浏览代码

Updated phases documentation

Taddeus Kroes 11 年之前
父节点
当前提交
8259c48821
共有 8 个文件被更改,包括 110 次插入105 次删除
  1. 5 4
      phases/boolop.mli
  2. 55 50
      phases/constprop.mli
  3. 7 7
      phases/context.mli
  4. 37 36
      phases/desug.mli
  5. 1 1
      phases/dimreduce.mli
  6. 3 5
      phases/index.mli
  7. 1 1
      phases/parse.mli
  8. 1 1
      phases/peephole.mli

+ 5 - 4
phases/boolop.mli

@@ -9,10 +9,11 @@
     transforming conjunction and disjunction operators into conditional
     transforming conjunction and disjunction operators into conditional
     operations of the form [a ? b : c]. This operation is supported by the
     operations of the form [a ? b : c]. This operation is supported by the
     assembly phase, which handles them similarly to if-else statements. The
     assembly phase, which handles them similarly to if-else statements. The
-    benefit of using conditional operations, is that thiese are still
-    expressions. A transformation into actual if-else statements would retuire
-    adding additional statements, which creates other problems which we will not
-    discuss here.
+    benefit of using conditional operations, is that they are still expressions.
+    A transformation into actual if-else statements would require adding
+    additional statements, which is not only more difficult to implement, but
+    also changes the evaluation order, which (however in agreement with the C
+    standard) may lead to confusing behaviour.
 
 
     The applied transformations are as follows:
     The applied transformations are as follows:
 {v b1 == b2    ==>   (int)b1 == (int)b2
 {v b1 == b2    ==>   (int)b1 == (int)b2

+ 55 - 50
phases/constprop.mli

@@ -1,24 +1,24 @@
-(** Rudimentary constant propagation and constant folding on generated
-    variables. *)
+(** Rudimentary constant propagation, constant folding and arithmetic
+    simplification on generated variables. *)
 
 
-(** The compiler sometimes generates variables of the form [__foo_1__], to make
-    sure that expressions are only executed once. In many cases, this leads to
-    over-complex constructions with more variable definitions den necessary, for
-    example when converting for-loops to while-loops. We use the knowledge of
-    these variables being only assigned once, by propagating the constant values
-    to their occurrences, and then apply arithmetic simplification to operators
-    to reduce the size and complexity of the generated code. Note that this can
-    only be applied to constants. For variables in general, some form of
-    liveness analysis would be required (e.g. first convert to Static Single
-    Assignment form). Expressions can only be propagated when they have no side
-    effects, i.e. when they do not contain function calls. The current
-    implementation only propagates [Bool|Int|Float] constants and simple
-    variable uses ([VarUse]).
+(** The compiler sometimes generates variables of the form [_foo_1_],
+    indicating that the variable is assigned exactly once (it is in SSA form).
+    In many cases, generated variables leads to over-complex constructions with
+    more variable definitions then necessary, for example when converting
+    for-loops to while-loops. We use the knowledge of these variables being in
+    SSA form, by propagating the constant values to their uses, and then apply
+    arithmetic simplification to operators to reduce the size and complexity of
+    the generated code. Note that this can only be applied when the assigned
+    expression is a constant or an SSA variable, since these have no side
+    effects and will not change in between the assignment and their uses. For
+    optimisation regular of variables, some form of liveness analysis would be
+    required.
 
 
-    Constant propagation is merged with some some arithmetic simplification here,
-    specifically targeting optimization oppertunities created bij earlier
-    constant propagation. This is utilized, for example, in array index
-    calculation when array dimensions are constant.
+    Constant propagation is supplemented with constand folding and some
+    arithmetic simplification, the latter specifically targeting optimisation
+    oppertunities created by earlier constant propagation. For example, in and
+    array index calculation when constant array dimensions are propagated, the
+    index calculation can often be simplified.
 
 
     The following example demonstrates the effectivity of this phase. An array
     The following example demonstrates the effectivity of this phase. An array
     assignment is transformed into a for-loop, which is transformed into a
     assignment is transformed into a for-loop, which is transformed into a
@@ -26,51 +26,56 @@
     simplified.
     simplified.
 
 
 {v void foo() \{
 {v void foo() \{
-    int[2, 2] arr = [[1, 2], 3];
+    int[2, 2] arr = 3;
 \} v}
 \} v}
 
 
 After desugaring and array dimension reduction this becomes:
 After desugaring and array dimension reduction this becomes:
 
 
 {v void foo() \{
 {v void foo() \{
-    int ____i_6_7;
-    int __arr_1_2__;
-    int __arr_2_1__;
-    int __const_3__;
-    int __const_4__;
-    int __const_5__;
+    int _i_4;
+    int _stop_5_;
+    int _step_6_;
+    int _i_7;
+    int _stop_8_;
+    int _step_9_;
+    int _arr_0_;
+    int _arr_1_;
+    int _scalar_1_;
     int[] arr;
     int[] arr;
-    int __stop_8__;
-    int __step_9__;
-    __arr_1_2__ = 2;
-    __arr_2_1__ = 2;
-    __const_3__ = 1;
-    __const_4__ = 2;
-    __const_5__ = 3;
-    arr := <allocate>((__arr_1_2__ * __arr_2_1__));
-    arr[((0 * __arr_2_1__) + 0)] = __const_3__;
-    arr[((0 * __arr_2_1__) + 1)] = __const_4__;
-    ____i_6_7 = 0;
-    __stop_8__ = __arr_2_1__;
-    __step_9__ = 1;
-    while (((__step_9__ > 0) ? (____i_6_7 < __stop_8__) : (____i_6_7 > __stop_8__))) \{
-        arr[((1 * __arr_2_1__) + ____i_6_7)] = __const_5__;
-        ____i_6_7 = (____i_6_7 + __step_9__);
+    _arr_0_ = 2;
+    _arr_1_ = 2;
+    _scalar_1_ = 3;
+    arr := <allocate>((_arr_0_ * _arr_1_));
+    _i_4 = 0;
+    _stop_5_ = _arr_0_;
+    _step_6_ = 1;
+    while (((_step_6_ > 0) ? (_i_4 < _stop_5_) : (_i_4 > _stop_5_))) \{
+        _i_7 = 0;
+        _stop_8_ = _arr_1_;
+        _step_9_ = 1;
+        while (((_step_9_ > 0) ? (_i_7 < _stop_8_) : (_i_7 > _stop_8_))) \{
+            arr[((_i_4 * _arr_1_) + _i_7)] = _scalar_1_;
+            _i_7 = (_i_7 + _step_9_);
+        \}
+        _i_4 = (_i_4 + _step_6_);
     \}
     \}
-
 \} v}
 \} v}
 
 
 Constant propagation reduces this to:
 Constant propagation reduces this to:
 
 
 {v void foo() \{
 {v void foo() \{
-    int ____i_6_7;
+    int _i_4;
+    int _i_7;
     int[] arr;
     int[] arr;
     arr := <allocate>(4);
     arr := <allocate>(4);
-    arr[0] = 1;
-    arr[1] = 2;
-    ____i_6_7 = 0;
-    while ((____i_6_7 < 2)) \{
-        arr[(2 + ____i_6_7)] = 3;
-        ____i_6_7 = (____i_6_7 + 1);
+    _i_4 = 0;
+    while ((_i_4 < 2)) \{
+        _i_7 = 0;
+        while ((_i_7 < 2)) \{
+            arr[((_i_4 * 2) + _i_7)] = 3;
+            _i_7 = (_i_7 + 1);
+        \}
+        _i_4 = (_i_4 + 1);
     \}
     \}
 \} v}
 \} v}
     *)
     *)

+ 7 - 7
phases/context.mli

@@ -2,23 +2,23 @@
 
 
 (** The desugared CiviC code contains [Var], [FunCall] and [Assign] nodes. These
 (** The desugared CiviC code contains [Var], [FunCall] and [Assign] nodes. These
     all use variables or functions identified by a [string] name. The context
     all use variables or functions identified by a [string] name. The context
-    analysis phase links each occurrence of this node to a declaration: a
+    analysis phase links each variable occurrence to its declaration: a
     [VarDec], [Param], [Dim], [GlobalDe[cf]] or [FunDe[cf]].  Since the original
     [VarDec], [Param], [Dim], [GlobalDe[cf]] or [FunDe[cf]].  Since the original
     nodes only have a [string] field to save the declaration, new node types
     nodes only have a [string] field to save the declaration, new node types
     have been added which replace the name with a declaration node: [VarUse],
     have been added which replace the name with a declaration node: [VarUse],
     [FunUse], and [VarLet].
     [FunUse], and [VarLet].
 
 
-    The phase traverses into functions, but first finds declarations in the
-    entire outer scope of the function, since functions can use any function of
-    variable that is defined within the same scope.
-
     The whole analysis is done in one traversal. When a declaration node is
     The whole analysis is done in one traversal. When a declaration node is
     encountered, its name and declaration are added to the currect scope (a
     encountered, its name and declaration are added to the currect scope (a
-    mutable hash table). When a vairable of fuction use is encountered, the name
+    mutable hash table). When a variable of fuction use is encountered, the name
     and declaration are looked up in the current scope. The scope is duplicated
     and declaration are looked up in the current scope. The scope is duplicated
     when entering a function, and restored when exiting the function, so that
     when entering a function, and restored when exiting the function, so that
     functions that are not subroutines of each other, do not share inner variable
     functions that are not subroutines of each other, do not share inner variable
-    definitions. *)
+    definitions. Note that the traversal traverses into functions AFTER it has
+    found all declarations in the outer scope of the function, since functions
+    can use any function of variable that is defined within the same scope (also
+    those defined after the function itself).
+    *)
 
 
 (** Traversal that replaces names with declarations. Exported for use in other
 (** Traversal that replaces names with declarations. Exported for use in other
     phases. *)
     phases. *)

+ 37 - 36
phases/desug.mli

@@ -30,12 +30,13 @@ void foo() \{
 
 
     {3 Array initialisations}
     {3 Array initialisations}
 
 
-    A more complex class of initialisations are array initialisations. Arrays
-    can be initialised to a scalar value or to an array constant in bracket
-    notation. A scalar value is rewritten to a nested for-loop over all array
-    dmensions, with an assignment in the most nested loop. An array constant is
-    rewritten to a series of separate assign statements to the corresponding
-    array indices. The following example shows both transformations:
+    A more complex class of initialisations is that of array initialisations.
+    Arrays can be initialised to a scalar value or to an array literal in
+    bracket notation. A scalar value is rewritten to a nested for-loop over all
+    array dmensions, with an assignment in the most nested loop. An array
+    constant is rewritten to a series of separate assign statements to the
+    corresponding array indices. The following example shows both
+    transformations:
 {v void foo() \{
 {v void foo() \{
     int[3] a = 4;
     int[3] a = 4;
     int[2, 2] b = [[3, 4], [5, 6]];
     int[2, 2] b = [[3, 4], [5, 6]];
@@ -62,9 +63,9 @@ void foo() \{
     example to maintain readability.
     example to maintain readability.
 
 
     Note that array constants in bracket expressions must have a nesting level
     Note that array constants in bracket expressions must have a nesting level
-    that is equal to the number of array dimensions, or an error will occur.
+    that is equal to the number of array dimensions, else an error will occur.
 
 
-    {2 Prevent incorrect double evaluation}
+    {2 Move array dimensions and scalars into new variables}
 
 
     In the following code:
     In the following code:
 {v int twos = 0;
 {v int twos = 0;
@@ -89,44 +90,44 @@ void foo() \{
     and array dimensions, and replacing the original expression with the
     and array dimensions, and replacing the original expression with the
     generated variables. Note that these variables are marked so-called
     generated variables. Note that these variables are marked so-called
     "constant variables" since they are known to be assigned exactly once, and
     "constant variables" since they are known to be assigned exactly once, and
-    thus optimizable by {!Constprop} in some cases. This way, only the
-    non-constant expressions are defined in new variables in the resulting code.
+    thus likely optimizable by {!Constprop}. This way, only the non-constant
+    expressions are defined in new variables in the final code.
 
 
     In the example above, [int[2, two()] a = two();] is transformed as follows:
     In the example above, [int[2, two()] a = two();] is transformed as follows:
 {v     ...
 {v     ...
-    int _a_1_1_ = 2;  // will be propagated back by constant propagation
-    int _a_2_2_ = two();
-    int _scalar_3_ = two();
-    int[_a_1_1_, _a_2_2_] a = _scalar_3_;
+    int _a_0_ = 2;  // 2 will be propagated back by constant propagation
+    int _a_1_ = two();
+    int _scalar_1_ = two();
+    int[_a_0_, _a_1_] a = _scalar_1_;
     ...  v}
     ...  v}
 resulting in:
 resulting in:
 {v     ...
 {v     ...
-    int _a_1_1_;
-    int _a_2_2_;
-    int _scalar_3_;
-    int[_a_1_1_, _a_2_2_] a;
-    _a_1_1_ = 2;
-    _a_2_2_ = two();
-    _scalar_3_ = two();
-    a := <allocate>(_a_1_1_, _a_2_2_);
-    for (int _i_4 = 0, _a_1_1_) \{
-        for (int _i_5 = 0, _a_2_2_) \{
-            a[_i_4, _i_5] = _scalar_3_;
+    int _a_0_;
+    int _a_1_;
+    int _scalar_1_;
+    int[_a_0_, _a_1_] a;
+    _a_0_ = 2;
+    _a_1_ = two();
+    _scalar_1_ = two();
+    a := <allocate>(_a_0_, _a_1_);
+    for (int _i_2 = 0, _a_0_) \{
+        for (int _i_3 = 0, _a_1_) \{
+            a[_i_2, _i_3] = _scalar_1_;
         \}
         \}
     \}
     \}
     ...  v}
     ...  v}
 
 
-    The [_a_1_1_] here is formed from the array name [a], the number of the
-    dimension [1], and a global counter variable that happened to be [1] at the
-    moment he variable was generated. The counter is necessary to make the
-    variable name unique, even when the program contains illegal name clashes,
-    which would yield weird errors during context analysis. E.g., a second
-    definition [int[2] a;] would generate a new variable [_a_1] which would
-    clash with the earlier [_a_1], yielding an error on compiler-generated code
-    instead of on the definition of [a].
-
-    Note that the for-loops are actually transformed into while-loops, but not
-    in this example in order to maintain readability.
+    The transformation described above is applied to all array definitions,
+    including extern arrays. Although dimensions of extern arrays are not
+    expressions (but identifiers), the transformation is necessary in order to
+    generate consistent names to be imported/exported. E.g. in [int[n] a], [n]
+    is just a name given locally to the first dimension of [a]. Therefore it is
+    transformed into:
+{v     extern int _a_0_;
+    int[_a_0_] a; v}
+    Also, all occurrences of [n] in the rest of the module are replaced by
+    [_a_0_]. For exported arrays, the generated dimension variables need to be
+    exported as well.
 
 
     {2 Transforming for-loops to while-loops}
     {2 Transforming for-loops to while-loops}
 
 

+ 1 - 1
phases/dimreduce.mli

@@ -2,7 +2,7 @@
     arrays, and pass original array dimensions in function calls. *)
     arrays, and pass original array dimensions in function calls. *)
 
 
 (**
 (**
-    This phase lowers multi-dimensional ararys to one-dimensional arrays. This
+    This phase lowers multi-dimensional arrays to one-dimensional arrays. This
     transformation is done in two steps.
     transformation is done in two steps.
 
 
     In the first step, function calls and function parameter lists are modified.
     In the first step, function calls and function parameter lists are modified.

+ 3 - 5
phases/index.mli

@@ -11,11 +11,9 @@
     variable/function uses with the original [Var|Assign|FunCall] nodes. Then,
     variable/function uses with the original [Var|Assign|FunCall] nodes. Then,
     index analysis is performed on declarations. Finally, the
     index analysis is performed on declarations. Finally, the
     {!Context.analyse_context} traversal is re-run to carry the [Index]
     {!Context.analyse_context} traversal is re-run to carry the [Index]
-    annotations to the variable/function uses.
-
-    Note that we can safely assume that no errors will occur during this context
-    analysis, since incorrect uses would have been spotted by the earlier
-    context analysis already.
+    annotations to the variable/function uses. Note that we can safely assume
+    that no errors will occur during this context analysis, since incorrect uses
+    would have been identified by the earlier context analysis already.
     *)
     *)
 
 
 (** Main phase function, called by {!Main}. *)
 (** Main phase function, called by {!Main}. *)

+ 1 - 1
phases/parse.mli

@@ -5,7 +5,7 @@
     [LocMsg], so that the error can be highlighted in the input file code.
     [LocMsg], so that the error can be highlighted in the input file code.
 
 
     The global files [lexer.mll] and [parser.mly] implement the grammar
     The global files [lexer.mll] and [parser.mly] implement the grammar
-    specified by the CiviC language manual This includes the extensions of
+    specified by the CiviC language manual. This includes the extensions of
     nested functions and multi-dimensional arrays. The entire CiviC grammar
     nested functions and multi-dimensional arrays. The entire CiviC grammar
     implementation is summarized below. Note that Menhir parser syntax is used
     implementation is summarized below. Note that Menhir parser syntax is used
     on some occasions. {v
     on some occasions. {v

+ 1 - 1
phases/peephole.mli

@@ -33,7 +33,7 @@ becomes [jump label].
     i{inc,dec} L C   |   i{inc,dec}_1 L v}
     i{inc,dec} L C   |   i{inc,dec}_1 L v}
 
 
     Note that the [iload] and [iloadc] may also be in reverse order, which in
     Note that the [iload] and [iloadc] may also be in reverse order, which in
-    CiviC code is the difference between [i = i + 1;] and [i = 1 + i]. Both
+    CiviC code is the difference between [i = i + 1;] and [i = 1 + i;]. Both
     orders of succession are supported by the implementation.
     orders of succession are supported by the implementation.
     *)
     *)