ソースを参照

Updated phases documentation

Taddeus Kroes 11 年 前
コミット
8259c48821
8 ファイル変更110 行追加105 行削除
  1. 5 4
      phases/boolop.mli
  2. 55 50
      phases/constprop.mli
  3. 7 7
      phases/context.mli
  4. 37 36
      phases/desug.mli
  5. 1 1
      phases/dimreduce.mli
  6. 3 5
      phases/index.mli
  7. 1 1
      phases/parse.mli
  8. 1 1
      phases/peephole.mli

+ 5 - 4
phases/boolop.mli

@@ -9,10 +9,11 @@
     transforming conjunction and disjunction operators into conditional
     operations of the form [a ? b : c]. This operation is supported by the
     assembly phase, which handles them similarly to if-else statements. The
-    benefit of using conditional operations, is that thiese are still
-    expressions. A transformation into actual if-else statements would retuire
-    adding additional statements, which creates other problems which we will not
-    discuss here.
+    benefit of using conditional operations, is that they are still expressions.
+    A transformation into actual if-else statements would require adding
+    additional statements, which is not only more difficult to implement, but
+    also changes the evaluation order, which (however in agreement with the C
+    standard) may lead to confusing behaviour.
 
     The applied transformations are as follows:
 {v b1 == b2    ==>   (int)b1 == (int)b2

+ 55 - 50
phases/constprop.mli

@@ -1,24 +1,24 @@
-(** Rudimentary constant propagation and constant folding on generated
-    variables. *)
+(** Rudimentary constant propagation, constant folding and arithmetic
+    simplification on generated variables. *)
 
-(** The compiler sometimes generates variables of the form [__foo_1__], to make
-    sure that expressions are only executed once. In many cases, this leads to
-    over-complex constructions with more variable definitions den necessary, for
-    example when converting for-loops to while-loops. We use the knowledge of
-    these variables being only assigned once, by propagating the constant values
-    to their occurrences, and then apply arithmetic simplification to operators
-    to reduce the size and complexity of the generated code. Note that this can
-    only be applied to constants. For variables in general, some form of
-    liveness analysis would be required (e.g. first convert to Static Single
-    Assignment form). Expressions can only be propagated when they have no side
-    effects, i.e. when they do not contain function calls. The current
-    implementation only propagates [Bool|Int|Float] constants and simple
-    variable uses ([VarUse]).
+(** The compiler sometimes generates variables of the form [_foo_1_],
+    indicating that the variable is assigned exactly once (it is in SSA form).
+    In many cases, generated variables leads to over-complex constructions with
+    more variable definitions then necessary, for example when converting
+    for-loops to while-loops. We use the knowledge of these variables being in
+    SSA form, by propagating the constant values to their uses, and then apply
+    arithmetic simplification to operators to reduce the size and complexity of
+    the generated code. Note that this can only be applied when the assigned
+    expression is a constant or an SSA variable, since these have no side
+    effects and will not change in between the assignment and their uses. For
+    optimisation regular of variables, some form of liveness analysis would be
+    required.
 
-    Constant propagation is merged with some some arithmetic simplification here,
-    specifically targeting optimization oppertunities created bij earlier
-    constant propagation. This is utilized, for example, in array index
-    calculation when array dimensions are constant.
+    Constant propagation is supplemented with constand folding and some
+    arithmetic simplification, the latter specifically targeting optimisation
+    oppertunities created by earlier constant propagation. For example, in and
+    array index calculation when constant array dimensions are propagated, the
+    index calculation can often be simplified.
 
     The following example demonstrates the effectivity of this phase. An array
     assignment is transformed into a for-loop, which is transformed into a
@@ -26,51 +26,56 @@
     simplified.
 
 {v void foo() \{
-    int[2, 2] arr = [[1, 2], 3];
+    int[2, 2] arr = 3;
 \} v}
 
 After desugaring and array dimension reduction this becomes:
 
 {v void foo() \{
-    int ____i_6_7;
-    int __arr_1_2__;
-    int __arr_2_1__;
-    int __const_3__;
-    int __const_4__;
-    int __const_5__;
+    int _i_4;
+    int _stop_5_;
+    int _step_6_;
+    int _i_7;
+    int _stop_8_;
+    int _step_9_;
+    int _arr_0_;
+    int _arr_1_;
+    int _scalar_1_;
     int[] arr;
-    int __stop_8__;
-    int __step_9__;
-    __arr_1_2__ = 2;
-    __arr_2_1__ = 2;
-    __const_3__ = 1;
-    __const_4__ = 2;
-    __const_5__ = 3;
-    arr := <allocate>((__arr_1_2__ * __arr_2_1__));
-    arr[((0 * __arr_2_1__) + 0)] = __const_3__;
-    arr[((0 * __arr_2_1__) + 1)] = __const_4__;
-    ____i_6_7 = 0;
-    __stop_8__ = __arr_2_1__;
-    __step_9__ = 1;
-    while (((__step_9__ > 0) ? (____i_6_7 < __stop_8__) : (____i_6_7 > __stop_8__))) \{
-        arr[((1 * __arr_2_1__) + ____i_6_7)] = __const_5__;
-        ____i_6_7 = (____i_6_7 + __step_9__);
+    _arr_0_ = 2;
+    _arr_1_ = 2;
+    _scalar_1_ = 3;
+    arr := <allocate>((_arr_0_ * _arr_1_));
+    _i_4 = 0;
+    _stop_5_ = _arr_0_;
+    _step_6_ = 1;
+    while (((_step_6_ > 0) ? (_i_4 < _stop_5_) : (_i_4 > _stop_5_))) \{
+        _i_7 = 0;
+        _stop_8_ = _arr_1_;
+        _step_9_ = 1;
+        while (((_step_9_ > 0) ? (_i_7 < _stop_8_) : (_i_7 > _stop_8_))) \{
+            arr[((_i_4 * _arr_1_) + _i_7)] = _scalar_1_;
+            _i_7 = (_i_7 + _step_9_);
+        \}
+        _i_4 = (_i_4 + _step_6_);
     \}
-
 \} v}
 
 Constant propagation reduces this to:
 
 {v void foo() \{
-    int ____i_6_7;
+    int _i_4;
+    int _i_7;
     int[] arr;
     arr := <allocate>(4);
-    arr[0] = 1;
-    arr[1] = 2;
-    ____i_6_7 = 0;
-    while ((____i_6_7 < 2)) \{
-        arr[(2 + ____i_6_7)] = 3;
-        ____i_6_7 = (____i_6_7 + 1);
+    _i_4 = 0;
+    while ((_i_4 < 2)) \{
+        _i_7 = 0;
+        while ((_i_7 < 2)) \{
+            arr[((_i_4 * 2) + _i_7)] = 3;
+            _i_7 = (_i_7 + 1);
+        \}
+        _i_4 = (_i_4 + 1);
     \}
 \} v}
     *)

+ 7 - 7
phases/context.mli

@@ -2,23 +2,23 @@
 
 (** The desugared CiviC code contains [Var], [FunCall] and [Assign] nodes. These
     all use variables or functions identified by a [string] name. The context
-    analysis phase links each occurrence of this node to a declaration: a
+    analysis phase links each variable occurrence to its declaration: a
     [VarDec], [Param], [Dim], [GlobalDe[cf]] or [FunDe[cf]].  Since the original
     nodes only have a [string] field to save the declaration, new node types
     have been added which replace the name with a declaration node: [VarUse],
     [FunUse], and [VarLet].
 
-    The phase traverses into functions, but first finds declarations in the
-    entire outer scope of the function, since functions can use any function of
-    variable that is defined within the same scope.
-
     The whole analysis is done in one traversal. When a declaration node is
     encountered, its name and declaration are added to the currect scope (a
-    mutable hash table). When a vairable of fuction use is encountered, the name
+    mutable hash table). When a variable of fuction use is encountered, the name
     and declaration are looked up in the current scope. The scope is duplicated
     when entering a function, and restored when exiting the function, so that
     functions that are not subroutines of each other, do not share inner variable
-    definitions. *)
+    definitions. Note that the traversal traverses into functions AFTER it has
+    found all declarations in the outer scope of the function, since functions
+    can use any function of variable that is defined within the same scope (also
+    those defined after the function itself).
+    *)
 
 (** Traversal that replaces names with declarations. Exported for use in other
     phases. *)

+ 37 - 36
phases/desug.mli

@@ -30,12 +30,13 @@ void foo() \{
 
     {3 Array initialisations}
 
-    A more complex class of initialisations are array initialisations. Arrays
-    can be initialised to a scalar value or to an array constant in bracket
-    notation. A scalar value is rewritten to a nested for-loop over all array
-    dmensions, with an assignment in the most nested loop. An array constant is
-    rewritten to a series of separate assign statements to the corresponding
-    array indices. The following example shows both transformations:
+    A more complex class of initialisations is that of array initialisations.
+    Arrays can be initialised to a scalar value or to an array literal in
+    bracket notation. A scalar value is rewritten to a nested for-loop over all
+    array dmensions, with an assignment in the most nested loop. An array
+    constant is rewritten to a series of separate assign statements to the
+    corresponding array indices. The following example shows both
+    transformations:
 {v void foo() \{
     int[3] a = 4;
     int[2, 2] b = [[3, 4], [5, 6]];
@@ -62,9 +63,9 @@ void foo() \{
     example to maintain readability.
 
     Note that array constants in bracket expressions must have a nesting level
-    that is equal to the number of array dimensions, or an error will occur.
+    that is equal to the number of array dimensions, else an error will occur.
 
-    {2 Prevent incorrect double evaluation}
+    {2 Move array dimensions and scalars into new variables}
 
     In the following code:
 {v int twos = 0;
@@ -89,44 +90,44 @@ void foo() \{
     and array dimensions, and replacing the original expression with the
     generated variables. Note that these variables are marked so-called
     "constant variables" since they are known to be assigned exactly once, and
-    thus optimizable by {!Constprop} in some cases. This way, only the
-    non-constant expressions are defined in new variables in the resulting code.
+    thus likely optimizable by {!Constprop}. This way, only the non-constant
+    expressions are defined in new variables in the final code.
 
     In the example above, [int[2, two()] a = two();] is transformed as follows:
 {v     ...
-    int _a_1_1_ = 2;  // will be propagated back by constant propagation
-    int _a_2_2_ = two();
-    int _scalar_3_ = two();
-    int[_a_1_1_, _a_2_2_] a = _scalar_3_;
+    int _a_0_ = 2;  // 2 will be propagated back by constant propagation
+    int _a_1_ = two();
+    int _scalar_1_ = two();
+    int[_a_0_, _a_1_] a = _scalar_1_;
     ...  v}
 resulting in:
 {v     ...
-    int _a_1_1_;
-    int _a_2_2_;
-    int _scalar_3_;
-    int[_a_1_1_, _a_2_2_] a;
-    _a_1_1_ = 2;
-    _a_2_2_ = two();
-    _scalar_3_ = two();
-    a := <allocate>(_a_1_1_, _a_2_2_);
-    for (int _i_4 = 0, _a_1_1_) \{
-        for (int _i_5 = 0, _a_2_2_) \{
-            a[_i_4, _i_5] = _scalar_3_;
+    int _a_0_;
+    int _a_1_;
+    int _scalar_1_;
+    int[_a_0_, _a_1_] a;
+    _a_0_ = 2;
+    _a_1_ = two();
+    _scalar_1_ = two();
+    a := <allocate>(_a_0_, _a_1_);
+    for (int _i_2 = 0, _a_0_) \{
+        for (int _i_3 = 0, _a_1_) \{
+            a[_i_2, _i_3] = _scalar_1_;
         \}
     \}
     ...  v}
 
-    The [_a_1_1_] here is formed from the array name [a], the number of the
-    dimension [1], and a global counter variable that happened to be [1] at the
-    moment he variable was generated. The counter is necessary to make the
-    variable name unique, even when the program contains illegal name clashes,
-    which would yield weird errors during context analysis. E.g., a second
-    definition [int[2] a;] would generate a new variable [_a_1] which would
-    clash with the earlier [_a_1], yielding an error on compiler-generated code
-    instead of on the definition of [a].
-
-    Note that the for-loops are actually transformed into while-loops, but not
-    in this example in order to maintain readability.
+    The transformation described above is applied to all array definitions,
+    including extern arrays. Although dimensions of extern arrays are not
+    expressions (but identifiers), the transformation is necessary in order to
+    generate consistent names to be imported/exported. E.g. in [int[n] a], [n]
+    is just a name given locally to the first dimension of [a]. Therefore it is
+    transformed into:
+{v     extern int _a_0_;
+    int[_a_0_] a; v}
+    Also, all occurrences of [n] in the rest of the module are replaced by
+    [_a_0_]. For exported arrays, the generated dimension variables need to be
+    exported as well.
 
     {2 Transforming for-loops to while-loops}
 

+ 1 - 1
phases/dimreduce.mli

@@ -2,7 +2,7 @@
     arrays, and pass original array dimensions in function calls. *)
 
 (**
-    This phase lowers multi-dimensional ararys to one-dimensional arrays. This
+    This phase lowers multi-dimensional arrays to one-dimensional arrays. This
     transformation is done in two steps.
 
     In the first step, function calls and function parameter lists are modified.

+ 3 - 5
phases/index.mli

@@ -11,11 +11,9 @@
     variable/function uses with the original [Var|Assign|FunCall] nodes. Then,
     index analysis is performed on declarations. Finally, the
     {!Context.analyse_context} traversal is re-run to carry the [Index]
-    annotations to the variable/function uses.
-
-    Note that we can safely assume that no errors will occur during this context
-    analysis, since incorrect uses would have been spotted by the earlier
-    context analysis already.
+    annotations to the variable/function uses. Note that we can safely assume
+    that no errors will occur during this context analysis, since incorrect uses
+    would have been identified by the earlier context analysis already.
     *)
 
 (** Main phase function, called by {!Main}. *)

+ 1 - 1
phases/parse.mli

@@ -5,7 +5,7 @@
     [LocMsg], so that the error can be highlighted in the input file code.
 
     The global files [lexer.mll] and [parser.mly] implement the grammar
-    specified by the CiviC language manual This includes the extensions of
+    specified by the CiviC language manual. This includes the extensions of
     nested functions and multi-dimensional arrays. The entire CiviC grammar
     implementation is summarized below. Note that Menhir parser syntax is used
     on some occasions. {v

+ 1 - 1
phases/peephole.mli

@@ -33,7 +33,7 @@ becomes [jump label].
     i{inc,dec} L C   |   i{inc,dec}_1 L v}
 
     Note that the [iload] and [iloadc] may also be in reverse order, which in
-    CiviC code is the difference between [i = i + 1;] and [i = 1 + i]. Both
+    CiviC code is the difference between [i = i + 1;] and [i = 1 + i;]. Both
     orders of succession are supported by the implementation.
     *)