11 年前 · 8259c48821
--- a/phases/boolop.mli
+++ b/phases/boolop.mli
@@ -9,10 +9,11 @@
 
				     transforming conjunction and disjunction operators into conditional
			
 
				     operations of the form [a ? b : c]. This operation is supported by the
			
 
				     assembly phase, which handles them similarly to if-else statements. The
			
 
				-    benefit of using conditional operations, is that thiese are still
			
 
				-    expressions. A transformation into actual if-else statements would retuire
			
 
				-    adding additional statements, which creates other problems which we will not
			
 
				-    discuss here.
			
 
				+    benefit of using conditional operations, is that they are still expressions.
			
 
				+    A transformation into actual if-else statements would require adding
			
 
				+    additional statements, which is not only more difficult to implement, but
			
 
				+    also changes the evaluation order, which (however in agreement with the C
			
 
				+    standard) may lead to confusing behaviour.
			
 
				 
			
 
				     The applied transformations are as follows:
			
 
				 {v b1 == b2    ==>   (int)b1 == (int)b2
			
--- a/phases/constprop.mli
+++ b/phases/constprop.mli
@@ -1,24 +1,24 @@
 
				-(** Rudimentary constant propagation and constant folding on generated
			
 
				-    variables. *)
			
 
				+(** Rudimentary constant propagation, constant folding and arithmetic
			
 
				+    simplification on generated variables. *)
			
 
				 
			
 
				-(** The compiler sometimes generates variables of the form [__foo_1__], to make
			
 
				-    sure that expressions are only executed once. In many cases, this leads to
			
 
				-    over-complex constructions with more variable definitions den necessary, for
			
 
				-    example when converting for-loops to while-loops. We use the knowledge of
			
 
				-    these variables being only assigned once, by propagating the constant values
			
 
				-    to their occurrences, and then apply arithmetic simplification to operators
			
 
				-    to reduce the size and complexity of the generated code. Note that this can
			
 
				-    only be applied to constants. For variables in general, some form of
			
 
				-    liveness analysis would be required (e.g. first convert to Static Single
			
 
				-    Assignment form). Expressions can only be propagated when they have no side
			
 
				-    effects, i.e. when they do not contain function calls. The current
			
 
				-    implementation only propagates [Bool|Int|Float] constants and simple
			
 
				-    variable uses ([VarUse]).
			
 
				+(** The compiler sometimes generates variables of the form [_foo_1_],
			
 
				+    indicating that the variable is assigned exactly once (it is in SSA form).
			
 
				+    In many cases, generated variables leads to over-complex constructions with
			
 
				+    more variable definitions then necessary, for example when converting
			
 
				+    for-loops to while-loops. We use the knowledge of these variables being in
			
 
				+    SSA form, by propagating the constant values to their uses, and then apply
			
 
				+    arithmetic simplification to operators to reduce the size and complexity of
			
 
				+    the generated code. Note that this can only be applied when the assigned
			
 
				+    expression is a constant or an SSA variable, since these have no side
			
 
				+    effects and will not change in between the assignment and their uses. For
			
 
				+    optimisation regular of variables, some form of liveness analysis would be
			
 
				+    required.
			
 
				 
			
 
				-    Constant propagation is merged with some some arithmetic simplification here,
			
 
				-    specifically targeting optimization oppertunities created bij earlier
			
 
				-    constant propagation. This is utilized, for example, in array index
			
 
				-    calculation when array dimensions are constant.
			
 
				+    Constant propagation is supplemented with constand folding and some
			
 
				+    arithmetic simplification, the latter specifically targeting optimisation
			
 
				+    oppertunities created by earlier constant propagation. For example, in and
			
 
				+    array index calculation when constant array dimensions are propagated, the
			
 
				+    index calculation can often be simplified.
			
 
				 
			
 
				     The following example demonstrates the effectivity of this phase. An array
			
 
				     assignment is transformed into a for-loop, which is transformed into a
			
@@ -26,51 +26,56 @@
 
				     simplified.
			
 
				 
			
 
				 {v void foo() \{
			
 
				-    int[2, 2] arr = [[1, 2], 3];
			
 
				+    int[2, 2] arr = 3;
			
 
				 \} v}
			
 
				 
			
 
				 After desugaring and array dimension reduction this becomes:
			
 
				 
			
 
				 {v void foo() \{
			
 
				-    int ____i_6_7;
			
 
				-    int __arr_1_2__;
			
 
				-    int __arr_2_1__;
			
 
				-    int __const_3__;
			
 
				-    int __const_4__;
			
 
				-    int __const_5__;
			
 
				+    int _i_4;
			
 
				+    int _stop_5_;
			
 
				+    int _step_6_;
			
 
				+    int _i_7;
			
 
				+    int _stop_8_;
			
 
				+    int _step_9_;
			
 
				+    int _arr_0_;
			
 
				+    int _arr_1_;
			
 
				+    int _scalar_1_;
			
 
				     int[] arr;
			
 
				-    int __stop_8__;
			
 
				-    int __step_9__;
			
 
				-    __arr_1_2__ = 2;
			
 
				-    __arr_2_1__ = 2;
			
 
				-    __const_3__ = 1;
			
 
				-    __const_4__ = 2;
			
 
				-    __const_5__ = 3;
			
 
				-    arr := <allocate>((__arr_1_2__ * __arr_2_1__));
			
 
				-    arr[((0 * __arr_2_1__) + 0)] = __const_3__;
			
 
				-    arr[((0 * __arr_2_1__) + 1)] = __const_4__;
			
 
				-    ____i_6_7 = 0;
			
 
				-    __stop_8__ = __arr_2_1__;
			
 
				-    __step_9__ = 1;
			
 
				-    while (((__step_9__ > 0) ? (____i_6_7 < __stop_8__) : (____i_6_7 > __stop_8__))) \{
			
 
				-        arr[((1 * __arr_2_1__) + ____i_6_7)] = __const_5__;
			
 
				-        ____i_6_7 = (____i_6_7 + __step_9__);
			
 
				+    _arr_0_ = 2;
			
 
				+    _arr_1_ = 2;
			
 
				+    _scalar_1_ = 3;
			
 
				+    arr := <allocate>((_arr_0_ * _arr_1_));
			
 
				+    _i_4 = 0;
			
 
				+    _stop_5_ = _arr_0_;
			
 
				+    _step_6_ = 1;
			
 
				+    while (((_step_6_ > 0) ? (_i_4 < _stop_5_) : (_i_4 > _stop_5_))) \{
			
 
				+        _i_7 = 0;
			
 
				+        _stop_8_ = _arr_1_;
			
 
				+        _step_9_ = 1;
			
 
				+        while (((_step_9_ > 0) ? (_i_7 < _stop_8_) : (_i_7 > _stop_8_))) \{
			
 
				+            arr[((_i_4 * _arr_1_) + _i_7)] = _scalar_1_;
			
 
				+            _i_7 = (_i_7 + _step_9_);
			
 
				+        \}
			
 
				+        _i_4 = (_i_4 + _step_6_);
			
 
				     \}
			
 
				-
			
 
				 \} v}
			
 
				 
			
 
				 Constant propagation reduces this to:
			
 
				 
			
 
				 {v void foo() \{
			
 
				-    int ____i_6_7;
			
 
				+    int _i_4;
			
 
				+    int _i_7;
			
 
				     int[] arr;
			
 
				     arr := <allocate>(4);
			
 
				-    arr[0] = 1;
			
 
				-    arr[1] = 2;
			
 
				-    ____i_6_7 = 0;
			
 
				-    while ((____i_6_7 < 2)) \{
			
 
				-        arr[(2 + ____i_6_7)] = 3;
			
 
				-        ____i_6_7 = (____i_6_7 + 1);
			
 
				+    _i_4 = 0;
			
 
				+    while ((_i_4 < 2)) \{
			
 
				+        _i_7 = 0;
			
 
				+        while ((_i_7 < 2)) \{
			
 
				+            arr[((_i_4 * 2) + _i_7)] = 3;
			
 
				+            _i_7 = (_i_7 + 1);
			
 
				+        \}
			
 
				+        _i_4 = (_i_4 + 1);
			
 
				     \}
			
 
				 \} v}
			
 
				     *)
			
--- a/phases/context.mli
+++ b/phases/context.mli
@@ -2,23 +2,23 @@
 
				 
			
 
				 (** The desugared CiviC code contains [Var], [FunCall] and [Assign] nodes. These
			
 
				     all use variables or functions identified by a [string] name. The context
			
 
				-    analysis phase links each occurrence of this node to a declaration: a
			
 
				+    analysis phase links each variable occurrence to its declaration: a
			
 
				     [VarDec], [Param], [Dim], [GlobalDe[cf]] or [FunDe[cf]].  Since the original
			
 
				     nodes only have a [string] field to save the declaration, new node types
			
 
				     have been added which replace the name with a declaration node: [VarUse],
			
 
				     [FunUse], and [VarLet].
			
 
				 
			
 
				-    The phase traverses into functions, but first finds declarations in the
			
 
				-    entire outer scope of the function, since functions can use any function of
			
 
				-    variable that is defined within the same scope.
			
 
				-
			
 
				     The whole analysis is done in one traversal. When a declaration node is
			
 
				     encountered, its name and declaration are added to the currect scope (a
			
 
				-    mutable hash table). When a vairable of fuction use is encountered, the name
			
 
				+    mutable hash table). When a variable of fuction use is encountered, the name
			
 
				     and declaration are looked up in the current scope. The scope is duplicated
			
 
				     when entering a function, and restored when exiting the function, so that
			
 
				     functions that are not subroutines of each other, do not share inner variable
			
 
				-    definitions. *)
			
 
				+    definitions. Note that the traversal traverses into functions AFTER it has
			
 
				+    found all declarations in the outer scope of the function, since functions
			
 
				+    can use any function of variable that is defined within the same scope (also
			
 
				+    those defined after the function itself).
			
 
				+    *)
			
 
				 
			
 
				 (** Traversal that replaces names with declarations. Exported for use in other
			
 
				     phases. *)
			
--- a/phases/desug.mli
+++ b/phases/desug.mli
@@ -30,12 +30,13 @@ void foo() \{
 
				 
			
 
				     {3 Array initialisations}
			
 
				 
			
 
				-    A more complex class of initialisations are array initialisations. Arrays
			
 
				-    can be initialised to a scalar value or to an array constant in bracket
			
 
				-    notation. A scalar value is rewritten to a nested for-loop over all array
			
 
				-    dmensions, with an assignment in the most nested loop. An array constant is
			
 
				-    rewritten to a series of separate assign statements to the corresponding
			
 
				-    array indices. The following example shows both transformations:
			
 
				+    A more complex class of initialisations is that of array initialisations.
			
 
				+    Arrays can be initialised to a scalar value or to an array literal in
			
 
				+    bracket notation. A scalar value is rewritten to a nested for-loop over all
			
 
				+    array dmensions, with an assignment in the most nested loop. An array
			
 
				+    constant is rewritten to a series of separate assign statements to the
			
 
				+    corresponding array indices. The following example shows both
			
 
				+    transformations:
			
 
				 {v void foo() \{
			
 
				     int[3] a = 4;
			
 
				     int[2, 2] b = [[3, 4], [5, 6]];
			
@@ -62,9 +63,9 @@ void foo() \{
 
				     example to maintain readability.
			
 
				 
			
 
				     Note that array constants in bracket expressions must have a nesting level
			
 
				-    that is equal to the number of array dimensions, or an error will occur.
			
 
				+    that is equal to the number of array dimensions, else an error will occur.
			
 
				 
			
 
				-    {2 Prevent incorrect double evaluation}
			
 
				+    {2 Move array dimensions and scalars into new variables}
			
 
				 
			
 
				     In the following code:
			
 
				 {v int twos = 0;
			
@@ -89,44 +90,44 @@ void foo() \{
 
				     and array dimensions, and replacing the original expression with the
			
 
				     generated variables. Note that these variables are marked so-called
			
 
				     "constant variables" since they are known to be assigned exactly once, and
			
 
				-    thus optimizable by {!Constprop} in some cases. This way, only the
			
 
				-    non-constant expressions are defined in new variables in the resulting code.
			
 
				+    thus likely optimizable by {!Constprop}. This way, only the non-constant
			
 
				+    expressions are defined in new variables in the final code.
			
 
				 
			
 
				     In the example above, [int[2, two()] a = two();] is transformed as follows:
			
 
				 {v     ...
			
 
				-    int _a_1_1_ = 2;  // will be propagated back by constant propagation
			
 
				-    int _a_2_2_ = two();
			
 
				-    int _scalar_3_ = two();
			
 
				-    int[_a_1_1_, _a_2_2_] a = _scalar_3_;
			
 
				+    int _a_0_ = 2;  // 2 will be propagated back by constant propagation
			
 
				+    int _a_1_ = two();
			
 
				+    int _scalar_1_ = two();
			
 
				+    int[_a_0_, _a_1_] a = _scalar_1_;
			
 
				     ...  v}
			
 
				 resulting in:
			
 
				 {v     ...
			
 
				-    int _a_1_1_;
			
 
				-    int _a_2_2_;
			
 
				-    int _scalar_3_;
			
 
				-    int[_a_1_1_, _a_2_2_] a;
			
 
				-    _a_1_1_ = 2;
			
 
				-    _a_2_2_ = two();
			
 
				-    _scalar_3_ = two();
			
 
				-    a := <allocate>(_a_1_1_, _a_2_2_);
			
 
				-    for (int _i_4 = 0, _a_1_1_) \{
			
 
				-        for (int _i_5 = 0, _a_2_2_) \{
			
 
				-            a[_i_4, _i_5] = _scalar_3_;
			
 
				+    int _a_0_;
			
 
				+    int _a_1_;
			
 
				+    int _scalar_1_;
			
 
				+    int[_a_0_, _a_1_] a;
			
 
				+    _a_0_ = 2;
			
 
				+    _a_1_ = two();
			
 
				+    _scalar_1_ = two();
			
 
				+    a := <allocate>(_a_0_, _a_1_);
			
 
				+    for (int _i_2 = 0, _a_0_) \{
			
 
				+        for (int _i_3 = 0, _a_1_) \{
			
 
				+            a[_i_2, _i_3] = _scalar_1_;
			
 
				         \}
			
 
				     \}
			
 
				     ...  v}
			
 
				 
			
 
				-    The [_a_1_1_] here is formed from the array name [a], the number of the
			
 
				-    dimension [1], and a global counter variable that happened to be [1] at the
			
 
				-    moment he variable was generated. The counter is necessary to make the
			
 
				-    variable name unique, even when the program contains illegal name clashes,
			
 
				-    which would yield weird errors during context analysis. E.g., a second
			
 
				-    definition [int[2] a;] would generate a new variable [_a_1] which would
			
 
				-    clash with the earlier [_a_1], yielding an error on compiler-generated code
			
 
				-    instead of on the definition of [a].
			
 
				-
			
 
				-    Note that the for-loops are actually transformed into while-loops, but not
			
 
				-    in this example in order to maintain readability.
			
 
				+    The transformation described above is applied to all array definitions,
			
 
				+    including extern arrays. Although dimensions of extern arrays are not
			
 
				+    expressions (but identifiers), the transformation is necessary in order to
			
 
				+    generate consistent names to be imported/exported. E.g. in [int[n] a], [n]
			
 
				+    is just a name given locally to the first dimension of [a]. Therefore it is
			
 
				+    transformed into:
			
 
				+{v     extern int _a_0_;
			
 
				+    int[_a_0_] a; v}
			
 
				+    Also, all occurrences of [n] in the rest of the module are replaced by
			
 
				+    [_a_0_]. For exported arrays, the generated dimension variables need to be
			
 
				+    exported as well.
			
 
				 
			
 
				     {2 Transforming for-loops to while-loops}
			
 
				 
			
--- a/phases/dimreduce.mli
+++ b/phases/dimreduce.mli
@@ -2,7 +2,7 @@
 
				     arrays, and pass original array dimensions in function calls. *)
			
 
				 
			
 
				 (**
			
 
				-    This phase lowers multi-dimensional ararys to one-dimensional arrays. This
			
 
				+    This phase lowers multi-dimensional arrays to one-dimensional arrays. This
			
 
				     transformation is done in two steps.
			
 
				 
			
 
				     In the first step, function calls and function parameter lists are modified.
			
--- a/phases/index.mli
+++ b/phases/index.mli
@@ -11,11 +11,9 @@
 
				     variable/function uses with the original [Var|Assign|FunCall] nodes. Then,
			
 
				     index analysis is performed on declarations. Finally, the
			
 
				     {!Context.analyse_context} traversal is re-run to carry the [Index]
			
 
				-    annotations to the variable/function uses.
			
 
				-
			
 
				-    Note that we can safely assume that no errors will occur during this context
			
 
				-    analysis, since incorrect uses would have been spotted by the earlier
			
 
				-    context analysis already.
			
 
				+    annotations to the variable/function uses. Note that we can safely assume
			
 
				+    that no errors will occur during this context analysis, since incorrect uses
			
 
				+    would have been identified by the earlier context analysis already.
			
 
				     *)
			
 
				 
			
 
				 (** Main phase function, called by {!Main}. *)
			
--- a/phases/parse.mli
+++ b/phases/parse.mli
@@ -5,7 +5,7 @@
 
				     [LocMsg], so that the error can be highlighted in the input file code.
			
 
				 
			
 
				     The global files [lexer.mll] and [parser.mly] implement the grammar
			
 
				-    specified by the CiviC language manual This includes the extensions of
			
 
				+    specified by the CiviC language manual. This includes the extensions of
			
 
				     nested functions and multi-dimensional arrays. The entire CiviC grammar
			
 
				     implementation is summarized below. Note that Menhir parser syntax is used
			
 
				     on some occasions. {v
			
--- a/phases/peephole.mli
+++ b/phases/peephole.mli
@@ -33,7 +33,7 @@ becomes [jump label].
 
				     i{inc,dec} L C   |   i{inc,dec}_1 L v}
			
 
				 
			
 
				     Note that the [iload] and [iloadc] may also be in reverse order, which in
			
 
				-    CiviC code is the difference between [i = i + 1;] and [i = 1 + i]. Both
			
 
				+    CiviC code is the difference between [i = i + 1;] and [i = 1 + i;]. Both
			
 
				     orders of succession are supported by the implementation.
			
 
				     *)