11 年之前 · 8259c48821
--- a/phases/boolop.mli
+++ b/phases/boolop.mli
@@ -9,10 +9,11 @@
 
															     transforming conjunction and disjunction operators into conditional
														
 
															     operations of the form [a ? b : c]. This operation is supported by the
														
 
															     assembly phase, which handles them similarly to if-else statements. The
														
 
															-    benefit of using conditional operations, is that thiese are still
														
 
															-    expressions. A transformation into actual if-else statements would retuire
														
 
															-    adding additional statements, which creates other problems which we will not
														
 
															-    discuss here.
														
 
															+    benefit of using conditional operations, is that they are still expressions.
														
 
															+    A transformation into actual if-else statements would require adding
														
 
															+    additional statements, which is not only more difficult to implement, but
														
 
															+    also changes the evaluation order, which (however in agreement with the C
														
 
															+    standard) may lead to confusing behaviour.
														
 
															     The applied transformations are as follows:
														
 
															 {v b1 == b2    ==>   (int)b1 == (int)b2
														
--- a/phases/constprop.mli
+++ b/phases/constprop.mli
@@ -1,24 +1,24 @@
 
															-(** Rudimentary constant propagation and constant folding on generated
														
 
															-    variables. *)
														
 
															+(** Rudimentary constant propagation, constant folding and arithmetic
														
 
															+    simplification on generated variables. *)
														
 
															-(** The compiler sometimes generates variables of the form [__foo_1__], to make
														
 
															-    sure that expressions are only executed once. In many cases, this leads to
														
 
															-    over-complex constructions with more variable definitions den necessary, for
														
 
															-    example when converting for-loops to while-loops. We use the knowledge of
														
 
															-    these variables being only assigned once, by propagating the constant values
														
 
															-    to their occurrences, and then apply arithmetic simplification to operators
														
 
															-    to reduce the size and complexity of the generated code. Note that this can
														
 
															-    only be applied to constants. For variables in general, some form of
														
 
															-    liveness analysis would be required (e.g. first convert to Static Single
														
 
															-    Assignment form). Expressions can only be propagated when they have no side
														
 
															-    effects, i.e. when they do not contain function calls. The current
														
 
															-    implementation only propagates [Bool|Int|Float] constants and simple
														
 
															-    variable uses ([VarUse]).
														
 
															+(** The compiler sometimes generates variables of the form [_foo_1_],
														
 
															+    indicating that the variable is assigned exactly once (it is in SSA form).
														
 
															+    In many cases, generated variables leads to over-complex constructions with
														
 
															+    more variable definitions then necessary, for example when converting
														
 
															+    for-loops to while-loops. We use the knowledge of these variables being in
														
 
															+    SSA form, by propagating the constant values to their uses, and then apply
														
 
															+    arithmetic simplification to operators to reduce the size and complexity of
														
 
															+    the generated code. Note that this can only be applied when the assigned
														
 
															+    expression is a constant or an SSA variable, since these have no side
														
 
															+    effects and will not change in between the assignment and their uses. For
														
 
															+    optimisation regular of variables, some form of liveness analysis would be
														
 
															+    required.
														
 
															-    Constant propagation is merged with some some arithmetic simplification here,
														
 
															-    specifically targeting optimization oppertunities created bij earlier
														
 
															-    constant propagation. This is utilized, for example, in array index
														
 
															-    calculation when array dimensions are constant.
														
 
															+    Constant propagation is supplemented with constand folding and some
														
 
															+    arithmetic simplification, the latter specifically targeting optimisation
														
 
															+    oppertunities created by earlier constant propagation. For example, in and
														
 
															+    array index calculation when constant array dimensions are propagated, the
														
 
															+    index calculation can often be simplified.
														
 
															     The following example demonstrates the effectivity of this phase. An array
														
 
															     assignment is transformed into a for-loop, which is transformed into a
														
@@ -26,51 +26,56 @@
 
															     simplified.
														
 
															 {v void foo() \{
														
 
															-    int[2, 2] arr = [[1, 2], 3];
														
 
															+    int[2, 2] arr = 3;
														
 
															 \} v}
														
 
															 After desugaring and array dimension reduction this becomes:
														
 
															 {v void foo() \{
														
 
															-    int ____i_6_7;
														
 
															-    int __arr_1_2__;
														
 
															-    int __arr_2_1__;
														
 
															-    int __const_3__;
														
 
															-    int __const_4__;
														
 
															-    int __const_5__;
														
 
															+    int _i_4;
														
 
															+    int _stop_5_;
														
 
															+    int _step_6_;
														
 
															+    int _i_7;
														
 
															+    int _stop_8_;
														
 
															+    int _step_9_;
														
 
															+    int _arr_0_;
														
 
															+    int _arr_1_;
														
 
															+    int _scalar_1_;
														
 
															     int[] arr;
														
 
															-    int __stop_8__;
														
 
															-    int __step_9__;
														
 
															-    __arr_1_2__ = 2;
														
 
															-    __arr_2_1__ = 2;
														
 
															-    __const_3__ = 1;
														
 
															-    __const_4__ = 2;
														
 
															-    __const_5__ = 3;
														
 
															-    arr := <allocate>((__arr_1_2__ * __arr_2_1__));
														
 
															-    arr[((0 * __arr_2_1__) + 0)] = __const_3__;
														
 
															-    arr[((0 * __arr_2_1__) + 1)] = __const_4__;
														
 
															-    ____i_6_7 = 0;
														
 
															-    __stop_8__ = __arr_2_1__;
														
 
															-    __step_9__ = 1;
														
 
															-    while (((__step_9__ > 0) ? (____i_6_7 < __stop_8__) : (____i_6_7 > __stop_8__))) \{
														
 
															-        arr[((1 * __arr_2_1__) + ____i_6_7)] = __const_5__;
														
 
															-        ____i_6_7 = (____i_6_7 + __step_9__);
														
 
															+    _arr_0_ = 2;
														
 
															+    _arr_1_ = 2;
														
 
															+    _scalar_1_ = 3;
														
 
															+    arr := <allocate>((_arr_0_ * _arr_1_));
														
 
															+    _i_4 = 0;
														
 
															+    _stop_5_ = _arr_0_;
														
 
															+    _step_6_ = 1;
														
 
															+    while (((_step_6_ > 0) ? (_i_4 < _stop_5_) : (_i_4 > _stop_5_))) \{
														
 
															+        _i_7 = 0;
														
 
															+        _stop_8_ = _arr_1_;
														
 
															+        _step_9_ = 1;
														
 
															+        while (((_step_9_ > 0) ? (_i_7 < _stop_8_) : (_i_7 > _stop_8_))) \{
														
 
															+            arr[((_i_4 * _arr_1_) + _i_7)] = _scalar_1_;
														
 
															+            _i_7 = (_i_7 + _step_9_);
														
 
															+        \}
														
 
															+        _i_4 = (_i_4 + _step_6_);
														
 
															     \}
														
 
															-
														
 
															 \} v}
														
 
															 Constant propagation reduces this to:
														
 
															 {v void foo() \{
														
 
															-    int ____i_6_7;
														
 
															+    int _i_4;
														
 
															+    int _i_7;
														
 
															     int[] arr;
														
 
															     arr := <allocate>(4);
														
 
															-    arr[0] = 1;
														
 
															-    arr[1] = 2;
														
 
															-    ____i_6_7 = 0;
														
 
															-    while ((____i_6_7 < 2)) \{
														
 
															-        arr[(2 + ____i_6_7)] = 3;
														
 
															-        ____i_6_7 = (____i_6_7 + 1);
														
 
															+    _i_4 = 0;
														
 
															+    while ((_i_4 < 2)) \{
														
 
															+        _i_7 = 0;
														
 
															+        while ((_i_7 < 2)) \{
														
 
															+            arr[((_i_4 * 2) + _i_7)] = 3;
														
 
															+            _i_7 = (_i_7 + 1);
														
 
															+        \}
														
 
															+        _i_4 = (_i_4 + 1);
														
 
															     \}
														
 
															 \} v}
														
 
															     *)
														
--- a/phases/context.mli
+++ b/phases/context.mli
@@ -2,23 +2,23 @@
 
															 (** The desugared CiviC code contains [Var], [FunCall] and [Assign] nodes. These
														
 
															     all use variables or functions identified by a [string] name. The context
														
 
															-    analysis phase links each occurrence of this node to a declaration: a
														
 
															+    analysis phase links each variable occurrence to its declaration: a
														
 
															     [VarDec], [Param], [Dim], [GlobalDe[cf]] or [FunDe[cf]].  Since the original
														
 
															     nodes only have a [string] field to save the declaration, new node types
														
 
															     have been added which replace the name with a declaration node: [VarUse],
														
 
															     [FunUse], and [VarLet].
														
 
															-    The phase traverses into functions, but first finds declarations in the
														
 
															-    entire outer scope of the function, since functions can use any function of
														
 
															-    variable that is defined within the same scope.
														
 
															-
														
 
															     The whole analysis is done in one traversal. When a declaration node is
														
 
															     encountered, its name and declaration are added to the currect scope (a
														
 
															-    mutable hash table). When a vairable of fuction use is encountered, the name
														
 
															+    mutable hash table). When a variable of fuction use is encountered, the name
														
 
															     and declaration are looked up in the current scope. The scope is duplicated
														
 
															     when entering a function, and restored when exiting the function, so that
														
 
															     functions that are not subroutines of each other, do not share inner variable
														
 
															-    definitions. *)
														
 
															+    definitions. Note that the traversal traverses into functions AFTER it has
														
 
															+    found all declarations in the outer scope of the function, since functions
														
 
															+    can use any function of variable that is defined within the same scope (also
														
 
															+    those defined after the function itself).
														
 
															+    *)
														
 
															 (** Traversal that replaces names with declarations. Exported for use in other
														
 
															     phases. *)
														
--- a/phases/desug.mli
+++ b/phases/desug.mli
@@ -30,12 +30,13 @@ void foo() \{
 
															     {3 Array initialisations}
														
 
															-    A more complex class of initialisations are array initialisations. Arrays
														
 
															-    can be initialised to a scalar value or to an array constant in bracket
														
 
															-    notation. A scalar value is rewritten to a nested for-loop over all array
														
 
															-    dmensions, with an assignment in the most nested loop. An array constant is
														
 
															-    rewritten to a series of separate assign statements to the corresponding
														
 
															-    array indices. The following example shows both transformations:
														
 
															+    A more complex class of initialisations is that of array initialisations.
														
 
															+    Arrays can be initialised to a scalar value or to an array literal in
														
 
															+    bracket notation. A scalar value is rewritten to a nested for-loop over all
														
 
															+    array dmensions, with an assignment in the most nested loop. An array
														
 
															+    constant is rewritten to a series of separate assign statements to the
														
 
															+    corresponding array indices. The following example shows both
														
 
															+    transformations:
														
 
															 {v void foo() \{
														
 
															     int[3] a = 4;
														
 
															     int[2, 2] b = [[3, 4], [5, 6]];
														
@@ -62,9 +63,9 @@ void foo() \{
 
															     example to maintain readability.
														
 
															     Note that array constants in bracket expressions must have a nesting level
														
 
															-    that is equal to the number of array dimensions, or an error will occur.
														
 
															+    that is equal to the number of array dimensions, else an error will occur.
														
 
															-    {2 Prevent incorrect double evaluation}
														
 
															+    {2 Move array dimensions and scalars into new variables}
														
 
															     In the following code:
														
 
															 {v int twos = 0;
														
@@ -89,44 +90,44 @@ void foo() \{
 
															     and array dimensions, and replacing the original expression with the
														
 
															     generated variables. Note that these variables are marked so-called
														
 
															     "constant variables" since they are known to be assigned exactly once, and
														
 
															-    thus optimizable by {!Constprop} in some cases. This way, only the
														
 
															-    non-constant expressions are defined in new variables in the resulting code.
														
 
															+    thus likely optimizable by {!Constprop}. This way, only the non-constant
														
 
															+    expressions are defined in new variables in the final code.
														
 
															     In the example above, [int[2, two()] a = two();] is transformed as follows:
														
 
															 {v     ...
														
 
															-    int _a_1_1_ = 2;  // will be propagated back by constant propagation
														
 
															-    int _a_2_2_ = two();
														
 
															-    int _scalar_3_ = two();
														
 
															-    int[_a_1_1_, _a_2_2_] a = _scalar_3_;
														
 
															+    int _a_0_ = 2;  // 2 will be propagated back by constant propagation
														
 
															+    int _a_1_ = two();
														
 
															+    int _scalar_1_ = two();
														
 
															+    int[_a_0_, _a_1_] a = _scalar_1_;
														
 
															     ...  v}
														
 
															 resulting in:
														
 
															 {v     ...
														
 
															-    int _a_1_1_;
														
 
															-    int _a_2_2_;
														
 
															-    int _scalar_3_;
														
 
															-    int[_a_1_1_, _a_2_2_] a;
														
 
															-    _a_1_1_ = 2;
														
 
															-    _a_2_2_ = two();
														
 
															-    _scalar_3_ = two();
														
 
															-    a := <allocate>(_a_1_1_, _a_2_2_);
														
 
															-    for (int _i_4 = 0, _a_1_1_) \{
														
 
															-        for (int _i_5 = 0, _a_2_2_) \{
														
 
															-            a[_i_4, _i_5] = _scalar_3_;
														
 
															+    int _a_0_;
														
 
															+    int _a_1_;
														
 
															+    int _scalar_1_;
														
 
															+    int[_a_0_, _a_1_] a;
														
 
															+    _a_0_ = 2;
														
 
															+    _a_1_ = two();
														
 
															+    _scalar_1_ = two();
														
 
															+    a := <allocate>(_a_0_, _a_1_);
														
 
															+    for (int _i_2 = 0, _a_0_) \{
														
 
															+        for (int _i_3 = 0, _a_1_) \{
														
 
															+            a[_i_2, _i_3] = _scalar_1_;
														
 
															         \}
														
 
															     \}
														
 
															     ...  v}
														
 
															-    The [_a_1_1_] here is formed from the array name [a], the number of the
														
 
															-    dimension [1], and a global counter variable that happened to be [1] at the
														
 
															-    moment he variable was generated. The counter is necessary to make the
														
 
															-    variable name unique, even when the program contains illegal name clashes,
														
 
															-    which would yield weird errors during context analysis. E.g., a second
														
 
															-    definition [int[2] a;] would generate a new variable [_a_1] which would
														
 
															-    clash with the earlier [_a_1], yielding an error on compiler-generated code
														
 
															-    instead of on the definition of [a].
														
 
															-
														
 
															-    Note that the for-loops are actually transformed into while-loops, but not
														
 
															-    in this example in order to maintain readability.
														
 
															+    The transformation described above is applied to all array definitions,
														
 
															+    including extern arrays. Although dimensions of extern arrays are not
														
 
															+    expressions (but identifiers), the transformation is necessary in order to
														
 
															+    generate consistent names to be imported/exported. E.g. in [int[n] a], [n]
														
 
															+    is just a name given locally to the first dimension of [a]. Therefore it is
														
 
															+    transformed into:
														
 
															+{v     extern int _a_0_;
														
 
															+    int[_a_0_] a; v}
														
 
															+    Also, all occurrences of [n] in the rest of the module are replaced by
														
 
															+    [_a_0_]. For exported arrays, the generated dimension variables need to be
														
 
															+    exported as well.
														
 
															     {2 Transforming for-loops to while-loops}
														
--- a/phases/dimreduce.mli
+++ b/phases/dimreduce.mli
@@ -2,7 +2,7 @@
 
															     arrays, and pass original array dimensions in function calls. *)
														
 
															 (**
														
 
															-    This phase lowers multi-dimensional ararys to one-dimensional arrays. This
														
 
															+    This phase lowers multi-dimensional arrays to one-dimensional arrays. This
														
 
															     transformation is done in two steps.
														
 
															     In the first step, function calls and function parameter lists are modified.
														
--- a/phases/index.mli
+++ b/phases/index.mli
@@ -11,11 +11,9 @@
 
															     variable/function uses with the original [Var|Assign|FunCall] nodes. Then,
														
 
															     index analysis is performed on declarations. Finally, the
														
 
															     {!Context.analyse_context} traversal is re-run to carry the [Index]
														
 
															-    annotations to the variable/function uses.
														
 
															-
														
 
															-    Note that we can safely assume that no errors will occur during this context
														
 
															-    analysis, since incorrect uses would have been spotted by the earlier
														
 
															-    context analysis already.
														
 
															+    annotations to the variable/function uses. Note that we can safely assume
														
 
															+    that no errors will occur during this context analysis, since incorrect uses
														
 
															+    would have been identified by the earlier context analysis already.
														
 
															     *)
														
 
															 (** Main phase function, called by {!Main}. *)
														
--- a/phases/parse.mli
+++ b/phases/parse.mli
@@ -5,7 +5,7 @@
 
															     [LocMsg], so that the error can be highlighted in the input file code.
														
 
															     The global files [lexer.mll] and [parser.mly] implement the grammar
														
 
															-    specified by the CiviC language manual This includes the extensions of
														
 
															+    specified by the CiviC language manual. This includes the extensions of
														
 
															     nested functions and multi-dimensional arrays. The entire CiviC grammar
														
 
															     implementation is summarized below. Note that Menhir parser syntax is used
														
 
															     on some occasions. {v
														
--- a/phases/peephole.mli
+++ b/phases/peephole.mli
@@ -33,7 +33,7 @@ becomes [jump label].
 
															     i{inc,dec} L C   |   i{inc,dec}_1 L v}
														
 
															     Note that the [iload] and [iloadc] may also be in reverse order, which in
														
 
															-    CiviC code is the difference between [i = i + 1;] and [i = 1 + i]. Both
														
 
															+    CiviC code is the difference between [i = i + 1;] and [i = 1 + i;]. Both
														
 
															     orders of succession are supported by the implementation.
														
 
															     *)