taddeus
/
civicaml


			
				
					
						
						
							123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155
							(** Desugaring: split variable initialisations, prevent incorrect double
    evaluation, and transform for-loops to while-loops. *)

(** {2 Split variable initialisations}

    Variable initialisations are split into declarations and assignments, to
    generalize the AST format. This makes context analysis easier, since
    initialisations need not be considered. The assignments are placed in the
    function body, after local fuction declarations (which are not in the
    example below). Initialisations fo global variables are moved to a new
    function called "__init", which is a reserved function that is called by the
    VM before the main function is called.

{v int glob = 1;
void foo() \{
  int foo = 0;
  int bar = foo;
\} v}
    becomes:
{v export void __init() \{
    glob = 1;
\}
int glob;
void foo() \{
  int foo;
  int bar;
  foo = 0;
  bar = foo;
\} v}

    {3 Array initialisations}

    A more complex class of initialisations are array initialisations. Arrays
    can be initialised to a scalar value or to an array constant in bracket
    notation. A scalar value is rewritten to a nested for-loop over all array
    dmensions, with an assignment in the most nested loop. An array constant is
    rewritten to a series of separate assign statements to the corresponding
    array indices. The following example shows both transformations:
{v void foo() \{
    int[3] a = 4;
    int[2, 2] b = [[3, 4], [5, 6]];
\} v}

    This is transformed into:

{v void foo() \{
    int[3] a;
    int[2, 2] b;
    a = __allocate(3);
    for (int i = 0, 3) \{
        a[i] = 4;
    \}
    b = __allocate(2, 2);
    b[0, 0] = 1;
    b[0, 1] = 2;
    b[1, 0] = 3;
    b[1, 1] = 4;
\} v}

    Actually, the dimensions of [a] and [b] should have been replaced by new
    variables earlier for reasons described below, but this is not done in the
    example to maintain readability.

    Note that array constants in bracket expressions must have a nesting level
    that is equal to the number of array dimensions, or an error will occur.

    {2 Prevent incorrect double evaluation}

    In the following code:
{v int twos = 0;
int two() \{
  twos = twos + 1;
  return 2;
\}
void foo() \{
  int[2, two()] a = two();
  printInt(twos);
  printInt(a[1, 1]);
\} v}

    [two()] must be evaluated exactly twice in order to preserve correct
    behaviour. However, the scalar inialisation will be transformed into a
    for-loop, evaluating [two()] in each loop iteration. Moreover, [a[1, 1]]
    would at a later stage (during array dimension reduction) be transformed
    into [a[(1 * two()) + 1]], thus incorrectly evaluating [two()] an additional
    time.

    The problem is solved by generating new variables for scalar initialisations
    and array dimensions, and replacing the original expression with the
    generated variables. Note that these variables are marked so-called
    "constant variables" since they are known to be assigned exactly once, and
    thus optimizable by {!Constprop} in some cases. This way, only the
    non-constant expressions are defined in new variables in the resulting code.

    In the example above, [int[2, two()] a = two();] is transformed as follows:
{v     ...
    int _a_1_1_ = 2;  // will be propagated back by constant propagation
    int _a_2_2_ = two();
    int _scalar_3_ = two();
    int[_a_1_1_, _a_2_2_] a = _scalar_3_;
    ...  v}
resulting in:
{v     ...
    int _a_1_1_;
    int _a_2_2_;
    int _scalar_3_;
    int[_a_1_1_, _a_2_2_] a;
    _a_1_1_ = 2;
    _a_2_2_ = two();
    _scalar_3_ = two();
    a := <allocate>(_a_1_1_, _a_2_2_);
    for (int _i_4 = 0, _a_1_1_) \{
        for (int _i_5 = 0, _a_2_2_) \{
            a[_i_4, _i_5] = _scalar_3_;
        \}
    \}
    ...  v}

    The [_a_1_1_] here is formed from the array name [a], the number of the
    dimension [1], and a global counter variable that happened to be [1] at the
    moment he variable was generated. The counter is necessary to make the
    variable name unique, even when the program contains illegal name clashes,
    which would yield weird errors during context analysis. E.g., a second
    definition [int[2] a;] would generate a new variable [_a_1] which would
    clash with the earlier [_a_1], yielding an error on compiler-generated code
    instead of on the definition of [a].

    Note that the for-loops are actually transformed into while-loops, but not
    in this example in order to maintain readability.

    {2 Transforming for-loops to while-loops}

{v for (int i = <start>, <stop>, <step>) \{
    <body>
\} v}

    is transformed into:

{v _i_1 = <start>;
_step_2 = <step>;
_stop_3 = <stop>;
while ((_step_2 > 0) ? (_i_1 < _stop_3) : (_i_1 > _stop_3)) \{
    <body>
    _i_1 = _i_1 + _step_2;
\} v}

    Here, [_i_1], [_step_2] and [_stop_3] are fresh variables. Definitions of
    these new variables are added to the scope of the current function. Every
    occurrence of [i] in [<body>] is replaced with the fresh variable [_i_1],
    this prevents problems with nested for-loops that use the same induction
    variable.
    *)

(** Main phase function, called by {!Main}. *)
val phase : Main.phase_func