| 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155 |
- (** Desugaring: split variable initialisations, prevent incorrect double
- evaluation, and transform for-loops to while-loops. *)
- (** {2 Split variable initialisations}
- Variable initialisations are split into declarations and assignments, to
- generalize the AST format. This makes context analysis easier, since
- initialisations need not be considered. The assignments are placed in the
- function body, after local fuction declarations (which are not in the
- example below). Initialisations fo global variables are moved to a new
- function called "__init", which is a reserved function that is called by the
- VM before the main function is called.
- {v int glob = 1;
- void foo() \{
- int foo = 0;
- int bar = foo;
- \} v}
- becomes:
- {v export void __init() \{
- glob = 1;
- \}
- int glob;
- void foo() \{
- int foo;
- int bar;
- foo = 0;
- bar = foo;
- \} v}
- {3 Array initialisations}
- A more complex class of initialisations are array initialisations. Arrays
- can be initialised to a scalar value or to an array constant in bracket
- notation. A scalar value is rewritten to a nested for-loop over all array
- dmensions, with an assignment in the most nested loop. An array constant is
- rewritten to a series of separate assign statements to the corresponding
- array indices. The following example shows both transformations:
- {v void foo() \{
- int[3] a = 4;
- int[2, 2] b = [[3, 4], [5, 6]];
- \} v}
- This is transformed into:
- {v void foo() \{
- int[3] a;
- int[2, 2] b;
- a = __allocate(3);
- for (int i = 0, 3) \{
- a[i] = 4;
- \}
- b = __allocate(2, 2);
- b[0, 0] = 1;
- b[0, 1] = 2;
- b[1, 0] = 3;
- b[1, 1] = 4;
- \} v}
- Actually, the dimensions of [a] and [b] should have been replaced by new
- variables earlier for reasons described below, but this is not done in the
- example to maintain readability.
- Note that array constants in bracket expressions must have a nesting level
- that is equal to the number of array dimensions, or an error will occur.
- {2 Prevent incorrect double evaluation}
- In the following code:
- {v int twos = 0;
- int two() \{
- twos = twos + 1;
- return 2;
- \}
- void foo() \{
- int[2, two()] a = two();
- printInt(twos);
- printInt(a[1, 1]);
- \} v}
- [two()] must be evaluated exactly twice in order to preserve correct
- behaviour. However, the scalar inialisation will be transformed into a
- for-loop, evaluating [two()] in each loop iteration. Moreover, [a[1, 1]]
- would at a later stage (during array dimension reduction) be transformed
- into [a[(1 * two()) + 1]], thus incorrectly evaluating [two()] an additional
- time.
- The problem is solved by generating new variables for scalar initialisations
- and array dimensions, and replacing the original expression with the
- generated variables. Note that these variables are marked so-called
- "constant variables" since they are known to be assigned exactly once, and
- thus optimizable by {!Constprop} in some cases. This way, only the
- non-constant expressions are defined in new variables in the resulting code.
- In the example above, [int[2, two()] a = two();] is transformed as follows:
- {v ...
- int _a_1_1_ = 2; // will be propagated back by constant propagation
- int _a_2_2_ = two();
- int _scalar_3_ = two();
- int[_a_1_1_, _a_2_2_] a = _scalar_3_;
- ... v}
- resulting in:
- {v ...
- int _a_1_1_;
- int _a_2_2_;
- int _scalar_3_;
- int[_a_1_1_, _a_2_2_] a;
- _a_1_1_ = 2;
- _a_2_2_ = two();
- _scalar_3_ = two();
- a := <allocate>(_a_1_1_, _a_2_2_);
- for (int _i_4 = 0, _a_1_1_) \{
- for (int _i_5 = 0, _a_2_2_) \{
- a[_i_4, _i_5] = _scalar_3_;
- \}
- \}
- ... v}
- The [_a_1_1_] here is formed from the array name [a], the number of the
- dimension [1], and a global counter variable that happened to be [1] at the
- moment he variable was generated. The counter is necessary to make the
- variable name unique, even when the program contains illegal name clashes,
- which would yield weird errors during context analysis. E.g., a second
- definition [int[2] a;] would generate a new variable [_a_1] which would
- clash with the earlier [_a_1], yielding an error on compiler-generated code
- instead of on the definition of [a].
- Note that the for-loops are actually transformed into while-loops, but not
- in this example in order to maintain readability.
- {2 Transforming for-loops to while-loops}
- {v for (int i = <start>, <stop>, <step>) \{
- <body>
- \} v}
- is transformed into:
- {v _i_1 = <start>;
- _step_2 = <step>;
- _stop_3 = <stop>;
- while ((_step_2 > 0) ? (_i_1 < _stop_3) : (_i_1 > _stop_3)) \{
- <body>
- _i_1 = _i_1 + _step_2;
- \} v}
- Here, [_i_1], [_step_2] and [_stop_3] are fresh variables. Definitions of
- these new variables are added to the scope of the current function. Every
- occurrence of [i] in [<body>] is replaced with the fresh variable [_i_1],
- this prevents problems with nested for-loops that use the same induction
- variable.
- *)
- (** Main phase function, called by {!Main}. *)
- val phase : Main.phase_func
|