desug.mli 5.1 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155
  1. (** Desugaring: split variable initialisations, prevent incorrect double
  2. evaluation, and transform for-loops to while-loops. *)
  3. (** {2 Split variable initialisations}
  4. Variable initialisations are split into declarations and assignments, to
  5. generalize the AST format. This makes context analysis easier, since
  6. initialisations need not be considered. The assignments are placed in the
  7. function body, after local fuction declarations (which are not in the
  8. example below). Initialisations fo global variables are moved to a new
  9. function called "__init", which is a reserved function that is called by the
  10. VM before the main function is called.
  11. {v int glob = 1;
  12. void foo() \{
  13. int foo = 0;
  14. int bar = foo;
  15. \} v}
  16. becomes:
  17. {v export void __init() \{
  18. glob = 1;
  19. \}
  20. int glob;
  21. void foo() \{
  22. int foo;
  23. int bar;
  24. foo = 0;
  25. bar = foo;
  26. \} v}
  27. {3 Array initialisations}
  28. A more complex class of initialisations are array initialisations. Arrays
  29. can be initialised to a scalar value or to an array constant in bracket
  30. notation. A scalar value is rewritten to a nested for-loop over all array
  31. dmensions, with an assignment in the most nested loop. An array constant is
  32. rewritten to a series of separate assign statements to the corresponding
  33. array indices. The following example shows both transformations:
  34. {v void foo() \{
  35. int[3] a = 4;
  36. int[2, 2] b = [[3, 4], [5, 6]];
  37. \} v}
  38. This is transformed into:
  39. {v void foo() \{
  40. int[3] a;
  41. int[2, 2] b;
  42. a = __allocate(3);
  43. for (int i = 0, 3) \{
  44. a[i] = 4;
  45. \}
  46. b = __allocate(2, 2);
  47. b[0, 0] = 1;
  48. b[0, 1] = 2;
  49. b[1, 0] = 3;
  50. b[1, 1] = 4;
  51. \} v}
  52. Actually, the dimensions of [a] and [b] should have been replaced by new
  53. variables earlier for reasons described below, but this is not done in the
  54. example to maintain readability.
  55. Note that array constants in bracket expressions must have a nesting level
  56. that is equal to the number of array dimensions, or an error will occur.
  57. {2 Prevent incorrect double evaluation}
  58. In the following code:
  59. {v int twos = 0;
  60. int two() \{
  61. twos = twos + 1;
  62. return 2;
  63. \}
  64. void foo() \{
  65. int[2, two()] a = two();
  66. printInt(twos);
  67. printInt(a[1, 1]);
  68. \} v}
  69. [two()] must be evaluated exactly twice in order to preserve correct
  70. behaviour. However, the scalar inialisation will be transformed into a
  71. for-loop, evaluating [two()] in each loop iteration. Moreover, [a[1, 1]]
  72. would at a later stage (during array dimension reduction) be transformed
  73. into [a[(1 * two()) + 1]], thus incorrectly evaluating [two()] an additional
  74. time.
  75. The problem is solved by generating new variables for scalar initialisations
  76. and array dimensions, and replacing the original expression with the
  77. generated variables. Note that these variables are marked so-called
  78. "constant variables" since they are known to be assigned exactly once, and
  79. thus optimizable by {!Constprop} in some cases. This way, only the
  80. non-constant expressions are defined in new variables in the resulting code.
  81. In the example above, [int[2, two()] a = two();] is transformed as follows:
  82. {v ...
  83. int _a_1_1_ = 2; // will be propagated back by constant propagation
  84. int _a_2_2_ = two();
  85. int _scalar_3_ = two();
  86. int[_a_1_1_, _a_2_2_] a = _scalar_3_;
  87. ... v}
  88. resulting in:
  89. {v ...
  90. int _a_1_1_;
  91. int _a_2_2_;
  92. int _scalar_3_;
  93. int[_a_1_1_, _a_2_2_] a;
  94. _a_1_1_ = 2;
  95. _a_2_2_ = two();
  96. _scalar_3_ = two();
  97. a := <allocate>(_a_1_1_, _a_2_2_);
  98. for (int _i_4 = 0, _a_1_1_) \{
  99. for (int _i_5 = 0, _a_2_2_) \{
  100. a[_i_4, _i_5] = _scalar_3_;
  101. \}
  102. \}
  103. ... v}
  104. The [_a_1_1_] here is formed from the array name [a], the number of the
  105. dimension [1], and a global counter variable that happened to be [1] at the
  106. moment he variable was generated. The counter is necessary to make the
  107. variable name unique, even when the program contains illegal name clashes,
  108. which would yield weird errors during context analysis. E.g., a second
  109. definition [int[2] a;] would generate a new variable [_a_1] which would
  110. clash with the earlier [_a_1], yielding an error on compiler-generated code
  111. instead of on the definition of [a].
  112. Note that the for-loops are actually transformed into while-loops, but not
  113. in this example in order to maintain readability.
  114. {2 Transforming for-loops to while-loops}
  115. {v for (int i = <start>, <stop>, <step>) \{
  116. <body>
  117. \} v}
  118. is transformed into:
  119. {v _i_1 = <start>;
  120. _step_2 = <step>;
  121. _stop_3 = <stop>;
  122. while ((_step_2 > 0) ? (_i_1 < _stop_3) : (_i_1 > _stop_3)) \{
  123. <body>
  124. _i_1 = _i_1 + _step_2;
  125. \} v}
  126. Here, [_i_1], [_step_2] and [_stop_3] are fresh variables. Definitions of
  127. these new variables are added to the scope of the current function. Every
  128. occurrence of [i] in [<body>] is replaced with the fresh variable [_i_1],
  129. this prevents problems with nested for-loops that use the same induction
  130. variable.
  131. *)
  132. (** Main phase function, called by {!Main}. *)
  133. val phase : Main.phase_func