desug.mli 4.7 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148
  1. (** Desugaring: split variable initialisations, prevent incorrect double
  2. evaluation, and transform for-loops to while-loops. *)
  3. (** {2 Split variable initialisations}
  4. Variable initialisations are split into declarations and assignments, to
  5. generalize the AST format. This makes context analysis easier, since
  6. initialisations need not be considered. The assignments are placed in the
  7. function body, after local fuction declarations (which are not in the
  8. example below). Initialisations fo global variables are moved to a new
  9. function called "__init", which is a reserved function that is called by the
  10. VM before the main function is called.
  11. {v int glob = 1;
  12. void foo() \{
  13. int foo = 0;
  14. int bar = foo;
  15. \} v}
  16. becomes:
  17. {v export void __init() \{
  18. glob = 1;
  19. \}
  20. int glob;
  21. void foo() \{
  22. int foo;
  23. int bar;
  24. foo = 0;
  25. bar = foo;
  26. \} v}
  27. {3 Array initialisations}
  28. A more complex class of initialisations are array initialisations. Arrays
  29. can be initialised to a scalar value or to an array constant in bracket
  30. notation. A scalar value is rewritten to a nested for-loop over all array
  31. dmensions, with an assignment in the most nested loop. An array constant is
  32. rewritten to a series of separate assign statements to the corresponding
  33. array indices. The following example shows both transformations:
  34. {v void foo() \{
  35. int[3] a = 4;
  36. int[2, 2] b = [[3, 4], [5, 6]];
  37. \} v}
  38. This is transformed into:
  39. {v void foo() \{
  40. int[] a;
  41. int[] b;
  42. a = __allocate(3);
  43. for (int i = 0, 3) \{
  44. a[i] = 4;
  45. \}
  46. b = __allocate(2, 2);
  47. b[0, 0] = 1;
  48. b[0, 1] = 2;
  49. b[1, 0] = 3;
  50. b[1, 1] = 4;
  51. \} v}
  52. {2 Prevent incorrect double evaluation}
  53. In the following code:
  54. {v int twos = 0;
  55. int two() \{
  56. twos = twos + 1;
  57. return 2;
  58. \}
  59. void foo() \{
  60. int[2, two()] a = two();
  61. printInt(twos);
  62. printInt(a[1, 1]);
  63. \} v}
  64. [two()] must be evaluated exactly twice in order to preserve correct
  65. behaviour. However, the scalar inialisation will be transformed into a
  66. for-loop, evaluating [two()] in each loop iteration. Moreover, [a[1, 1]]
  67. would at a later stage (during array dimension reduction) be transformed
  68. into [a[(1 * two()) + 1]], thus incorrectly evaluating [two()] an additional
  69. time.
  70. The problem is solved by generating new variables for scalar initialisations
  71. and array dimensions, and replacing the original expression with the
  72. generated variables. Note that these variables are marked so-called
  73. "constant variables" since they are known to be assigned exactly once, and
  74. thus optimizable by {!Constprop} in some cases. This way, only the
  75. non-constant expressions are defined in new variables in the resulting code.
  76. In the example above, [int[2, two()] a = two();] is transformed as follows:
  77. {v ...
  78. int _a_1_1_ = 2; // will be propagated back by constant propagation
  79. int _a_2_2_ = two();
  80. int _scalar_3_ = two();
  81. int[_a_1_1_, _a_2_2_] a = _scalar_3_;
  82. ... v}
  83. resulting in:
  84. {v ...
  85. int _a_1_1_;
  86. int _a_2_2_;
  87. int _scalar_3_;
  88. int[_a_1_1_, _a_2_2_] a;
  89. _a_1_1_ = 2;
  90. _a_2_2_ = two();
  91. _scalar_3_ = two();
  92. a := <allocate>(_a_1_1_, _a_2_2_);
  93. for (int _i_4 = 0, _a_1_1_) \{
  94. for (int _i_5 = 0, _a_2_2_) \{
  95. a[_i_4, _i_5] = _scalar_3_;
  96. \}
  97. \}
  98. ... v}
  99. The [_a_1_1_] here is formed from the array name [a], the number of the
  100. dimension [1], and a global counter variable that happened to be [1] at the
  101. moment he variable was generated. The counter is necessary to make the
  102. variable name unique, even when the program contains illegal name clashes,
  103. which would yield weird errors during context analysis. E.g., a second
  104. definition [int[2] a;] would generate a new variable [_a_1] which would
  105. clash with the earlier [_a_1], yielding an error on compiler-generated code
  106. instead of on the definition of [a].
  107. Note that the for-loops are actually transformed into while-loops, but not
  108. in this example in order to maintain readability.
  109. {2 Transforming for-loops to while-loops}
  110. {v for (int i = <start>, <stop>, <step>) \{
  111. <body>
  112. \} v}
  113. is transformed into:
  114. {v _i_1 = <start>;
  115. _step_2 = <step>;
  116. _stop_3 = <stop>;
  117. while ((_step_2 > 0) ? (_i_1 < _stop_3) : (_i_1 > _stop_3)) \{
  118. <body>
  119. _i_1 = _i_1 + _step_2;
  120. \} v}
  121. Here, [_i_1], [_step_2] and [_stop_3] are fresh variables. Definitions of
  122. these new variables are added to the scope of the current function. Every
  123. occurrence of [i] in [<body>] is replaced with the fresh variable [_i_1],
  124. this prevents problems with nested for-loops that use the same induction
  125. variable.
  126. *)
  127. (** Main phase function, called by {!Main}. *)
  128. val phase : Main.phase_func