report.tex 12 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340
  1. \documentclass[10pt,a4paper]{article}
  2. \usepackage[latin1]{inputenc}
  3. \usepackage{amsmath}
  4. \usepackage{amsfonts}
  5. \usepackage{amssymb}
  6. \usepackage{booktabs}
  7. \usepackage{graphicx}
  8. \usepackage{listings}
  9. \usepackage{subfigure}
  10. \usepackage{float}
  11. \usepackage{hyperref}
  12. \title{Peephole Optimizer}
  13. \author{Jayke Meijer (6049885), Richard Torenvliet (6138861), Tadde\"us Kroes
  14. (6054129)}
  15. \begin{document}
  16. \maketitle
  17. \tableofcontents
  18. \pagebreak
  19. \section{Introduction}
  20. The goal of the assignment is to implement the optimization stage of the
  21. compiler. To reach this goal the parser and the optimizer part of the compiler
  22. have to be implemented.
  23. The output of the xgcc cross compiler on a C program is our input. The output
  24. of the xgcc cross compiler is in the form of Assembly code, but not optimized.
  25. Our assignment includes a number of C programs. An important part of the
  26. assignment is parsing the data. Parsing the data is done with Lex and Yacc. The
  27. Lexer is a program that finds keywords that meets the regular expression
  28. provided in the Lexer. After the Lexer, the Yaccer takes over. Yacc can turn
  29. the keywords in to an action.
  30. \section{Design}
  31. There are two general types of optimizations of the assembly code, global
  32. optimizations and optimizations on a so-called basic block. These optimizations
  33. will be discussed separately
  34. \subsection{Global optimizations}
  35. We only perform one global optimization, which is optimizing branch-jump
  36. statements. The unoptimized Assembly code contains sequences of code of the
  37. following structure:
  38. \begin{verbatim}
  39. beq ...,$Lx
  40. j $Ly
  41. $Lx: ...
  42. \end{verbatim}
  43. This is inefficient, since there is a jump to a label that follows this code.
  44. It would be more efficient to replace the branch statement with a \texttt{bne}
  45. (the opposite case) to the label used in the jump statement. This way the jump
  46. statement can be eliminated, since the next label follows anyway. The same can
  47. of course be done for the opposite case, where a \texttt{bne} is changed into a
  48. \texttt{beq}.
  49. Since this optimization is done between two series of codes with jumps and
  50. labels, we can not perform this code during the basic block optimizations. The
  51. reason for this will become clearer in the following section.
  52. \subsection{Basic Block Optimizations}
  53. Optimizations on basic blocks are a more important part of the optimizer.
  54. First, what is a basic block? A basic block is a sequence of statements
  55. guaranteed to be executed in that order, and that order alone. This is the case
  56. for a piece of code not containing any branches or jumps.
  57. To create a basic block, you need to define what is the leader of a basic
  58. block. We call a statement a leader if it is either a jump/branch statement, or
  59. the target of such a statement. Then a basic block runs from one leader until
  60. the next leader.
  61. There are quite a few optimizations we perform on these basic blocks, so we
  62. will describe the types of optimizations here in stead of each optimization.
  63. \subsubsection*{Standard peephole optimizations}
  64. These are optimizations that simply look for a certain statement or pattern of
  65. statements, and optimize these. For example,
  66. \begin{verbatim}
  67. mov $regA,$regB
  68. instr $regA, $regA,...
  69. \end{verbatim}
  70. can be optimized into
  71. \begin{verbatim}
  72. instr $regA, $regB,...
  73. \end{verbatim}
  74. since the register \texttt{\$regA} gets overwritten by the second instruction
  75. anyway, and the instruction can easily use \texttt{\$regB} in stead of
  76. \texttt{\$regA}. There are a few more of these cases, which are the same as
  77. those described on the practicum page
  78. \footnote{\url{http://staff.science.uva.nl/~andy/compiler/prac.html}} and in
  79. Appendix \ref{opt}.
  80. \subsubsection*{Common subexpression elimination}
  81. A more advanced optimization is common subexpression elimination. This means
  82. that expensive operations as a multiplication or addition are performed only
  83. once and the result is then `copied' into variables where needed.
  84. \begin{verbatim}
  85. non-optimized:
  86. addu $2,$2,$3
  87. addu $4,$2,$3
  88. optimized:
  89. \end{verbatim}
  90. A standard method for doing this is the creation of a DAG or Directed Acyclic
  91. Graph. However, this requires a fairly advanced implementation. Our
  92. implementation is a slightly less fancy, but easier to implement.
  93. We search from the end of the block up for instructions that are eligible for
  94. CSE. If we find one, we check further up in the code for the same instruction,
  95. and add that to a temporary storage list. This is done until the beginning of
  96. the block or until one of the arguments of this expression is assigned.
  97. We now add the instruction above the first use, and write the result in a new
  98. variable. Then all occurrences of this expression can be replaced by a move of
  99. from new variable into the original destination variable of the instruction.
  100. This is a less efficient method then the DAG, but because the basic blocks are
  101. in general not very large and the execution time of the optimizer is not a
  102. primary concern, this is not a big problem.
  103. \subsubsection*{Constant folding}
  104. Another optimization is to do constant folding. Constant folding is replacing
  105. a expensive step like addition with a more simple step like loading a constant.
  106. Of course, this is not always possible. It is possible in cases where you apply
  107. an operation on two constants, or a constant and a variable of which you know
  108. for sure that it always has a certain value at that point. For example:
  109. \begin{verbatim}
  110. li $regA, 1 li $regA, 1
  111. addu $regB, $regA, 2 -> li $regB, 3
  112. \end{verbatim}
  113. Of course, if \texttt{\$regA} is not used after this, it can be removed, which
  114. will be done by the dead code elimination.
  115. One problem we encountered with this is that the use of a \texttt{li} is that
  116. the program often also stores this in the memory, so we had to check whether
  117. this was necessary here as well.
  118. \subsubsection*{Copy propagation}
  119. Copy propagation `unpacks' a move instruction, by replacing its destination
  120. address with its source address in the code following the move instruction.
  121. This is not a direct optimization, but this does allow for a more effective
  122. dead code elimination.
  123. The code of the block is checked linearly. When a move operation is
  124. encountered, the source and destination address of this move are stored. When
  125. a normal operation with a source and a destination address are found, a number
  126. of checks are performed.
  127. The first check is whether the destination address is stored as a destination
  128. address of a move instruction. If so, this move instruction is no longer valid,
  129. so the optimizations can not be done. Otherwise, continue with the second
  130. check.
  131. In the second check, the source address is compared to the destination
  132. addresses of all still valid move operations. If these are the same, in the
  133. current operation the found source address is replaced with the source address
  134. of the move operation.
  135. An example would be the following:
  136. \begin{verbatim}
  137. move $regA, $regB move $regA, $regB
  138. ... ...
  139. Code not writing $regA, $regB -> ...
  140. ... ...
  141. addu $regC, $regA, ... addu $regC, $regB, ...
  142. \end{verbatim}
  143. This code shows that \texttt{\$regA} is replaced with \texttt{\$regB}. This
  144. way, the move instruction might have become useless, and it will then be
  145. removed by the dead code elimination.
  146. \subsubsection*{Algebraic transformations}
  147. Some expression can easily be replaced with more simple once if you look at
  148. what they are saying algebraically. An example is the statement $x = y + 0$, or
  149. in Assembly \texttt{addu \$1, \$2, 0}. This can easily be changed into $x = y$
  150. or \texttt{move \$1, \$2}.
  151. Another case is the multiplication with a power of two. This can be done way
  152. more efficiently by shifting left a number of times. An example:
  153. \texttt{mult \$regA, \$regB, 4 -> sll \$regA, \$regB, 2}. We perform this
  154. optimization for any multiplication with a power of two.
  155. There are a number of such cases, all of which are once again stated in
  156. appendix \ref{opt}.
  157. \section{Implementation}
  158. We decided to implement the optimization in Python. We chose this programming
  159. language because Python is an easy language to manipulate strings, work
  160. object-oriented etc.
  161. It turns out that a Lex and Yacc are also available as a Python module,
  162. named PLY(Python Lex-Yacc). This allows us to use one language, Python, instead
  163. of two, i.e. C and Python. Also no debugging is needed in C, only in Python
  164. which makes our assignment more feasible.
  165. The program has three steps, parsing the Assembly code into a datastructure we
  166. can use, the so-called Intermediate Representation, performing optimizations on
  167. this IR and writing the IR back to Assembly.
  168. \subsection{Parsing}
  169. The parsing is done with PLY, which allows us to perform Lex-Yacc tasks in
  170. Python by using a Lex-Yacc like syntax. This way there is no need to combine
  171. languages like we should do otherwise since Lex and Yacc are coupled with C.
  172. The decision was made to not recognize exactly every possible instruction in
  173. the parser, but only if something is for example a command, a comment or a gcc
  174. directive. We then transform per line to a object called a Statement. A
  175. statement has a type, a name and optionally a list of arguments. These
  176. statements together form a statement list, which is placed in another object
  177. called a Block. In the beginning there is one block for the entire program, but
  178. after global optimizations this will be separated in several blocks that are
  179. the basic blocks.
  180. \subsection{Optimizations}
  181. The optimizations are done in two different steps. First the global
  182. optimizations are performed, which are only the optimizations on branch-jump
  183. constructions. This is done repeatedly until there are no more changes.
  184. After all possible global optimizations are done, the program is separated into
  185. basic blocks. The algorithm to do this is described earlier, and means all
  186. jump and branch instructions are called leaders, as are their targets. A basic
  187. block then goes from leader to leader.
  188. After the division in basic blocks, optimizations are performed on each of
  189. these basic blocks. This is also done repeatedly, since some times several
  190. steps can be done to optimize something.
  191. \subsection{Writing}
  192. Once all the optimizations have been done, the IR needs to be rewritten into
  193. Assembly code, so the xgcc cross compiler can make binary code out of it.
  194. The writer expects a list of statements, so first the blocks have to be
  195. concatenated again into a list. After this is done, the list is passed on to
  196. the writer, which writes the instructions back to Assembly and saves the file
  197. so we can let xgcc compile it.
  198. \section{Results}
  199. \subsection{pi.c}
  200. \subsection{acron.c}
  201. \subsection{whet.c}
  202. \subsection{slalom.c}
  203. \subsection{clinpack.c}
  204. \section{Conclusion}
  205. \appendix
  206. \section{List of all optimizations}
  207. \label{opt}
  208. \textbf{Global optimizations}
  209. \begin{verbatim}
  210. beq ...,$Lx bne ...,$Ly
  211. j $Ly -> $Lx: ...
  212. $Lx: ...
  213. bne ...,$Lx beq ...,$Ly
  214. j $Ly -> $Lx: ...
  215. $Lx: ...
  216. \end{verbatim}
  217. \textbf{Standard basic block optimizations}
  218. \begin{verbatim}
  219. mov $regA,$regA -> --- // remove it
  220. mov $regA,$regB -> instr $regA, $regB,...
  221. instr $regA, $regA,...
  222. instr $regA,... instr $4,...
  223. mov [$4-$7], $regA -> jal XXX
  224. jal XXX
  225. sw $regA,XXX -> sw $regA, XXX
  226. ld $regA,XXX
  227. shift $regA,$regA,0 -> --- // remove it
  228. add $regA,$regA,X -> lw ...,X($regA)
  229. lw ...,0($regA)
  230. \end{verbatim}
  231. \textbf{Advanced basic block optimizations}
  232. \begin{verbatim}
  233. # Common subexpression elimination
  234. addu $regA, $regB, 4 addu $regD, $regB, 4
  235. ... move $regA, $regD
  236. Code not writing $regB -> ...
  237. ... ...
  238. addu $regC, $regB, 4 move $regC, $regD
  239. # Constant folding
  240. # Copy propagation
  241. move $regA, $regB move $regA, $regB
  242. ... ...
  243. Code not writing $regA, -> ...
  244. $regB ...
  245. ... ...
  246. addu $regC, $regA, ... addu $regC, $regB, ...
  247. # Algebraic transformations
  248. addu $regA, $regB, 0 -> move $regA, $regB
  249. subu $regA, $regB, 0 -> move $regA, $regB
  250. mult $regA, $regB, 1 -> move $regA, $regB
  251. mult $regA, $regB, 0 -> li $regA, 0
  252. mult $regA, $regB, 2 -> sll $regA, $regB, 1
  253. \end{verbatim}
  254. \end{document}