report.tex 6.7 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211
  1. \documentclass[10pt,a4paper]{article}
  2. \usepackage[latin1]{inputenc}
  3. \usepackage{amsmath}
  4. \usepackage{amsfonts}
  5. \usepackage{amssymb}
  6. \usepackage{booktabs}
  7. \usepackage{graphicx}
  8. \usepackage{listings}
  9. \usepackage{subfigure}
  10. \usepackage{float}
  11. \usepackage{hyperref}
  12. \title{Peephole Optimizer}
  13. \author{Jayke Meijer (6049885), Richard Torenvliet (6138861), Tadde\"us Kroes
  14. (6054129)}
  15. \begin{document}
  16. \maketitle
  17. \pagebreak
  18. \tableofcontents
  19. \pagebreak
  20. \section{Introduction}
  21. The goal of the assignment is to implement the optimization stage of the
  22. compiler. To reach this goal the parser and the optimizer part of the compiler
  23. have to be implemented.
  24. The output of the xgcc cross compiler on a C program is our input. The output
  25. of the xgcc cross compiler is in the form of Assembly code, but not optimized.
  26. Our assignment includes a number of C programs. An important part of the
  27. assignment is parsing the data. Parsing the data is done with Lex and Yacc. The
  28. Lexer is a program that finds keywords that meets the regular expression
  29. provided in the Lexer. After the Lexer, the Yaccer takes over. Yacc can turn
  30. the keywords in to an action.
  31. \section{Design}
  32. There are two general types of of optimizations of the assembly code, global
  33. optimizations and optimizations on a so-called basic block. These optimizations
  34. will be discussed seperatly
  35. \subsection{Global optimizations}
  36. We only perform one global optimization, which is optimizing branch-jump
  37. statements. The unoptimized Assembly code contains sequences of code of the
  38. following structure:
  39. \begin{verbatim}
  40. beq ...,$Lx
  41. j $Ly
  42. $Lx: ...
  43. \end{verbatim}
  44. This is inefficient, since there is a jump to a label that follows this code.
  45. It would be more efficient to replace the branch statement with a \texttt{bne}
  46. (the opposite case) to the label used in the jump statement. This way the jump
  47. statement can be eliminated, since the next label follows anyway. The same can
  48. of course be done for the opposite case, where a \texttt{bne} is changed into a
  49. \texttt{beq}.
  50. Since this optimization is done between two series of codes with jumps and
  51. labels, we can not perform this code during the basic block optimizations. The
  52. reason for this will become clearer in the following section.
  53. \subsection{Basic Block Optimizations}
  54. Optimizations on basic blocks are a more important part of the optimizer.
  55. First, what is a basic block? A basic block is a sequence of statements
  56. guaranteed to be executed in that order, and that order alone. This is the case
  57. for a piece of code not containing any branches or jumps.
  58. To create a basic block, you need to define what is the leader of a basic
  59. block. We call a statement a leader if it is either a jump/branch statement, or
  60. the target of such a statement. Then a basic block runs from one leader
  61. (not including this leader) until the next leader (including this leader). !!!!
  62. There are quite a few optimizations we perform on these basic blocks, so we
  63. will describe the types of optimizations here in stead of each optimization.
  64. \subsubsection*{Standard peephole optimizations}
  65. These are optimizations that simply look for a certain statement or pattern of
  66. statements, and optimize these. For example,
  67. \begin{verbatim}
  68. mov $regA,$regB
  69. instr $regA, $regA,...
  70. \end{verbatim}
  71. can be optimized into
  72. \begin{verbatim}
  73. instr $regA, $regB,...
  74. \end{verbatim}
  75. since the register \texttt{\$regA} gets overwritten by the second instruction
  76. anyway, and the instruction can easily use \texttt{\$regB} in stead of
  77. \texttt{\$regA}. There are a few more of these cases, which are the same as
  78. those described on the practicum page
  79. \footnote{\url{http://staff.science.uva.nl/~andy/compiler/prac.html}} and in
  80. Appendix \ref{opt}.
  81. \subsubsection*{Common subexpression elimination}
  82. A more advanced optimization is common subexpression elimination. This means
  83. that expensive operations as a multiplication or addition are performed only
  84. once and the result is then `copied' into variables where needed.
  85. A standard method for doing this is the creation of a DAG or Directed Acyclic
  86. Graph. However, this requires a fairly advanced implementation. Our
  87. implementation is a slightly less fancy, but easier to implement.
  88. We search from the end of the block up for instructions that are eligible for
  89. CSE. If we find one, we check further up in the code for the same instruction,
  90. and add that to a temporary storage list. This is done until the beginning of
  91. the block or until one of the arguments of this expression is assigned. Now all
  92. occurences of this expression can be replaced by a move of a new variable that
  93. is generated above the first occurence, which contains the value of the
  94. expression.
  95. This is a less efficient method, but because the basic blocks are in general
  96. not very large and the exectution time of the optimizer is not a primary
  97. concern, this is not a big problem.
  98. \section{Implementation}
  99. We decided to implement the optimization in Python. We chose this programming
  100. language because Python is an easy language to manipulate strings, work
  101. object-oriented etc.
  102. It turns out that a Lex and Yacc are also available as a Python module,
  103. named PLY(Python Lex-Yacc). This allows us to use one language, Python, instead
  104. of two, i.e. C and Python. Also no debugging is needed in C, only in Python
  105. which makes our assignment more feasible.
  106. The program has three steps, parsing the Assembly code into a datastructure we
  107. can use, the so-called Intermediate Representation, performing optimizations on
  108. this IR and writing the IR back to Assembly.
  109. \subsection{Parsing with PLY}
  110. \subsection{Optimizations}
  111. \subsection{Writing}
  112. \section{Results}
  113. \subsection{pi.c}
  114. \subsection{arcron.c}
  115. \subsection{whet.c}
  116. \subsection{slalom.c}
  117. \subsection{clinpack.c}
  118. \section{Conclusion}
  119. \appendix
  120. \section{Total list of optimizations}
  121. \label{opt}
  122. \textbf{Global optimizations}
  123. \begin{verbatim}
  124. beq ...,$Lx bne ...,$Ly
  125. j $Ly -> $Lx: ...
  126. $Lx: ...
  127. bne ...,$Lx beq ...,$Ly
  128. j $Ly -> $Lx: ...
  129. $Lx: ...
  130. \end{verbatim}
  131. \textbf{Simple basic block optimizations}
  132. \begin{verbatim}
  133. mov $regA,$regA -> --- // remove it
  134. mov $regA,$regB -> instr $regA, $regB,...
  135. instr $regA, $regA,...
  136. instr $regA,... instr $4,...
  137. mov [$4-$7], $regA -> jal XXX
  138. jal XXX
  139. sw $regA,XXX -> sw $regA, XXX
  140. ld $regA,XXX
  141. shift $regA,$regA,0 -> --- // remove it
  142. add $regA,$regA,X -> lw ...,X($regA)
  143. lw ...,0($regA)
  144. \end{verbatim}
  145. \textbf{Advanced basic block optimizations}
  146. \begin{verbatim}
  147. addu $regA, $regB, 4 addu $regD, $regB, 4
  148. ... move $regA, $regD
  149. Code not writing $regB -> ...
  150. ... ...
  151. addu $regC, $regB, 4 move $regC, $regD
  152. \end{verbatim}
  153. \end{document}