| 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393 |
- \documentclass[10pt,a4paper]{article}
- \usepackage[latin1]{inputenc}
- \usepackage{amsmath,amsfonts,amssymb,booktabs,graphicx,listings,subfigure}
- \usepackage{float,hyperref}
- \title{Peephole Optimizer}
- \author{Jayke Meijer (6049885), Richard Torenvliet (6138861), Tadde\"us Kroes
- (6054129)}
- \begin{document}
- \maketitle
- \tableofcontents
- \pagebreak
- \section{Introduction}
- The goal of the assignment is to implement the optimization stage of the
- compiler. To reach this goal the parser and the optimizer part of the compiler
- have to be implemented.
- The output of the xgcc cross compiler on a C program is our input. The output
- of the xgcc cross compiler is in the form of Assembly code, but not optimized.
- Our assignment includes a number of C programs. An important part of the
- assignment is parsing the data. Parsing the data is done with Lex and Yacc. The
- Lexer is a program that finds keywords that meets the regular expression
- provided in the Lexer. After the Lexer, the Yaccer takes over. Yacc can turn
- the keywords in to an action.
- \section{Design}
- There are two general types of optimizations of the assembly code, global
- optimizations and optimizations on a so-called basic block. These optimizations
- will be discussed separately
- \subsection{Global optimizations}
- We only perform one global optimization, which is optimizing branch-jump
- statements. The unoptimized Assembly code contains sequences of code of the
- following structure:
- \begin{verbatim}
- beq ...,$Lx
- j $Ly
- $Lx: ...
- \end{verbatim}
- This is inefficient, since there is a jump to a label that follows this code.
- It would be more efficient to replace the branch statement with a \texttt{bne}
- (the opposite case) to the label used in the jump statement. This way the jump
- statement can be eliminated, since the next label follows anyway. The same can
- of course be done for the opposite case, where a \texttt{bne} is changed into a
- \texttt{beq}.
- Since this optimization is done between two series of codes with jumps and
- labels, we can not perform this code during the basic block optimizations.
- \subsection{Basic Block Optimizations}
- Optimizations on basic blocks are a more important part of the optimizer.
- First, what is a basic block? A basic block is a sequence of statements
- guaranteed to be executed in that order, and that order alone. This is the case
- for a piece of code not containing any branches or jumps.
- To create a basic block, you need to define what is the leader of a basic
- block. We call a statement a leader if it is either a jump/branch statement, or
- the target of such a statement. Then a basic block runs from one leader until
- the next leader.
- There are quite a few optimizations we perform on these basic blocks, so we
- will describe the types of optimizations here in stead of each optimization.
- \subsubsection*{Standard peephole optimizations}
- These are optimizations that simply look for a certain statement or pattern of
- statements, and optimize these. For example,
- \begin{verbatim}
- mov $regA,$regB
- instr $regA, $regA,...
- \end{verbatim}
- can be optimized into
- \begin{verbatim}
- instr $regA, $regB,...
- \end{verbatim}
- since the register \texttt{\$regA} gets overwritten by the second instruction
- anyway, and the instruction can easily use \texttt{\$regB} in stead of
- \texttt{\$regA}. There are a few more of these cases, which are the same as
- those described on the practicum page
- \footnote{\url{http://staff.science.uva.nl/~andy/compiler/prac.html}} and in
- Appendix \ref{opt}.
- \subsubsection*{Common subexpression elimination}
- A more advanced optimization is common subexpression elimination. This means
- that expensive operations as a multiplication or addition are performed only
- once and the result is then `copied' into variables where needed.
- \begin{verbatim}
- addu $2,$4,$3 addu = $t1, $4, $3
- ... mov = $2, $t1
- ... -> ...
- ... ...
- addu $5,$4,$3 mov = $4, $t1
- \end{verbatim}
- A standard method for doing this is the creation of a DAG or Directed Acyclic
- Graph. However, this requires a fairly advanced implementation. Our
- implementation is a slightly less fancy, but easier to implement.
- We search from the end of the block up for instructions that are eligible for
- CSE. If we find one, we check further up in the code for the same instruction,
- and add that to a temporary storage list. This is done until the beginning of
- the block or until one of the arguments of this expression is assigned.
- We now add the instruction above the first use, and write the result in a new
- variable. Then all occurrences of this expression can be replaced by a move of
- from new variable into the original destination variable of the instruction.
- This is a less efficient method then the dag, but because the basic blocks are
- in general not very large and the execution time of the optimizer is not a
- primary concern, this is not a big problem.
- \subsubsection*{Fold constants}
- Constant folding is an optimization where the outcome of arithmetics are
- calculated at compile time. If a value x is assigned to a certain value, lets
- say 10, than all next occurences of \texttt{x} are replaced by 10 until a
- redefinition of x. Arithmetics in Assembly are always performed between two
- variables or a variable and a constant. If this is not the case the calculation
- is not possible. See \ref{opt} for an example. In other words until the current
- definition of \texttt{x} becomes dead. Therefore reaching definitions analysis
- is needed. Reaching definitions is a form of liveness analysis, we use the
- liveness analysis within a block and not between blocks.
- During the constant folding, so-called algebraic transformations are performed
- as well. Some expression can easily be replaced with more simple once if you
- look at what they are saying algebraically. An example is the statement
- $x = y + 0$, or in Assembly \texttt{addu \$1, \$2, 0}. This can easily be
- changed into $x = y$ or \texttt{move \$1, \$2}.
- Another case is the multiplication with a power of two. This can be done way
- more efficiently by shifting left a number of times. An example:
- \texttt{mult \$regA, \$regB, 4 -> sll \$regA, \$regB, 2}. We perform this
- optimization for any multiplication with a power of two.
- There are a number of such cases, all of which are once again stated in
- appendix \ref{opt}.
- \subsubsection*{Copy propagation}
- Copy propagation `unpacks' a move instruction, by replacing its destination
- address with its source address in the code following the move instruction.
- This is not a direct optimization, but this does allow for a more effective
- dead code elimination.
- The code of the block is checked linearly. When a move operation is
- encountered, the source and destination address of this move are stored. When
- a normal operation with a source and a destination address are found, a number
- of checks are performed.
- The first check is whether the destination address is stored as a destination
- address of a move instruction. If so, this move instruction is no longer valid,
- so the optimizations can not be done. Otherwise, continue with the second
- check.
- In the second check, the source address is compared to the destination
- addresses of all still valid move operations. If these are the same, in the
- current operation the found source address is replaced with the source address
- of the move operation.
- An example would be the following:
- \begin{verbatim}
- move $regA, $regB move $regA, $regB
- ... ...
- Code not writing $regA, -> ...
- $regB ...
- ... ...
- addu $regC, $regA, ... addu $regC, $regB, ...
- \end{verbatim}
- This code shows that \texttt{\$regA} is replaced with \texttt{\$regB}. This
- way, the move instruction might have become useless, and it will then be
- removed by the dead code elimination.
- \subsection{Dead code elimination}
- The final optimization that is performed is dead code elimination. This means
- that when an instruction is executed, but the result is never used, that
- instruction can be removed.
- To be able to properly perform dead code elimination, we need to know whether a
- variable will be used, before it is overwritten again. If it does, we call the
- variable live, otherwise the variable is dead. The technique to find out if a
- variable is live is called liveness analysis. We implemented this for the
- entire code, by analyzing each block, and using the variables that come in the
- block live as the variables that exit its predecessor live.
- \section{Implementation}
- We decided to implement the optimization in Python. We chose this programming
- language because Python is an easy language to manipulate strings, work
- object-oriented etc.
- It turns out that a Lex and Yacc are also available as a Python module,
- named PLY(Python Lex-Yacc). This allows us to use one language, Python, instead
- of two, i.e. C and Python. Also no debugging is needed in C, only in Python
- which makes our assignment more feasible.
- The program has three steps, parsing the Assembly code into a datastructure we
- can use, the so-called Intermediate Representation, performing optimizations on
- this IR and writing the IR back to Assembly.
- \subsection{Parsing}
- The parsing is done with PLY, which allows us to perform Lex-Yacc tasks in
- Python by using a Lex-Yacc like syntax. This way there is no need to combine
- languages like we should do otherwise since Lex and Yacc are coupled with C.
- The decision was made to not recognize exactly every possible instruction in
- the parser, but only if something is for example a command, a comment or a gcc
- directive. We then transform per line to an object called a Statement. A
- statement has a type, a name and optionally a list of arguments. These
- statements together form a statement list, which is placed in another object
- called a Block. In the beginning there is one block for the entire program, but
- after global optimizations this will be separated in several blocks that are
- the basic blocks.
- \subsection{Optimizations}
- The optimizations are done in two different steps. First the global
- optimizations are performed, which are only the optimizations on branch-jump
- constructions. This is done repeatedly until there are no more changes.
- After all possible global optimizations are done, the program is seperated into
- basic blocks. The algorithm to do this is described earlier, and means all
- jump and branch instructions are called leaders, as are their targets. A basic
- block then goes from leader to leader.
- After the division in basic blocks, optimizations are performed on each of
- these basic blocks. This is also done repeatedly, since some times several
- steps can be done to optimize something.
- \subsection{Writing}
- Once all the optimizations have been done, the IR needs to be rewritten into
- Assembly code. After this step the xgcc crosscompiler can make binary code from
- the generated Assembly code.
- The writer expects a list of statements, so first the blocks have to be
- concatenated again into a list. After this is done, the list is passed on to
- the writer, which writes the instructions back to Assembly and saves the file
- so we can let xgcc compile it. We also write the original statements to a file,
- so differences in tabs, spaces and newlines do not show up when we check the
- differences between the optimized and non-optimized files.
- \subsection{Execution}
- To execute the optimizer, the following command can be given:\\
- \texttt{./main <original file> <optimized file> <rewritten original file>}
- \section{Testing}
- Of course, it has to be guaranteed that the optimized code still functions
- exactly the same as the none-optimized code. To do this, testing is an
- important part of out program. We have two stages of testing. The first stage
- is unit testing. The second stage is to test whether the compiled code has
- exactly the same output.
- \subsection{Unit testing}
- For almost every piece of important code, unit tests are available. Unit tests
- give the possibility to check whether each small part of the program, for
- instance each small function, is performing as expected. This way bugs are
- found early and very exactly. Otherwise, one would only see that there is a
- mistake in the program, not knowing where this bug is. Naturally, this means
- debugging is a lot easier.
- The unit tests can be run by executing \texttt{make test} in the root folder of
- the project. This does require the \texttt{textrunner} module.
- Also available is a coverage report. This report shows how much of the code has
- been unit tested. To make this report, the command \texttt{make coverage} can
- be run in the root folder. The report is than added as a folder \emph{coverage}
- in which a \emph{index.html} can be used to see the entire report.
- \subsection{Ouput comparison}
- In order to check whether the optimization does not change the functioning of
- the program, the output of the provided benchmark programs has to be compared
- to the output after optimization. If any of these outputs is not equal to the
- original output, our optimizations are to aggressive, or there is a bug
- somewhere in the code.
- \section{Results}
- The following results have been obtained:\\
- \begin{tabular}{|c|c|c|c|c|c|}
- \hline
- Benchmark & Original & Optimized & Original & Optimized & Performance \\
- & Instructions & instructions & cycles & cycles & boost(cycles)\\
- \hline
- pi & 134 & & & & \\
- acron & & & & & \\
- dhrystone & & & & & \\
- whet & & & & & \\
- slalom & & & & & \\
- clinpack & & & & & \\
- \hline
- \end{tabular}
- \appendix
- \section{List of all optimizations}
- \label{opt}
- \textbf{Global optimizations}
- \begin{verbatim}
- beq ...,$Lx bne ...,$Ly
- j $Ly -> $Lx: ...
- $Lx: ...
- bne ...,$Lx beq ...,$Ly
- j $Ly -> $Lx: ...
- $Lx: ...
- \end{verbatim}
- \textbf{Standard basic block optimizations}
- \begin{verbatim}
- mov $regA,$regA -> --- // remove it
- mov $regA,$regB -> instr $regA, $regB,...
- instr $regA, $regA,...
- instr $regA,... instr $4,...
- mov [$4-$7], $regA -> jal XXX
- jal XXX
- sw $regA,XXX -> sw $regA, XXX
- ld $regA,XXX
- shift $regA,$regA,0 -> --- // remove it
- add $regA,$regA,X -> lw ...,X($regA)
- lw ...,0($regA)
- \end{verbatim}
- \textbf{Advanced basic block optimizations}
- \begin{verbatim}
- # Common subexpression elimination
- addu $regA, $regB, 4 addu $regD, $regB, 4
- ... move $regA, $regD
- Code not writing $regB -> ...
- ... ...
- addu $regC, $regB, 4 move $regC, $regD
- # Constant folding
- li $regA, constA ""
- sw $regA, 16($fp) ""
- li $regA, constB -> ""
- sw $regA, 20($fp) ""
- lw $regA, 16($fp) ""
- lw $regB, 20($fp) ""
- addu $regA, $regA, $regA $li regA, (constA + constB) at compile time
- # Copy propagation
- move $regA, $regB move $regA, $regB
- ... ...
- Code not writing $regA, -> ...
- $regB ...
- ... ...
- addu $regC, $regA, ... addu $regC, $regB, ...
- # Algebraic transformations
- addu $regA, $regB, 0 -> move $regA, $regB
- subu $regA, $regB, 0 -> move $regA, $regB
- mult $regA, $regB, 1 -> move $regA, $regB
- mult $regA, $regB, 0 -> li $regA, 0
- mult $regA, $regB, 2 -> sll $regA, $regB, 1
- \end{verbatim}
- \end{document}
|