14 anos atrás · 4c34a8234d
--- a/report/report.tex
+++ b/report/report.tex
@@ -3,6 +3,10 @@
 
				 \usepackage{amsmath,amsfonts,amssymb,booktabs,graphicx,listings,subfigure}
			
 
				 \usepackage{float,hyperref}
			
 
				 
			
 
				+% Paragraph indentation
			
 
				+\setlength{\parindent}{0pt}
			
 
				+\setlength{\parskip}{1ex plus 0.5ex minus 0.2ex}
			
 
				+
			
 
				 \title{Peephole Optimizer}
			
 
				 \author{Jayke Meijer (6049885), Richard Torenvliet (6138861), Tadde\"us Kroes
			
 
				     (6054129)}
			
@@ -16,248 +20,301 @@
 
				 
			
 
				 \section{Introduction}
			
 
				 
			
 
				-The goal of the assignment is to implement the optimization stage of the
			
 
				-compiler. To reach this goal the parser and the optimizer part of the compiler
			
 
				-have to be implemented.
			
 
				+The goal of the assignment is to implement the peephole optimization stage of
			
 
				+xgcc cross compiler. This requires a MIPS Assembly parser to parse the output
			
 
				+of the compiler. Also, an assembly writer is needed to write the optimized
			
 
				+statements back to valid Assembly code for the assembler.
			
 
				 
			
 
				-The output of the xgcc cross compiler on a C program is our input. The output
			
 
				-of the xgcc cross compiler is in the form of Assembly code, but not optimized.
			
 
				-Our assignment includes a number of C programs. An important part of the
			
 
				-assignment is parsing the data. Parsing the data is done with Lex and Yacc. The
			
 
				-Lexer is a program that finds keywords that meets the regular expression
			
 
				-provided in the Lexer. After the Lexer, the Yaccer takes over. Yacc can turn
			
 
				-the keywords in to an action.
			
 
				+The assignment provides a number of benchmarks written in C. The objective is
			
 
				+to obtain a high speedup in number of cycles for these benchmarks.
			
 
				 
			
 
				-\section{Design}
			
 
				+\section{Types of optimizations}
			
 
				 
			
 
				-There are two general types of optimizations of the assembly code, global
			
 
				-optimizations and optimizations on a so-called basic block. These optimizations
			
 
				-will be discussed separately
			
 
				+There are two general types of optimizations on the assembly code: global
			
 
				+optimizations and optimizations on so-called basic blocks. These optimizations
			
 
				+will be discussed individually below.
			
 
				 
			
 
				 \subsection{Global optimizations}
			
 
				 
			
 
				 We only perform one global optimization, which is optimizing branch-jump
			
 
				-statements. The unoptimized Assembly code contains sequences of code of the
			
 
				-following structure:
			
 
				+statements. The unoptimized Assembly code contains sequences of statements with
			
 
				+the following structure:
			
 
				 \begin{verbatim}
			
 
				     beq ...,$Lx
			
 
				     j $Ly
			
 
				 $Lx:   ...
			
 
				 \end{verbatim}
			
 
				-This is inefficient, since there is a jump to a label that follows this code.
			
 
				-It would be more efficient to replace the branch statement with a \texttt{bne}
			
 
				-(the opposite case) to the label used in the jump statement. This way the jump
			
 
				-statement can be eliminated, since the next label follows anyway. The same can
			
 
				-of course be done for the opposite case, where a \texttt{bne} is changed into a
			
 
				-\texttt{beq}.
			
 
				+%This is inefficient, since there is a branch to a label that follows this code.
			
 
				+In this code, it is more efficient to replace the branch statement with a
			
 
				+\texttt{bne} (the opposite case) to the label used in the jump statement. This
			
 
				+way, the jump statement can be eliminated since the label directly follows it.
			
 
				+The same can be done for the opposite case, where a \texttt{bne} is changed
			
 
				+into a \texttt{beq}.
			
 
				 
			
 
				 Since this optimization is done between two series of codes with jumps and
			
 
				 labels, we can not perform this code during the basic block optimizations.
			
 
				 
			
 
				 \subsection{Basic Block Optimizations}
			
 
				 
			
 
				-Optimizations on basic blocks are a more important part of the optimizer.
			
 
				-First, what is a basic block? A basic block is a sequence of statements
			
 
				+Optimizations on basic blocks are a more extended part of the optimizer.
			
 
				+
			
 
				+First of all, what is a basic block? A basic block is a sequence of statements
			
 
				 guaranteed to be executed in that order, and that order alone. This is the case
			
 
				-for a piece of code not containing any branches or jumps.
			
 
				+for a piece of code not containing any branches or jumps (except for the last
			
 
				+statement).
			
 
				 
			
 
				-To create a basic block, you need to define what is the leader of a basic
			
 
				-block. We call a statement a leader if it is either a jump/branch statement, or
			
 
				-the target of such a statement. Then a basic block runs from one leader until
			
 
				-the next leader.
			
 
				+To divide the code into basic blocks, the ``leaders'' have to be found. A
			
 
				+leading statement is a leader if it is either a jump or branch statement, or
			
 
				+the target of such a statement. Each leader is the start of a new basic block.
			
 
				 
			
 
				-There are quite a few optimizations we perform on these basic blocks, so we
			
 
				-will describe the types of optimizations here in stead of each optimization.
			
 
				+There are five types of optimizations performed on basic blocks in our
			
 
				+implementation. Each is described individually below.
			
 
				 
			
 
				-\subsubsection*{Standard peephole optimizations}
			
 
				+\subsubsection{Standard peephole optimizations}
			
 
				 
			
 
				-These are optimizations that simply look for a certain statement or pattern of
			
 
				+These are optimizations that look for a certain statement or pattern of
			
 
				 statements, and optimize these. For example,
			
 
				 \begin{verbatim}
			
 
				 mov $regA,$regB
			
 
				 instr $regA, $regA,...
			
 
				 \end{verbatim}
			
 
				-can be optimized into
			
 
				+can be optimized to:
			
 
				 \begin{verbatim}
			
 
				 instr $regA, $regB,...
			
 
				 \end{verbatim}
			
 
				-since the register \texttt{\$regA} gets overwritten by the second instruction
			
 
				-anyway, and the instruction can easily use \texttt{\$regB} in stead of
			
 
				-\texttt{\$regA}. There are a few more of these cases, which are the same as
			
 
				-those described on the practicum page
			
 
				+\texttt{\$regA} should contain the same value as \texttt{\$regB} after the move
			
 
				+statement, so \texttt{\$regB} can be used by \texttt{instr}.  Since
			
 
				+\texttt{instr} overwrites \texttt{\$regA}, the move statement has not further
			
 
				+effect after \texttt{instr} and can be removed.
			
 
				+
			
 
				+There are a few more of these cases, which are described on the practicum page
			
 
				 \footnote{\url{http://staff.science.uva.nl/~andy/compiler/prac.html}} and in
			
 
				 Appendix \ref{opt}.
			
 
				 
			
 
				-\subsubsection*{Common subexpression elimination}
			
 
				+\subsubsection{Common subexpression elimination}
			
 
				 
			
 
				 A more advanced optimization is common subexpression elimination. This means
			
 
				-that expensive operations as a multiplication or addition are performed only
			
 
				-once and the result is then `copied' into variables where needed.
			
 
				+that expensive operations like multiplications or additions are performed only
			
 
				+once and the result is then `copied' into registers where needed.
			
 
				 \begin{verbatim}
			
 
				-
			
 
				-addu	$2,$4,$3              addu = $t1, $4, $3
			
 
				-...                        mov = $2, $t1
			
 
				+addu $2,$4,$3              addu = $8, $4, $3  # $8 is free
			
 
				+...                        mov = $2, $8
			
 
				 ...                   ->   ...
			
 
				 ...                        ...
			
 
				-addu	$5,$4,$3              mov = $4, $t1
			
 
				-
			
 
				+addu $5,$4,$3              mov = $4, $8
			
 
				 \end{verbatim}
			
 
				 
			
 
				-A standard method for doing this is the creation of a DAG or Directed Acyclic
			
 
				-Graph. However, this requires a fairly advanced implementation. Our
			
 
				-implementation is a slightly less fancy, but easier to implement.
			
 
				-We search from the end of the block up for instructions that are eligible for
			
 
				-CSE. If we find one, we check further up in the code for the same instruction,
			
 
				-and add that to a temporary storage list. This is done until the beginning of
			
 
				-the block or until one of the arguments of this expression is assigned.
			
 
				+A standard method for doing this is usage of a DAG or Directed Acyclic Graph.
			
 
				+However, this requires either the code to be in Static single
			
 
				+assignment
			
 
				+form\footnote{\url{http://en.wikipedia.org/wiki/Static\_single\_assignment\_form}},
			
 
				+or an advanced liveness check. Our implementation contains a (partially tested)
			
 
				+implementation of DAG creation, but this is not used in the final
			
 
				+implementation.  However, our implementation does contain a simplified version
			
 
				+of common subexpression elimination:
			
 
				+
			
 
				+The statement list of a block is traversed in reversed order, looking for
			
 
				+instructions that are eligible for CSE (\texttt{addu}, for example). If such an
			
 
				+instruction is found, it is marked and the rest of the statement list is
			
 
				+traversed while marking all statements that are equal to the found instruction.
			
 
				+If a statement assigns a register that is uses by the instruction, traversal
			
 
				+stops.
			
 
				+
			
 
				+If more than one instruction have been marked, a new instruction is inserted
			
 
				+above the first occurrence (the last occurrence in reversed order). This
			
 
				+instruction performs the calculation and saves it in a free temporary register.
			
 
				+Then, each occurrence is replaced by a \texttt{move} of the free register to
			
 
				+its original destination register.
			
 
				+
			
 
				+This method is obviously less efficient method then the DAG.  However, since
			
 
				+the basic blocks are generally not very large and the execution time of the
			
 
				+optimizer is not a primary concern, this is not a large problem.
			
 
				+
			
 
				+\subsubsection{Constant folding}
			
 
				 
			
 
				-We now add the instruction above the first use, and write the result in a new
			
 
				-variable. Then all occurrences of this expression can be replaced by a move of
			
 
				-from new variable into the original destination variable of the instruction.
			
 
				-
			
 
				-This is a less efficient method then the DAG, but because the basic blocks are
			
 
				-in general not very large and the execution time of the optimizer is not a
			
 
				-primary concern, this is not a big problem.
			
 
				-
			
 
				-\subsubsection*{Fold constants}
			
 
				 Constant folding is an optimization where the outcome of arithmetics are
			
 
				-calculated at compile time. If a value x is assigned to a certain value, lets
			
 
				-say 10, than all next occurences of \texttt{x} are replaced by 10 until a
			
 
				-redefinition of x. Arithmetics in Assembly are always performed between two
			
 
				-variables or a variable and a constant. If this is not the case the calculation
			
 
				-is not possible. See \ref{opt} for an example. In other words until the current
			
 
				-definition of \texttt{x} becomes dead. Therefore reaching definitions analysis
			
 
				-is needed. Reaching definitions is a form of liveness analysis, we use the
			
 
				-liveness analysis within a block and not between blocks.
			
 
				-
			
 
				-During the constant folding, so-called algebraic transformations are performed
			
 
				-as well. Some expression can easily be replaced with more simple once if you
			
 
				-look at what they are saying algebraically. An example is the statement
			
 
				-$x = y + 0$, or in Assembly \texttt{addu \$1, \$2, 0}. This can easily be
			
 
				-changed into $x = y$ or \texttt{move \$1, \$2}.
			
 
				+calculated at compile time. If a register x is known to contain a constant
			
 
				+value, all following uses of \texttt{x} can be replaced by that value until a
			
 
				+redefinition of x.
			
 
				 
			
 
				-Another case is the multiplication with a power of two. This can be done way
			
 
				-more efficiently by shifting left a number of times. An example:
			
 
				-\texttt{mult \$regA, \$regB, 4    ->  sll  \$regA, \$regB, 2}. We perform this
			
 
				-optimization for any multiplication with a power of two.
			
 
				+Arithmetics in Assembly are always performed between two registers or a
			
 
				+register and a constant. If the current value of all used registers is known,
			
 
				+The expression can be executed at-compile-time and the instruction can be
			
 
				+replaced by an immediate load of the result. See \ref{opt} for an example.
			
 
				 
			
 
				-There are a number of such cases, all of which are once again stated in
			
 
				-appendix \ref{opt}.
			
 
				+%In other words until the current definition of \texttt{x} becomes dead.
			
 
				+%Therefore reaching definitions analysis is needed. Reaching definitions is a
			
 
				+%form of liveness analysis, we use the liveness analysis within a block and not
			
 
				+%between blocks.
			
 
				 
			
 
				-\subsubsection*{Copy propagation}
			
 
				-
			
 
				-Copy propagation `unpacks' a move instruction, by replacing its destination
			
 
				-address with its source address in the code following the move instruction.
			
 
				-
			
 
				-This is not a direct optimization, but this does allow for a more effective
			
 
				-dead code elimination.
			
 
				-
			
 
				-The code of the block is checked linearly. When a move operation is
			
 
				-encountered, the source and destination address of this move are stored. When
			
 
				-a normal operation with a source and a destination address are found, a number
			
 
				-of checks are performed.
			
 
				-
			
 
				-The first check is whether the destination address is stored as a destination
			
 
				-address of a move instruction. If so, this move instruction is no longer valid,
			
 
				-so the optimizations can not be done. Otherwise, continue with the second
			
 
				-check.
			
 
				-
			
 
				-In the second check, the source address is compared to the destination
			
 
				-addresses of all still valid move operations. If these are the same, in the
			
 
				-current operation the found source address is replaced with the source address
			
 
				-of the move operation.
			
 
				+During the constant folding, so-called algebraic transformations are performed
			
 
				+as well. When calculations are performed using constants, some calculations can
			
 
				+be replaced by a load- or move-instruction. An example is the statement
			
 
				+$x = y + 0$, or in Assembly: \texttt{addu \$1, \$2, 0}. This can be replaced by
			
 
				+$x = y$ or \texttt{move \$1, \$2}. A list of transformations that are performed
			
 
				+can be found in appendix \ref{opt}.
			
 
				+
			
 
				+\subsubsection{Copy propagation}
			
 
				+
			
 
				+Copy propagation replaces usage of registers that have been assigned the value
			
 
				+of another register earlier. In Assembly code, such an assignment is in the
			
 
				+form of a \texttt{move} instruction.
			
 
				+
			
 
				+This is not a direct optimization, but is often does create dead code (the
			
 
				+\texttt{move} statement) that can be eliminated.
			
 
				+
			
 
				+To perform copy propagation within the same basic block, the block is traversed
			
 
				+until a \texttt{move x, y} instruction is encountered. For each of these ``copy
			
 
				+statements'', the rest of the block is traversed while looking for usage of the
			
 
				+\texttt{move}'s destination address \texttt{x}. These usages are replaced by
			
 
				+usages of \texttt{y}, until either \texttt{x} or \texttt{y} is re-assigned.
			
 
				+
			
 
				+%Copy propagation `unpacks' a move instruction, by replacing its destination
			
 
				+%address with its source address in the code following the move instruction.
			
 
				+%
			
 
				+%This is not a direct optimization, but this does allow for a more effective
			
 
				+%dead code elimination.
			
 
				+%
			
 
				+%The code of the block is traversed linearly. If a move operation is
			
 
				+%encountered, the source and destination address of this move are stored. If a
			
 
				+%normal operation with a source and a destination address are found, a number of
			
 
				+%checks are performed.
			
 
				+%
			
 
				+%The first check is whether the destination address is stored as a destination
			
 
				+%address of a move instruction. If so, this move instruction is no longer valid,
			
 
				+%so the optimizations can not be done. Otherwise, continue with the second
			
 
				+%check.
			
 
				+%
			
 
				+%In the second check, the source address is compared to the destination
			
 
				+%addresses of all still valid move operations. If these are the same, in the
			
 
				+%current operation the found source address is replaced with the source address
			
 
				+%of the move operation.
			
 
				 
			
 
				 An example would be the following:
			
 
				 \begin{verbatim}
			
 
				-move $regA, $regB           move $regA, $regB
			
 
				-...                         ...
			
 
				-Code not writing $regA, ->  ...
			
 
				-$regB                       ...
			
 
				-...                         ...
			
 
				-addu $regC, $regA, ...      addu $regC, $regB, ...
			
 
				+move $regA, $regB                    move $regA, $regB
			
 
				+...                                  ...
			
 
				+Code not writing $regA or $regB  ->  ...
			
 
				+...                                  ...
			
 
				+addu $regC, $regA, ...               addu $regC, $regB, ...
			
 
				 \end{verbatim}
			
 
				-This code shows that \texttt{\$regA} is replaced with \texttt{\$regB}. This
			
 
				-way, the move instruction might have become useless, and it will then be
			
 
				-removed by the dead code elimination.
			
 
				-
			
 
				-\subsection{Dead code elimination}
			
 
				-
			
 
				-The final optimization that is performed is dead code elimination. This means
			
 
				-that when an instruction is executed, but the result is never used, that
			
 
				-instruction can be removed.
			
 
				-
			
 
				-To be able to properly perform dead code elimination, we need to know whether a
			
 
				-variable will be used, before it is overwritten again. If it does, we call the
			
 
				-variable live, otherwise the variable is dead. The technique to find out if a
			
 
				-variable is live is called liveness analysis. We implemented this for the
			
 
				-entire code, by analysing each block, and using the variables that come in the
			
 
				-block live as the variables that exit its predecessor live.
			
 
				+\texttt{\$regA} is replaced with \texttt{\$regB}. Now, the move instruction
			
 
				+might have become useless. If so, it will be removed by dead code elimination.
			
 
				+
			
 
				+To also replace usages in successors of the basic block, a Reaching Definitions
			
 
				+analysis is used: If a \texttt{move}-statement is in the $REACH_{out}$ set of
			
 
				+the block, it is used in one of the block's successors. To be able to replace a
			
 
				+usage, the definition must me the only definition reaching the usage. To
			
 
				+determine this, copy propagation defines a new dataflow problem that yields the
			
 
				+$COPY_{in}$ and $COPY_{out}$ sets. the successor The definition is the only
			
 
				+reaching definition if it is in the successor's $COPY_{in}$ set. If this is the
			
 
				+case, the usage van be replaced by the destination address of the
			
 
				+\texttt{move}-statement. \\
			
 
				+Note: Though we implemented the algorithm as described above, we did not
			
 
				+encounter any replacements between basic blocks while optimizing the provided
			
 
				+benchmark scripts. This might mean that our implementation of the copy
			
 
				+propagation dataflow problem is based on the lecture slides, which only briefly
			
 
				+describe the algorithm.
			
 
				+
			
 
				+\subsubsection{Dead code elimination}
			
 
				+
			
 
				+The final optimization that is performed is dead code elimination. This removes
			
 
				+statements of which the result is never used.
			
 
				+
			
 
				+To determine if a register is used from a certain point in the code, liveness
			
 
				+analysis is used. A variable is ``live'' at a certain point in the code if it
			
 
				+holds a value that may be needed in the future. Using the $LIVE_{out}$ set
			
 
				+that is generated by the analysis, we can check if a register is dead after a
			
 
				+certain point in a basic block. Each statement that assigns a register which
			
 
				+is dead from that point on is removed.
			
 
				 
			
 
				 \section{Implementation}
			
 
				 
			
 
				-We decided to implement the optimization in Python. We chose this programming
			
 
				+We decided to implement the optimizations in Python. We chose this programming
			
 
				 language because Python is an easy language to manipulate strings, work
			
 
				-object-oriented etc.
			
 
				-It turns out that a Lex and Yacc are also available as a Python module,
			
 
				-named PLY(Python Lex-Yacc). This allows us to use one language, Python, instead
			
 
				-of two, i.e. C and Python. Also no debugging is needed in C, only in Python
			
 
				-which makes our assignment more feasible.
			
 
				-
			
 
				-The program has three steps, parsing the Assembly code into a datastructure we
			
 
				-can use, the so-called Intermediate Representation, performing optimizations on
			
 
				-this IR and writing the IR back to Assembly.
			
 
				+object-oriented etc..
			
 
				 
			
 
				-\subsection{Parsing}
			
 
				+To implement the parser, we use a Python variant of Yacc and Lex named
			
 
				+PLY(Python Lex-Yacc). By using this module instead of the regular C
			
 
				+implementations of Yacc and Lex, we only use a single language in the entire
			
 
				+project.
			
 
				 
			
 
				-The parsing is done with PLY, which allows us to perform Lex-Yacc tasks in
			
 
				-Python by using a Lex-Yacc like syntax. This way there is no need to combine
			
 
				-languages like we should do otherwise since Lex and Yacc are coupled with C.
			
 
				+The program has three steps:
			
 
				+\begin{enumerate}
			
 
				+    \item Parsing the Assembly code to an Intermediate Representation (IR).
			
 
				+    \item Performing optimizations on the IR.
			
 
				+    \item Writing the IR back to Assembly code.
			
 
				+\end{enumerate}
			
 
				 
			
 
				-The decision was made to not recognize exactly every possible instruction in
			
 
				-the parser, but only if something is for example a command, a comment or a gcc
			
 
				-directive. We then transform per line to an object called a Statement. A
			
 
				-statement has a type, a name and optionally a list of arguments. These
			
 
				-statements together form a statement list, which is placed in another object
			
 
				-called a Block. In the beginning there is one block for the entire program, but
			
 
				-after global optimizations this will be separated in several blocks that are
			
 
				-the basic blocks.
			
 
				+Our code is provided with this report, and is also available on GitHub: \\
			
 
				+\url{https://github.com/taddeus/peephole}
			
 
				 
			
 
				-\subsection{Optimizations}
			
 
				+\subsection{Structure}
			
 
				 
			
 
				-The optimizations are done in two different steps. First the global
			
 
				-optimizations are performed, which are only the optimizations on branch-jump
			
 
				-constructions. This is done repeatedly until there are no more changes.
			
 
				+% TODO
			
 
				 
			
 
				-After all possible global optimizations are done, the program is separated into
			
 
				-basic blocks. The algorithm to do this is described earlier, and means all
			
 
				-jump and branch instructions are called leaders, as are their targets. A basic
			
 
				-block then goes from leader to leader.
			
 
				+\subsection{Parsing}
			
 
				 
			
 
				-After the division in basic blocks, optimizations are performed on each of
			
 
				-these basic blocks. This is also done repeatedly, since some times several
			
 
				-steps can be done to optimize something.
			
 
				+The parser is implemented using PLY, which uses standard Lex-Yacc syntax in
			
 
				+given function formats.
			
 
				+
			
 
				+The parser assumes that it is given valid Assembly code as input, so it does
			
 
				+not validate whether, for example, command arguments are valid. This design
			
 
				+decision was made because the optimizer uses the output of a compiler, which
			
 
				+should produce valid Assembly code.
			
 
				+
			
 
				+The parser recognizes 4 types of ``statements'':
			
 
				+\begin{itemize}
			
 
				+    \item \textbf{comment} Line starting with a `\#'.
			
 
				+    \item \textbf{directive} C-directive, used by the compiler. These are
			
 
				+                             matched and treated in the same way as comments.
			
 
				+    \item \textbf{command} Machine instruction, followed 0 to 3 arguments and
			
 
				+                           optionally an inline comment.
			
 
				+    \item \textbf{label} Line containing a \texttt{WORD} token, followed by a
			
 
				+                         colon (`:').
			
 
				+\end{itemize}
			
 
				+
			
 
				+Each statement is represented by a \texttt{Statement} object containing a type,
			
 
				+a name, optionally a list of arguments and optionally a list of extra options
			
 
				+(such as inline comments). The parsed list of statements forms a
			
 
				+\texttt{Program} object, which is the return value of the parser.
			
 
				+
			
 
				+\subsection{Optimization loop}
			
 
				+
			
 
				+The optimizations are performed in a loop until no more changed are made. The
			
 
				+optimization loop first performs global optimizations on the entire statement
			
 
				+list of the program. Second, all dataflow analyses are performed (basic block
			
 
				+creation, flow graph generation, liveness, reaching definitions, copy
			
 
				+propagation). Finally, all basic block-level optimizations are executed. if
			
 
				+either the global or one of the block optimizations yields a change in
			
 
				+statements, another iteration is executed.
			
 
				 
			
 
				 \subsection{Writing}
			
 
				 
			
 
				-Once all the optimizations have been done, the IR needs to be rewritten into
			
 
				-Assembly code. After this step the xgcc crosscompiler can make binary code from
			
 
				-the generated Assembly code.
			
 
				+Once all the optimizations have been done, the IR needs to be rewritten to
			
 
				+Assembly code. After this step, the xgcc cross compiler can make binary code
			
 
				+from the generated Assembly code.
			
 
				 
			
 
				 The writer expects a list of statements, so first the blocks have to be
			
 
				 concatenated again into a list. After this is done, the list is passed on to
			
 
				-the writer, which writes the instructions back to Assembly and saves the file
			
 
				-so we can let xgcc compile it. The original statements can also written to a
			
 
				-file, so differences in tabs, spaces and newlines do not show up when checking
			
 
				-the differences between the optimized and non-optimized files.
			
 
				+the writer, which writes the instructions back to Assembly and saves the file.
			
 
				+We believe that the writer code is self-explanatory, so we will not discuss it
			
 
				+in detail here.
			
 
				+
			
 
				+The writer has a slightly different output format than the xgcc compiler in
			
 
				+some cases. Therefore, the main execution file has an option to also write the
			
 
				+original statement list back to a files way, differences in tabs, spaces and
			
 
				+newlines do not show up when checking the differences between optimized and
			
 
				+non-optimized files.
			
 
				 
			
 
				 \subsection{Execution}
			
 
				 
			
 
				-To execute the optimizer, the following command can be given:\\
			
 
				-\texttt{./main.py <original file> <optimized file> <rewritten original file>}\\
			
 
				+To execute the optimizer, the following command can be given: \\
			
 
				+\texttt{./main.py <original file> <optimized file> <rewritten original file>} \\
			
 
				 There is also a script available that runs the optimizer and automatically
			
 
				 starts the program \emph{meld}. In meld it is easy to visually compare the
			
 
				-original file and the optimized file. The command to execute this script is:\\
			
 
				-\texttt{./run <benchmark name (e.g. whet)>}\\
			
 
				+original file and the optimized file. The command to execute this script is: \\
			
 
				+\texttt{./run <benchmark name (e.g. whet)>}
			
 
				 
			
 
				 \section{Testing}
			
 
				 
			
@@ -277,7 +334,7 @@ mistake in the program, not knowing where this bug is. Naturally, this means
 
				 debugging is a lot easier.
			
 
				 
			
 
				 The unit tests can be run by executing \texttt{make test} in the root folder of
			
 
				-the project. This does require the \texttt{textrunner} module.
			
 
				+the project. This does require the \texttt{testrunner} module of Python.
			
 
				 
			
 
				 Also available is a coverage report. This report shows how much of the code has
			
 
				 been unit tested. To make this report, the command \texttt{make coverage} can
			
@@ -297,15 +354,29 @@ somewhere in the code.
 
				 The following results have been obtained:\\
			
 
				 \begin{tabular}{|c|c|c|c|c|c|}
			
 
				 \hline
			
 
				-Benchmark & Original     & Optimized    & Original & Optimized & Performance \\
			
 
				-        & Instructions & instructions   & cycles   & cycles    &  boost(cycles)\\
			
 
				+Benchmark & Original     & Removed      & Original & Optimized & Performance \\
			
 
				+          & Instructions & instructions & cycles   & cycles    & boost(cycles) \\
			
 
				+\hline
			
 
				+pi        &           94 &            2 &          &           &             \% \\
			
 
				+acron     &          361 &           24 &          &           &             \% \\
			
 
				+dhrystone &          752 &           52 &          &           &             \% \\
			
 
				+whet      &          935 &           37 &          &           &             \% \\
			
 
				+slalom    &         4177 &          227 &          &           &             \% \\
			
 
				+clinpack  &         3523 &              &          &           &              \% \\
			
 
				+\hline
			
 
				+\end{tabular}
			
 
				+
			
 
				+\begin{tabular}{|c|c|c|c|c|c|}
			
 
				+\hline
			
 
				+Benchmark & Original     & Removed      & Original & Optimized & Performance \\
			
 
				+          & Instructions & instructions & cycles   & cycles    & boost(cycles)\\
			
 
				 \hline
			
 
				 pi        &           94 &      2       &    1714468   &   1714362      &   0.006182676 \%       \\
			
 
				 acron     &          361 &      19      &    4435687   &   4372825      &   1.417187462 \%       \\
			
 
				-dhrystone &          752 &      36      &    2887710   &   2742720      &   5.020933542 \%       \\  
			
 
				+dhrystone &          752 &      36      &    2887710   &   2742720      &   5.020933542 \%       \\
			
 
				 whet      &          935 &      23      &    2864526   &   2840042      &   0.854731289 \%       \\
			
 
				 slalom    &         4177 &      107     &    2879140   &   2876105      &   0.143480345 \%       \\
			
 
				-clinpack  &         3523 &      49      &    1543746   &   1528406      &   1.353201887  \%       \\ 
			
 
				+clinpack  &         3523 &      49      &    1543746   &   1528406      &   1.353201887  \%       \\
			
 
				 \hline
			
 
				 \end{tabular}
			
 
				 
			
@@ -363,15 +434,14 @@ Code not writing $regB  ->  ...
 
				 ...                         ...
			
 
				 addu $regC, $regB, 4        move $regC, $regD
			
 
				 
			
 
				-
			
 
				 # Constant folding
			
 
				-li $regA, constA                ""       
			
 
				-sw $regA, 16($fp)               ""
			
 
				-li $regA, constB        ->      ""
			
 
				-sw $regA, 20($fp)               ""	
			
 
				-lw $regA, 16($fp)               "" 
			
 
				-lw $regB, 20($fp)               ""
			
 
				-addu $regA, $regA, $regA        $li regA, (constA + constB) at compile time
			
 
				+li $2, 2                    $2 = 2
			
 
				+sw $2, 16($fp)              16($fp) = 2
			
 
				+li $2, 3                    $2 = 3
			
 
				+sw $2, 20($fp)          ->  20($fp) = 3
			
 
				+lw $2, 16($fp)              $2 = 16($fp) = 2
			
 
				+lw $3, 20($fp)              $3 = 20($fp) = 3
			
 
				+addu $2, $2, $3             change to "li $2, 0x00000005"
			
 
				 
			
 
				 # Copy propagation
			
 
				 move $regA, $regB           move $regA, $regB