|
|
@@ -105,28 +105,28 @@ addu $2,$4,$3 addu = $t1, $4, $3
|
|
|
... mov = $2, $t1
|
|
|
... -> ...
|
|
|
... ...
|
|
|
-addu $5,$4,$3 mov = $4, $t1
|
|
|
+addu $5,$4,$3 mov = $4, $t1
|
|
|
|
|
|
\end{verbatim}
|
|
|
|
|
|
-
|
|
|
A standard method for doing this is the creation of a DAG or Directed Acyclic
|
|
|
Graph. However, this requires a fairly advanced implementation. Our
|
|
|
implementation is a slightly less fancy, but easier to implement.
|
|
|
We search from the end of the block up for instructions that are eligible for
|
|
|
CSE. If we find one, we check further up in the code for the same instruction,
|
|
|
and add that to a temporary storage list. This is done until the beginning of
|
|
|
-the block or until one of the arguments of this expression is assigned.
|
|
|
+the block or until one of the arguments of this expression is assigned. The temporty storage is
|
|
|
|
|
|
We now add the instruction above the first use, and write the result in a new
|
|
|
variable. Then all occurrences of this expression can be replaced by a move of
|
|
|
from new variable into the original destination variable of the instruction.
|
|
|
|
|
|
-This is a less efficient method then the dag, but because the basic blocks are
|
|
|
+This is a less efficient method then the DAG, but because the basic blocks are
|
|
|
in general not very large and the execution time of the optimizer is not a
|
|
|
primary concern, this is not a big problem.
|
|
|
|
|
|
\subsubsection*{Fold constants}
|
|
|
+Constant folding is an optimization where the outcome of arithmetics are calculated at compile time. If a value x is assigned to a certain value, let's say 10, than all next occurences of \texttt{x} are replaced by 10 until a redefinition of x. Arithmetics in Assembly are always preformed between two constants, if this is not the case the calculation is not possible. See the example for a more clear explanation of constant folding(will come). In other words until the current definition of \texttt{x} becomes dead. Therefore reaching definitions analysis is needed.
|
|
|
|
|
|
|
|
|
|
|
|
@@ -168,7 +168,18 @@ removed by the dead code elimination.
|
|
|
|
|
|
\subsubsection*{Algebraic transformations}
|
|
|
|
|
|
+Some expression can easily be replaced with more simple once if you look at
|
|
|
+what they are saying algebraically. An example is the statement $x = y + 0$, or
|
|
|
+in Assembly \texttt{addu \$1, \$2, 0}. This can easily be changed into $x = y$
|
|
|
+or \texttt{move \$1, \$2}.
|
|
|
+
|
|
|
+Another case is the multiplication with a power of two. This can be done way
|
|
|
+more efficiently by shifting left a number of times. An example:
|
|
|
+\texttt{mult \$regA, \$regB, 4 -> sll \$regA, \$regB, 2}. We perform this
|
|
|
+optimization for any multiplication with a power of two.
|
|
|
|
|
|
+There are a number of such cases, all of which are once again stated in
|
|
|
+appendix \ref{opt}.
|
|
|
|
|
|
\section{Implementation}
|
|
|
|
|
|
@@ -205,7 +216,7 @@ The optimizations are done in two different steps. First the global
|
|
|
optimizations are performed, which are only the optimizations on branch-jump
|
|
|
constructions. This is done repeatedly until there are no more changes.
|
|
|
|
|
|
-After all possible global optimizations are done, the program is seperated into
|
|
|
+After all possible global optimizations are done, the program is separated into
|
|
|
basic blocks. The algorithm to do this is described earlier, and means all
|
|
|
jump and branch instructions are called leaders, as are their targets. A basic
|
|
|
block then goes from leader to leader.
|
|
|
@@ -225,17 +236,57 @@ concatenated again into a list. After this is done, the list is passed on to
|
|
|
the writer, which writes the instructions back to Assembly and saves the file
|
|
|
so we can let xgcc compile it.
|
|
|
|
|
|
-\section{Results}
|
|
|
+\section{Testing}
|
|
|
+
|
|
|
+Of course, it has to be guaranteed that the optimized code still functions
|
|
|
+exactly the same as the none-optimized code. To do this, testing is an
|
|
|
+important part of out program. We have two stages of testing. The first stage
|
|
|
+is unit testing. The second stage is to test whether the compiled code has
|
|
|
+exactly the same output.
|
|
|
|
|
|
-\subsection{pi.c}
|
|
|
+\subsection{Unit testing}
|
|
|
|
|
|
-\subsection{acron.c}
|
|
|
+For almost every piece of important code, unit tests are available. Unit tests
|
|
|
+give the possibility to check whether each small part of the program, for
|
|
|
+instance each small function, is performing as expected. This way bugs are
|
|
|
+found early and very exactly. Otherwise, one would only see that there is a
|
|
|
+mistake in the program, not knowing where this bug is. Naturally, this means
|
|
|
+debugging is a lot easier.
|
|
|
|
|
|
-\subsection{whet.c}
|
|
|
+The unit tests can be run by executing \texttt{make test} in the root folder of
|
|
|
+the project. This does require the \texttt{textrunner} module.
|
|
|
|
|
|
-\subsection{slalom.c}
|
|
|
+Also available is a coverage report. This report shows how much of the code has
|
|
|
+been unit tested. To make this report, the command \texttt{make coverage} can
|
|
|
+be run in the root folder. The report is than added as a folder \emph{coverage}
|
|
|
+in which a \emph{index.html} can be used to see the entire report.
|
|
|
+
|
|
|
+\subsection{Ouput comparison}
|
|
|
+
|
|
|
+In order to check whether the optimization does not change the functioning of
|
|
|
+the program, the output of the provided benchmark programs has to be compared
|
|
|
+to the output after optimization. If any of these outputs is not equal to the
|
|
|
+original output, our optimizations are to aggressive, or there is a bug
|
|
|
+somewhere in the code.
|
|
|
+
|
|
|
+\section{Results}
|
|
|
|
|
|
-\subsection{clinpack.c}
|
|
|
+The following results have been obtained:\\
|
|
|
+\begin{tabular}{|c|c|c|c|c|c|}
|
|
|
+\hline
|
|
|
+Benchmark & Original & Optimized & Original & Optimized & Performance \\
|
|
|
+ & Instructions & instructions & cycles & cycles & boost(cycles)\\
|
|
|
+\hline
|
|
|
+pi & 134 & & 13011 & & \\
|
|
|
+acron & & & 4435687 & & \\
|
|
|
+dhrystone & & & 2887710 & & \\
|
|
|
+whet & & & 2864089 & & \\
|
|
|
+slalom & & & 27270 & & \\
|
|
|
+clinpack & & & 1547941 & & \\
|
|
|
+\hline
|
|
|
+\end{tabular}\\
|
|
|
+\\
|
|
|
+The imput for slalom was 1000 seconds and a minimum of $n = 100$
|
|
|
|
|
|
\section{Conclusion}
|
|
|
|