report.tex 15.7 KB
Newer Older
Jayke Meijer's avatar
Jayke Meijer committed
1 2
\documentclass[10pt,a4paper]{article}
\usepackage[latin1]{inputenc}
3 4
\usepackage{amsmath,amsfonts,amssymb,booktabs,graphicx,listings,subfigure}
\usepackage{float,hyperref}
Jayke Meijer's avatar
Jayke Meijer committed
5 6

\title{Peephole Optimizer}
7
\author{Jayke Meijer (6049885), Richard Torenvliet (6138861), Tadde\"us Kroes
Jayke Meijer's avatar
Jayke Meijer committed
8
    (6054129)}
Jayke Meijer's avatar
Jayke Meijer committed
9 10

\begin{document}
11

Jayke Meijer's avatar
Jayke Meijer committed
12
\maketitle
13
\tableofcontents
14

15
\pagebreak
Jayke Meijer's avatar
Jayke Meijer committed
16 17

\section{Introduction}
18

Jayke Meijer's avatar
Jayke Meijer committed
19
The goal of the assignment is to implement the optimization stage of the
20 21 22 23 24 25 26 27 28 29 30 31 32
compiler. To reach this goal the parser and the optimizer part of the compiler
have to be implemented.

The output of the xgcc cross compiler on a C program is our input. The output
of the xgcc cross compiler is in the form of Assembly code, but not optimized.
Our assignment includes a number of C programs. An important part of the
assignment is parsing the data. Parsing the data is done with Lex and Yacc. The
Lexer is a program that finds keywords that meets the regular expression
provided in the Lexer. After the Lexer, the Yaccer takes over. Yacc can turn
the keywords in to an action.

\section{Design}

Richard Torenvliet's avatar
Richard Torenvliet committed
33
There are two general types of optimizations of the assembly code, global
34
optimizations and optimizations on a so-called basic block. These optimizations
35
will be discussed separately
36 37 38 39 40 41

\subsection{Global optimizations}

We only perform one global optimization, which is optimizing branch-jump
statements. The unoptimized Assembly code contains sequences of code of the
following structure:
Jayke Meijer's avatar
Jayke Meijer committed
42
\begin{verbatim}
43 44
    beq ...,$Lx
    j $Ly
Jayke Meijer's avatar
Jayke Meijer committed
45 46
$Lx:   ...
\end{verbatim}
47 48 49 50 51 52 53 54
This is inefficient, since there is a jump to a label that follows this code.
It would be more efficient to replace the branch statement with a \texttt{bne}
(the opposite case) to the label used in the jump statement. This way the jump
statement can be eliminated, since the next label follows anyway. The same can
of course be done for the opposite case, where a \texttt{bne} is changed into a
\texttt{beq}.

Since this optimization is done between two series of codes with jumps and
Jayke Meijer's avatar
Jayke Meijer committed
55
labels, we can not perform this code during the basic block optimizations.
56 57 58 59 60 61 62 63 64 65

\subsection{Basic Block Optimizations}

Optimizations on basic blocks are a more important part of the optimizer.
First, what is a basic block? A basic block is a sequence of statements
guaranteed to be executed in that order, and that order alone. This is the case
for a piece of code not containing any branches or jumps.

To create a basic block, you need to define what is the leader of a basic
block. We call a statement a leader if it is either a jump/branch statement, or
66 67
the target of such a statement. Then a basic block runs from one leader until
the next leader.
68 69 70 71 72 73 74 75

There are quite a few optimizations we perform on these basic blocks, so we
will describe the types of optimizations here in stead of each optimization.

\subsubsection*{Standard peephole optimizations}

These are optimizations that simply look for a certain statement or pattern of
statements, and optimize these. For example,
Jayke Meijer's avatar
Jayke Meijer committed
76
\begin{verbatim}
77
mov $regA,$regB
78
instr $regA, $regA,...
Jayke Meijer's avatar
Jayke Meijer committed
79
\end{verbatim}
80
can be optimized into
Jayke Meijer's avatar
Jayke Meijer committed
81
\begin{verbatim}
82
instr $regA, $regB,...
Jayke Meijer's avatar
Jayke Meijer committed
83
\end{verbatim}
84 85 86 87 88 89 90 91 92 93 94 95
since the register \texttt{\$regA} gets overwritten by the second instruction
anyway, and the instruction can easily use \texttt{\$regB} in stead of
\texttt{\$regA}. There are a few more of these cases, which are the same as
those described on the practicum page
\footnote{\url{http://staff.science.uva.nl/~andy/compiler/prac.html}} and in
Appendix \ref{opt}.

\subsubsection*{Common subexpression elimination}

A more advanced optimization is common subexpression elimination. This means
that expensive operations as a multiplication or addition are performed only
once and the result is then `copied' into variables where needed.
Richard Torenvliet's avatar
Richard Torenvliet committed
96 97
\begin{verbatim}

98 99 100 101
addu	$2,$4,$3              addu = $t1, $4, $3
...                        mov = $2, $t1
...                   ->   ...
...                        ...
102
addu	$5,$4,$3              mov = $4, $t1
Richard Torenvliet's avatar
Richard Torenvliet committed
103 104

\end{verbatim}
105 106 107 108 109 110 111

A standard method for doing this is the creation of a DAG or Directed Acyclic
Graph. However, this requires a fairly advanced implementation. Our
implementation is a slightly less fancy, but easier to implement.
We search from the end of the block up for instructions that are eligible for
CSE. If we find one, we check further up in the code for the same instruction,
and add that to a temporary storage list. This is done until the beginning of
Jayke Meijer's avatar
Jayke Meijer committed
112
the block or until one of the arguments of this expression is assigned.
113

Jayke Meijer's avatar
Jayke Meijer committed
114 115 116 117
We now add the instruction above the first use, and write the result in a new
variable. Then all occurrences of this expression can be replaced by a move of
from new variable into the original destination variable of the instruction.

118
This is a less efficient method then the DAG, but because the basic blocks are
Jayke Meijer's avatar
Jayke Meijer committed
119 120
in general not very large and the execution time of the optimizer is not a
primary concern, this is not a big problem.
121

122
\subsubsection*{Fold constants}
Jayke Meijer's avatar
Jayke Meijer committed
123 124 125 126 127
Constant folding is an optimization where the outcome of arithmetics are
calculated at compile time. If a value x is assigned to a certain value, lets
say 10, than all next occurences of \texttt{x} are replaced by 10 until a
redefinition of x. Arithmetics in Assembly are always performed between two
variables or a variable and a constant. If this is not the case the calculation
128
is not possible. See \ref{opt} for an example. In other words until the current
Jayke Meijer's avatar
Jayke Meijer committed
129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145
definition of \texttt{x} becomes dead. Therefore reaching definitions analysis
is needed. Reaching definitions is a form of liveness analysis, we use the
liveness analysis within a block and not between blocks.

During the constant folding, so-called algebraic transformations are performed
as well. Some expression can easily be replaced with more simple once if you
look at what they are saying algebraically. An example is the statement
$x = y + 0$, or in Assembly \texttt{addu \$1, \$2, 0}. This can easily be
changed into $x = y$ or \texttt{move \$1, \$2}.

Another case is the multiplication with a power of two. This can be done way
more efficiently by shifting left a number of times. An example:
\texttt{mult \$regA, \$regB, 4    ->  sll  \$regA, \$regB, 2}. We perform this
optimization for any multiplication with a power of two.

There are a number of such cases, all of which are once again stated in
appendix \ref{opt}.
146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171

\subsubsection*{Copy propagation}

Copy propagation `unpacks' a move instruction, by replacing its destination
address with its source address in the code following the move instruction.

This is not a direct optimization, but this does allow for a more effective
dead code elimination.

The code of the block is checked linearly. When a move operation is
encountered, the source and destination address of this move are stored. When
a normal operation with a source and a destination address are found, a number
of checks are performed.

The first check is whether the destination address is stored as a destination
address of a move instruction. If so, this move instruction is no longer valid,
so the optimizations can not be done. Otherwise, continue with the second
check.

In the second check, the source address is compared to the destination
addresses of all still valid move operations. If these are the same, in the
current operation the found source address is replaced with the source address
of the move operation.

An example would be the following:
\begin{verbatim}
172 173 174 175 176 177
move $regA, $regB           move $regA, $regB
...                         ...
Code not writing $regA, ->  ...
$regB                       ...
...                         ...
addu $regC, $regA, ...      addu $regC, $regB, ...
178 179 180 181 182
\end{verbatim}
This code shows that \texttt{\$regA} is replaced with \texttt{\$regB}. This
way, the move instruction might have become useless, and it will then be
removed by the dead code elimination.

183 184 185 186 187 188 189 190 191 192
\subsection{Dead code elimination}

The final optimization that is performed is dead code elimination. This means
that when an instruction is executed, but the result is never used, that
instruction can be removed.

To be able to properly perform dead code elimination, we need to know whether a
variable will be used, before it is overwritten again. If it does, we call the
variable live, otherwise the variable is dead. The technique to find out if a
variable is live is called liveness analysis. We implemented this for the
193
entire code, by analysing each block, and using the variables that come in the
194 195
block live as the variables that exit its predecessor live.

196 197 198 199 200 201
\section{Implementation}

We decided to implement the optimization in Python. We chose this programming
language because Python is an easy language to manipulate strings, work
object-oriented etc.
It turns out that a Lex and Yacc are also available as a Python module,
Jayke Meijer's avatar
Jayke Meijer committed
202
named PLY(Python Lex-Yacc). This allows us to use one language, Python, instead
203
of two, i.e. C and Python. Also no debugging is needed in C, only in Python
Jayke Meijer's avatar
Jayke Meijer committed
204
which makes our assignment more feasible.
Richard Torenvliet's avatar
Richard Torenvliet committed
205

206 207 208
The program has three steps, parsing the Assembly code into a datastructure we
can use, the so-called Intermediate Representation, performing optimizations on
this IR and writing the IR back to Assembly.
Richard Torenvliet's avatar
Richard Torenvliet committed
209

210
\subsection{Parsing}
Richard Torenvliet's avatar
Richard Torenvliet committed
211

212 213 214
The parsing is done with PLY, which allows us to perform Lex-Yacc tasks in
Python by using a Lex-Yacc like syntax. This way there is no need to combine
languages like we should do otherwise since Lex and Yacc are coupled with C.
Richard Torenvliet's avatar
Richard Torenvliet committed
215

216 217
The decision was made to not recognize exactly every possible instruction in
the parser, but only if something is for example a command, a comment or a gcc
218
directive. We then transform per line to an object called a Statement. A
219 220 221 222 223
statement has a type, a name and optionally a list of arguments. These
statements together form a statement list, which is placed in another object
called a Block. In the beginning there is one block for the entire program, but
after global optimizations this will be separated in several blocks that are
the basic blocks.
Richard Torenvliet's avatar
Richard Torenvliet committed
224

225
\subsection{Optimizations}
Richard Torenvliet's avatar
Richard Torenvliet committed
226

227 228 229 230
The optimizations are done in two different steps. First the global
optimizations are performed, which are only the optimizations on branch-jump
constructions. This is done repeatedly until there are no more changes.

231
After all possible global optimizations are done, the program is separated into
232 233 234
basic blocks. The algorithm to do this is described earlier, and means all
jump and branch instructions are called leaders, as are their targets. A basic
block then goes from leader to leader.
Richard Torenvliet's avatar
Richard Torenvliet committed
235

236 237 238
After the division in basic blocks, optimizations are performed on each of
these basic blocks. This is also done repeatedly, since some times several
steps can be done to optimize something.
Richard Torenvliet's avatar
Richard Torenvliet committed
239

240
\subsection{Writing}
Richard Torenvliet's avatar
Richard Torenvliet committed
241

242
Once all the optimizations have been done, the IR needs to be rewritten into
243 244
Assembly code. After this step the xgcc crosscompiler can make binary code from
the generated Assembly code.
Richard Torenvliet's avatar
Richard Torenvliet committed
245

246 247 248
The writer expects a list of statements, so first the blocks have to be
concatenated again into a list. After this is done, the list is passed on to
the writer, which writes the instructions back to Assembly and saves the file
Jayke Meijer's avatar
Jayke Meijer committed
249 250 251
so we can let xgcc compile it. The original statements can also written to a
file, so differences in tabs, spaces and newlines do not show up when checking
the differences between the optimized and non-optimized files.
Richard Torenvliet's avatar
Richard Torenvliet committed
252

Jayke Meijer's avatar
Jayke Meijer committed
253 254 255
\subsection{Execution}

To execute the optimizer, the following command can be given:\\
Jayke Meijer's avatar
Jayke Meijer committed
256 257 258 259 260
\texttt{./main.py <original file> <optimized file> <rewritten original file>}\\
There is also a script available that runs the optimizer and automatically
starts the program \emph{meld}. In meld it is easy to visually compare the
original file and the optimized file. The command to execute this script is:\\
\texttt{./run <benchmark name (e.g. whet)>}\\
261

Jayke Meijer's avatar
Jayke Meijer committed
262
\section{Testing}
263

Jayke Meijer's avatar
Jayke Meijer committed
264 265 266 267 268
Of course, it has to be guaranteed that the optimized code still functions
exactly the same as the none-optimized code. To do this, testing is an
important part of out program. We have two stages of testing. The first stage
is unit testing. The second stage is to test whether the compiled code has
exactly the same output.
269

Jayke Meijer's avatar
Jayke Meijer committed
270
\subsection{Unit testing}
271

Jayke Meijer's avatar
Jayke Meijer committed
272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295
For almost every piece of important code, unit tests are available. Unit tests
give the possibility to check whether each small part of the program, for
instance each small function, is performing as expected. This way bugs are
found early and very exactly. Otherwise, one would only see that there is a
mistake in the program, not knowing where this bug is. Naturally, this means
debugging is a lot easier.

The unit tests can be run by executing \texttt{make test} in the root folder of
the project. This does require the \texttt{textrunner} module.

Also available is a coverage report. This report shows how much of the code has
been unit tested. To make this report, the command \texttt{make coverage} can
be run in the root folder. The report is than added as a folder \emph{coverage}
in which a \emph{index.html} can be used to see the entire report.

\subsection{Ouput comparison}

In order to check whether the optimization does not change the functioning of
the program, the output of the provided benchmark programs has to be compared
to the output after optimization. If any of these outputs is not equal to the
original output, our optimizations are to aggressive, or there is a bug
somewhere in the code.

\section{Results}
296

Jayke Meijer's avatar
Jayke Meijer committed
297 298 299 300 301 302
The following results have been obtained:\\
\begin{tabular}{|c|c|c|c|c|c|}
\hline
Benchmark & Original     & Optimized    & Original & Optimized & Performance \\
        & Instructions & instructions & cycles   & cycles    &  boost(cycles)\\
\hline
303 304 305
pi        &           94 &              &  1714468 &           &             \\
acron     &          361 &              &  4435687 &           &             \\
dhrystone &          752 &              &  2887710 &           &             \\
306
whet      &          935 &              &          &           &             \\
307
slalom    &         4177 &              &  2879140 &           &             \\
308
clinpack  &         3523 &              &          &           &             \\
Jayke Meijer's avatar
Jayke Meijer committed
309 310
\hline
\end{tabular}
311

312
\pagebreak
313 314
\appendix

315
\section{List of all optimizations}
316 317 318 319 320

\label{opt}

\textbf{Global optimizations}

Jayke Meijer's avatar
Jayke Meijer committed
321 322 323 324 325 326 327 328 329 330
\begin{verbatim}
    beq ...,$Lx             bne ...,$Ly
    j $Ly               ->  $Lx:   ...
$Lx:   ...


    bne ...,$Lx             beq ...,$Ly
    j $Ly               ->  $Lx:   ...
$Lx:   ...
\end{verbatim}
Jayke Meijer's avatar
Jayke Meijer committed
331
\textbf{Standard basic block optimizations}
332

Jayke Meijer's avatar
Jayke Meijer committed
333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355
\begin{verbatim}
mov $regA,$regA         ->  --- // remove it


mov $regA,$regB         ->  instr $regA, $regB,...
instr $regA, $regA,...


instr $regA,...             instr $4,...
mov [$4-$7], $regA      ->  jal XXX
jal  XXX


sw $regA,XXX            ->  sw $regA, XXX
ld $regA,XXX


shift $regA,$regA,0     ->  --- // remove it


add $regA,$regA,X       ->  lw ...,X($regA)
lw ...,0($regA)
\end{verbatim}
356
\textbf{Advanced basic block optimizations}
Jayke Meijer's avatar
Jayke Meijer committed
357 358

\begin{verbatim}
359
# Common subexpression elimination
Jayke Meijer's avatar
Jayke Meijer committed
360 361 362 363 364
addu $regA, $regB, 4        addu $regD, $regB, 4
...                         move $regA, $regD
Code not writing $regB  ->  ...
...                         ...
addu $regC, $regB, 4        move $regC, $regD
Jayke Meijer's avatar
Jayke Meijer committed
365 366 367


# Constant folding
368 369 370 371 372 373 374
li $regA, constA                ""       
sw $regA, 16($fp)               ""
li $regA, constB        ->      ""
sw $regA, 20($fp)               ""	
lw $regA, 16($fp)               "" 
lw $regB, 20($fp)               ""
addu $regA, $regA, $regA        $li regA, (constA + constB) at compile time
375

Jayke Meijer's avatar
Jayke Meijer committed
376 377 378 379 380 381 382
# Copy propagation
move $regA, $regB           move $regA, $regB
...                         ...
Code not writing $regA, ->  ...
$regB                       ...
...                         ...
addu $regC, $regA, ...      addu $regC, $regB, ...
383 384 385 386 387 388 389 390 391 392 393 394


# Algebraic transformations
addu $regA, $regB, 0    ->  move $regA, $regB

subu $regA, $regB, 0    ->  move $regA, $regB

mult $regA, $regB, 1    ->  move $regA, $regB

mult $regA, $regB, 0    ->  li   $regA, 0

mult $regA, $regB, 2    ->  sll  $regA, $regB, 1
Jayke Meijer's avatar
Jayke Meijer committed
395
\end{verbatim}
396

Jayke Meijer's avatar
Jayke Meijer committed
397
\end{document}