ECE220: Computer Systems and Programming Machine Problem 11
Code Generation for an LC-3 Compiler
This assignment requires you to use recursion to translate a description of a program in a subset of the C language into LC-3 assembly code. Developing such code provides you with experience with the use ofrecursion and pointer-based data structures in C, both of which are important tools. You will also gain a better understanding of the methods and challenges involved in automating translations from one form of program to another. Most of the compiler will be provided to you, including both code that translates C code into an abstract syntax tree (AST) as well as C library routines to support basic arithmetic and I/O on LC-3. Your focus will be on turning an AST into a sequence of LC-3 instructions.
Please read this document in its entirety before you begin to program.
On Reading the Code
You are not expected to read the more than 3,300 lines of C, lexer, parser, and LC-3 code provided to you. You are welcome to do so, of course, but you do not need to read them to complete this assignment. All relevant material from headers, etc., is included in this document. Furthermore, you should be aware that a substantial amount of code (around 4,500 lines!) is automatically generated when you build your compiler, and none of the automatically generated code (such as ece220 lex.yy.c) is meant to be human-readable. Don’t try. If you want to remove the automatically-generated files so that you can see what was there in the original distribution, type make clear.
A Subset of C
To simplify your task, the compiler supports only a narrow subset of the full C programming language.
Support is included for if statements, for loops, return statements, and expressions using most of the C operators. Compound statements are required as opposed to simply being good practice. For example:
int i;
for (i = 0; 10 > i; i++) /* This code will generate a syntax error. */
printf (“%d\n”, i);
for (i = 0; 10 > i; i++) { /* To avoid, include braces around loop body. */
printf (“%d\n”, i);
}
Programs written for this compiler can declare global variables as well as a single subroutine (int main (),of course). Local variables can be declared within main, but variable declarations are not otherwise allowed (for example, within compound statements). Only two types are supported: int’s and one-dimensional arrays of integers. Pointer types are not supported, and you can use arrays only with bracketed expressions for indices. For example:
int a[1];
scanf (“%d”, a); /* NOT legal–will generate a syntax error */
scanf (“%d”, &a[0]); /* legal */
Constant strings can be used, but only as arguments to function calls, as shown by the format strings in the printf and scanf examples above.
Most C operators are supported, including assignment (=), basic arithmetic (+, -, *, /, %, and negation),pre- and post-increment and decrement (++, –), logical operators (!, &&, ||), and comparisons (<, <=, ==,>=, >, !=). Taking the address of a variable or array element is also possible (as is necessary for scanf), but almost no type checking is done for you.
Thus the compiler accepts the code below (as does gcc, but accompanied by warnings):
int i;
int j;
i = &j;
printf (“%d\n”, “Hello!\n”);
printf (“i at %d, j at %d\n”, &i, &j);
scanf (“%d”, i);
The output produced by the printf calls in the code above is unpredictable and of little value, but the scanf call passes the value of variable i instead of its address and thus could change memory unpredictably.
In the code above, i was set previously to the address of j, but such tricks are confusing and error-prone.
Examples of other missing functionality include pointer dereferencing, multi-dimensional arrays, structures,enumerations, defining new types, variable initialization within a declaration, loops other than for loops,switch statements, and bitwise operations. Anything not explicitly mentioned as a type of AST node in the AST section of this document is simply not included in the language, and you need not worry about implementing it. Of course, you won’t be able to use such functionality in C programs on which you want to use your compiler, either.
Code Generation
Your code must generate LC-3 assembly for the main subroutine and print it to stdout. As you know,a C function such as main consists of variable declarations and statements. Variable declarations will be translated for you into a symbol table that your code must use when generating assembly code for each of the statements. The stack frame will already be set up for you, and will be torn down when your code finishes.
Be sure to execute the stack frame teardown—do not insert RET instructions directly into the code that you generate.
This machine problem is not intended to require substantial LC-3 programming on your part. Each of the recursive components that you write should be no more complex than implementing a single step of systematic decomposition, and you may in fact want to review the LC-3 implementation of the conditional and iterative decompositions that we covered early in the course.
In addition to statements, much of the work involved in code generation involves computing the results of expressions. To simplify your task, you are required to use the stack to compute all expressions. As you may recall, any expression can be computed using a stack and some basic rules. To “execute” a literal operand such as a number is seen, push the operand onto the stack. To “execute” an operator, pop operands for the operator from the stack, apply the operator to those operands, and then push the operation result back onto the stack.
The C Library
In order to avoid your having to implement functionality in LC-3, we have written and provided you with a small library of functions to support you when generating the assembly code. The library includes four interfaces that can be used directly from C and another three interfaces intended for your use in implementing basic arithmetic. The calls visible in C are as follows:
int printf (const char* format, …); /* assembly label PRINTF */
int rand (); /* assembly label RAND */
int scanf (const char* format, …); /* assembly label SCANF */
void srand (int new seed); /* assembly label SRAND */
These calls have C-style interfaces: arguments must be put onto the stack in the correct order, and the return value will be on the stack. Except for R6 (decremented to store the return value) and R7 (changed by JSR), no register values are changed by these subroutines. As described in class, the caller is responsible for removing arguments from the stack after the call to any of these functions returns.