Last updated: Thu 30 Apr 2020 11:29:17 AEST.
COMP4403/COMP7402 - Compilers and
Interpreters
Assignment 2
Due date: 3pm (15:00) Friday 15th May, 2020
Weighting: 25% for both COMP4403 and COMP7402
Modify the LALR assignment 2 compiler for the PL0 language (provided on the course
web page with this assignment) to add array and enumeration types, operations on arrays
and enumerations, and a for statement.
Assignment Compiler Files
All sources for the assignment PL0 compiler are available as a2.zip (below). Please be
sure to use the version for this assignment and not the one used for the tutorials or
another assignment. There are differences (like the lexical tokens you need for this
assignment are only defined in the assignment version).
a2.zipThu 30 Apr 2020 11:28:08 AEST Save this .zip file and follow the instructions
for setting up a compiler project in IntelliJ
Setting up and running PL0 in IntelliJThu 30 Apr 2020 11:29:17 AEST
Brief documentation on assignment 2 filesThu 30 Apr 2020 11:29:17 AEST
Setting up Java-CUP and JFlex under EclipseThu 30 Apr 2020 11:02:22 AEST
Here is the documentation for
Java CUP [HTML]
JFlex [HTML]
For the most part you will not need these.
https://learn.uq.edu.au/bbcswebdav/pid-5316913-dt-content…_1/courses/COMP4403S_7020_21490/2020-assignment2.html 2020/5/7 9:07
⻚:1/15
Please ensure you follow the course Piazza bulletin board for any updates and further
information on the assignment. Read all the fine print below in detail before you start!
And, most important, when you have finished implementing the assignment, come back
and re-read the fine print again.
Do not use imports for external packages other than those in java.util.*. Note that
IntelliJ may offer the option of importing an external package to resolve an issue;
please avoid accepting this option because it will often add an erroneous import that
you will not need. Such imports lead to the compilation failing in the environment in
which your compiler will be assessed because that environment does not include the
external libraries. Please check you are not importing external libraries before
submitting.
You must only modify the files that must be submitted (see below).
You must not modify any other files because we will be testing your implementation
using the existing other files with your submitted files.
Please do not reformat the files because we would like to just print the differences
between the originals and the versions you hand in.
Please keep the length of lines in your files below 100 characters, so that we can
print them sensibly.
Please avoid using non-standard characters, e.g. Chinese characters, including in
the comments. Non-standard characters are not accepted by the Java compiler
used to test your assignment and all comments should be readable by the person
assessing your assignment.
Your implementation should be in Java Project language level 8. Set the IntelliJ
preferences for the Java project language level 8 under Project structure then
Project (or use the "-source 1.8" option to the command line Java compiler).
Please remove any debugging output before your assignment is submitted
because debugging output will cause your program to fail our automated testing of
your assignment.
Either avoid using tabs or set your tabs stops to 4 spaces (this is the default for
IntelliJ/Eclipse) so that your files will print sensibly.
Overview
Enumerated types
https://learn.uq.edu.au/bbcswebdav/pid-5316913-dt-content…_1/courses/COMP4403S_7020_21490/2020-assignment2.html 2020/5/7 9:07
⻚:2/15
An enumerated type may be declared in a type definition. Subranges of enumerated types
may form types.
type
Day = {Mon,Tue,Wed,Thu,Fri,Sat,Sun};
WeekDay = [Mon..Fri];
Day is an enumerated type with seven elements, Mon, Tue, Wed, Thu, Fri, Sat and Sun,
each of which is a constant of type Day. WeekDay is a subrange with five elements, Mon
through to Fri.
One may declare a variable to be of an enumerated type:
var
d: Day;
w: WeekDay;
These may be assigned values of the enumerated type.
d := Mon;
w := d;
Values of an enumerated type may be compared using all the usual relational operators:
the elements are in increasing order of their position in the definition of the (base)
enumerated type, e.g., Mon < Tue < Wed < Thu < Fri < Sat < Sun. The unary operators
pred and succ can be used to give the predecessor and successor values, respectively, in
the enumeration, e.g. succ Mon = Tue = pred Wed. Note that predecessor and successor
wrap around so that succ Sun = Mon and pred Mon = Sun and hence the following code
calls process for all values of d from Mon to Sun, inclusive, and leaves the final value of d
as Mon (but see the for statement below for a simpler version of this code).
d := Mon;
while d < Sun do
begin
call process();
d := succ d
end;
call process();
d := succ d
https://learn.uq.edu.au/bbcswebdav/pid-5316913-dt-content…_1/courses/COMP4403S_7020_21490/2020-assignment2.html 2020/5/7 9:07
⻚:3/15
Array types
An array type may be declared in a type definition. The index type for an array may be
either a subrange or an enumerated type.
const N = 10;
type
S = [1..N];
V = array S of int;
V2 = array [-N..N] of int;
M = array S of V;
Bookings = array Day of int;
V is an array type with ten elements each of type integer and with indices 1 to 10. V2 is
declared using an explicit subrange, which happens to range over both positive and
negative values; it has 2N+1 elements. M is an array type with ten elements each of type
V, with indices 1 to 10. Each element of M is itself an array of 10 integers, i.e., M contains a
total of one hundred integers. The type Bookings defines an array with seven elements
indexed by elements of type Day.
One may declare a variable to be of an array type:
var
vec : V;
mat: M;
reserved: Bookings;
Elements of arrays may be assigned appropriate values.
vec[1] := 2;
mat[10][1] := 42;
mat[1] := vec; // array assignment
reserved[Mon] := 3;
The values of the elements may be accessed, for example, the following extracts the
diagonal elements of mat and places them in vec.
i := 1;
while i <= N do
https://learn.uq.edu.au/bbcswebdav/pid-5316913-dt-content…_1/courses/COMP4403S_7020_21490/2020-assignment2.html 2020/5/7 9:07
⻚:4/15
begin
vec[i] := mat[i][i];
i := i + 1
end
For statement
You are also required to add a for statement to PL0. The above while loop can be
rewritten using a for statement as follows.
for i : 1..N do
vec[i] := mat[i][i]
od
And the earlier while loop over days can be rewritten as follows.
for d : Mon.. Sun do
call process()
od
The for loop is not quite the same as the while because the control variable of the for
loop is considered local to the loop (see static semantics below).
Syntax Changes
The following lexical tokens have been added to the lexical analyser, PL0.flex. Keywords
are case sensitive.
LCURLY → "{"
RCURLY → "}"
KW_ARRAY → "array"
KW_FOR → "for"
KW_OD → "od"
KW_OF → "of"
KW_PRED → "pred"
KW_SUCC → "succ"
https://learn.uq.edu.au/bbcswebdav/pid-5316913-dt-content…_1/courses/COMP4403S_7020_21490/2020-assignment2.html 2020/5/7 9:07
⻚:5/15
The syntax is given here in extended BNF. You need to transform it to BNF for use in Java-
CUP. The syntax for type definitions (Type) is extended with alternatives for array types
and enumeration types. Be careful to distinguish the use of braces as part of the EBNF
notation (unquoted) from their use as lexical tokens (quoted).
Type → "array" Type "of" Type
Type → "{" EnumerationList "}"
EnumerationList → IDENTIFIER { "," IDENTIFIER }
A reference to an element of an array can be used as an LValue either within an expression
or on the left side of an assignment. Hence we add a new alternative for an LValue.
LValue → LValue "[" Condition "]"
Unary operators are extended to include predecessor and successor.
UnaryOperator → KW_PRED
UnaryOperator → KW_SUCC
Statement has an additional alternative for a for statement.
Statement → "for" IDENTIFIER ":" Condition ".." Condition "do" StatementList "od"
You need to add these productions and their associated actions to build the symbol table
entries and abstract syntax trees to PL0.cup.
The Parser Generator Java-CUP
The parser specification for the compiler is specified in the file PL0.cup. You will need to
add productions (and their associated actions) to the specification and then run the
parser generator Java-CUP (manually or automatically) to generate the files
CUPParser.java and CUPToken.java. Do not modify these two Java files directly (even if
you think you understand them (do you really?)) - remake them from PL0.cup. You should
make the compiler before you change anything just to see what forms of messages to
expect. When you make the compiler (before you modify it) there will be some warning
messages about the terminal symbols like ILLEGAL being declared but never used; these
are to be expected at this stage. Any new warning/error messages will be significant.
Beware that if there are Java-CUP errors reported, the Java files for the parser will not be
https://learn.uq.edu.au/bbcswebdav/pid-5316913-dt-content…_1/courses/COMP4403S_7020_21490/2020-assignment2.html 2020/5/7 9:07
⻚:6/15
generated, so check for Java-CUP errors first. There is HTML documentation for Java-
CUP available from the class web page (with the assignment).
The Scanner Generator JFlex
All the lexical tokens for this version of the compiler have already been added to the lexical
analyser.
The file Lexer.java is automatically generated by the scanner generator JFlex from
PL0.flex; again, do not modify Lexer.java - remake Lexer.java from PL0.flex.
Both Java-CUP and JFlex are available with the assignment files on the course web page,
with instructions on how to run them in IntelliJ. Java archives for Java-CUP and JFlex are
part of the IntelliJ project for the assignment.
Static Semantic Restrictions
Enumerated types
Enumerated types are scalar types. For an enumerated type the elements are treated as
constants of the enumerated type within the scope in which the type is defined. The
elements cannot have names corresponding to other identifiers declared within the same
scope. Obviously, an enumerated type cannot have multiple elements with the same name
and two enumerated types in the same scope must have disjoint sets of elements. The
elements can be treated as Const entries in the symbol table. In the static semantics, an
enumeration consisting of a list of identifiers, ids, is treated as the type enumeration(ids).
Each enumeration type is considered a separate primitive scalar type that is incompatible
with all other types, except its subranges. Note that Type.java has already been extended
with enumerated types, so your main task is to handle the parsing of enumerated type
definitions to set up an EnumerationType.
Expressions of enumeration types may be compared with other elements of the same
enumeration (or a subrange of it) and assigned to variables of the enumeration type or a
subrange of it, provided the value is in the subrange. Note that in the compiler the types
of operators (like "_=_") are defined within the symbol table. You will need to extend the
https://learn.uq.edu.au/bbcswebdav/pid-5316913-dt-content…_1/courses/COMP4403S_7020_21490/2020-assignment2.html 2020/5/7 9:07
⻚:7/15
types of comparison operators to handle comparison of values of an enumerated type.
Hint: adding operator entries is similar to the adding the predefined operators in the class
Predefined (except they do not get added to the predefined scope). The compiler
supports overloaded operators.
The pred and succ unary operators may be applied to an expression of an enumeration
type and yield a value of the same type. The widening and narrowing rules for subranges
apply to subranges of enumeration types in a manner similar to subranges of integers.
Array types
In an array type declaration, array S of T,
the index type, S, must be a subrange type or an enumeration and
the element type, T, can be any type including another array type, but cycles in type
definitions are not permitted, e.g. the following is not valid as the definitions of types
C and D form a cycle.
C = array S of D;
D = array S of C;
An array with index type S and element type T is treated as the type array(S,T). In
Type.java a class ArrayType has already been added to represent array types within the
compiler.
syms ⊢ typeof(S) = subrange(S1,v0,v1)
syms ⊢ typeof(T) = T1
syms ⊢ typeof(array S of T) = array(subrange(S1,v0,v1),T1)
and
syms ⊢ typeof(E) = enumeration(ids)
syms ⊢ typeof(T) = T1
syms ⊢ typeof(array E of T) = array(enumeration(ids),T1)
For a subscripted array reference, e1[e2], e1 must be an L-Value and have a type that is a
reference to an array type, and the type expression e2 used as the array index must be
compatible with the type of the index of the array. The type of a subscripted array, e1[e2]
is a reference to the element type of the array.
syms ⊢ e2 : T1
https://learn.uq.edu.au/bbcswebdav/pid-5316913-dt-content…_1/courses/COMP4403S_7020_21490/2020-assignment2.html 2020/5/7 9:07
⻚:8/15
syms ⊢ e1 : ref(array(T1,T2))
syms ⊢ e1[e2] : ref(T2)
Note that the type of e1[e2] is ref(T2) rather than T2 so that the subscripted array
element can be used as an L-value, e.g., it can be used on the left side of an assignment.
Assignment of whole arrays is allowed but other operations on whole arrays (e.g.,
comparison, etc.) are not supported.
Note that, because the index type is subrange type or an enumeration, the bounds of the
array are fixed (constant) at compile time.
For statement
For a for statement with lower and upper bound expressions, lower and upper, there must
be some scalar type T such that the types of the two expressions are (coercible to) type T
(e.g. by dereferencing or widening - see optDereference and optWidenSubrange in
Type.java).
The control variable, id, is treated as a new variable (of type ref(T)) local to the for
statement. The name of the control variable, id, may be the same as an existing (local or
control) variable, but within the loop any references to id are to the control variable, rather
than the existing variable (unless there is a nested for loop with a control variable with the
same name, which then takes over for its body (and so on)). The control variable is not
allowed to be modified by the loop body.
Given that the predicate isScalarType(T) is true if and only if T is a scalar type, the rule for
a for follows.
isScalarType(T)
syms ⊢ lower : T
syms ⊢ upper : T
syms ⊕ { id ↦ VarEntry(ref(T)) } ⊢ WFStatement(sl)
syms ⊢ WFStatement(for id : lower .. upper do sl od)
In addition, the control variable is read only within sl. To facilitate the implementation of
the for statement, the symbol table entry for a variable (SymEntry.VarEntry) has a flag to
indicate whether or not the variable is read only. In addition, the symbol table (Scope) has
a method, extendCurrentScope, to extend the current scope with a new scope at the
https://learn.uq.edu.au/bbcswebdav/pid-5316913-dt-content…_1/courses/COMP4403S_7020_21490/2020-assignment2.html 2020/5/7 9:07
⻚:9/15
same lexical level. The extended scope can be used to declare the control variable with a
scope which includes only the for statement. When space is allocated for variables
declared in the extended scope they are effectively allocated within the parent scope (or
for an extended scope within an extended scope within the grandparent scope, etc.).
Leaving the extended scope can be done using the method getParent in class Scope.
Dynamic Semantics
Enumerations
Variables and values of enumerated types act like other scalar types (because they are
represented by integer values) and hence nothing special should be required for code
generation except to implement predecessor and successor. The successor of the
greatest element of an enumeration type wraps around to give the least element of the
type and the predecessor of the least element wraps around to give the greatest element
of the type.
Arrays
The dynamic semantics of array accessing is conventional. An element of an array may be
used like a variable whose type is the same as the element type of the array. For example,
if the element type is int then it may be "assigned to", read, written and used in arithmetic
expressions and comparisons (but if you start worrying about each of these you are
making way more work for yourself than you need to).
Accesses to array elements should be bounds checked at run time, i.e., the value of the
index expression should be checked to make sure it is within the subrange defined by the
index type (see the discussion of the BOUND instruction below).
Variables of array type are local variables and hence are allocated on the stack just like
any other local variable (not on the heap (as in Java "new")). The main difference from
scalar variables is that their size is typically not one. The space allocated for an array
should be just enough to fit all of its elements. For example, the array vec requires 10
words (one for each integer) and mat requires 100 words.
https://learn.uq.edu.au/bbcswebdav/pid-5316913-dt-content…_1/courses/COMP4403S_7020_21490/2020-assignment2.html 2020/5/7 9:07
⻚:10/15
For statement
The lower and upper bound expressions are evaluated once to v1 and v2, respectively, at
the start of the execution of the for, not on every iteration. If v1 is greater than v2, then
the subrange is empty, and hence the body is not executed at all. Otherwise, the for
statement executes the statements in its body for each value in the subrange from v1 up
to v2, inclusive, in increasing order. The control variable is assigned the corresponding
value at the start of each iteration. The control variable may not be modified (it is read
only) within the body of the for loop. It is only modified (incremented) by the for
statement itself.
Object Code
The PL0 compiler generates code for the Stack Machine. A description of the stack
machine is available in Inst.pdf. See also the file StackMachine.java (near the end) for
details of the instruction set.
Array indexing and bounds checking
The BOUND instruction expects that the following have been loaded (pushed) onto the
stack (in this order):
1. a value to be bounds checked,
2. a lower bound, and
3. an upper bound.
The BOUND instruction pops the upper and lower bounds and checks that the value (now
on top of the stack) is in the range. If it is in range, the value is left on the top of the stack
(but the upper and lower bounds have been removed) and execution continues with the
next instruction. If the value is out of bounds, the stack machine interpreter halts and
prints out an error message that the value is out of range.
Array copying
https://learn.uq.edu.au/bbcswebdav/pid-5316913-dt-content…_1/courses/COMP4403S_7020_21490/2020-assignment2.html 2020/5/7 9:07
⻚:11/15
The code generation for the assignment statement currently handles assignments of
items larger than a single word by using LOAD_MULTI and STORE_MULTI. If you would
like to experiment with doing array assignments more efficiently, the stack machine has a
COPY instruction, which takes three arguments from the stack which should be pushed in
the following order:
the source (or from) address, and
the target (or to) address,
the size of the block of memory to be moved.
For statement
It is desirable to reduce the overhead for repeated iterations of a loop because it may be
executed a large number of times. You only need to generate code for the lower and upper
bound expressions of the for loop once. In fact, it will be incorrect to evaluate the upper
bound expression on every iteration because the body of the loop may change variables in
that expression.
Student Misconduct
Students are reminded of the University's policy on student misconduct, including
plagiarism. See the course profile and the School web page
http://www.itee.uq.edu.au/itee-student-misconduct-including-plagiarism
You are expected to protect your files so that they are not readable by other users.
Your assignment is expected to be your own individual work and must not include code
copied from other students, current or past. You are also reminded not to post your
(partial) solutions to assignments to any place accessible by others, including the bulletin
board or emailing to other students. If you need that sort of help consult the lecturer
and/or tutor. Note that Piazza allows private posts to the instructors.
This assignment compiler is provided solely for the purposes of doing this assignment and
your solutions must never be shared, especially publicly, even after completion of the
course. Such publication would be considered both student misconduct and a breach of
copyright.
https://learn.uq.edu.au/bbcswebdav/pid-5316913-dt-content…_1/courses/COMP4403S_7020_21490/2020-assignment2.html 2020/5/7 9:07
⻚:12/15
Late Submission
A penalty of 5% of the maximum mark for an assignment will be deducted for each day
(24 hours) or part thereof late up to a limit of 7 days, after which time assignments will not
be accepted for assessment unless you have been granted an extension.
As we plan to hand back assignments a week or two after submission, requests for an
extension will not be accepted more than one week late, unless there are exceptional
circumstances.
Requests for extensions should be accompanied by suitable documentation (see the
course profile for details).
Personal hardware or computer failures are not grounds for extension. Assignments must
be backed up on the university system.
Submission
Please keep the length of lines in your files below 100 characters, so that we can print
them sensibly. You should avoid using tabs or set your tabs stops to 4 spaces so that
when we print them (with tab stops set to 4 spaces) they will print sensibly. Do not forget
to remove any code generating debugging output and any rogue external imports before
submission.
You must submit your completed assignment electronically through the assessment
section of the course BlackBoard site (the BlackBoard Assessment page rather than the
course web pages).
You need to submit the following list of individual files (not a .zip or any other form of
archive file) for evaluation and marking. Note that file names are case-sensitive.
PL0.cup
ExpNode.java
ExpTransform.java
StatementNode.java
StatementTransform.java
https://learn.uq.edu.au/bbcswebdav/pid-5316913-dt-content…_1/courses/COMP4403S_7020_21490/2020-assignment2.html 2020/5/7 9:07
⻚:13/15
StatementVisitor.java
StaticChecker.java
CodeGenerator.java
You can submit your assignment multiple times, but only the last copy submitted will be
retained for marking.
Assessment
The assignment is marked out of a total of 20 marks. Marks will be allocated as follows:
6 Syntax analysis and tree building
7 Static semantics checking
7 Code generation
Marks will be awarded for the correctness of the changes to each category. Readability,
modular structure, and structural and computational complexity are also criteria. For
readability, we expect that you follow good software engineering practice, such as
appropriate choices of variable names, consistent indentation, appropriate comments
where needed, etc. For modularity we expect you introduce new methods where it makes
sense to help structure the program and to avoid unnecessary duplication of code. Use of
generic Java utility interfaces (like Set, Map, List, Queue, ...) and their implementations
(like HashSet, ..., TreeMap, ..., LinkedList, ...) is encouraged. We expect you to produce
well structured programs that are not unnecessarily complex, both structurally (e.g.
complex control flow that is hard to follow), and in terms of execution time and space
requirements, (e.g. an O(n) algorithm is preferred to an O(n ) algorithm, and a O(log n)
algorithm is even better).
We will not be concerned with the quality of syntactic error recovery because the parser
generator CUP takes care of that for the most part, but you must handle semantic errors
appropriately, including handling the situation when there is a syntax error, i.e., your
compiler should not crash because of a syntax error.
Your assignment files will be compiled in the context of the remaining assignment files and
put through a sequence of tests. The total mark for this assignment will be limited by the