Last updated: Fri 14 Apr 2023 17:18:09 AEST.
COMP4403 - Compilers and Interpreters
Assignment 2
Due date: 15:00 Thursday 04 May 2023
This is an individual assignment which involves modifying the assignment 2
compiler for the language PL0 (provided on the course web page with this
assignment) to add record (struct) and pointer types.
Assignment Compiler files
All sources for the assignment PL0 compiler are available as a2.zip (below). Please
be sure to use the version for this assignment and not the one used for the
tutorials or another assignment. There are differences (like the lexical tokens you
need for this assignment are only defined in this version).
a2.zip Save this .zip file and follow the instructions for setting up a
compiler project in IntelliJ
Setting up and running PL0 in IntellliJ
Brief documentation on assignment 2 files
Please pay attention to course Blackboard announcments, and ensure you follow
the course discussion board (https://edstem.org/) for any updates and further
information on the assignment.
Do not use imports for external packages other than those in java.util.*.
Note that IntelliJ may offer the option of importing an external package to
resolve an issue; please avoid taking this option because it will often add
an erroneous import that you will not need. Such imports lead to the
compilation failing in the environment in which your compiler will be
assessed because that environment may not include the external libraries.
Please check you are not importing external libraries before submitted.
You must only modify the files that must be submitted (see below).
You must not modify any other files because we will be testing your
implementation using the existing other files with your submitted files.
Please do not reformat the files because we would like to just print the
differences between the originals and the versions you hand in.
Please keep the length of lines in your files below 100 characters, so that
we can print them sensibly.
Please avoid using non-standard characters, e.g. Chinese characters,
including in the comments. Non-standard characters are not accepted by
some Java compilers and all comments should be readable by the person
assessing your assignment.
Your implementation should be in Java 1.8. Set the IntelliJ preferences for
the Java compiler to 1.8 (or use the "-source 1.8" option to the Java
compiler).
Please remove any debugging output before your assignment is
submitted because debugging output will cause your program to fail our
automated testing of your assignment.
Either avoid using tabs or set your tabs stops to 4 spaces (this is the
default for IntelliJ) so that your files will print sensibly.
Read the fine print in detail before you start! And, most important, when you
have finished implementing the assignment, come back and reread the fine print
again.
Overview
Records and pointers
Records are similar to C structs (and very simple Java classes). They only have
fields (no methods). As well as records, we have a pointer type. Unlike Java
classes, we distinguish between a record object and a pointer to that object.
Records are accessed via value (rather than reference), i.e., assignment of records
copies all fields rather than assigning a reference. However, for a variable of
type pointer to a record we only assign the pointer.
Record and pointer types may be declared in a type definition, for example, we
can declare List as the type of a pointer to a value of type Node. The
type Node is a record that contains two fields: x of type integer, and next of
type List (i.e., pointer to a Node).
type
Node = record x: int; next: List end;
List = ^ Node;
For example, one may declare variables r and r2 to be of a record type,
and p and q to be pointers as follows.
var r: Node; r2: Node; p : List; q: List;
A pointer value may be assigned a dynamically allocated object of its base type
via a new expression.
p := new List
Fields of a record r may be assigned appropriate values:
r.x := 2; r.next := p
and the values of the fields may be accessed:
write r.x
Assignment of whole records is also allowed, e.g.,
r2 := r
assigns both fields of r2 the corresponding fields of r.
A record constructor allows a value of record type to be built, e.g., the following
assignment to r is equivalent to the two assignments to the fields of r above.
r := Node{ 2, p }
A record constructor consists of the type identifier of a record followed by a list of
expressions in braces that are assigned to the fields of the record. The order of
the expressions corresponds to the order of declaration of the fields in the record
type.
The value pointed to by a pointer is referenced by p^ and the fields of that object
by p^.x and p^.next. For example, we may set the x field of the object pointed
to by p to be 34 via the assignment
p^.x := 34
and we may dynamically allocate a new record object and set the next field of the
record pointed to by p to point to that object by
p^.next := new List
Note that a new List expression can be used anywhere an expression of
type List is expected. Arbitrary chains of dereferencing are allowed. For
example, p^.next^.next references the next field of the object pointed to by
the next field of the object pointed to by p. Such references may be used either
on the left side of an assignment or on the right side within an expression where
a value of that type (List in the example) is expected. For example, we may
increment the field x of the object p^.next by the statement:
p^.next^.x := p^.next^.x + 1
Pointers may be directly assigned. For example, after the assignment
q := p
both q and p both point to the same object. If we then perform the assignment
q^.next := nil
where nil is the special pointer constant for a null pointer, then because
both q and p point to the same object, the value of p^.next will also be nil.
Pointers may also be compared (but only for equality and inequality), so after the
above assignment the comparison
if p^.next = nil then ...
will return true and the if statement will take the then branch.
Syntax Changes
The reserved keywords "record" and "new" have already been added to the lexical
analyser as the tokens KW_RECORD and KW_NEW, and the symbols "{", "}", "."
and "^" have been added as tokens LCURLY, RCURLY, PERIOD and POINTER. They
have also been added to the terminals definitions in PL0.cup.
The syntax for record and pointer type definitions follows. It is given in EBNF, but
you won't be able to use the EBNF directly in the parser generator Java-CUP.
Type → SubrangeType | TypeIdentifier | RecordType | PointerType
RecordType → "record" Fields "end"
Fields → Field { ";" Field }
Field → IDENTIFIER ":" TypeIdentifier
PointerType → "^" TypeIdentifier
A reference to a field of a record can be used as an LValue either within an
expression or on the left side of an assignment. A pointer dereference can also be
used as an LValue. A "new" expression or a record expression can be used as a
Factor in an expression.
LValue → IDENTIFIER | LValue "." IDENTIFIER | LValue "^"
Factor → ... | "new" TypeIdentifier | TypeIdentifier "{" ExpList "}"
ExpList → Condition { "," Condition }
Note that a field of a record may itself be of type record. Hence the above syntax
has an LValue before the "." rather than an IDENTIFIER and similarly there is an
LValue before "^".
The Parser Generator Java-CUP
The parser specification for the compiler is specified in the file PL0.cup. You will
need to add productions (and their associated actions) to the specification and
then run the parser generator Java-CUP (manually or automatically) to generate
the files CUPParser.java and CUPToken.java. Do not modify these two Java files
directly (even if you think you understand them (do you really?)) - remake them
from PL0.cup. You should make the compiler before you change anything just to
see what forms of messages to expect. When you make the compiler (before you
modify it) there will be some warning messages about the terminal symbols
like ILLEGAL being declared but never used; these are to be expected at this
stage. Any new warning/error messages will be significant. Beware that if there
are Java-CUP errors reported, the Java files for the parser will not be generated,
so check for Java-CUP errors first. There is HTML documentation for Java-
CUP available from the class web page (with the assignment).
The Scanner Generator JFlex
All the lexical tokens for this version of the compiler have already been added to
the lexical analyser.
The file Lexer.java is automatically generated by the scanner
generator JFlex from PL0.flex; again, do not modify Lexer.java -
remake Lexer.java from PL0.flex.
Both Java-CUP and JFlex are available with the assignment files on the course
web page, with instructions on how to run them in IntelliJ. Java archives for Java-
CUP and JFlex are part of the IntelliJ project for the assignment.
Static Semantic Restrictions
Records
The names of fields of a record must be distinct.
All of the type identifiers used for fields must be defined type identifiers.
A field of a record may be of any type, including a pointer or record type but may
not include the same record type as being declared either directly or indirectly,
e.g. the following is invalid both because R directly includes R and because R
indirectly includes R, via Q.
type R = record x : int; p : Q; next: R end;
Q = record y : int; q : R end;
The rule for well formedness of a record type is given below.
i,j ∈ 1..n ? i ≠ j ? idi ≠ idj
j ∈ 1..n ? syms ? typeof(tj) = Tj
syms ? typeof(record id1: t1; id2: t2; ... idn: tn end) = RecordType([ (id1, T1), (id2,
T2), ... (idn, Tn) ])
A reference to a field of a record (e.g., r.x) consists of an LValue, (e.g., r), which
must be a record and a field name, (e.g., x), which must be one of the fields of
that record. Note that an LValue may appear on either the left or right side of an
assignment. Hence the two uses of ref in the following rule.
j ∈ 1..n
syms ? e: ref(RecordType([ (id1, T1), (id2, T2), ... (idn, Tn) ]))
syms ? e.idj: ref(Tj)
For example,
syms ? r: ref(RecordType([ (x, int), (b, boolean) ]))
syms ? r.x : ref(int)
For a record constructor, its type identifier must be a record type and it gives the
type of the constructed record. The expressions in the type constructor must
match the fields of the record in the order in which they were declared in the
record type declaration. Each expression must be assignment compatible with its
corresponding field.
id ∈ dom(syms)
syms(id) = TypeEntry(RecordType([ (id1, T1), (id2, T2), ... (idn, Tn) ]))
j ∈ 1..n ? syms ? ej : Tj
syms ? id{ e1, e2, ... en } : RecordType([ (id1, T1), (id2, T2), ... (idn, Tn) ])
Assignments between records are allowed, provided the left and right sides of the
assignment are of the same type. Two types are considered equivalent if they are
the same type identifier, or if one is a type identifier, B, that is defined as equal to
another type identifier, A, i.e.,
type
A = record x: int; y: int end;
B = A;
[This makes the implementation simpler than doing structural equivalence of
records.]
Comparison of records is not supported.
Pointers
The identifier in the pointer type definition must be a type identifier. It can be of
any type (not necessarily a record type as in the example).
The rule for well formedness of a pointer type is given below.
syms typeof(t) = T
syms typeof(^t) = PointerType(T)
To be dereferenced (using "^") an LValue, p, must be of type ref(PointerType(T)),
for some type T. The type of the result of the pointer dereference is ref(T).
syms e : ref(PointerType(T))
syms e^ : ref(T)
The identifier in a new expression must be a pointer type. Hence, for a type
identifier id,
id ∈ dom(syms)
syms(id) = TypeEntry(PointerType(B))
syms ? (new id) : PointerType(B)
Assignment of pointer values and comparison of pointer values, for equality and
inequality only, is allowed provided the types are equivalent. If T is declared as a
pointer type (i.e., PointerType(B) for some type B) then the equality and inequality
operators have the following additional types.
= : T x T → boolean
≠ : T x T → boolean
An exception here is the special constant nil which is a pointer value that is
compatible with all pointer types. The constant nil has already been defined. It is
of the special pointer type NIL_TYPE, which is compatible with any pointer type.
Hence for any type identifier id,
id ∈ dom(syms)
syms(id) = TypeEntry(PointerType(B))
syms ? nil : PointerType(B)
Two pointer types are considered equivalent if their base types are equivalent.
e.g.,
type
C = ^A; D = C; E = ^A;
Types C, D and E are all compatible.
Note that Type.java already includes record and pointer types.
Dynamic Semantics
Records
A record must be allocated enough space to contain values for all its fields, either
on the stack if it is a local variable, or on the heap if it is dynamically allocated
using new.
A field reference, r.x, acts like a variable of the type of the field and hence can be
used on either the left or right sides of assignments.
For the record assignment
r := p^
all the fields of r are assigned the corresponding fields of the record pointed to
by p.
Because we are copying values and not references, all the fields need to be
assigned (see the Object Code section for a way to do this simply).
Pointers
The dynamic semantics of the new expression and of accessing a pointer are
conventional.
If p is of type ref(PointerType(T)) then a pointer dereference, p^, corresponds to
the object (of type ref(T)) pointed to by the pointer p. If the value of the
pointer p being dereferenced is nil the dereference is invalid (a run time error).
The expression new List dynamically allocates space on the heap for an object of
the type pointed to by List and returns the address of that new object. Objects
dynamically allocated via a new have a life time that is not restricted to that of
the variable via which they were allocated. For example, an object may be created
within a procedure using a local variable within the procedure to perform
the new. Provided that variable's value (the pointer to the new object) is assigned
to a variable or field that is accessible via variables global to the procedure,
before the procedure exits, the object will remain accessible.
Although we dynamically allocate objects via the new statement, we won't
implement garbage collection of objects that are no longer accessible.
Object Code
The PL0 compiler generates code for the Stack Machine. A description of the
stack machine is available in StackMachineHandout.pdf. See also the
file StackMachine.java (near the end) for details of the instruction set.
Records
To implement record assignments the stack machine includes
the LOAD_MULTI and STORE_MULTI instructions, both of which take a count of
the number of words to load from or store to an address.
the source address for loading (or target address for storing) is in second
top of stack, and
the number of words to be loaded (or stored) is in the top of the stack.
Pointers
There is an instruction, ALLOC_HEAP, which assumes that there is an integer on
the top of the stack that represents the size of the object it is to allocate (new). It
pops that value from the stack and replaces it with the absolute address of a
newly allocated object of that size. The stack machine does not support disposing
objects or garbage collection.
If there is insufficient space then ALLOC_HEAP will fail with a "memory overflow"
message. In the implementation of the stack machine there is a single array
representing the whole of memory: the stack (the bottom end of the array) and
the heap of dynamically allocated objects (the top end). If either pushing onto the
stack reaches the lower limit of the heap, or allocating on the heap reaches the
top of the stack, then there is insufficient memory and the program aborts (with
the same error message in both cases).
As the instructions LOAD_FRAME and STORE_FRAME expect an address that is an
offset from the frame pointer for the current (procedure) scope, you will need to
use the instruction TO_LOCAL which converts an absolute address (the pointer
value) into an address relative to the current frame pointer.
The STOP instruction with a parameter of StackMachine.NIL_POINTER indicates a
nil pointer dereference error.
Tests
Some tests are available in the test-pgm directory (in a2.zip), and can be used to
help you to debug your code. All of the tests can be run together using
the Test_LALR configuration. You can also individually run a test
using PL0_LALR on a selected PL0 program. The test cases of the form test-
base*.pl0 are useful for regression testing, to make sure that you haven't broken
any existing functionality in the compiler, and the other tests can help you find
bugs in the new compiler features.
Student Misconduct
Students are reminded of the University's policy on student misconduct, including
plagiarism. See the course profile and the School web
page http://www.itee.uq.edu.au/itee-student-misconduct-including-plagiarism
You are expected to protect your files so that they are not readable by other
users.
Your assignment is expected to be your own individual work and must not
include code copied from other students, current or past. You are also reminded
not to post your (partial) solutions to assignments to any place accessible by
others (before or after the assignment deadline), including the course discussion
board or emailing to other students. If you need that sort of help consult the
lecturer and/or tutor. Note that course discussion board allows private posts to
the instructors.
This assignment compiler is provided solely for the purposes of doing this
assignment and your solutions must never be shared, especially publicly, even
after completion of the course. Such publication would be considered both
student misconduct and a breach of copyright.
Late Submission
See section 5.3 of the course profile for details.
If there are documented medical or exceptional circumstances that will affect your
ability to complete the assignment by the due date, then you can apply for an
extension. Extensions to non-exam assignments must be requested via my.UQ.
You can find instructions on how to submit your request online.
If the assignment is submitted after the deadline, without an approved extension,
a late penalty will apply. The late penalty shall be 10% of the maximum possible
mark for the assessment item will be deducted per calendar day (or part thereof),
up to a maximum of seven (7) days. After seven days, no marks will be awarded
for the item. A day is considered to be a 24 hour block from the assessment item
due time. Negative marks will not be awarded.
Submission
Please keep the length of lines in your files below 100 characters, so that we can
print them sensibly. You should avoid using tabs or set your tabs stops to 4
spaces so that when we print them (with tab stops set to 4 spaces) they will print
sensibly. Don't forget to remove any code generating debugging output and
rogue external imports before submission.
You must submit your completed assignment electronically through the
assessment section of the course BlackBoard site (the BlackBoard Assessment
page rather than the course web pages).
You need to submit the following list of individual files, not a .zip file (note that
file names are case-sensitive) for evaluation and marking.
PL0.cup
StaticChecker.java
CodeGenerator.java
ExpNode.java
ExpTransform.java
You can submit your assignment multiple times, but only the last copy submitted
will be retained for marking.
Assessment
The assignment is marked out of a total of 15 marks.
Marks will be allocated as follows:
5 Syntax analysis and tree building
6 Static semantics checking
4 Code generation
Marks will be awarded for the correctness of the changes to each category.
Readability and modular structure will also be criteria. For readability, we expect
that you follow good software engineering practice, such as appropriate choices
of variable names, consistent indentation, appropriate comments where needed,
etc. For modularity we expect you introduce new methods where it makes sense
to help structure the program and to avoid unnecessary duplication of code. Use
of generic Java utility interfaces (like Set, Map, List, Queue, ...) and their
implementations (like HashSet, ..., TreeMap, ..., LinkedList, ...) is encouraged. We
expect you to produce well structured programs that are not unnecessarily
complex, both structurally (e.g. complex control flow that is hard to follow), and in
terms of execution time and space requirements, (e.g. an O(n) algorithm is
preferred to an O(n2) algorithm, and a O(log n) algorithm is even better).
We will not be concerned with the quality of syntactic error recovery because the
parser generator CUP takes care of that for the most part, but you must handle
semantic errors appropriately, including handling the situation when there is a
syntax error, i.e., your compiler should not crash because of a syntax error.
Your assignment files will be compiled in the context of the remaining assignment
files and put through a sequence of tests. The total mark for this assignment will
be limited by the overall success of the development in the following way:
The program submitted does not compile: Maximum 8/15.
The program submitted will not correctly handle any test case with the
new facilities: Maximum 10/15.
You are not required to correct any bugs that may exist in the original compiler.
However, we would appreciate being informed of any such bugs that you detect,
so that we can correct them, or any improvements to the compiler you might like
to suggest.