首页
编程语言
数据库
网络开发
Algorithm算法
移动开发
系统相关
金融统计
人工智能
其他
首页
>
> 详细
program讲解 、辅导 C/C++设计程序
Assignment 1: Micro Language Compiler
1 Introduction
In this assignment, you are required to design and implement a compiler frontend for Micro
language which transforms the Micro Program into corresponding LLVM Intermediate Representation (IR) and finally translated to RISC-V assembly code and executable with the help of
LLVM optimizer and its RISC-V backend. After that, we can execute the compiled program on our
RISC-V docker container to verify the correctness of your compiler.
Since it is a senior major elective course, we don’t want to set any limitation for you. You are strongly
recommended to use Lex/Flex and Yacc/Bison taught in tutorial 3 to design your compiler frontend,
but it is not forcible. You can choose Any Programming Language you like for this assignment,
but the RISC-V container we use only has C/C++ toolchain installed and you need to provide me a
Dockerfile to run your compiler and execute the RISC-V program, which may need some extra effort.
Some languages also provide tools like Lex and Yacc, and you are free to use them. It is also OK if
you want to design the scanner and parser by hand instead of using tools.
2 Micro Language
Before we move on to the compiler design, it is necessary to have an introduction to Micro Language,
that serves as the input of our compiler. Actually, it is a very simple language, with limited number
of tokens and production rules of context-free grammar (CFG):
• Only integers(i32); No float numbers
• No Declarations
• Variable consists of a-z, A-Z, 0-9, at most 32 characters long, must start with character and are
initialized as 0
• Comments begin with ”−−” and end with end-of-line(EOL)
• Three kind of statements:
– assignments, e.g. a:=b+c
– read(list of IDs), e.g. read(a, b)
– write(list of Expressions), e.g. write (a, b, a+b)
• BEGIN, END, READ, WRITE are reserved words
• Tokens may not extend to the following line
1
2.1 Tokens & Regular Expression
Micro Language has 14 Tokens, and the regular expression for each token is listed below. Since BEGIN
is a reserved word in C/C++, we need to use the alternative BEGIN as the token class.
1. BEGIN : begin
2. END: end
3. READ: read
4. WRITE: write
5. LPAREN: (
6. RPAREN: )
7. SEMICOLON: ;
8. COMMA: ,
9. ASSIGNOP: :=
10. PLUSOP: +
11. MINUSOP: −
12. ID: [a−zA−Z][a−zA−Z0−9 ]{0,31}
13. INTLITERAL: −?[0−9]+
14. SCANEOF: <
>
2.2 Context Free Grammar
Here is the extended context-free grammar (CFG) of Micro Language:
1.
→
SCANEOF
2.
→ BEGIN
END
3.
→
{
}
4.
→ ID ASSIGNOP
;
5.
→ READ LPAREN
RPAREN;
6.
→ WRITE LPAREN
RPAREN;
7.
→ ID {COMMA ID}
8.
→
{COMMA
}
9.
→
{
}
10.
→ LPAREN
RPAREN
11.
→ ID
12.
→ INTLITERAL
13.
→ PLUSOP
14.
→ MINUSOP
Note: {} means the content inside can appear 0, 1 or multiple times.
2.3 How to Run Micro Compiler
Here is a very simple Micro program that we are going to use as the sample program throughout this
instruction. SCANEOF is the end of line and we do not need to explicitly write it in the program.
−− Expected Output: 30
begin
A := 10;
B := A + 20;
write (B);
end
We can use our compiler to compile, optimize and execute this program to get expected output 30.
Note: The exact command to run your compiler is up to you. Just specify out in your report how
to compile and execute your compiler to get the expected output.
118010200@c2d52c9b1339:˜/A1$ ./compiler ./testcases/test0.m
118010200@c2d52c9b1339:˜/A1$ llc −march=riscv64 ./program.ll −o ./program.s
118010200@c2d52c9b1339:˜/A1$ riscv64−unknown−linux−gnu−gcc ./program.s −o ./program
118010200@c2d52c9b1339:˜/A1$ qemu−riscv64 −L /opt/riscv/sysroot ./program
30
2
3 Compiler Design
Figure 1 shows the overall structure of a compiler. In this assignment, your task is to implement the
frontend only, which contains scanner, parser and intermediate code generator and form a whole →
compiler with LLVM for Micro language.
Figure 1: Compiler Structure
3.1 Scanner
Scanner takes input character stream and extracts out a series of tokens, and your scanner should
be able to print out both token class and lexeme for each token. Figure ?? shows an example of the
sample Micro program.
118010200@c2d52c9b1339:˜/A1$ ./compiler ./testcases/test0.m −−scan−only
BEGIN begin
ID A
ASSIGNOP :=
INTLITERAL 10
SEMICOLON ;
ID B
ASSIGNOP :=
ID A
PLUOP +
INTLITERAL 20
SEMICOLON ;
WRITE write
LPAREN (
ID B
RPAREN )
SEMICOLON ;
END end
SCANEOF
3
3.2 Parser
Parser receives the tokens extracted from scanner, and generates a parse tree (or concrete syntax tree),
and futhermore, the abstract syntax tree (AST) based on the CFG. Your compiler should be able to
print out both the parse tree and the abstract syntax tree, and visualize them with graphviz.
3.2.1 Parse Tree (Concrete Syntax Tree)
Figure 2 shows an example of the concrete syntax tree generated from the sample program:
Figure 2: Concrete Syntax Tree of Sample Program
3.2.2 Abstract Syntax Tree
Figure 3 shows an example of the abstract syntax tree generated from the sample program:
Figure 3: Abstract Syntax Tree of Sample Program
4
3.3 Intermediate Code Generator
For all the assignments in this course, the intermediate representation (IR) of the compiler should be
the LLVM IR. There are mainly two reasons why LLVM IR is chosen. One is that LLVM IR can take
advantage of the powerful LLVM optimizer and make up for the missing backend part of compiler in
this course. The other is that LLVM IR can be easier translated into assembly code for any target
machine, no matter MIPS, x86 64, ARM, or RISC-V. This functionality can make our compiler more
compatible to machines with different architecture. The following shows the LLVM IR generated for
the sample Micro program:
; Declare printf
declare i32 @printf (i8 ∗, ...)
; Declare scanf
declare i32 @scanf(i8 ∗, ...)
define i32 @main() {
% ptr0 = alloca i32
store i32 10, i32∗ % ptr0
%A = load i32, i32∗ % ptr0
% 1 = add i32 %A, 20
store i32 % 1, i32∗ % ptr0
%B = load i32, i32∗ % ptr0
% scanf format0 = alloca [4 x i8 ]
store [4 x i8 ] c”%d\0A\00”, [4 x i8]∗ % scanf format0
% scanf str0 = getelementptr [4 x i8 ], [4 x i8]∗ % scanf format0, i32 0, i32 0
call i32 (i8 ∗, ...) @printf (i8∗ % scanf str0 , i32 %B)
ret i32 0
}
3.4 Bonus (Extra Credits 10%)
If you are interested and want to make your compiler better, you may try the following options:
• Add robust syntax error report
• Add a symbol table to make your compiler more complete
• Generate LLVM IR with LLVM C/C++ API instead of simply generating the string
• Optimize the LLVM IR generation plan for more efficient IR
• Any other you can think about...
4 Submission and Grading
4.1 Grading Scheme
• Scanner: 20%
• Parser: 40% (20% for parse tree generator and 20% for AST generation)
• Intermediate Code Generator: 30%
We have prepared 10 test cases, and the points for each section will be graded according to the
number of testcases you passed.
• Technical Report: 10%
If your report properly covers the three required aspects and the format is clean, you will get 10
points.
5
• Bonus: 10%
Refer to section 3.4 for more details. The grading of this part will be very flexible and highly
depend on the TA’s own judgement. Please specify clearly what you have done for the bonus
part so that he do not miss anything.
4.2 Submission with Source Code
If you want to submit source C/C++ program that is executable in our RISC-V docker container,
your submission should look like:
csc4180−a1−118010200.zip
|−
|−−− csc4180−a1−118010200−report.pdf
|−
|−−− testcases
|−
|−−− src
|−
|−−−Makefile
|−−−ir generator.cpp
|−−−ir generator.hpp
|−−−node.cpp
|−−−node.hpp
|−−−scanner.l
|−−−parser.y
|−−−Other possible files
4.3 Submission with Docker
If you want to submit your program in a docker, your submission should look like:
csc4180−a1−118010200.zip
|−
|−−− csc4180−a1−118010200.Dockerfile
|−
|−−− csc4180−a1−118010200−report.pdf
|−
|−−− src
|−
|−−−Makefile
|−
|−−−run compiler.sh
|−
|−−−Your Code Files
4.4 Technical Report
Please answer the following questions in your report:
• How to execute your compiler to get expected output?
• How do you design the Scanner?
• How do you design the Parser?
• How do you design the Intermediate Code Generator?
• Some other things you have done in this assignment?
The report doesn’t need to be very long, but the format should be clean. As long as the three questions
are clearly answered and the format is OK, your report will get full mark.
6
联系我们
QQ:99515681
邮箱:99515681@qq.com
工作时间:8:00-21:00
微信:codinghelp
热点文章
更多
讲解 econ1202 – quantitativ...
2024-11-22
辅导 msds 490: healthcare an...
2024-11-22
讲解 civl 326 geotechnical d...
2024-11-22
辅导 term paper medicine whe...
2024-11-22
讲解 eng3004 course work辅导...
2024-11-22
讲解 ee512: stochastic proce...
2024-11-22
辅导 geog100 ol01 - fall 202...
2024-11-22
辅导 st5226: spatial statist...
2024-11-22
讲解 ece 101a engineering el...
2024-11-22
讲解 database development an...
2024-11-22
讲解 comp3134 business intel...
2024-11-22
讲解 practice exam 2, math 3...
2024-11-22
讲解 project 4: advanced opt...
2024-11-22
辅导 38003 organisational be...
2024-11-22
辅导 economic growth调试spss
2024-11-22
辅导 ee512: stochastic proce...
2024-11-22
讲解 eesb04 "principles of h...
2024-11-22
辅导 am2060 final assignment...
2024-11-22
辅导 acfim0035 fundamentals ...
2024-11-22
辅导 stat 612 (fall 2024) ho...
2024-11-22
热点标签
mktg2509
csci 2600
38170
lng302
csse3010
phas3226
77938
arch1162
engn4536/engn6536
acx5903
comp151101
phl245
cse12
comp9312
stat3016/6016
phas0038
comp2140
6qqmb312
xjco3011
rest0005
ematm0051
5qqmn219
lubs5062m
eee8155
cege0100
eap033
artd1109
mat246
etc3430
ecmm462
mis102
inft6800
ddes9903
comp6521
comp9517
comp3331/9331
comp4337
comp6008
comp9414
bu.231.790.81
man00150m
csb352h
math1041
eengm4100
isys1002
08
6057cem
mktg3504
mthm036
mtrx1701
mth3241
eeee3086
cmp-7038b
cmp-7000a
ints4010
econ2151
infs5710
fins5516
fin3309
fins5510
gsoe9340
math2007
math2036
soee5010
mark3088
infs3605
elec9714
comp2271
ma214
comp2211
infs3604
600426
sit254
acct3091
bbt405
msin0116
com107/com113
mark5826
sit120
comp9021
eco2101
eeen40700
cs253
ece3114
ecmm447
chns3000
math377
itd102
comp9444
comp(2041|9044)
econ0060
econ7230
mgt001371
ecs-323
cs6250
mgdi60012
mdia2012
comm221001
comm5000
ma1008
engl642
econ241
com333
math367
mis201
nbs-7041x
meek16104
econ2003
comm1190
mbas902
comp-1027
dpst1091
comp7315
eppd1033
m06
ee3025
msci231
bb113/bbs1063
fc709
comp3425
comp9417
econ42915
cb9101
math1102e
chme0017
fc307
mkt60104
5522usst
litr1-uc6201.200
ee1102
cosc2803
math39512
omp9727
int2067/int5051
bsb151
mgt253
fc021
babs2202
mis2002s
phya21
18-213
cege0012
mdia1002
math38032
mech5125
07
cisc102
mgx3110
cs240
11175
fin3020s
eco3420
ictten622
comp9727
cpt111
de114102d
mgm320h5s
bafi1019
math21112
efim20036
mn-3503
fins5568
110.807
bcpm000028
info6030
bma0092
bcpm0054
math20212
ce335
cs365
cenv6141
ftec5580
math2010
ec3450
comm1170
ecmt1010
csci-ua.0480-003
econ12-200
ib3960
ectb60h3f
cs247—assignment
tk3163
ics3u
ib3j80
comp20008
comp9334
eppd1063
acct2343
cct109
isys1055/3412
math350-real
math2014
eec180
stat141b
econ2101
msinm014/msing014/msing014b
fit2004
comp643
bu1002
cm2030
联系我们
- QQ: 99515681 微信:codinghelp
© 2024
www.7daixie.com
站长地图
程序辅导网!