LW1
DA53: COMPILERS AND LANGUAGE
THEORY
Lexical Analysis
by hand
00 | Goal
The goal of this on-computer tutorial session is to write a small lexical analyzer (or lexer) by hand.
The development could be done in Java, C# or Python.
Remember that the lexical analyzer takes a source program, and scans it, and replies the recognized
tokens when they are requested by the parser.
Definitions :
• Lexeme : Sequence of characters that is recognized by the parser (usually expressed by
regular expression)
• Token : Symbolic name of a lexeme
During this on-computer tutorial, we must :
Define the language of the source program, its syntax and its semantic.
Write a symbol table
Write the lexer
01 | Language Description
The language to implement in this Labwork (LW) is a part of the Tiny Basic dialect of the BASIC
language.
01.1 Statements
The language is composed of statements, one per line. If this ST we do not recognize
multiple statements on the same line. A line must follow one of the following formats :
statement
number statement
where number is the number of the line, and statement is the instruction of the language.
Lexical
Analyzer Parser
Symbol
Table
tokens
syntax
tree
source
program
getNextToken
The supported statements are :
PRINT expression
▪ Print the given expression on the standard output
IF expression relational_operator expression THEN statement
▪ The statement is executed only if the given relational operator on the two
given expressions replies true.
▪ The relational operators are :
= equality
< less than
> greater than
<= less or equal
>= greater or equal
<>, >< not equal
GOTO expression
▪ Run the statement at the line de)ned by the expression.
INPUTvariable [, variable ...]
▪ Read variable values from the standard input .
LET variable = expression
▪ Assign the value of the expression to the variable.
GOSUB expression
▪ Run the statement at the line de)ned by the expression, but in opposite
than GOTO, remember the line of the GOSUB and run the statement just
after that line when the statement RETURN is encountered.
RETURN
▪ Run the statement at the line following the last GOSUB.
END
▪ Stop the program.
REM text
▪ Insert a comment in the source code.
01.2 Expressions
Many of these statement types involve the use of expressions. An expression is the
combination of one or more numbers, strings or variables, joined by operators, and
possibly grouped by parentheses.
There are four operators: +, -, *, /
Note that the operators + and – have two versions : the binary (a+b) and the unary (+a).
A string of characters is enclosed by the quote characters (").
01.3 Variable Definition and Scope
The variables do not need to be declared. The scope of all the variables is global. So that
a variable, when it is used, could be undened.
All the variables are of numerical type or of string type, depending on the type of the
expression assigned to the variable.
01.4 Example
Let the following Tiny Basic program :
5 LET S = 0
10 INPUT NUM
15 INPUT V
20 LET N = NUM
30 IF N <= 0 THEN GOTO 99
40 LET S = S + V
50 LET N = N - 1
70 GOTO 30
99 PRINT S/NUM
100 END
Your lexer must reply the following tokens, in that order :