# coding: utf-8
# # Homework Three: Building a Calculator: Abstract Syntax Trees, Evaluation, and Memory for Variables
# ## General Homework Policies for CS 320
#
# Please provide answers to these problems by filling in this notebook and submitting it via websubmit by Tuesday 2/13 by midnight. If the lab for this week involved a separate notebook, it is due at the same time. For any problems involving diagrams or drawings, you may try your hand at ASCII art or you may submit an electronic version of your drawings, either using drawing software or by scanning or photographing hand drawings. A message has been posted to Piazza showing you how to automatically include these in your notebook (a nice touch). Make sure in the latter case that the result is legible; if we can not understand your drawing, it will receive no credit.
#
# You may submit by Wednesday midnight for a 10% penalty; after that no credit will be given.
#
# I will drop the lowest homework/lab combination at the end of the term.
#
# For coding problems any reasonable solution, in which you understand the spirit of the exercise and try to learn this week's material through your efforts, will be graded based solely on the test cases. In general, shortcuts to avoid learning the material will be unreasonable. For example, you may not use any Python libraries or functions that make your task dramatically easier (e.g., you may use split(), replace(...), int(...), and float(...), as mentioned below, and any other normal functions, but may NOT use the Python library for regular expressions). Good Python style, using list comprehensions, Numpy, creating helper functions in addition to what is required, etc., when appropriate, are quite reasonable! Check with me if you have questions about this.
#
# Lab problems may be worked on collaboratively at the lab and afterwards, with no need to indicate whom you collaborated with. (Of course, if you simply copy, then you will not learn what you need for homeworks and tests, and your grade will reflect your lack of effort.)
#
# Homework problems must be solved individually, and will be subject to the College rules on plagiarism. In particular, you are absolutely forbidden to copy code or answers from another student or from a source other than my slides and materials I provide to you. We will run the code from notebooks through an electronic service that checks for plagiarism, and any violation will be dealt with harshly, up to and including an automatic drop of one letter grade at the end of the term (this policy will be familiar to those who took CS 111 at BU). Significant offences will be referred to the College Academic Misconduct Committee. These policies are necessary to provide a fair environment for you to do your best work with the expectation that the quality and quantity of your efforts will be rewarded appropriately. Please take this as seriously as I do.
#
# Make sure to click Cell -> Run All before submitting notebooks!!
# ### Lexical analyzer code. DO NOT MODIFY!
# In[10]:
# Token definitions
# Token Categories -- slightly different than in HW01 and HW 02
ident = 0 # used in token as (indent, )
integer = 1 # (integer, )
floating = 2 # (floating, )
plus = 3 # (plus,) rest as this one, no attribute necessary
minus = 4
mult = 5
div = 6
lparen = 7
rparen = 8
equals = 9
let = 10
inTok = 11
error = 12 # (error, ) gives spelling of token that caused the lexical error
# Non-terminals are encoded as pairs using these codes:
S = 19 # (S,) etc.
E = 13
A = 14
L = 15
T = 16
F = 17
I = 20 # added this in the last step
# special token for end of input
end_of_input = 18 # (end_of_input,) will be pretty-printed as ($,)
# pretty-printing lists of tokens
tokenCats = ['ident','integer','floating','plus','minus','mult','div',
'lparen','rparen','=','let','in','error','E','A','L','T','F','$','S','I']
tokenSymbols = ['id','int','float','+','-','*','/',
'(',')','=','let','in','error','E','A','L','T','F','$','S','I']
def tokenToString(t):
if t == None:
return str(t)
elif t[0]
# We will avoid the pathologies explored in lab by only allowing assignments at the top level, so that in the interactive calculator at the end of the homework, you can bind a value into a variable, but you MAY NOT use an assignment as part of an expression. You can see in the CFG how this has been accomplished at the syntactic level. (This is consistent with Python; see the that I posted on the web site for examples of the weird and difficult behavior. that results when you allow this, as is the case in Java, C, and C++.)
#
#
# Recall that expressions evaluate to values, but the assignment statement e.g.,
#
# x = (3+4) # may only be used as a top-level expression and not as a subexpression!
#
# does not yield a value, but has an
# effect on the global memory: the first time a particular identifier is given a value, it is entered into the memory (for us, a Python dictionary); after that the value is updated (and the previous value destroyed).
#
# The let expression (which yields a value) allows you to make a local binding of a value to a local variable, valid only inside the expression to which is prefaced, e.g.,
#
# (let x = 3 in (3 * x) / 2)
#
# (The let expression must be enclosed in parentheses in order to be a subexpression of another expression, but in any case this reinforces the scope of the binding, i.e., that there is a local context inside the parentheses where the binding has effect.)
# This type of expression will require that a recursive evaluation must carry along a list of variable bindings as, essentially, a stack, so that the value of a local variable will be determined by the closest enclosing definition. Thursday's lecture will consider this in detail.
# ### The next two cells contain provided code for the homework; DO NOT MODIFY!
#
# In[11]:
# The CFG. Notice how S defines a top-level expression I, which can either be an assignment A or an expression E.
rules = [((S,),[(I,),(end_of_input,)]), # S -> I $
((I,),[(A,)]), # I -> A
((A,),[(ident,),(equals,),(E,)]), # A -> id = E
((I,),[(E,)]), # I -> E
((F,),[(lparen,),(let,),(A,),(inTok,),(E,),(rparen,)]), # F -> (let A in E)
((E,),[(E,),(plus,),(T,)]), # E -> E + T
((E,),[(E,),(minus,),(T,)]), # E -> E - T
((E,),[(T,)]), # E -> T
((T,),[(T,),(mult,),(F,)]), # T -> T * F
((T,),[(T,),(div,),(F,)]), # T -> T / F
((T,),[(F,)]), # T -> F
((F,),[(minus,),(F,)]), # F -> - F
((F,),[(lparen,),(E,),(rparen,)] ), # F -> ( E )
((F,),[(ident,)]), # F -> id
((F,),[(integer,)] ), # F -> integer
((F,),[(floating,)] ) ] # F -> floating
def toStringRule(r):
s = tokenCats[r[0][0]] + ' := '
for t in r[1]:
s = s + tokenSymbols[t[0]] + ' '
return s
def pprint_rules(lst):
for k in range(len(lst)):
print(str(k) + ": " + toStringRule(lst[k]))# all we really need to know about the rules (since they are embedded in the DFA) is the
pprint_rules(rules)
# In[31]:
# DFA to drive the shift-reduce parser
# The parser DFA will tell you what to do when the stack is
# in a given configuration. A positive number gives the
# state transition, a non-positive number k is a reduction by the
# rule -k, i.e., -4 means a reduction by rule 4 (t -> F) with no
# advance in the inputListOfTokens string. -100 is an error and
# 0 is accept.
err = -100
accept = 0
table = {1: {0:4, 1:12, 2:11, 3:err, 4:9, 5:err, 6:err, 7:10, 8:err, 9:err, 10:err, 11:err, 12:err, 13:2, 14:3, 15:5, 16:7, 17:8, 18:err, 19:err, 20:17},
2: {0:err, 1:err, 2:err, 3:16, 4:15, 5:err, 6:err, 7:err, 8:err, 9:err, 10:err, 11:err, 12:err, 13:err, 14:err, 15:err, 16:err, 17:err, 18:-3, 19:err, 20:err},
3: {0:err, 1:err, 2:err, 3:err, 4:err, 5:err, 6:err, 7:err, 8:err, 9:err, 10:err, 11:err, 12:err, 13:err, 14:err, 15:err, 16:err, 17:err, 18:-1, 19:err, 20:err},
4: {0:err, 1:err, 2:err, 3:-13, 4:-13, 5:-13, 6:-13, 7:err, 8:-13, 9:13, 10:err, 11:-13, 12:err, 13:err, 14:err, 15:err, 16:err, 17:err, 18:-13, 19:err, 20:err},
5: {0:err, 1:err, 2:err, 3:-4, 4:-4, 5:-4, 6:-4, 7:err, 8:-4, 9:err, 10:err, 11:-4, 12:err, 13:err, 14:err, 15:err, 16:err, 17:err, 18:-4, 19:err, 20:err},
6: {0:4, 1:err, 2:err, 3:err, 4:err, 5:err, 6:err, 7:err, 8:err, 9:err, 10:err, 11:err, 12:err, 13:err, 14:20, 15:err, 16:err, 17:err, 18:err, 19:err, 20:err},
7: {0:err, 1:err, 2:err, 3:-7, 4:-7, 5:18, 6:19, 7:err, 8:-7, 9:err, 10:err, 11:-7, 12:err, 13:err, 14:err, 15:err, 16:err, 17:err, 18:-7, 19:err, 20:err},
8: {0:err, 1:err, 2:err, 3:-10, 4:-10, 5:-10, 6:-10, 7:err, 8:-10, 9:err, 10:err, 11:-10, 12:err, 13:err, 14:err, 15:err, 16:err, 17:err, 18:-10, 19:err, 20:err},
9: {0:4, 1:12, 2:11, 3:err, 4:9, 5:err, 6:err, 7:10, 8:err, 9:err, 10:err, 11:err, 12:err, 13:err, 14:err, 15:err, 16:err, 17:27, 18:err, 19:err, 20:err},
10: {0:4, 1:12, 2:11, 3:err, 4:9, 5:err, 6:err, 7:10, 8:err, 9:err, 10:6, 11:err, 12:err, 13:24, 14:3, 15:5, 16:7, 17:8, 18:err, 19:err, 20:err},
11: {0:err, 1:err, 2:err, 3:-15, 4:-15, 5:-15, 6:-15, 7:err, 8:-15, 9:err, 10:err, 11:-15, 12:err, 13:err, 14:err, 15:err, 16:err, 17:err, 18:-15, 19:err, 20:err},
12: {0:err, 1:err, 2:err, 3:-14, 4:-14, 5:-14, 6:-14, 7:err, 8:-14, 9:err, 10:err, 11:-14, 12:err, 13:err, 14:err, 15:err, 16:err, 17:err, 18:-14, 19:err, 20:err},
13: {0:4, 1:12, 2:11, 3:err, 4:9, 5:err, 6:err, 7:10, 8:err, 9:err, 10:6, 11:err, 12:err, 13:14, 14:3, 15:5, 16:7, 17:8, 18:err, 19:err, 20:err},
14: {0:err, 1:err, 2:err, 3:16, 4:15, 5:err, 6:err, 7:err, 8:err, 9:err, 10:err, 11:-2, 12:err, 13:err, 14:err, 15:err, 16:err, 17:err, 18:-2, 19:err, 20:err},
15: {0:4, 1:12, 2:11, 3:err, 4:9, 5:err, 6:err, 7:10, 8:err, 9:err, 10:err, 11:err, 12:err, 13:err, 14:err, 15:err, 16:28, 17:8, 18:err, 19:err, 20:err},
16: {0:4, 1:12, 2:11, 3:err, 4:9, 5:err, 6:err, 7:10, 8:err, 9:err, 10:err, 11:err, 12:err, 13:err, 14:err, 15:err, 16:29, 17:8, 18:err, 19:err, 20:err},
17: {0:err, 1:err, 2:err, 3:err, 4:err, 5:err, 6:err, 7:err, 8:err, 9:err, 10:err, 11:err, 12:err, 13:err, 14:err, 15:err, 16:29, 17:8, 18:0, 19:err, 20:err},
18: {0:4, 1:12, 2:11, 3:err, 4:9, 5:err, 6:err, 7:10, 8:err, 9:err, 10:err, 11:err, 12:err, 13:err, 14:err, 15:err, 16:err, 17:22, 18:err, 19:err, 20:err},
19: {0:4, 1:12, 2:11, 3:err, 4:9, 5:err, 6:err, 7:10, 8:err, 9:err, 10:err, 11:err, 12:err, 13:err, 14:err, 15:err, 16:err, 17:23, 18:err, 19:err, 20:err},
20: {0:err, 1:err, 2:err, 3:err, 4:err, 5:err, 6:err, 7:err, 8:err, 9:err, 10:err, 11:21, 12:err, 13:err, 14:err, 15:err, 16:err, 17:err, 18:err, 19:err, 20:err},
21: {0:4, 1:12, 2:11, 3:err, 4:9, 5:err, 6:err, 7:10, 8:err, 9:err, 10:6, 11:err, 12:err, 13:26, 14:3, 15:5, 16:7, 17:8, 18:err, 19:err, 20:err},
22: {0:err, 1:err, 2:err, 3:-8, 4:-8, 5:-8, 6:-8, 7:err, 8:-8, 9:err, 10:-err, 11:-8, 12:err, 13:err, 14:err, 15:err, 16:err, 17:err, 18:-8, 19:err, 20:err},
23: {0:err, 1:err, 2:err, 3:-9, 4:-9, 5:-9, 6:-9, 7:err, 8:-9, 9:err, 10:err, 11:-9, 12:err, 13:err, 14:err, 15:err, 16:err, 17:err, 18:-9, 19:err, 20:err},
24: {0:err, 1:err, 2:err, 3:16, 4:15, 5:err, 6:err, 7:err, 8:25, 9:err, 10:err, 11:err, 12:err, 13:err, 14:err, 15:err, 16:err, 17:err, 18:err, 19:err, 20:err},
25: {0:err, 1:err, 2:err, 3:-12, 4:-12, 5:-12, 6:-12, 7:err, 8:-12, 9:err, 10:err, 11:-12, 12:err, 13:err, 14:err, 15:err, 16:err, 17:err, 18:-12, 19:err, 20:err},
26: {0:err, 1:err, 2:err, 3:16, 4:15, 5:err, 6:err, 7:err, 8:5, 9:err, 10:err, 11:err, 12:err, 13:err, 14:err, 15:err, 16:err, 17:err, 18:err, 19:err, 20:err},
27: {0:err, 1:err, 2:err, 3:-11, 4:-11, 5:-11, 6:-11, 7:err, 8:-11, 9:err, 10:err, 11:-11, 12:err, 13:err, 14:err, 15:err, 16:err, 17:err, 18:-11, 19:err, 20:err},
28: {0:err, 1:err, 2:err, 3:-6, 4:-6, 5:18, 6:19, 7:err, 8:-6, 9:err, 10:err, 11:-6, 12:err, 13:err, 14:err, 15:err, 16:err, 17:-6, 18:-6, 19:err, 20:err},
29: {0:err, 1:err, 2:err, 3:-5, 4:-5, 5:18, 6:19, 7:err, 8:-5, 9:err, 10:err, 11:-5, 12:err, 13:err, 14:err, 15:err, 16:err, 17:-5, 18:-5, 19:err, 20:err}}
# this reads a list of tokens (i.e., the parsing stack) and returns the state or action where you end up;
# numbers > 0 represent states, so you must keep shifting on the stack; numbers
# Your task for this first problem is to modify this parser so that it constructs abstract-syntax trees as discussed in lecture on Tuesday 2/6. You can see from the test cases in the solution PDF what is required.
# In[26]:
# Code for shift-reduce parser
# pretty-printing parser configurations
def pprintParser(parsingStack,inputListOfTokens):
totalWidth = 80
smallestGap = 6
largestStackLength = int(totalWidth*0.6 - smallestGap/2) # most characters to display
largestInputLength = totalWidth - largestStackLength - smallestGap
p = '| '
for symbol in parsingStack:
p = p + tokenToString(symbol) + ' '
if len(p) > largestStackLength:
ind = int(len(p) - largestStackLength + 9)
p = '| ... ' + p[ind:]
q = ""
for tok in inputListOfTokens[:-1]:
q = q + tokenToString(tok) + ' '
if len(inputListOfTokens) > 0:
q = q + tokenToString(inputListOfTokens[-1])
if len(q) > largestInputLength:
ind = int(largestInputLength - 9)
q = q[:ind] + ' ... '
q = q + ' |'
p = p + (' ' * (totalWidth - len(p) - len(q)))
print(p+q)
# parse takes a list of tokens and determines if it is a
# well-formed arithmetic expression.
def parse(list_of_input_tokens,rules,table,verbose=False):
# add end of input marker
list_of_input_tokens.append((end_of_input,))
# input stack; use pop(0) to remove front item in list
input_stack = list_of_input_tokens
# parsing stack; use append to push onto end and pop() to pop from end
parsing_stack = []
if verbose:
print('Input Tokens: ' + tokenListToString(input_stack) + '\n')
pprintParser(parsing_stack,input_stack)
while( len(input_stack) > 0 ):
# Now we will "lookahead" at the next symbol on the input and ask the automaton what to do
# a positive number means to shift, and a negative number -n means to reduce by rule n
n = action(parsing_stack+[input_stack[0]],table)
if n == accept: # reduce by start rule, success
if verbose:
pprintParser(parsing_stack,input_stack)
return True
elif n == err: # problem on stack!
if verbose:
pprintParser(parsing_stack,input_stack)
return False
elif n > 0: # shift
parsing_stack.append( input_stack.pop(0))
else: # reduce by rule -n
LHS = rules[-n][0]
RHS = rules[-n][1]
lenRHS = len(RHS) # notice that no matching is necessary!
for k in range(lenRHS):
parsing_stack.pop()
parsing_stack.append(LHS)
if verbose:
pprintParser(parsing_stack,input_stack)
return None # this will never be executed
# In[32]:
# demonstrations of the parser -- you may remove these if you like
s = 'x + 3 / (flag_input - -3)'
print('Input String: ' + s + '\n')
print(parse(lexer(s),rules,table))
s = '(((6)))'
print('\nInput String: ' + s + '\n')
print(parse(lexer(s),rules,table))
s = 'x = x + 1'
print('\nInput String: ' + s + '\n')
print(parse(lexer(s),rules,table))
s = '(let x = 7 in (let y = 8 in x * y))'
print('\nInput String: ' + s + '\n')
print(parse(lexer(s),rules,table))
s = 'x + (let y = (let z = 4 in z) * 2 in (let z = (let b = 4 in b) in (let c = (let x = 4 in b) in x))) + 2'
print('\nInput String: ' + s + '\n')
print(parse(lexer(s),rules,table))
s = 'x + (y = 5)'
print('\nIllegal Input String: ' + s + '\n')
print(parse(lexer(s),rules,table))
s = 'let x = 4 in x*2'
print('\nIllegal Input String: ' + s + '\n')
print(parse(lexer(s),rules,table))
# In[15]:
# Pretty-printing code for testing your solution
def ASTToString(t):
if t[0] in [ident,integer,floating]:
return tokenToString(t)
elif len(t) == 2:
return '(' + tokenCats[t[0]] + ',' + ASTToString(t[1]) + ')'
elif len(t) == 3:
return '(' + tokenCats[t[0]] + ',' + ASTToString(t[1]) + ',' + ASTToString(t[2]) + ')'
#else
print("** " + str(len(t)))
print("** " + str(t))
return "ERROR!"
def pprintAST(t):
pprintASTHelper(t,'')
def pprintASTHelper(t,indent):
if (t[0] in [ident,integer,floating]):
print(indent + '(' + tokenCats[t[0]] + ',' + str(t[1]) + ')')
else:
print(indent + tokenCats[t[0]])
for k in range(1,len(t)):
pprintASTHelper(t[k],indent + '\t')
def testParserAST(s,verbose=False):
t = parse(lexer(s),rules,table,verbose)
if t == None:
print("Error: No AST generated!")
else:
print("\nAbstract Syntax Tree:\n")
print(ASTToString(t))
print()
pprintAST(t)
# In[ ]:
# Test 1
s = '(5+4) * (3.4 / -x) - 4'
testParserAST(s)
# In[ ]:
# Test 2
s = 'x = (x + 4 * (3))'
testParserAST(s)
# In[ ]:
# Test 3
s = '(let x = 7 * 2 in x / -2)'
testParserAST(s)
# In[ ]:
# Test 4
s = '(let x = 3 * (-(4.5)) + 8 + (2 + 4.2) in (7 + x))'
testParserAST(s)
# In[ ]:
# Test 5
s = 'x = (let x = 7 in (let y = 8 in x * y))'
testParserAST(s)
# ## Problem Two: Evaluator for Abstract Syntax Trees (10 pts)
#
# For this problem you must write an evaluator for the ASTs that you generated in the previous question.
# I suggest doing this in several stages, and, as usual, checking your solution at each step and making sure it is "bullet-proof" before proceeding.
#
# - Step One: Write a simple recursive evaluator for arithmetic expressions without variables or let expressions
# - Step Two: Add the ability to add variable bindings to the global memory: when evaluating an assignment, you will evaluate the expression on the RHS and add that binding to the global memory; when encountering a variable in an expression, you must look up the value in the memory. Assume that there are no unbound variables.
# - Step Three: Add the ability to raise an unbound variable exception (some details are provided below)
# - Step Four: Add the ability to evaluate let expressions by adding an env parameter to the eval: when you evaluate a let expression, recursively call eval on the expression with the local variable binding added to the environment.
#
# Test your program on the examples shown, which progress through these various steps.
#
# Note that this problem does not use assignments; we will add them in the final problem.
#
# In[ ]:
## Evaluator for ASTs:
# eval(t,env,memory) takes an abstract syntax tree, an environment, and a global memory and evaluates
# the term in the context of the bindings in env, which is a stack, so
# the first occurrence of the binding is the one that is used; if a variable
# is not found in the env, it is looked for in the global memory; if not there
# an unbound variable exception is raised
# global memory is a dictionary of bindings, e.g., { 'x': 4.5, 'y':-1 }
# bindings are in the form. of a tuple (str,value), e.g., ('x',4.5) so
# environment is a list of bindings, e.g., [ ('x',4.5), ('y',-1), ('x',4) ]
# for efficiency, env stack grows from left to right, so push binding on env for
# the recursive call by using env + [binding]
# Use the following for errors with unbound variables
class UnboundVariableException(Exception):
def __init__(self, name):
self.name = name
# To raise an exception involving a variable (ident, s), use raise UnboundVariableException(s)
# Don't worry about catching exceptions, we will do that in the next problem
# look up a variable id (the string representation, not the whole token) in the
# environment from end of list to beginning; if there return the value; if not there, look in memory, if
# not there, raise an exception: raise UnboundVariableException(name)
def lookup(name,env,memory):
pass
def eval(t,env,memory):
pass
# In[ ]:
# test 1
s = '3 * 4'
t = parse(lexer(s),rules,table)
print("Should be 12:")
print(eval(t,[],{}))
# In[ ]:
# test 2
s = 'x * x'
t = parse(lexer(s),rules,table)
print("Should be 64:")
print(eval(t,[('x',4), ('x',8)],{ 'x' : 1}))
# In[ ]:
# test 3
s = 'x * y / z'
t = parse(lexer(s),rules,table)
print("Should be 6.0:")
print(eval(t,[('y',4), ('x',3)],{ 'z' : 2}))
# In[ ]:
# test 4
s = '(let x = 4 in x + y * z)'
t = parse(lexer(s),rules,table)
print("Should be 12:")
print(eval(t,[('y',4), ('x',3)],{ 'z' : 2 }))
# In[ ]:
# test 5
s = '(let x = x+1 in (let x = x+2 in (let x = x+3 in x)))'
t = parse(lexer(s),rules,table)
print("Should be 8:")
print(eval(t,[],{ 'x' : 2 }))
# In[ ]:
# test 6
s = 'z * (let x = (let y = 4 in y - x) in (let y = x in z + x + y))'
t = parse(lexer(s),rules,table)
print("Should be 16:")
print(eval(t,[],{ 'z' : 2, 'x' : 1 }))
# ## Problem Three: Building a Simple Interactive Calculator (similar to Python) [5 pts]
#
# To see the whole motivation for all these code, let's finish by creating an interactive environment which can create bindings in a global environment using assignments and evaluate the expressions in our language. Use the solution PDF to see what the expected behavior. should be.
# In[35]:
# Basic Interpreter
# use the following as a framework for imitating what I provide in the solution pdf
# to catch an exception, use something like this:
# try:
# print("Out["+str(lineNum)+"]: " + str(eval(t,[])))
# except UnboundVariableException as n:
# print("Unbound Variable: " + str(n))
lineNum = 1
global_memory = { }
s = input("\nMyPython 1.0\nInput equation or type \"quit\" to quit:\n\nIn ["+str(lineNum)+"]: ")
while True:
# replace this with code to simulate the behavior. expected
if(s == 'quit'):
print("\nBye!")
break
else:
print("Out["+str(lineNum)+"]: " + s)
lineNum += 1
s = input("\nIn ["+str(lineNum)+"]: ")
# ## Extra for Experts : Error Reporting
#
# No credit, but I'll be impressed.... Rewrite the above code to report both lexical errors and syntax errors in a style. similar to Python. This is not actually that hard, and is obviously an important feature. There are also some clever things you can do with the parsing table to tell the user what token was expected. Talk to me about this if you are interested!