CMSC 216 Project #3 Spring 2021
Date due: Wednesday, March 3, 11:59:59 p.m.
1 Introduction and purpose
In this project you will write some functions to manipulate the instructions of a fictional simple processor in a family of
new CPUs being developed by the Banana computer company, to run all of its future devices. The CPU version used in
this project, named N2, has a limited set of instructions, is able to access just a small amount of memory, and is primarily
intended for use in embedded systems, like a toaster for example. (The computational needs of a toaster are not high.)
One purpose of the project is to use the features of C covered recently– bit operators, arrays, and pointers– but another
important objective is to introduce hardware and assembly language concepts, which will be revisited later in the course.
Much of this project description explains the assembly and machine language concepts involved, as well as the hypothetical
N2 CPU. The description of the functions to write is just two and a half pages of this assignment.
Although the amount of code to be written in this project is not large compared to some CMSC 132 projects most
students should find it significantly harder than Projects #1 and #2, so start working on it right away. Before starting to
code, look at the public tests to see how the functions are called, and look at the definitions in the header file machine.h
described below. Also first study bit masking from lecture, which you need to understand very well, and if needed, ask
about it in the TAs’ office hours before starting to code. Lastly, read the entire project completely and carefully before
starting to code. You may need to read it several times while writing the project.
Due to the size of the course it is not feasible for us to be able to provide project information or help via email/ELMS
messages, so we will be unable to answer such questions. You are welcome to ask any questions verbally during the TAs’
online office hours (or during, before, or after online discussion section or lecture if time permits). However, you cannot
wait to write projects in a course with nearly 550 students. If many students wait until the last few days to write or
finish projects– especially challenging ones– and have questions, the TAs will not be able to help everyone in office hours.
If you are in office hours near when a project is due you may have to wait a long time, and may not even be able to get help
at all. This would not be the TAs’ fault; it would be because you waited until too late to do the project, given the size of the
course. The only way to avoid this is starting on projects early, in case you do have to ask questions in office hours.
1.1 Extra credit and number of submissions
If you make only one submission for this project, it passes all the public tests, and it is made at least 48 hours before
the on–time deadline (i.e., at least two days early), we will give you 10 extra–credit bonus points on the project. In
order to make only one submission and have it pass all the public tests you will have to read this assignment very carefully,
and throughly test your functions yourself, so you are confident of doing well on the secret tests, because if you find a bug
later and want to submit again you won’t get the bonus points. You will also have to use good style all during your coding,
because your single submission is of course the one that is going to be graded for style.
However, as before, if you make too many submissions for this project you may again lose credit. The purpose of the
extra credit and the point deductions for excess submissions is to motivate you to test your code thoroughly yourself.
There really is little reason to make any submissions that don’t compile– you should be compiling your code yourself
and fixing problems before submitting. Other than right before the project deadline, there really is little reason to make any
submissions that do not pass all of the public tests– you should be running your code on them yourself, comparing your
output to the correct output as described in Appendix A.1, and fixing any problems, before submitting.
(If for some reason your program passes all the public tests on Grace, but doesn’t work on the submit server when
submitted, so you have to submit more than once– and this is not due to an error on your part– you can talk with me
verbally in office hours about receiving the extra credit despite having more than one submission.)
2 Machine details and background concepts
This section explains many background concepts, while the following one describes the functions to be written.
2.1 Hardware concepts
For a program written in any computer language to be executed on a particular machine it must be translated to the
instructions that the machine’s processor can execute. These instructions are referred to as machine instructions, but we
usually just say instructions. Instructions manipulate information in memory and also in the processor’s registers. Memory
© 2021 L. Herman; all rights reserved 1
is used to hold data, such as variables, and programs themselves are also stored in memory in modern systems. Registers
are locations that store values like memory except they are internal to the CPU, so they are accessed much more quickly.
Registers temporarily store values that instructions operate upon, as well as the results of instructions.
The fundamental operation of any CPU is to get an instruction from memory, figure out what it says to do, perform that
operation, and go on to get the next instruction; this is called the fetch–decode–execute cycle. (This is discussed in more
detail in the Bryant & O’Hallaron text.) Although the instruction sets of different processors vary widely, instructions can
be categorized by functionality, for example:
computation instructions: These perform various arithmetic and logical operations.
flow of control instructions: By default instructions are executed sequentially in memory, but there are instructions that
affect which instruction is executed next, for implementing conditionals and repetition, as well as function calls.
data movement instructions: These transfer data values between memory and registers, between different registers, and
sometimes between different memory locations.
invoking the operating system: These instructions are used to do things like halt a program or to access parts of the
hardware that user programs are not allowed to directly access, for example, I/O devices. Some instructions allow
user programs to make system calls to get the operating system to perform these types of tasks for them.
2.2 Machine specifications
The hypothetical N2 processor has a 32–bit (4 byte) word length, meaning that instructions, registers, and memory words
are all 32 bits. (As mentioned in lecture, processors manipulate information in multibyte quantities called words.) Machines
that use the N2 CPU are quite small, with very limited system resources. They have only 32768 bytes of memory, grouped
as 8192 four–byte words (8192 × 4 = 32768). The first byte in memory has address 0. Memory addresses always refer
to bytes. Memory is only word–addressable, meaning that although each byte in memory has its own unique address, the
CPU can only access memory in four–byte quantities, using the address of the first (or low–order) byte of a four–byte word.
Consequently, the addresses that can be used to access memory are 0, 4, 8, etc., up to 32764. The value of a memory word
can be interpreted as an instruction and executed, or treated as data.
The N2 processor has 27 different instructions. It has seven registers, each of which as mentioned holds 32 bits. These
are referred to using the names R0 through R6. The N2 has what is called a load/store architecture, which means that only
some data movement instructions access memory. One of them loads a value from a memory location into a register and
another one stores a value from a register into a memory location, while all other instructions, including computations,
operate upon values in registers and put their result into a register. So computation typically involves loading operands
from memory into registers, doing calculations with them, and storing the results from registers back into memory.
One N2 register, R6, has a special purpose. It is the program counter, which always contains the memory address of
the next instruction to be fetched and executed. A flow of control instruction can modify the program counter to cause
execution to jump to somewhere else in a program. All other instructions cause the program counter to be incremented
by 4. So unless specified by a flow of control instruction, the processor executes instructions sequentially in memory. R6
cannot be directly modified, for example, by storing the result of a computation in it. (Instructions can use the value in R6,
but not directly modify it.) The other registers may be used or modified by the various instructions.
2.3 Hardware, components of instructions, and instruction format
Since the N2’s processor has a 32–bit word length, all instructions, just like all data, must fit into a 32–bit word. When a
32–bit word is interpreted as an instruction, it is considered to consist of fields representing(a) an “opcode”, i.e., operation
of the instruction, (b) an extension, whose use will be described below, (c) up to three registers used as operands by some
instructions, and (d) a memory address or constant/immediate field that holds either a memory address or a constant (literal)
numeric operand for some instructions. The instruction format on the N2 processor is as follows:
5 bits 3 bits 3 bits 3 bits 3 bits 15 bits
opcode extension register1 register2 register3 memory address or constant/literal value
The fields have the following uses and functionality:
opcode: An opcode uniquely identifies the operation that an instruction performs. Short names are used in assembly
language to refer to each type of instruction. (Recall that assembly instructions are just human–readable versions of
machine instructions.) Since the N2 has 27 different opcodes, 5 bits are required to represent an opcode. Since 5 bits
can store 32 different values but the N2 only has 27 opcodes, five five–bit values do not correspond to valid opcodes.
© 2021 L. Herman; all rights reserved 2
extension: One instruction on the N2 has five different variants. For this instruction, a value in the extension field is used to
indicate which variant an instruction represents. Not all values that can be stored in 3 bits represent valid extensions.
register1, register2, register3: As Section 2.4 says, some instructions operate upon one register, others operate on two
registers, and some have three register operands. These three fields of instructions contain numbers indicating the
registers that instructions use as operands. Since the N2 CPU has 7 registers, 3 bits can indicate any of them.
Since 3 bits can store 8 different values but the N2 only has 7 registers, there is one value that can be in a register
field that does not correspond to a valid register.
address or constant: Some of the N2’s instructions have the address of a memory location as an operand. For example,
an instruction may copy the value in memory location 216 to one of the registers, so 216 would be stored in this
field of the instruction. One instruction (li) has a constant or literal value, typically called an immediate in assembly
language, as an operand. For example, an instruction may store the numeric value 250 in one of the registers, so 250
itself would be encoded in the instruction.
This field is 15 bits, which is why the N2 has 32768 bytes of memory (with addresses 0... 32767)– because 215 =
32768. This also means that the N2 CPU only has 32768 different literal (constant) values that can appear in the li
instruction, although values up to 4,294,967,295 can be stored in a 32–bit word.
For example, say an instruction has opcode value 3 and is an instruction that uses all three register operands, and this
particular instruction is going to operate upon registers R1, R2, and R4. Also suppose this instruction doesn’t use the
extension field or the address/constant field, and just has zeros in those 18 bits. Then this instruction would be represented
in a 32–bit memory word as 0x182a000016 , which is 40540569610. (Suggestion– convert the hexadecimal value to binary
and match the bits against the diagram of the instruction format above.) Or suppose an instruction has opcode value 24
and uses one register operand, which is R5, and uses the memory address field to store 21610. If it also had 0s in all of the
unused fields it would be represented in a memory word as 0xc0a000d816 (which is 323171144810 ).
2.4 Names, operands, and effects of the machine instructions
You need to have a basic understanding of the effects of the N2’s machine instructions to write the project, so after the table
below, which indicates which operands are used by each instruction, their effects are briefly described. Each instruction is
listed with the (decimal) number between 0 and 26 that represents its opcode, according to the enum Op_code defined in
machine.h. (To save space on the page the table is in two columns– “val” is the decimal value of the instruction opcodes,
“ext” refers to the extension field of instructions, reg1, reg2, and reg3 are the three register operand fields, and “addr./imm.”
refers to the memory address/immediate operand field.) A checkmark means that an instruction uses that field.
opcode val ext reg1 reg2 reg3 addr./imm.
Note that the sys instruction uses the first register operand only when the value of the extension field is 0...3, not 4.
Also note that different instructions use between 11 bits and 26 bits of their word (there are no instructions that use all 32
bits). Unused fields may just contain any bits at all. Also note that not every 32–bit word represents a valid N2 instruction.
The plus instruction adds the contents of register2 and register3 and puts the result in register1. Similarly, minus does
subtraction (computes the difference of register2 and register3), times does multiplication, div performs division, and mod
© 2021 L. Herman; all rights reserved 3
computes the remainder of a division (modulus). These all apply the operation to their second and third register operands
and put the result in their first register operand. In fact, all instructions that modify a register store their result into their
first register operand, so this is not repeated below. Since a register can only store one value at a time, any instruction that
modifies its first register operand’s value has the effect of overwriting the previous value in that register.
For instance, the instruction times R4 R3 R5 multiplies the values stored in registers R3 and R5, and puts the result in
register R4. And minus R5 R2 R4 subtracts the contents of R4 from the contents of R2 and puts the result into R5.
The neg instruction negates the value in its second register operand (if it is positive it becomes negative, and vice versa),
and abs computes the absolute value of its second operand. For instance, if register R1 contains 216, then after neg R5 R1,
the value in R5 would be −216.
The shl instruction shifts a value to the left and shr shifts it to the right; these both shift their second operand’s value
by the number of bits stored in their third register operand. band, bor, bxor, and bneg perform bitwise and, bitwise or,
bitwise exclusive or, and bitwise negation. The first three apply the operation to their second and third register operands
while bneg is applied to its second register operand.
and performs logical conjunction, or performs logical disjunction, and not performs logical negation. The first two of
these operate upon their second and third register operands while not is applied to its second register operand.
eq, neq, le, lt, ge, and gt compare their two register operands for equality, inequality, less than or equal, less than,
greater than or equal, and greater than, and allow execution to go to another location in a program. If the relationship is true
about the values in their register operands then execution jumps to the memory address in the address/constant field and
continues from there. If not, execution just drops down to continue with the next sequential instruction in memory instead.
move copies its second register operand’s value to its first register operand. lw and sw are the only instructions that
access or modify values in memory locations. lw copies a value from a memory location to a register. It uses the first
register operand and a memory address, and its effect is to copy the value of the word in memory that has that address into
the register specified by the register operand. For example, lw R3 200 would load the value in memory location 200 into
register R3. (Assuming 200 is in decimal, this instruction would be stored in a register or memory word as 0xb86000c816 ,
which is 309329940010 .) sw does the opposite, copying a value from a register to the indicated memory location.
li loads a constant or literal value (li stands for load immediate) into a register. For instance, li R4 236 would load
the numeric value 236 (not the contents of memory location 236) into register number R4. (Assuming 236 is in decimal,
this instruction would be stored in a word or register as 0xc08000ec16 , which is 322961431610 .)
Lastly, sys allows a program to invoke functionality in the operating system, which as mentioned is referred to as a
system call. The N2 operating system has five system calls, differentiated by the value of the extension field. System call
0 reads an integer, system call 1 writes an integer, system calls 2 and 3 read and write a character respectively, and system
call 4 halts the executing program. These are indicated by the values 0 through 4 respectively in the extension field. The
first four use the first register operand, while system call 4 (to halt a program) has no register operands. System calls 0
and 2, which read an integer and a character, modify their first register operand because whatever is read is stored there.
System calls 1 and 3 print the value in their first register operand without modifying the register.
All of the instructions are computation instructions except move, li, lw, and sw, which are data movement instructions,
and eq, neq, le, lt, ge, and gt, which are flow of control instructions. sys invokes the operating system.
The same register may be used for multiple operands of an instruction, such as in add R3 R2 R3 or add R1 R1 R1.
Any operand fields that aren’t indicated in the table above are ignored by that instruction. For example, move has two
register operands, therefore they will be contained in the first two register fields register1 and register2; the extension, the
third register field register3, and the address or immediate field are all not used by a move instruction.
Of the instructions that use the address or constant field, only li uses that field to store an immediate (i.e., constant)
operand; all others use that field to store a memory address.
3 Functions to be written
The header file machine.h contains definitions and the prototypes of the four functions you must write. The functions use
a typedef name Wrd to represent a N2 word. Wrd is an unsigned int, since on the Grace machines an int is four bytes.
3.1 unsigned short print_instruction(Wrd instruction)
This function should interpret its parameter as a N2 instruction, use the bit operators to extract its fields (see the description
and diagram in Section 2.4), and print the instruction’s components on a single output line. However, if the instruction is
© 2021 L. Herman; all rights reserved 4
invalid according to the criteria below, nothing at all should be printed and the function should just return 0. Otherwise, if
the instruction is valid, the function should return 1 after printing the instruction, in this format:
• The instruction’s opcode (the opcode field of the parameter, meaning its leftmost five bits) should be printed using
its name in the table in Section 2.4 (plus, minus, times, div, etc.), spelled exactly as shown there. For example,
if the value of the opcode field is the enum value PLUS (which has the value 0), then plus should be printed. If the
opcode field is MINUS then minus should be printed, etc.
• Following the opcode, the extension and register operands that are actually used by the instruction should be
printed, with the extension (if it is used) followed by the register operands that are used, in the order register1,
register2, and register3. For example, a not instruction uses the first two registers as operands, so those two should
be printed, with the first register operand followed by the second one. A sys instruction uses the extension (usually–
see the next sentence) and the first register as operands, so the extension should be printed first, then the register
operand. (If the value of the extension field is 4 in a sys instruction then the register field should not be printed,
because the system call to halt a program does not have a register operand.)
Register names should be printed in decimal with an R immediately preceding the register number. For example,
registers 0, 1, and 3 would be printed as R0, R1, and R3. For sys, the only instruction that uses the extension, the
extension should just be printed as a single decimal digit between 0 and 4.
• If an instruction uses a memory address operand or an immediate operand that operand should be printed last, in
decimal. If it is a memory address it should be printed using exactly five places, with addresses less than 1000010
printed using as many leading zeros as necessary so they occupy exactly four places. For example, the address
21610 should be printed as 00216. (This can trivially be performed with printf() formatting options covered in
discussion.) Note that immediate operands of li instructions should not be printed with any leading zeros, except
zero itself should be printed as 0. Only memory address operands should have leading 0s if needed.
The printed fields must be separated with one or more whitespace characters (meaning tabs or spaces); the exact
number of whitespace characters is up to you. The printed instruction should not have any whitespace before the opcode
or following the last field printed. A newline should not be printed after the instruction.
For example, the call print_instruction(0x38710000) should print something like shl R3 R4 R2, where the exact
number of spaces or tabs separating the four things printed is up to you (one or more).
As mentioned before, some values that can be stored in a 32–bit N2 word or register don’t represent valid N2 instructions,
and if its parameter represents an invalid instruction this function should just return 0 without printing anything. An
instruction would be invalid in any of these cases:
• The value in its opcode field is invalid. There are only 27 instructions (hence opcodes) on the N2 (with enum values
between PLUS and SYS), so any value outside of that range doesn’t represent an actual opcode.
• If a register operand that is actually used by the instruction has an invalid number. There are only seven registers
on the N2 with numbers 0–6 (and symbolic constant names R0 through R6 defined in machine.h), so the value 7
doesn’t represent a valid register number.
• If it is an instruction that uses the address or constant field to store a memory address and the value of the field is
not evenly divisible by 4. As will come up later, machines often have alignment requirements such that any data item
must begin at a memory address that is a multiple of the size in bytes of that item. Since the N2 has this requirement,
and everything on the N2 occupies a 4–byte word, everything must begin at an address that is divisible by 4.
Note: if the parameters represent an instruction that uses the address or constant field for storing a memory address
its value must be divisible by 4. But if the instruction is an li instruction, which is using the field to represent a
immediate (constant) value, it does not have to be divisible by 4.
• If the parameters represent an instruction that has the effect of modifying (storing a new value into) its first register
operand and reg1 is the special unmodifiable program counter register R6.
The instructions that use and would modify the first register operand are all instructions except for eq, neq, le, lt,
ge, gt, sw, as well as a sys instruction when the value of the extension field is 1, 3, or 4. (A sys instruction with
extension 0 or 2 does modify its first register operand.)
R6 can be used as the second or third register operand of any instructions that use two or three registers, but not as
the first operand of instructions that would modify their first register operand’s value.
© 2021 L. Herman; all rights reserved 5
• If the parameter represents a sys instruction and the extension field is anything other than 0 through 4 it is invalid.
The N2 determines which operating system call should be invoked based on the value of the extension field, and
there are only five possible system calls.
Note: the values of fields that are not used by machine instructions have no effect on validity, and it’s immaterial what
they contain. For instance, a not instruction may have anything at all in its rightmost 18 bits (the third register operand
and memory address fields) and in the 3 extension bits. As long as the opcode, register1, and register2 are correct, a not
instruction is valid. But instructions that do use fields, and have incorrect values in them, are invalid.
Also note: in projects where output has to be produced, students sometimes lose credit for minor spelling or formatting
mistakes. The submit server checks your output automatically, consequently even trivial typos would cause tests to fail.
Due to the size of the course we will not be able to give back any credit lost for failed tests due to spelling or formatting
errors, even if they are extremely minor, because that would necessitate manual checking of every output for hundreds of
students’ projects, which is not possible. Therefore it’s essential for you to check carefully that your output is correct,
register names and opcodes are spelled exactly, and that the output format described above is followed.
3.2 unsigned short disassemble(const Wrd program[], unsigned short num_instrs,
unsigned short *const bad_instr)
A disassembler converts machine language to assembly language. This function will do that for a N2 program stored in
its first parameter program, which is an array of words representing N2 instructions. The parameter num_instrs gives the
number of elements of the array (number of array elements, not number of bytes).
The function’s return value is described below. It should not produce any output at all (not even a newline) if either
program or bad_instr are NULL, or if any of the program’s instructions are invalid (see below). If the instructions in
program are all valid the function should print them all, in this format (see some of the public test outputs for examples):
• Preceding each instruction in the array, its starting N2 memory address should be printed in hexadecimal using
as many leading zeros as necessary so the address occupies exactly four places. A N2 program always begins at
memory address 0. A colon and blank space should follow the address. In printing the memory address, remember
that each array element corresponds to a four–byte word of the machine’s memory.
• Each element of the array should be printed exactly how print_instruction() prints instructions, with each one
followed by a newline (including the last one).
If its parameters are valid and all the instructions in program are valid the function should return 1 after printing the
array contents as described above. The function should not produce any output at all and just return 0 if its parameters
are invalid, which would be when:
• The array memory is NULL, or the pointer bad_instr is NULL.
• num_instrs is larger than the number of 4–byte words in the N2 (meaning larger than the maximum number of
4–byte instructions that can be in memory; note there is a symbolic constant with this value in machine.h).
• Any of the instructions in the array program are invalid, according to the criteria described above in Section 3.1.
In this case the function should also store the location (array index) of the first invalid instruction into the variable
that bad_instr points to. (If the instructions are all valid the variable bad_instr points to should not be modified.)
If the function’s third parameter is non–NULL it may assume that it is a valid pointer that points to a Wrd (unsigned
int) variable, in other words, it may assume that it is not an uninitialized or invalid pointer. This is something that the
function cannot check, and is the caller’s responsibility to ensure instead.
3.3 short where_set(const Wrd program[], unsigned short num_words, unsigned short reg_nbr)
As you should know from CMSC 131 and 132, it is a syntax error in Java to use a local variable in a method before giving it
a value. In C, a (nonstatic) local variable just has a garbage value if it is used before being set. It is commonly an indication
of an error to use something in a program before it’s given a value, and this could even apply in assembly programs to
using the values of registers before putting a value in them. Some systems might initialize all registers to zero before a
program starts, but in others, registers may have the equivalent of garbage values if they are used before being set. Since
using anything before its given a value may be a bug, an assembly programmer might be helped if a tool could perform a
similar type of analysis to what the Java compiler does, in detecting whether any register in an assembly program is used
before it is given a value. This function and the next one will analyze an N2 program to do this.
© 2021 L. Herman; all rights reserved 6
This function should return −1 if program is NULL, or if num_words is larger than the number of 4–byte words in the
N2, or if reg_nbr is not a valid register number for the N2, or if register number reg_nbr is not given a value in any of
the first num_words instructions of program (num_words indicates how many instructions of the array should be checked).
Otherwise the function should return the index or position in the array program of the first instruction where register
number reg_nbr is set or given a value. You should know from the explanation in Section 2.4 which instructions modify a
register and which ones don’t. For example, add modifies its first register operand but sw does not. (As mentioned, for the
instructions that do modify a register, it is always the first register operand that is modified.) Notice that the sys instruction
modifies its first register for two possible values of the extension (0 and 2), but not for the other three possible values. This
function will always return −1 if reg_nbr is R6, since the program counter register can never be explicitly modified by any
instructions. (This probably does not have to be handled as a special case.)
3.4 unsigned short is_safe(const Wrd program[], unsigned short pgm_size,
unsigned short *const bad_instr)
This function should return 1 if the program in program, which has pgm_size instructions, is valid. It should return 0 if
program or bad_instr are NULL, or pgm_size is larger than the number of 4–byte words in the N2 or if the program is
not safe. Safe means that none of the instru