讲解CSE1400 编程、C/C++讲解、调试C/C++程序

CSE1400 Lab Course Manual
Delft University of Technology
Edition 2018-20191
1Revision 20180912T1637
Contents
Acknowledgements
The CSE1400 lab course assignments were originally developed by Sidney Cadot for the PowerPC
architecture. The accompanying documents, two versions of the assignments and a tutorial on
PowerPC assembler, were also written by Sidney. Jonne Zutt maintained these documents over
the years after Sidney left the university. In 2004, the decision was made to change the target
architecture of the lab course from the somewhat obscure PowerPC to the ubiquitous Intel x86
platform. The lab course environment had to change accordingly: the carefully tweaked commer-
cial IDE/PowerPC emulator running on Microsoft Windows was abandoned in favour of the GNU
assembler and debugger, which are available on nearly every Linux distribution. Because of these
changes, the assignments and the accompanying reference material were rewritten by Denis de
Leeuw Duarte.
In 2005 a new curriculum has been started, which transformed the old lab into a more elaborate
project for Software Technology students, while Media and Knowledge Technology students only
had to do a trimmed down version. The manual was modi ed to match the new requirements by
Bas van der Doorn and Sander Koning.
In 2011 and 2012 the manual was further updated by Mihai Capot a and Alexandru Iosup and
in 2013 and 2014 the the manual received a refresh and expansion by Otto Visser. In 2014 the
decision was made to switch from the x86 to the x86-64 architecture, and the manual was edited by
Elvan Kula to re ect this. Finally, Maarten Sijm polished the manual in 2018, removing multiple
inconsistencies that had survived the previous rounds of editing.
3
Chapter 1
Introduction
1.1 Getting started
Given that this course is given in the rst quarter of the rst year, it is quite likely that you
have zero programming experience. There is however also a reasonably large group amongst you
that seems to have some programming experience. For those, a computer was usually \sketched"
as a machine that executes program statements in a line-by-line fashion, manipulating variables
and performing calculations as it goes along. This simple model allowed you to write your rst
computer programs in Java, Python, or some other language. Of course, the underlying mechanics
of executing programs are a bit more complicated. It is quite likely that you have already en-
countered the limitations of this naive model in the form. of incomprehensible error messages and
seemingly inexplicable program failures. Apparently, it is necessary to acquire a more thoroughly
detailed mental picture of the machine in order to solve many common programming problems.
The goal of this lab course is to ll in these details. Much of this knowledge is to be acquired
through practical assignments, but careful reading is also an important part of the educational
process.
You should read the remainder of this introductory section before starting on the rst assign-
ment in Section 2.1. Take special notice of Section 1.3, which summarises and refreshes the prior
knowledge that we assume on your part. The assignment texts will frequently refer to the reference
(found in Chapter 3), which contains essential information for completing the lab course.
1.2 Lab course rules and etiquette
Teaching assistants will be present during lab course hours to o er you advice and the possibility
to have your work reviewed. Please keep in mind that they are not there to deal with faulty lab
course equipment, problems with your laptop and software, problems with your login account,
disputes about deadlines and lab course rules, possible special exceptions (e.g., due to illness on
your part) or problems with lab course enrolment. We kindly ask you to bypass the teaching
assistants in such matters and to go straight to the proper authorities (see table below).
Lab course equipment: Service point
Login accounts: Service point
Deadlines, rules, exceptions: CSE1400 lab course coordinator (Otto Visser)
Enrolment: Director of education
Keep in mind that the work you hand in during the lab is subject to certain rules and quality
guidelines. The rules and guidelines are listed explicitly for this lab course in Appendix A. Your
are expected to check your work prior to handing it in for review. The lab course assistants will
only consider work that is in full compliance with the stated rules and guidelines.
4
1.2.1 Verifying your work
In order for you to get the credit that you passed a lab exercise, you need to have your work
checked by a teaching assistant. For this, you will have to submit your work to CPM1 and then
enqueue to the digital queue. One of the assistants will then come and visit you and give you the
oppurtunity to explain your work.
1.3 Assumed prior knowledge
This lab course is by no means an introductory course in computer programming. In this subsection
we will brie y describe the level of experience that we assume on your part. More importantly, we
will refresh and summarise the knowledge that you must have before you can start the assignments.
Background knowledge
We assume that you have followed and understood the lectures and that you have studied the
accompanying book and lecture notes in the necessary detail. In particular, we assume that
you know and understand what opcodes, instructions, subroutines, stacks, registers and program
counters are and that you have a general idea of what occurs during the compilation and linking
stage of an executable program. We further assume that you know the di erence between bits,
nibbles and bytes and that you can convert numbers between di erent number representation
systems (hexadecimal, binary, etc.) and we assume that you understand the concept of endianness.
These topics have all been treated during the lectures and instructions.
In this lab course you will learn how to program in the x86-64 assembly language. You should
know that \the x86-64 architecture" is the generic name for the architecture of CPUs found in
garden-variety personal computers2. We assume that you have studied the x86-64 architecture in
your lecture notes. You should know that x86-64 has six 64-bit general purpose registers (RAX,
RBX, RCX, RDX, RDI and RSI) which you can use freely when writing programs. You should know
that the x86-64 has a 64-bit stack pointer register (RSP) which contains the memory address of the
top of the current program stack and a base pointer register (RBP), which is used during subroutine
execution. The purpose of these registers will be explained to you during the assignments. There
is a second set of eight general purpose 64-bit registers named R8-R15.
Finally, you should know what we mean by the Von Neumann architecture, i.e., you should
know that a computer is roughly comprised of three subsystems: a CPU, a random access main
memory and an IO subsystem which is in turn comprised of IO devices such as mice, keyboards,
sound cards, hard disks, etc. You should know that the CPU is capable of executing instructions
which reside in the main memory and that instructions are simply binary codes of varying lengths.
In the next paragraph we refresh your knowledge of some important de nitions regarding computer
languages.
Essential concepts
It is likely that you do not yet have a very clear image in your mind as to what goes on inside
the bowels of your computer when it is executing a program. The main goal of this lab course
is to remedy this situation. To start o , we will eliminate any romantic preconceptions on your
part regarding assembly languages, machine language and high-level programming languages by
carefully restating their de nitions and their respective purposes. Much of this should already be
known to you, but it is essential that you read the following carefully.
Computer memory stores programs as sequences of instructions and data in binary format. To
a computer, there is no essential di erence between instructions and data. Here is a real example
of part of a computer program, as it would look in the main memory of a computer:
1https://cpm.ewi.tudelft.nl
2In the literature you may encounter the terms AMD64 and x64, which are synonyms for x86-64.
5
0 01001000
1 11000111
2 11000000
3 00000001
4 00000000
5 00000000
6 00000000
7 01001000
8 11000111
9 11000001
10 00000001
11 00000000
12 00000000
13 00000000
14 01001000
15 00000001
16 11000001
The line numbers are not part of the program but you could think of them as memory addresses,
as each byte has its own memory address. This type of \zeros and ones" program representation
is called the machine language representation of a program. The machine language representation
is of course the only representation that a computer can understand. Any higher level language
representation of a program (such as a Java or C program) needs to be translated into machine
language at some point, before it can be executed by the hardware. Machine languages are speci c
to the CPU architecture, e.g., there is a PowerPC machine language, an x86-64 machine language,
etc.
The little code snippet above is actual x86-64 machine code written in binary notation. As you
can see, programs take up a large amount of space when written in binary, so we will commonly
write such code in hexadecimal format. As you know, each nibble is represented by one hex digit
so we can rewrite these 17 bytes of code in 34 hex digits:
48 c7 c0 01 00 00 00 # move the number 1 into the RAX register
48 c7 c1 01 00 00 00 # move the number 1 into the RCX register
48 01 c1 # add the contents of RAX to RCX
We have regrouped the bytes in such a way that one instruction ts on one line and we have
added some explanatory comments, which are denoted by the ‘#’ character. Do not be afraid at
this point if you do not understand the code fragment completely, but please do take a close look.
In this piece of code you can see three x86-64 instructions with some operand data. The rst
instruction is the so called mov instruction. It moves a value to a register. In this case, the value
is the number 1 and the register is the x86-64’s RAX register. On the second line you see another
mov instruction, again with operand 1, but this time it moves the value to the RCX register. The
third line shows us the add instruction which sums the contents of RAX and RCX and stores the
result in RCX. As you can see, not all of the instructions in the x86-64 architecture are of the
same length. Usually the actual instruction is one to two bytes long. The mov instruction is only
two bytes long (48 c7) and the add instruction is three bytes long (48 01 c1). The operand data
of the mov instructions is four bytes long. Since x86-64 is a little-endian machine, the four byte
integer 1 is encoded as 01 00 00 00 as you can see. One nal thing to note is that the operand
registers are encoded as c0 and c1.
In ancient times3, programmers had to enter programs into the computer’s memory by means
of punch cards - small pieces of cardboard which contained zeros and ones encoded in the presence
or absence of holes in the cardboard. Obviously, writing computer programs in zeros and ones or
hexadecimal numbers like this was a very cumbersome, error-prone task and in modern times it
3In computing terms, \ancient" is roughly between 1920 and 1950.
6
would be next to impossible. Modern computer programs easily contain around 10 MB of machine
code, which would amount to 83 886 080 zeros and ones or 20 971 520 hex digits. You could imagine
the horror of having to type them by hand. Worse, you could imagine the horror of nding bugs
in such programs! For these reasons, people switched to assemblers in the 1950s. Assemblers are
special computer programs that translate text from a more humanly readable symbolic assembly
language (\assembly" for short) into machine code. In the assembly language, each instruction
code has a short mnemonic or nickname associated with it and each number can be represented
in decimal or hex, instead of in bits. Just as each architecture has its own machine code, each
machine code has its own assembly language. Below, we see the same program snippet as above,
but this time it is written in x86-64 assembly language:
movq $1, %rax # Move the number 1 into the RAX register
movq $1, %rcx # Move the number 1 into the RCX register
addq %rax, %rcx # Add the contents of RAX to RCX
As you can see, our cryptic piece of binary machine code is as simple as 1 + 1! Well, almost. The
most signi cant property of assembly language is that it closely resembles the raw machine code in
structure. If you have a table of opcodes and some knowledge of the elds in an x86-64 instruction
you could easily translate this assembly code directly into machine code - even by hand.
Writing large programs in assembly language still has its drawbacks unfortunately. As pro-
grammers, we still have to deal with millions upon millions of instructions and we still have to
concern ourselves with registers, stacks and memory locations, while we would rather like to focus
on solving our problems. If we want to add two numbers, we would like to tell the computer
something like y = a + b rather than having to explain the operation in register-level detail. In
short, we would like to program computers in a language that translates easily to mathematics or
English rather than machine language. This desire prompted the development of so-called 3GLs4.
You have probably heard of a lot of these 3GLs and in due time you will learn many of them:
C, C++, Java, Pascal, Prolog, Haskell, etc. In order to run a program that is written in a 3GL,
we need to translate it to assembly language rst5. The tools that do this are called compilers.
Unlike assemblers, compilers are amongst the most complex of computer programs in existence.
Compiler technology continues to evolve as it has done for over sixty years since admiral Grace
Hopper wrote the rst compiler in assembly language6. It is often di cult to predict exactly what
instructions a compiler will generate when given a particular snippet of 3GL code. Nonetheless,
below is a line of C/Java code that might cause a compiler to spit out something that looks like
our example fragment:
int x = 1 + 1;
And there it is! As an aside, it is in fact highly unlikely that a compiler will ever generate our
little snippet of assembly code due to optimisations like constant folding. Luckily for you, that is
entirely outside the scope of this lab course.
1.4 Schedule
During the lab course you will learn the basics of writing programs in assembly language. You
will learn how to call functions and what recursion looks like.
Assignment 0 + study of paragraph 3.1.1 (at home)
Learn programming environment and do assignment 1 (4 hours)
4This stands for \third generation language", with machine language being the rst- and assembly being the
second generation.
5We could also use an interpreter, but we ignore that option here.
6Hopper is also renowned for the discovery of the rst \bug": a dead moth in one of the relay switches of the
Mark II calculator.
7
Assignment 2 (4 hours)
Assignment 3 (8 hours)
Assignment 4 (optional) (6 hours)
Assignment 5 (optional) (8 hours)
Assignment 6 (optional) (at least 8 hours)
Assignment 7 (optional) (at least 8 hours)
You will need your time for this lab course, it is not easy. Many students underestimate the
lab, do not start when they should and nd out that they cannot complete it anymore too late.
Do not let this happen to you and make sure you visit every session, so you can talk about your
questions to the assistants.
1.5 Why assembly (still) matters
To round up this introductory chapter, it is time to answer an important and often asked question.
Why is it necessary that you learn how to program in assembly? Is Java or C not much more
convenient? There are two answers to this question. First and foremost, not all programs can be
written in high level languages like C or Java. Contrary to popular lore, new compilers, Virtual
Machines and operating system kernels are not passed down to us from the heavens. Instead, and
with an equal sense of drama, they have to be forged by the hands of engineers of esh and blood
through long toil and serious hardship. Engineers like you. If you, the computer scientists of the
future, do not know how to program in assembly language, who will? And who will port our
kernels to the latest 128-bit CPU or develop the next generation of embedded cellphone software
or the driver for your new video card? In a few years, people will be looking at you to perform
such feats, and you better be prepared.
There is a second, perhaps even more important reason for you to study assembler. In the
words of Donald E. Knuth, one of the most respected minds in our eld:
Expressing basic methods like algorithms for sorting and searching in machine language
makes it possible to carry out meaningful studies of the e ects of cache and RAM size
and other hardware characteristics (memory speed, pipelining, multiple issue, look
aside bu ers, the size of cache blocks, etc.) when comparing di erent schemes.
The point Knuth makes here is that you cannot ever expect to develop proper computer programs
if you do not have a basic understanding of how computers work on the lowest level and of how
programs are represented there. A point that Knuth even fails to mention is that we live in an on-
line world today and that malicious attackers use their knowledge of assembly language to exploit
the programs that you may one day write. Thus, learning something about assembly language is
a lesson that will be of essential value to you whether you aspire to be a kernel hacker, a systems
analyst, a game programmer, a web developer or a theoretical computer scientist. In fact, here is
another priceless quote from Knuth that says it all:
People who are more than casually interested in computers should have at least some
idea of what the underlying hardware is like. Otherwise the programs they write will
be pretty weird.
You should now be mentally prepared to start the assignments. Good luck!
8
Chapter 2
Assignments
Please note: all the assignments marked \extra" will only provide points i 1 you have assignment
1{3 checked (but you do not need to do Assignment 6 before you can do 7; no worries)! It is also
worth noting that you can not get points for doing more than one of the Assignments 4 a{c, you
have to choose one of them.
2.1 Assignment 0: a detailed example
In this assignment you will study the development process and the implementation of a simple,
non-trivial example program. In the subsequent exercises you can borrow ideas from this example
for your own programs, but for now, the most important thing is that you will learn:
what an assembly program looks like
how the basic programming constructs work in assembly (if/else, while, for, etc.)
how to transform. an idea into a good speci cation
how to transform. a speci cation into an assembly program
Since this is an introductory exercise, we will start with a very simple example problem. We are
going to develop a program that prints all prime numbers below 1000. The algorithm that we will
use to solve this problem was developed by an old scholar who went by the name of Eratosthenes
of Cyrene. Eratosthenes lived in northern Africa from 276 b.c. to 194 b.c. and he devised what is
probably the oldest known algorithm for nding prime numbers: the Sieve of Eratosthenes. The
algorithm is not terribly e cient but it is very easy to understand, which is exactly why we use it
here.
We will start by describing the Sieve algorithm in plain English. We will then write the
algorithm down more formally in pseudocode. Finally, we will translate the pseudocode into
working assembly code and we will discuss that code in line by line detail.
Step 1: description of the algorithm
The Sieve algorithm is quite simple. We start by constructing a list of all numbers between 2 and
some upper limit, which in our case is the number 1000:
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 ... 997 998 999 1000
We then take the rst element of the list (the number 2, which is a prime!) and remove all
multiples of this number from the list, except for 2 itself:
1no, this is not a spelling error; see http://en.wikipedia.org/wiki/If_and_only_if
9
2 3 . 5 . 7 . 9 . 11 . 13 . 15 . 17 . 19 . 21 ... 997 . 999 .
We then continue with the next element after 2 which is still in the list. This number is 3, again
a prime number, and we remove all multiples of 3 except 3 itself:
2 3 . 5 . 7 ... 11 . 13 ... 17 . 19 ... 997 ...
We continue this process with the next number in the list (5, again a prime), etc. until we reach
the end of the list. You should be able to convince yourself that the remaining numbers are all
the prime numbers below 1000.
Step 2: speci cation of the algorithm
Now that we are familiar with the basics of Eratosthenes’s Sieve, we will transform. the algorithm
into a formal speci cation. Why do we have to do this? Well, because there are many small,
practical details that we need to consider before we can actually implement the algorithm. One
such problem is the problem of representation: how will we represent the list of numbers in our
program? Will we use a linked list2 structure containing the numbers or is it better to store
the numbers in a list of xed size? What implications will it have for the complexity of our
program if we choose one representation over another? Resolving such questions is part of the
creative challenge of programming, so usually you will have to decide on these matters for yourself.
However, we will always expect you to formalise these decisions in the form. of a good speci cation,
before you start programming. As an example of what we consider to be a good speci cation, we
present the speci cation of our sieve program below.
Instead of maintaining a list of remaining numbers, we will maintain an array of Boolean values
to denote which numbers are still present and which have been crossed out. All entries in this
array are initialised to true, since all numbers are initially present. We remove a number from
the list by setting its corresponding table entry to false. Here is the pseudocode that describes