辅导ECE/CS 472编程、C++程序语言辅导、辅导c/c++编程辅导留学生Processing|辅导Web开发

Final Project
ECE/CS 472/572 Final Exam Project
Late submissions will be penalized 15 points per day.
Remember to check the errata section (at the very bottom of the page) for updates.
Your submission should be comprised of two items: a .pdf file containing your written report and a
.tar file containing a directory structure with your C or C++ source code. Your grade will be reduced if
you do not follow the submission instructions.
All written reports (for both 472 and 572 students) must be composed in MS Word, LaTeX, or some other
word processor and submitted as a PDF file.
Please take the time to read this entire document. If you have questions there is a high likelihood that
another section of the document provides answers.
Introduction
In this final project you will implement a cache simulator. Your simulator will be configurable and will be
able to handle caches with varying capacities, block sizes, levels of associativity, replacement policies,
and write policies. The simulator will operate on trace files that indicate memory access properties. All
input files to your simulator will follow a specific structure so that you can parse the contents and use the
information to set the properties of your simulator.
After execution is finished, your simulator will generate an output file containing information on the
number of cache misses, hits, and miss evictions (i.e. the number of block replacements). In addition,
the file will also record the total number of (simulated) clock cycles used during the situation. Lastly, the
file will indicate how many read and write operations were requested by the CPU.
It is important to note that your simulator is required to make several significant assumptions for the sake
of simplicity.
1. You do not have to simulate the actual data contents. We simply pretend that we copied data from
main memory and keep track of the hypothetical time that would have elapsed.
2. Accessing a sub-portion of a cache block takes the exact same time as it would require to access the
entire block. Imagine that you are working with a cache that uses a 32 byte block size and has an
access time of 15 clock cycles. Reading a 32 byte block from this cache will require 15 clock cycles.
However, the same amount of time is required to read 1 byte from the cache.
3. In this project assume that main memory RAM is always accessed in units of 8 bytes (i.e. 64 bits at a
time).
When accessing main memory, it's expensive to access the first unit. However, DDR memory
typically includes a prefetch buffer which means that the RAM can provide access to the successive
memory (in 8 byte chunks) with minimal overhead. In this project we assume an overhead of 1
additional clock cycle per contiguous unit.
For example, suppose that it costs 255 clock cycles to access the first unit from main memory. Based
on our assumption, it would only cost 257 clock cycles to access 24 bytes of memory.
4. Assume that all caches utilize a "fetch-on-write" scheme if a miss occurs on a Store operation. This
means that you must always fetch a block (i.e. load it) before you can store to that location (if that
block is not already in the cache).
Additional Resources
Sample trace files
Students are required to simulate the instructor-provided trace files (although you are welcome to
simulate your own files in addition).
Trace files are available on Flip in the following directory:

You should test your code with all three tracefiles in that directory (gcc, netpath, and openssl).
Starter Code
In order to help you focus on the implementation of the cache simulator, starter code is provided (written
in C++) to parse the input files and handle some of the file I/O involved in this assignment. You are not
required to use the supplied code (it's up to you). Note that you will need to adapt this code to work with
your particular design.
The starter code is available here:

Basic-Mode Usage (472 & 572 students)
L1 Cache Simulator
?
5/28/2021
Final Project

3/12
All students are expected to implement the L1 cache simulator. Students who are enrolled in 472 can
ignore the sections that are written in brown text. Graduate students will be simulating a multiple-level
cache (an L1 cache, an L2 cache, and even an L3 cache).
Input Information
Your cache simulator will accept two arguments on the command line: the file path of a configuration file
and the file path of a trace file containing a sequence of memory operations. The cache simulator will
generate an output file containing the simulation results. The output filename will have “.out” appended
to the input filename. Additional details are provided below.
# example invocation of cache simulator
cache_sim ./resources/testconfig ./resources/simpletracefile
Output file written to ./resources/simpletracefile.out
The first command line argument will be the path to the configuration file. This file contains information
about the cache design. The file will contain only numeric values, each of which is on a separate line.
Example contents of a configuration file:
1 <-- this line will always contain a "1" for 472 students
230 <-- number of cycles required to write or read a unit from main memory
8 <-- number of sets in cache (will be a non-negative power of 2)
16 <-- block size in bytes (will be a non-negative power of 2)
3 <-- level of associativity (number of blocks per set)
1 <-- replacement policy (will be 0 for random replacement, 1 for LRU)
1 <-- write policy (will be 0 for write-through, 1 for write-back)
13 <-- number of cycles required to read or write a block from the cache (consider this to be the ac
cess time per block)
Here is another example configuration file specifying a direct-mapped cache with 64 entries, a 32 byte
block size, associativity level of 1 (direct-mapped), least recently used (LRU) replacement policy, write-
through operation, 26 cycles to read or write data to the cache, and 1402 cycles to read or write data to
the main memory. CS/ECE472 projects can safely ignore the first line.
1
1402
64
32
1
1
0
26
The second command line argument indicates the path to a trace file. This trace file will follow the format
used by Valgrind (a memory debugging tool). The file consists of comments and memory access
information. Any line beginning with the ‘=’ character should be treated as a comment and ignored.
==This is a comment and can safely be ignored.
==An example snippet of a Valgrind trace file
I 04010173,3
I 04010176,6
?
5/28/2021
Final Project

4/12
S 04222cac,1
I 0401017c,7
L 04222caf,8
I 04010186,6
I 040101fd,7
L 1ffefffd78,8
M 04222ca8,4
I 04010204,4
Memory access entries will use the following format in the trace file:
[space]operation address,size
Lines beginning with an ‘I’ character represent an instruction load. For this assignment, you can
ignore instruction read requests and assume that they are handled by a separate instruction cache.
Lines with a space followed by an ‘S’ indicate a data store operation. This means that data needs to
be written from the CPU into the cache or main memory (possibly both) depending on the write
policy.
Lines with a space followed by an ‘L’ indicate a data load operation. Data is loaded from the cache
into the CPU.
Lines with a space followed by an ‘M’ indicate a data modify operation (which implies a special case
of a data load, followed immediately by a data store).
The address is a 64 bit hexadecimal number representing the address of the first byte that is being
requested. Note that leading 0's are not necessarily shown in the file. The size of the memory operation
is indicated in bytes (as a decimal number).
If you are curious about the trace file, you may generate your own trace file by running Valgrind on
arbitrary executable files:
valgrind --log-fd=1 --log-file=./tracefile.txt --tool=lackey --trace-mem=yes name_of_executable_to_t
race
Cache Simulator Output
Your simulator will write output to a text file. The output filename will be derived from the trace filename
with “.out” appended to the original filename. E.g. if your program was called using the invocation
“cache_sim ./dm_config ./memtrace” then the output file would be written to “./memtrace.out”
(S)tore, (L)oad, and (M)odify operations will each be printed to the output file (in the exact order that they
were read from the Valgrind trace file). Lines beginning with “I” should not appear in the output since they
do not affect the operation of your simulator.
Each line will have a copy of the original trace file instruction. There will then be a space, followed by the
number of cycles used to complete the operation. Lastly, each line will have one or more statements
indicating the impact on the cache. This could be one or more of the following: miss, hit, or eviction.
?
5/28/2021
Final Project

5/12
Note that an eviction is what happens when a cache block needs to be removed in order to make space
in the cache for another block. It is simply a way of indicating that a block was replaced. In our
simulation, an eviction means that the next instruction cannot be executed until after the existing cache
block is written to main memory. An eviction is an expensive cache operation.
It is possible that a single memory access has multiple impacts on the cache. For example, if a particular
cache index is already full, a (M)odify operation might miss the cache, evict an existing block, and then
hit the cache when the result is written to the cache.
The general format of each output line (for 472 students) is as follows (and will contain one or more
cache impacts):
operation address,size L1 <...>
The final lines of the output file are special. They will indicate the total number of hits, misses, and
evictions. The last line will indicate the total number of simulated cycles that were necessary to simulate
the trace file, as well as the total number of read and write operations that were directly requested by the
CPU.
These lines should exactly match the following format (with values given in decimal):
L1 Cache: Hits: Misses: Evictions:
Cycles: Reads:<# of CPU read requests> Writes:<# of CPU write requ
ests>
In order to illustrate the output file format let’s look at an example. Suppose we are simulating a direct-
mapped cache operating in write-through mode. Note that the replacement policy does not have any
effect on the operation of a direct-mapped cache. Assume that the configuration file told us that it takes
13 cycles to access the cache and 230 cycles to access main memory. Keep in mind that a hit during a
load operation only accesses the cache while a miss must access both the cache and the main
memory. For this scenario, assume that memory access is aligned to a single block and does not
straddle multiple cache blocks.
In this example the cache is operating in write-through mode so a standalone (S)tore operation takes
243 cycles, even if it is a hit, because we always write the block into both the cache and into main
memory. If this particular cache was operating in write-back mode, a (S)tore operation would take only
13 cycles if it was a hit (since the block would not be written into main memory until it was evicted).
The exact details of whether an access is a hit or a miss is entirely dependent on the specific cache
design (block size, level of associativity, number of sets, etc). Your program will implement the code to
see if each access is a hit, miss, eviction, or some combination.
Since the (M)odify operation involves a Load operation (immediately followed by a Store operation), it is
recorded twice in the output file. The first instance represents the load operation and the next line will
represent the store operation. See the example below...
?
5/28/2021
Final Project

6/12
==For this example we assume that addresses 04222cac, 04222caf, and 04222ca8 are all in the same blo
ck at index 2
==Assume that addresses 047ef249 and 047ef24d share a block that also falls at index 2.
==Since the cache is direct-mapped, only one of those blocks can be in the cache at a time.
==Fortunately, address 1ffefffd78 happens to fall in a different block index (in our hypothetical ex
ample).
==Side note: For this example a store takes 243 cycles (even if it was a hit) because of the write-t
hrough behavior.
==The output file for our hypothetical example:
S 04222cac,1 486 L1 miss <-- (243 cycles to fetch the block and write it to L1) + (243 cycles to upd
ate the cache & main memory)
L 04222caf,8 13 L1 hit
M 1ffefffd78,8 243 L1 miss <-- notice that this (M)odify has a miss for the load and a hit for the s
tore
M 1ffefffd78,8 243 L1 hit <-- this line represents the Store of the modify operation
M 04222ca8,4 13 L1 hit <-- notice that this (M)odify has two hits (one for the load, one for the sto
re)
M 04222ca8,4 243 L1 hit <-- this line represents the Store of the modify operation
S 047ef249,4 486 L1 miss eviction <-- 486 cycles for miss, no eviction penalty for write-through cac
he
L 04222caf,8 243 L1 miss eviction
M 047ef24d,2 243 L1 miss eviction <-- notice that this (M)odify initially misses, evicts the block,
and then hits
M 047ef24d,2 243 L1 hit <-- this line represents the Store of the modify operation
L 1ffefffd78,8 13 L1 hit
M 047ef249,4 13 L1 hit
M 047ef249,4 243 L1 hit
L1 Cache: Hits:8 Misses:5 Evictions:3
Cycles:2725 Reads:7 Writes:6 <-- total sum of simulated cycles (from above), as well as the number o
f reads and writes
NOTE: The example above is assuming that the cache has a block size of at least 8 bytes. Simulating a
cache with a smaller blocksize would affect the timing and could also lead to multiple evictions in a single
cache access. For example, if the blocksize was only 4 bytes it's possible that an 8 byte load might evict
3 different blocks. This happens because the data might straddle two or more blocks (depending on the
starting memory address).
Implementation Details
You may use either the C or the C++ programming language. Graduate students will have an additional
component to this project.
In our simplified simulator, increasing the level of associativity has no impact on the cache access time.
Furthermore, you may assume that it does not take any additional clock cycles to access non-data bits
such as Valid bits, Tags, Dirty Bits, LRU counters, etc.
Your code must support the LRU replacement scheme and the random replacement scheme. For the
LRU behavior, a block is considered to be the Least Recently Used if every other block in the cache has
been read or written after the block in question. In other words, your simulator must implement a true
LRU scheme, not an approximation.
You must implement the write-through cache mode. You will receive extra credit if your code correctly
supports the write-back cache mode (specified in the configuration file).
?
5/28/2021
Final Project

7/12
Acceptable Compiler Versions
The flip server provides GCC 4.8.5 for compiling your work. Unfortunately, this version is from 2015 and
may not support newer C and C++ features. If you call the program using “gcc” (or “g++”) this is the
version you will be using by default.
If you wish to use a newer compiler version, I have compiled a copy of GCC 11.1 (released April 27,
2021). You may write your code using this compiler and you’re allowed to use any of the compiler
features that are available. The compiler binaries are available in the path:
/nfs/farm/classes/eecs/spring2021/cs472/public/gcc/bin
For example, in order to compile a C++ program with GCC 11.1, you could use the following command
(on a single terminal line):
/nfs/farm/classes/eecs/spring2021/cs472/public/gcc/bin/g++ -ocache_sim -Wl,-rpath,/nfs/farm/classes/
eecs/spring2021/cs472/public/gcc/lib64 my_source_code.cpp
If you use the Makefile that is provided in the starter code, it is already configured to use GCC 11.1.
L2/L3 Cache Implementation (required for CS/ECE 572
students)
Implement your cache simulator so that it can support up to 3 layers of cache. You can imagine that
these caches are connected in a sequence. The CPU will first request information from the L1 cache. If
the data is not available, the request will be forwarded to the L2 cache. If the L2 cache cannot fulfill the
request, it will be passed to the L3 cache. If the L3 cache cannot fulfill the request, it will be fulfilled by
main memory.
It is important that the properties of each cache are read from the provided configuration file. As an
example, it is possible to have a direct-mapped L1 cache that operates in cohort with an associative L2
cache. All of these details will be read from the configuration file. As with any programming project, you
should be sure to test your code across a wide variety of scenarios to minimize the probability of an
undiscovered bug.
Cache Operation
When multiple layers of cache are implemented, the L1 cache will no longer directly access main
memory. Instead, the L1 cache will interact with the L2 cache. During the design process, you need to
consider the various interactions that can occur. For example, if you are working with three write-through
caches, than a single write request from the CPU will update the contents of L1, L2, L3, and main
memory!
?
5/28/2021
Final Project

8/12
++++++++++++ ++++++++++++ ++++++++++++ ++++++++++++ +++++++++++++++
| | | | | | | | | |
| CPU | <----> | L1 Cache | <----> | L2 Cache | <----> | L3 Cache | <----> | Main Memory |
| | | | | | | | | |
++++++++++++ ++++++++++++ ++++++++++++ ++++++++++++ +++++++++++++++
Note that your program should still handle a configuration file that specifies an L1 cache (without any L2
or L3 present). In other words, you can think of your project as a more advanced version of the 472
implementation.
572 Extra Credit
By default, your code is only expected to function with write-through caches. If you want to earn extra
credit, also implement support for write-back caches.
In this situation, you will need to track dirty cache blocks and properly handle the consequences of
evictions. You will earn extra credit if your write-back design works with simple L1 implementations. You
will receive additional extra credit if your code correctly handles multiple layers of write-back caches (e.g.
the L1 and L2 caches are write-back, but L3 is write-through) .
Simulator Operation
Your cache simulator will use a similar implementation as the single-level version but will parse the
configuration file to determine if multiple caches are present.
Input Information
The input configuration file is as shown below. Note that it is backwards compatible with the 472 format.
The exact length of the input configuration file will depend on the number of caches that are specified.
3 <-- this line indicates the number of caches in the simulation (this can be set to a maximum of 3)
230 <-- number of cycles required to write or read a block from main memory
8 <-- number of sets in L1 cache (will be a non-negative power of 2)
16 <-- L1 block size in bytes (will be a non-negative power of 2)
4 <-- L1 level of associativity (number of blocks per set)
1 <-- L1 replacement policy (will be 0 for random replacement, 1 for LRU)
1 <-- L1 write policy (will be 0 for write-through, 1 for write-back)
13 <-- number of cycles required to read or write a block from the L1 cache (consider this to be the
access time)
8 <-- number of sets in L2 cache (will be a non-negative power of 2)
32 <-- L2 block size in bytes (will be a non-negative power of 2)
4 <-- L2 level of associativity (number of blocks per set)
1 <-- L2 replacement policy (will be 0 for random replacement, 1 for LRU)
1 <-- L2 write policy (will be 0 for write-through, 1 for write-back)
40 <-- number of cycles required to read or write a block from the L2 cache (consider this to be the
access time)
64 <-- number of sets in L3 cache (will be a non-negative power of 2)
32 <-- L3 block size in bytes (will be a non-negative power of 2)
8 <-- L3 level of associativity (number of blocks per set)
0 <-- L3 replacement policy (will be 0 for random replacement, 1 for LRU)
0 <-- L3 write policy (will be 0 for write-through, 1 for write-back)
110 <-- number of cycles required to read or write a block from the L3 cache (consider this to be th
e access time)
Cache Simulator Output
?
5/28/2021
Final Project

9/12
The output file will contain nearly the same information as in the single-level version (see the general
description provided in the black text). However, the format is expanded to contain information about
each level of the cache.
The general format of each output line is as follows (and can list up to 2 cache impacts for each level of
the cache):
operation address,size L1 <...> L2