首页 > > 详细

Dynamic Branch Prediction Computer Architecture and Design - Practical 2

 Dynamic Branch Prediction

Computer Architecture and Design - Practical 2
Due: 23 November 2023 at 12:00
This handout describes the second practical assignment of the Computer Architecture
and Design course, which represents 50% of the practical component of the course. It
consists of a programming exercise culminating in a brief written report. Assessment of
this practical will be based on the correctness of the code and the clarity of the report
as explained below. The practical is to be solved individually. Please bear in mind
the School of Informatics guidelines on Academic Misconduct. You must submit your
solutions before the due date shown above. Follow the instructions provided in Section 3
for submission details.
In this practical, you are required to explore Branch Prediction techniques using the
Intel Pin simulation tool. You are strongly advised to commence working on the simulator
as soon as possible.
1 Overview
A branch predictor anticipates the outcome of a branch, before it is executed, to
improve the instruction flow in the pipeline. In this practical you will investigate the
accuracy of various branch predictors. Your assignment is to implement and evaluate
three different branch predictors, which are:
1. Local branch predictor
2. Gshare global branch predictor
3. Tournament branch predictor that combines 1 and 2
1.1 The Pin Tool
Pin (Intel [2021]) is a dynamic binary instrumentation framework that enables the
creation of tools for analysing the dynamic behaviour of programs running natively on
an Intel x86 machine. In the context of this assignment we use Pin to create tools
for simulating branch predictors. Pin operates by instrumenting a program binary to
intercept program execution at certain points and make call-backs into the simulation
tool. Pin provides an API that allows you (the tool writer) to run high-level C++
functions in response to annotated runtime events, such as loads, stores, and branch
instructions. A Pin tool is a set of call-back functions that use the Pin API and are
1
compiled into a dynamic shared library that is linked with Pin when it runs. The tool
specifies what events are of interest to it, and specifies what tool function gets called in
response to each of those events. In this practical, you are given an example Pin tool
that intercepts all conditional branch instructions and provides a call-back function to
simulate the behaviour of an AlwaysTaken branch predictor. When run with a benchmark
program, each conditional branch instruction encountered during program execution will
trigger a call to the branch predictor’s call-back function, allowing it to simulate a branch
predictor processing the entire sequence of conditional branch instructions executed by
the benchmark. Your task is to extend this tool to simulate the three other branch
predictors listed above, and then to use the tool to evaluate those branch predictors on
three standard benchmark programs.
The directions below apply to DICE machines. For marking purposes, the
testing of your code will be performed on a DICE machine. Therefore you
are strongly encouraged to develop your code on a DICE machine.
The Pin tool and the benchmarks are available in a shared directory to which you
have read-only access from a DICE machine. You first step will be to create a working
directory called card_p2 in your home directory1 and then install the required Pin files
in that directory. This can be done with the following commands on a DICE machine:
mkdir -p ✩HOME/card p2
cd ✩HOME/card p2
/group/teaching/card/p2/install.sh
The installer copies various files, creates a soft link to a README file, and creates a shell
script called setup_pin.sh. It then runs that setup script, and tests the installation.
You should see output from the install script similar to the output below:
[minos]s1234567: /group/teaching/card/p2/install.sh
Installing Pin for CARD P2 in /home/s1234567/card_p2 on Sat 7 Oct 15:52:32 BST 2023
1. copying Pin files...OK
2. Testing the installation
- setting up the environment...OK
- compiling the BPExample Pin tool...OK
- running the BPExample Pin tool on the gromacs benchmark...OK
Testing completed successfully
Environment has been set for using Pin
Your P2_WORK directory is /home/s1234567/card_p2
The setup_pin.sh script defines a number of environment variables. For example,
$P2_WORK is the path to your Pin installation directory. See the README file installed in
your $P2_WORK directory for more details on how to run each benchmark. But remember,
that each time you start a new working session with your Pin tool you must source the
Pin setup script using the following command, from within your Pin working directory:
1You can choose a directory name other than card p2 if you wish. You can also install in as many
different directories as you wish. The setup pin.sh file created inside each installation directory will
setup environment variables for working in that directory alone, so if you create multiple installation
directories you must source the setup pin.sh script from the directory in which you want to work.
2
source setup pin.sh
When Pin is run with the tool created from branch predictor example.cpp it requires
a benchmark program binary, along with its arguments, to be given as the last com￾mand line argument after a double-dash --, as shown in the README file. The Pin
tool runs the benchmark, and while running it extracts information about all condi￾tional branches and passes that information (one branch at a time) to the simulated
branch predictor. The predictor implemented in branch predictor example.cpp is
called AlwaysTakenBranchPredictor and (not surprisingly) it predicts that all condi￾tional branches are Taken. Naturally, a good number of them will be Not Taken, so this
example predictor is rather inaccurate.
The Pin tool outputs a set of statistics for the simulated branch predictor to a file
just before it terminates. You can specify the output filename using one of the Pin
command-line arguments outlined below, otherwise it will be BP stats.out by default.
The branch predictor simulator, defined in the branch predictor example.cpp, is lo￾cated in the BPExample directory within your $P2_WORK directory. The simulator accepts
three command line arguments:
1. The kind of predictor to be simulated, which can be: Always Taken (code pro￾vided), Local, Gshare, Tournament. The default predictor is Always Taken. To run
the Always Taken branch predictor, use the command line argument: -BP type
always taken. The other option names are local, gshare, and tournament.
2. The number of entries in the Pattern History Table (PHT) of the predictor, which
defaults to 1024 entries. To run the simulator with, for example, 4096 PHT entries,
use the command line argument: -num BP entries 4096
3. The name of the output file (default BP stats.out) specified with -o .
For example to generate an output file called MyOutput.out, use the command line
argument: -o MyOutput.out
In the path ✩P2 BENCHMARKS there are 3 different benchmark programs (gromacs,
gobmk and sjeng), all from the SPEC2000 benchmark suite. These are pre-compiled
to run on DICE. You must run each of your branch predictor experiments on all 3
benchmarks; that means evaluating each of the the 3 predictors, with 3 different PHT
sizes, on 3 benchmarks – for a total of 27 distinct experiments 2
. The setup_pin.sh
script defines an environment variable for each benchmark containing the path to that
benchmark. These are: ✩GROMACS PATH, ✩GOBMK PATH, and ✩SJENG PATH.
The instructions given in the README text file contain full information about how
exactly to compile and run your branch prediction simulator, and the command line
arguments required for each benchmark. Please follow these instructions carefully.
1.2 Branch Predictor Parameters
The experiments must be run with the following three different sizes for the Pattern
History Tables (PHTs): 128, 1024, and 4096 entries. Note that you must assess all three
PHT sizes for each of the different branch predictors.
2You may find it helpful to write a bash script to iterate through the experiments.
3
Each PHT entry is a 2-bit saturating counter as explained in the lecture slides. Re￾member that you are writing a simulator in a high-level language; you are not designing
actual circuits. As such, you may use any data types for the counters (and other program
variables). Regardless of the data type you use, it must behave like a saturating 2-bit
counter.
Local Branch predictor
❼ The Local Branch predictor uses per-branch history to make a prediction, i.e., the
history of each branch is considered independently. The Predictor consists of two
structures:
1. an array of 128 distinct Local History Registers (LHRs), that store the branch
history of different branches
2. the PHT
The 7 least-significant bits (LSBs) of the Program Counter (PC) of the branch
instruction are used to index into the array of LHRs and select the relevant LHR
to be used for the prediction. Each LHR is composed of as many bits as necessary
to index the PHT. For example, if the PHT contains 1024 entries, then each LHR
should be 10 bits long. Each n-bit LHR contains the last n outcomes of all branches
that map to that LHR (i.e. all branch PCs whose 7 LSBs are identical), with 0
denoting not taken and 1 denoting taken. All LHRs should be initialized to 0. The
number of LHRs must be 128 and must stay constant across all experiments (i.e.
128 LHRs is a fixed parameter, independent of PHT size).
❼ The selected LHR is used to index into the PHT, which yields the prediction. All
PHT entries must be initialized to “11” (i.e., 3).
❼ To train the predictor after each prediction, (a) the relevant LHR must be updated
such that it reflects the new history, and (b) the PHT entry used to make the
prediction must be strengthened in case of a correct prediction and weakened in
case of misprediction.
– To strengthen a saturating counter means to increment it if its prediction bit
is ‘1’, and decrement it if the prediction bit is ‘0’.
– To weaken a saturating counter means to decrement it if its prediction bit is
‘1’, and increment it if the prediction bit is ‘0’.
❼ Figure 1 shows a pictorial view of the Local Branch Predictor, with 128 distinct
LHRs, each of which consists of 12 bits of branch history, that is used to index
into a 4096-entry PHT of 2-bit saturating counters. You are recommended to read
Section 4 of McFarling’s paper on combined branch predictors (McFarling [1993])
for more information.
4
12-bit Local History
Local History
Registers
0
127
PHT
0
4095
10
PC
7 LSBs
Predict 
Taken 
Figure 1: A Local Branch predictor with 128 Local History Registers and a PHT with
4096 2-bit saturating counters
Gshare Branch Predictor
❼ The Gshare Branch predictor combines a Global History Register (GHR) and the
LSBs of the branch’s PC to index into the PHT. The PC and the GHR are XORed
in order to create the index to the PHT.
❼ The Global History Register is composed of as many bits as necessary to index
the PHT. For example, if the PHT contains 1024 entries, then the GHR should be
10 bits long. These 10 bits contain the outcomes of the last 10 branches, with 0
denoting not taken and 1 denoting taken. The GHR should be initialized to 0. All
PHT entries must be initialized to “11” (i.e., 3).
❼ To train the predictor after each prediction, the GHR must be updated such that
it reflects the new global history and the used PHT entry must be strengthened in
case of a correct prediction and weakened in case of misprediction.
❼ Figure 2 shows a pictorial view of the Gshare Branch Predictor with a GHR of 10
bits that gets XORed with the 10 least-significant bits of the PC of the branch in
order to index into the PHT that contains 1024 entries of 2-bit saturating counters.
You are referred to Section 7 of McFarling [1993] for more information. For the
purpose of this practical, the number of LSBs used from the branch PC should be
equal to the number of bits of the GHR.
PHT
0
1023
01
PC
10 LSBs
Predict 
Not Taken 
10-bit Global History
Global History 
Register
XOR
Figure 2: Gshare Branch predictor with a 10-bit GHR and a 1024-entry PHT
Tournament Branch Predictor
❼ The Tournament predictor is composed of three different predictors: the Local
predictor, the Gshare predictor, and a meta-predictor that selects between the Local
5
or the Gshare predictor to provide the actual prediction of the branch outcome.
❼ The PHT of the meta-predictor is indexed with the LSBs of the Program Counter
(PC) of the branch instruction.
❼ In the meta-predictor, when the prediction bit (i.e. the MSB) of the saturating
counter is 0, the Local predictor is selected to supply the prediction. When the
MSB is 1, the Gshare predictor is selected. All PHT entries must be initialized to
“11” (i.e., 3).
❼ The Tournament predictor will follow a total update policy. Both the Local and
the Gshare predictors will be trained after each branch is resolved. After a correct
prediction the relevant meta-predictor entry must be strengthened. After a mis￾prediction, the relevant meta-predictor entry (the counter) must be weakened, but
only if the predictor that was not chosen was correct. If both the Local and the
Gshare predictors were incorrect, there is no update to the meta-predictor.
❼ Since the underlying behavior of the Local and Gshare predictors is unmodified in
the Tournament scheme, you may choose to use your existing implementations of
Local and Gshare predictors to implement the Tournament predictor.
❼ Figure 3 shows a pictorial view of the Tournament Branch Predictor with a PHT
that contains 1024 entries of 2-bit saturating counters. The 2-bit saturating counter
indicates which predictor should be used (Local or Gshare) for the branch given its
PC.
❼ As before, the experiments must be run with PHTs of 3 different sizes (128, 1024
and 4096 entries). In each experiment the PHTs of the Local Predictor and the
Gshare Predictor must all have the same size. I.e. when the meta-predictor PHT
is 1024 entries, then the Gshare PHT is also 1024 entries and the Local Predictor
PHT is also 1024 entries.
PHT
0
1023
01
PC
10 LSBs
Use Local
Predictor 
Local
Predictor
Gshare
Predictor
Final 
prediction
Figure 3: Tournament Branch predictor with a a 1024-entry PHT
1.3 Simulator infrastructure
The three requested predictors must all be implemented inside the branch predictor
example.cpp file. Inside this file, you must create a class with an appropriate name
for each predictor. The file branch predictor example.cpp includes comments and
6
guidelines on how you must create your classes and the appropriate member functions.
To help you get started, an implementation of the Always Taken predictor is supplied.
You are advised to start from the code provided for the Always Taken predictor, and
modify it to implement the functionality required for each specific predictor type.
For the purpose of this practical you can ignore the Branch Target Buffer (BTB) (i.e.
the implementation of a BTB is not needed). Hence, your simulated predictors need
only generate a prediction of the outcome of the branch condition (Taken or Not-taken),
without specifying the target address of the branch. Additionally the predictors do not
need to identify whether an instruction is a conditional branch; they can assume that
every instruction is a conditional branch – this is because the code you are working with
passes only conditional branches into the prediction functions.
The contents of the file branch predictor example.cpp should be modified
only as the above guidelines specify, and not in any other way.
2 Marking Scheme
Correctness (80 marks). Includes code functionality and quality for the following 3
predictors:
1. Local Branch Predictor (30 marks)
2. Gshare Global Branch Predictor (30 marks)
3. Tournament Branch Predictor (20 marks)
Report (20 marks). You should write a short report (max 2 pages) on how you have
implemented each predictor and where the predictor code is (source file, line numbers).
In the report you should answer the following questions. Please refrain from using more
than 3 to 4 lines for each question.
❼ Why and how, varying the size of the PHT impacts (or not) the prediction accuracy
for each of the predictors?
❼ What are the advantages and disadvantages of the Local predictor?
❼ What are the advantages and disadvantages of the Gshare predictor?
❼ What benefits did you expect from the addition of the Tournament predictor? Did
the results match your expectations?
Finally your report must contain 3 graphs (one for each benchmark) summarizing
your results by showing the prediction accuracy of all three predictors for the various
PHTs. The x axis should denote the PHT size and the y axis the prediction accuracy as
a percentage. In each of the graphs the performance of all 3 predictors must be plotted.
A sample chart is shown in figure 4.
7
Figure 4: A sample of the chart that must be submitted in the Report. One such chart
must exist in the report for each benchmark.
3 Format of your submission
Your submission should clearly indicate (in both the report and the code) which
branch prediction techniques you have simulated completely and which you have only
partially completed (if any). Please put comments in the code that you add to make it
easy for the markers to understand. Remember that your simulator will be compiled and
executed on a DICE machine, so you must ensure it works on DICE.
You are required to submit:
1. a .cpp source file containing the implementation of your Branch Predictors, and
2. a .pdf file containing your report.
Note that both files should be named using your student id. For example, a student with
student-id s1234567 would submit s1234567.cpp and s1234567.pdf.
Submission is via the LEARN page for CARD. Navigate to the Gradebook
tab, where you will see an item called Coursework 2 through which you can submit
your report and code for this assignment. You may submit multiple times if you need to
revise your submission, up to the deadline, after which your last submission will be the
one that is marked.
4 Similarity Checking and Academic Misconduct
You must submit your own work. Any code that is not written by you must be
clearly identified and explained through comments at the top of your files. Failure to
do so is plagiarism. Note that you are required to take reasonable measures to protect
your assessed work from unauthorized access. Detailed guidelines on what constitutes
plagiarism and other issues of academic misconduct can be found at:
http://web.inf.ed.ac.uk/infweb/admin/policies/academic-misconduct
All submitted code is checked for similarity with other submissions using software tools
such as MOSS. These tools have been very effective in the past at finding similarities and
are not fooled by name changes and reordering of code blocks.
8
5 Reporting Problems
Please post questions to Piazza if you have any issues regarding the practical and
cannot find the answer in the README file.
References
Intel. Pin – A Dynamic Binary Instrumentation Tool, 2021. URL
https://software.intel.com/content/www/us/en/develop/articles/
pin-a-dynamic-binary-instrumentation-tool.html.
Scott McFarling. Combining Branch Predictors. Technical Report TN-36, Digital
Equipment Corporation, June 1993. URL https://www.hpl.hp.com/techreports/
Compaq-DEC/WRL-TN-36.pdf.
9
联系我们
  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-21:00
  • 微信:codinghelp
热点标签

联系我们 - QQ: 99515681 微信:codinghelp
程序辅导网!