Dynamic Branch Prediction Computer Architecture and Design - Practical 2

Dynamic Branch Prediction

Computer Architecture and Design - Practical 2

Due: 23 November 2023 at 12:00

This handout describes the second practical assignment of the Computer Architecture

and Design course, which represents 50% of the practical component of the course. It

consists of a programming exercise culminating in a brief written report. Assessment of

this practical will be based on the correctness of the code and the clarity of the report

as explained below. The practical is to be solved individually. Please bear in mind

the School of Informatics guidelines on Academic Misconduct. You must submit your

solutions before the due date shown above. Follow the instructions provided in Section 3

for submission details.

In this practical, you are required to explore Branch Prediction techniques using the

Intel Pin simulation tool. You are strongly advised to commence working on the simulator

as soon as possible.

1 Overview

A branch predictor anticipates the outcome of a branch, before it is executed, to

improve the instruction flow in the pipeline. In this practical you will investigate the

accuracy of various branch predictors. Your assignment is to implement and evaluate

three different branch predictors, which are:

1. Local branch predictor

2. Gshare global branch predictor

3. Tournament branch predictor that combines 1 and 2

1.1 The Pin Tool

Pin (Intel [2021]) is a dynamic binary instrumentation framework that enables the

creation of tools for analysing the dynamic behaviour of programs running natively on

an Intel x86 machine. In the context of this assignment we use Pin to create tools

for simulating branch predictors. Pin operates by instrumenting a program binary to

intercept program execution at certain points and make call-backs into the simulation

tool. Pin provides an API that allows you (the tool writer) to run high-level C++

functions in response to annotated runtime events, such as loads, stores, and branch

instructions. A Pin tool is a set of call-back functions that use the Pin API and are

compiled into a dynamic shared library that is linked with Pin when it runs. The tool

specifies what events are of interest to it, and specifies what tool function gets called in

response to each of those events. In this practical, you are given an example Pin tool

that intercepts all conditional branch instructions and provides a call-back function to

simulate the behaviour of an AlwaysTaken branch predictor. When run with a benchmark

program, each conditional branch instruction encountered during program execution will

trigger a call to the branch predictor’s call-back function, allowing it to simulate a branch

predictor processing the entire sequence of conditional branch instructions executed by

the benchmark. Your task is to extend this tool to simulate the three other branch

predictors listed above, and then to use the tool to evaluate those branch predictors on

three standard benchmark programs.

The directions below apply to DICE machines. For marking purposes, the

testing of your code will be performed on a DICE machine. Therefore you

are strongly encouraged to develop your code on a DICE machine.

The Pin tool and the benchmarks are available in a shared directory to which you

have read-only access from a DICE machine. You first step will be to create a working

directory called card_p2 in your home directory1 and then install the required Pin files

in that directory. This can be done with the following commands on a DICE machine:

mkdir -p ✩HOME/card p2

cd ✩HOME/card p2

/group/teaching/card/p2/install.sh

The installer copies various files, creates a soft link to a README file, and creates a shell

script called setup_pin.sh. It then runs that setup script, and tests the installation.

You should see output from the install script similar to the output below:

[minos]s1234567: /group/teaching/card/p2/install.sh

Installing Pin for CARD P2 in /home/s1234567/card_p2 on Sat 7 Oct 15:52:32 BST 2023

1. copying Pin files...OK

2. Testing the installation

- setting up the environment...OK

- compiling the BPExample Pin tool...OK

- running the BPExample Pin tool on the gromacs benchmark...OK

Testing completed successfully

Environment has been set for using Pin

Your P2_WORK directory is /home/s1234567/card_p2

The setup_pin.sh script defines a number of environment variables. For example,

$P2_WORK is the path to your Pin installation directory. See the README file installed in

your $P2_WORK directory for more details on how to run each benchmark. But remember,

that each time you start a new working session with your Pin tool you must source the

Pin setup script using the following command, from within your Pin working directory:

1You can choose a directory name other than card p2 if you wish. You can also install in as many

different directories as you wish. The setup pin.sh file created inside each installation directory will

setup environment variables for working in that directory alone, so if you create multiple installation

directories you must source the setup pin.sh script from the directory in which you want to work.

source setup pin.sh

When Pin is run with the tool created from branch predictor example.cpp it requires

a benchmark program binary, along with its arguments, to be given as the last command line argument after a double-dash --, as shown in the README file. The Pin

tool runs the benchmark, and while running it extracts information about all conditional branches and passes that information (one branch at a time) to the simulated

branch predictor. The predictor implemented in branch predictor example.cpp is

called AlwaysTakenBranchPredictor and (not surprisingly) it predicts that all conditional branches are Taken. Naturally, a good number of them will be Not Taken, so this

example predictor is rather inaccurate.

The Pin tool outputs a set of statistics for the simulated branch predictor to a file

just before it terminates. You can specify the output filename using one of the Pin

command-line arguments outlined below, otherwise it will be BP stats.out by default.

The branch predictor simulator, defined in the branch predictor example.cpp, is located in the BPExample directory within your $P2_WORK directory. The simulator accepts

three command line arguments:

1. The kind of predictor to be simulated, which can be: Always Taken (code provided), Local, Gshare, Tournament. The default predictor is Always Taken. To run

the Always Taken branch predictor, use the command line argument: -BP type

always taken. The other option names are local, gshare, and tournament.

2. The number of entries in the Pattern History Table (PHT) of the predictor, which

defaults to 1024 entries. To run the simulator with, for example, 4096 PHT entries,

use the command line argument: -num BP entries 4096

3. The name of the output file (default BP stats.out) specified with -o .

For example to generate an output file called MyOutput.out, use the command line

argument: -o MyOutput.out

In the path ✩P2 BENCHMARKS there are 3 different benchmark programs (gromacs,

gobmk and sjeng), all from the SPEC2000 benchmark suite. These are pre-compiled

to run on DICE. You must run each of your branch predictor experiments on all 3

benchmarks; that means evaluating each of the the 3 predictors, with 3 different PHT

sizes, on 3 benchmarks – for a total of 27 distinct experiments 2

. The setup_pin.sh

script defines an environment variable for each benchmark containing the path to that

benchmark. These are: ✩GROMACS PATH, ✩GOBMK PATH, and ✩SJENG PATH.

The instructions given in the README text file contain full information about how

exactly to compile and run your branch prediction simulator, and the command line

arguments required for each benchmark. Please follow these instructions carefully.

1.2 Branch Predictor Parameters

The experiments must be run with the following three different sizes for the Pattern

History Tables (PHTs): 128, 1024, and 4096 entries. Note that you must assess all three

PHT sizes for each of the different branch predictors.

2You may find it helpful to write a bash script to iterate through the experiments.

Each PHT entry is a 2-bit saturating counter as explained in the lecture slides. Remember that you are writing a simulator in a high-level language; you are not designing

actual circuits. As such, you may use any data types for the counters (and other program

variables). Regardless of the data type you use, it must behave like a saturating 2-bit

counter.

Local Branch predictor

❼ The Local Branch predictor uses per-branch history to make a prediction, i.e., the

history of each branch is considered independently. The Predictor consists of two

structures:

1. an array of 128 distinct Local History Registers (LHRs), that store the branch

history of different branches

2. the PHT

The 7 least-significant bits (LSBs) of the Program Counter (PC) of the branch

instruction are used to index into the array of LHRs and select the relevant LHR

to be used for the prediction. Each LHR is composed of as many bits as necessary

to index the PHT. For example, if the PHT contains 1024 entries, then each LHR

should be 10 bits long. Each n-bit LHR contains the last n outcomes of all branches

that map to that LHR (i.e. all branch PCs whose 7 LSBs are identical), with 0

denoting not taken and 1 denoting taken. All LHRs should be initialized to 0. The

number of LHRs must be 128 and must stay constant across all experiments (i.e.

128 LHRs is a fixed parameter, independent of PHT size).

❼ The selected LHR is used to index into the PHT, which yields the prediction. All

PHT entries must be initialized to “11” (i.e., 3).

❼ To train the predictor after each prediction, (a) the relevant LHR must be updated

such that it reflects the new history, and (b) the PHT entry used to make the

prediction must be strengthened in case of a correct prediction and weakened in

case of misprediction.

– To strengthen a saturating counter means to increment it if its prediction bit

is ‘1’, and decrement it if the prediction bit is ‘0’.

– To weaken a saturating counter means to decrement it if its prediction bit is

‘1’, and increment it if the prediction bit is ‘0’.

❼ Figure 1 shows a pictorial view of the Local Branch Predictor, with 128 distinct

LHRs, each of which consists of 12 bits of branch history, that is used to index

into a 4096-entry PHT of 2-bit saturating counters. You are recommended to read

Section 4 of McFarling’s paper on combined branch predictors (McFarling [1993])

for more information.

12-bit Local History

Local History

Registers

127

PHT

4095

7 LSBs

Predict

Taken

Figure 1: A Local Branch predictor with 128 Local History Registers and a PHT with

4096 2-bit saturating counters

Gshare Branch Predictor

❼ The Gshare Branch predictor combines a Global History Register (GHR) and the

LSBs of the branch’s PC to index into the PHT. The PC and the GHR are XORed

in order to create the index to the PHT.

❼ The Global History Register is composed of as many bits as necessary to index

the PHT. For example, if the PHT contains 1024 entries, then the GHR should be

10 bits long. These 10 bits contain the outcomes of the last 10 branches, with 0

denoting not taken and 1 denoting taken. The GHR should be initialized to 0. All

PHT entries must be initialized to “11” (i.e., 3).

❼ To train the predictor after each prediction, the GHR must be updated such that

it reflects the new global history and the used PHT entry must be strengthened in

case of a correct prediction and weakened in case of misprediction.

❼ Figure 2 shows a pictorial view of the Gshare Branch Predictor with a GHR of 10

bits that gets XORed with the 10 least-significant bits of the PC of the branch in

order to index into the PHT that contains 1024 entries of 2-bit saturating counters.

You are referred to Section 7 of McFarling [1993] for more information. For the

purpose of this practical, the number of LSBs used from the branch PC should be

equal to the number of bits of the GHR.

PHT

1023

10 LSBs

Predict

Not Taken

10-bit Global History

Global History

XOR

Figure 2: Gshare Branch predictor with a 10-bit GHR and a 1024-entry PHT

Tournament Branch Predictor

❼ The Tournament predictor is composed of three different predictors: the Local

predictor, the Gshare predictor, and a meta-predictor that selects between the Local

or the Gshare predictor to provide the actual prediction of the branch outcome.

❼ The PHT of the meta-predictor is indexed with the LSBs of the Program Counter

(PC) of the branch instruction.

❼ In the meta-predictor, when the prediction bit (i.e. the MSB) of the saturating

counter is 0, the Local predictor is selected to supply the prediction. When the

MSB is 1, the Gshare predictor is selected. All PHT entries must be initialized to

“11” (i.e., 3).

❼ The Tournament predictor will follow a total update policy. Both the Local and

the Gshare predictors will be trained after each branch is resolved. After a correct

prediction the relevant meta-predictor entry must be strengthened. After a misprediction, the relevant meta-predictor entry (the counter) must be weakened, but

only if the predictor that was not chosen was correct. If both the Local and the

Gshare predictors were incorrect, there is no update to the meta-predictor.

❼ Since the underlying behavior of the Local and Gshare predictors is unmodified in

the Tournament scheme, you may choose to use your existing implementations of

Local and Gshare predictors to implement the Tournament predictor.

❼ Figure 3 shows a pictorial view of the Tournament Branch Predictor with a PHT

that contains 1024 entries of 2-bit saturating counters. The 2-bit saturating counter

indicates which predictor should be used (Local or Gshare) for the branch given its

PC.

❼ As before, the experiments must be run with PHTs of 3 different sizes (128, 1024

and 4096 entries). In each experiment the PHTs of the Local Predictor and the

Gshare Predictor must all have the same size. I.e. when the meta-predictor PHT

is 1024 entries, then the Gshare PHT is also 1024 entries and the Local Predictor

PHT is also 1024 entries.

PHT

1023

10 LSBs

Use Local

Predictor

Local

Predictor

Gshare

Predictor

Final

prediction

Figure 3: Tournament Branch predictor with a a 1024-entry PHT

1.3 Simulator infrastructure

The three requested predictors must all be implemented inside the branch predictor

example.cpp file. Inside this file, you must create a class with an appropriate name

for each predictor. The file branch predictor example.cpp includes comments and

guidelines on how you must create your classes and the appropriate member functions.

To help you get started, an implementation of the Always Taken predictor is supplied.

You are advised to start from the code provided for the Always Taken predictor, and

modify it to implement the functionality required for each specific predictor type.

For the purpose of this practical you can ignore the Branch Target Buffer (BTB) (i.e.

the implementation of a BTB is not needed). Hence, your simulated predictors need

only generate a prediction of the outcome of the branch condition (Taken or Not-taken),

without specifying the target address of the branch. Additionally the predictors do not

need to identify whether an instruction is a conditional branch; they can assume that

every instruction is a conditional branch – this is because the code you are working with

passes only conditional branches into the prediction functions.

The contents of the file branch predictor example.cpp should be modified

only as the above guidelines specify, and not in any other way.

2 Marking Scheme

Correctness (80 marks). Includes code functionality and quality for the following 3

predictors:

1. Local Branch Predictor (30 marks)

2. Gshare Global Branch Predictor (30 marks)

3. Tournament Branch Predictor (20 marks)

Report (20 marks). You should write a short report (max 2 pages) on how you have

implemented each predictor and where the predictor code is (source file, line numbers).

In the report you should answer the following questions. Please refrain from using more

than 3 to 4 lines for each question.

❼ Why and how, varying the size of the PHT impacts (or not) the prediction accuracy

for each of the predictors?

❼ What are the advantages and disadvantages of the Local predictor?

❼ What are the advantages and disadvantages of the Gshare predictor?

❼ What benefits did you expect from the addition of the Tournament predictor? Did

the results match your expectations?

Finally your report must contain 3 graphs (one for each benchmark) summarizing

your results by showing the prediction accuracy of all three predictors for the various

PHTs. The x axis should denote the PHT size and the y axis the prediction accuracy as

a percentage. In each of the graphs the performance of all 3 predictors must be plotted.

A sample chart is shown in figure 4.

Figure 4: A sample of the chart that must be submitted in the Report. One such chart

must exist in the report for each benchmark.

3 Format of your submission

Your submission should clearly indicate (in both the report and the code) which

branch prediction techniques you have simulated completely and which you have only

partially completed (if any). Please put comments in the code that you add to make it

easy for the markers to understand. Remember that your simulator will be compiled and

executed on a DICE machine, so you must ensure it works on DICE.

You are required to submit:

1. a .cpp source file containing the implementation of your Branch Predictors, and

2. a .pdf file containing your report.

Note that both files should be named using your student id. For example, a student with

student-id s1234567 would submit s1234567.cpp and s1234567.pdf.

Submission is via the LEARN page for CARD. Navigate to the Gradebook

tab, where you will see an item called Coursework 2 through which you can submit

your report and code for this assignment. You may submit multiple times if you need to

revise your submission, up to the deadline, after which your last submission will be the

one that is marked.

4 Similarity Checking and Academic Misconduct

You must submit your own work. Any code that is not written by you must be

clearly identified and explained through comments at the top of your files. Failure to

do so is plagiarism. Note that you are required to take reasonable measures to protect

your assessed work from unauthorized access. Detailed guidelines on what constitutes

plagiarism and other issues of academic misconduct can be found at:

http://web.inf.ed.ac.uk/infweb/admin/policies/academic-misconduct

All submitted code is checked for similarity with other submissions using software tools

such as MOSS. These tools have been very effective in the past at finding similarities and

are not fooled by name changes and reordering of code blocks.

5 Reporting Problems

Please post questions to Piazza if you have any issues regarding the practical and

cannot find the answer in the README file.

References

Intel. Pin – A Dynamic Binary Instrumentation Tool, 2021. URL

https://software.intel.com/content/www/us/en/develop/articles/

pin-a-dynamic-binary-instrumentation-tool.html.

Scott McFarling. Combining Branch Predictors. Technical Report TN-36, Digital

Equipment Corporation, June 1993. URL https://www.hpl.hp.com/techreports/

Compaq-DEC/WRL-TN-36.pdf.

联系我们

QQ：99515681
邮箱：99515681@qq.com
工作时间：8:00-21:00
微信：codinghelp

热点文章

data程序辅导、讲解 c/c++编程... 2024-05-17
data程序辅导、讲解 python编... 2024-05-17
program讲解、c/c++，python程... 2024-05-17
辅导 math 3333 3.0 - winter ... 2024-05-17
讲解 seng6110 programming as... 2024-05-17
辅导 seng6110 object oriente... 2024-05-17
辅导 comp828: statistical pr... 2024-05-17
讲解 culture and society调试... 2024-05-17
讲解 comp 4911 winter 2024 a... 2024-05-17
讲解 lh physical iiib / 03 3... 2024-05-17
讲解 3032ict big data analyt... 2024-05-17
辅导 comp4702 report辅导留学... 2024-05-17
辅导 fin2020 hw6辅导 c/c++编... 2024-05-17
讲解 civ4100: autonomous veh... 2024-05-17
辅导 feeg6008 advanced photo... 2024-05-17
讲解 acc207 econometrics in ... 2024-05-17
辅导 npsc1003 writing portfo... 2024-05-17
讲解 math 3333 3.0 section a... 2024-05-17
辅导 busn 37103 data-driven ... 2024-05-17
辅导 218.203 measurement pri... 2024-05-17

热点标签

联系我们 - QQ: 99515681 微信：codinghelp

程序辅导网！