代做data程序、c/c++编程设计代写
SP Assessed Exercise 2
-2-
Concurrent Dependency Discoverer
1 Requirement
Large-scale systems developed in C and C++ tend to include a large number of .h files, both of a
system variety (enclosed in < >) and non-system (enclosed in “ ”). The make utility and
Makefiles are a convenient way to record dependencies between source files, and to minimize the
amount of work that is done when the system needs to be rebuilt. Of course, the work will only be
minimized if the Makefile exactly captures the dependencies between source and object files.
Some systems are extremely large, and it is difficult to keep the dependencies in the Makefile
correct as many people make changes at the same time. Therefore, there is a need for a program
that can crawl over source files, noting any #include directives, and recurse through files
specified in #include directives, and finally generate the correct dependency specifications.
#include directives for system files (enclosed in < >) are normally NOT specified in
dependencies. Therefore, our system will focus on generating dependencies between source files
and non-system #include directives (enclosed in “ ”).
2 Specification
For very large software systems, a singly-threaded application to crawl the source files may take a
long time. The purpose of this assessed exercise is to develop a concurrent include file crawler in
C++.
On Moodle you are provided with a sequential C++17 include file crawler
dependencyDiscoverer.cpp. The main() function may take the following arguments:
-Idir indicates a directory to be searched for any include files encountered
file.ext source file to be scanned for #include directives; ext must be c, y, or l
The usage string is: ./dependencyDiscoverer [-Idir] file1.ext [file2.ext …]
The crawler uses the following environment variables when it runs:
CRAWLER_THREADS – if this is defined, it specifies the number of worker threads that the
application must create; if it is not defined, then two (2) worker threads should be created.
CPATH – if this is defined, it contains a list of directories separated by ‘:’; these directories are to be
searched for files specified in #include directives; if it is not defined, then no additional
directories are searched beyond the current directory and any specified by –Idir flags.
NOTE: You can set an environment variable in shell with the following command:
% export CRAWLER_THREADS=3
SP Assessed Exercise 2
-3-
For example, if CPATH is “/home/user/include:/usr/local/group/include” and
if “-Ikernel” is specified on the command line, then when processing
#include “x.h”
x.h will be located by searching for it in the following order:
./x.h
kernel/x.h
/home/user/include/x.h
/usr/local/group/include/x.h
3 Design and Implementation
The key data structures, data flows, and threads in the concurrent version are shown in the figure
below. This is a common leader/worker concurrency pattern. The main thread (leader) places file
names to be processed in the work queue. Worker threads select a file name from the work queue,
scan the file to discover dependencies, add these dependencies to the result Hash Map and, if new,
to the work queue.
It should be possible to adjust the number of worker threads that process the accumulated work
queue in order to speed up the processing. Since the Work Queue and the Hash Map are shared
between threads, you will need to use concurrency control mechanisms to implement thread safe
access.
SP Assessed Exercise 2
-4-
3.1 How to proceed
You are provided with a working, sequential C++ 17 program called dependencyDiscoverer.
Read the extensive comments in dependencyDiscoverer.cpp that explain the design of the
application. Use the documentation at en.cppreference.com to check that you understand how the
standard C++ containers are used in dependencyDiscoverer.cpp.
Build the program with the provided Makefile and you can then test it by running
% cd test
% ../dependencyDiscoverer *.y *.l *.c
This should produce an output identical to the provided output file, so that the following
command should yield no output when the correct output is produced:
% ../dependencyDiscoverer *.y *.l *.c | diff - output
NOTE: The university servers might throw an error saying that C++17 is not available. You need to
use a more recent version of Clang. To obtain it, run the following in the command shell on one of
the stlinux servers (not ssh or sibu):
% source /usr/local/bin/clang9.setup
Start to make the code concurrent by creating new thread-safe Work Queue and Hash Map data
structures that encapsulate the existing C++ standard containers. Create a struct that stores the
container as a member alongside the synchronization utilities and provides a similar interface to the
container, but with appropriate synchronization.
Once you have thread safe data structures, create a single thread to operate on them. Test the
resulting program and keep a working copy in case the next stage goes wrong.
Once the single threaded version works correctly it should be straightforward to obtain the number
of worker threads that should be created from the CRAWLER_THREADS environment variable and create
that many worker threads. A key technical challenge is to design a solution so that the main thread
can determine that all the worker threads have finished (without busy waiting) so it can harvest the
information in the Hash Map.
3.2 Submission Options
As with Assessed Exercise 1, you have the option of submitting a less than complete
implementation of this exercise. Your options are as follows:
1. You may submit a sequential implementation of the crawler; it must use thread-safe data
structures. If you select this option, you are constrained to 50% of the total marks.
2. You may submit an implementation that supports a single worker thread in addition to the
main/manager thread. If you select this option, you are constrained to 75% of the total
marks.
3. You may submit an implementation that completely conforms to the full specification in
Section 2 above. If you select this option, you have access to 100% of the total marks.
The marking scheme is appended to this document.
4 Submission
You will submit your solutions electronically by submitting the following files on Aropa2:
http://aropa2.gla.ac.uk/aropa/
• dependencyDiscoverer.cpp – the source file as described above
• report.pdf – a report in PDF as specified below.
SP Assessed Exercise 2
-5-
NOTE: You are expected to submit only one cpp file. We provide the space to submit a second file
just in case anyone has split their solution over two files. Doing so is not expected.
Your source file must start with an “authorship statement” stating either:
• “This is my own work as defined in the Academic Ethics agreement I have signed.” or
• “This is my own work except that …”, as appropriate.
You must complete an “Own Work” form via https://studentltc.dcs.gla.ac.uk/.
Assignments will be checked for collusion; better to turn in an incomplete solution that is your own
than a copy of someone else’s work. There are very good tools for detecting software plagiarism,
e.g. JPLAG or MOSS.
Do not:
• upload your code to a public repository, like Github, as fellow students may copy it, or
• use code from a public repository.
In either case you may find yourself in a plagiarism investigation.
4.1 Report Contents
Your report should contain the following sections.
1. Status. A brief report outlining the state of your solution, and indicating which of the single
threaded, two threaded or multithreaded solutions you have provided. It is important that
the report is accurate. For example, it is offensive to report that everything works when it
won’t even compile.
2. Build, and sequential (i.e., original) & 1-thread runtimes. A screenshot showing:
(a) the path where you are executing the program (i.e. pwd)
(b) your crawler being compiled either manually by a sequence of commands, or by the
Makefile
(c) the time to run the sequential crawler on all .c, .l and .y files in the test directory.
(d) the time to run your threaded crawler (if you have one) with one thread.
You’ll need to use time to obtain understandable times. We are interested in the real
time, which is the wall clock time to run the program. Remember to pipe the output to a file
to keep the screenshot manageable, e.g.
% time ./dependencyDiscoverer -Itest test/*.c test/*.l test/*.y > temp
real 0m0.030s
user 0m0.007s
sys 0m0.017s
SP Assessed Exercise 2
-6-
3. Runtime with Multiple Threads.
3a Screenshot. A screenshot showing the path where you are executing the program (i.e.
pwd), and the times to run the crawler with 1, 2, 3, 4, 6, and 8 threads on all .c, .l and .y
files in the test directory.
3b. Experiment with your multithreaded program, completing the following table of
elapsed times for 3 executions with different numbers of threads on one of the School
stlinux servers. Compute the median elapsed time. To get reproducible results you
should run on a lightly loaded machine.
3c. Discussion. Briefly describe what you conclude from your experiment about (a) the
benefits of additional cores for this input data (b) the variability of elapsed times.
CRAWLER_
THREADS
1 2 3 4 6 8
Elapsed
Time
Elapsed
Time
Elapsed
Time
Elapsed
Time
Elapsed
Time
Elapsed
Time
Execution 1
Execution 2
Execution 3
Median
5 Marking Scheme
Your submission will be marked on a 100 point scale. There is substantial emphasis on
WORKING submissions, and you will note that a large fraction of the points is reserved for this
aspect. It is to your advantage to ensure that whatever you submit compiles, links, and runs
correctly.
You must be sure that your code works correctly with clang++ on the School stlinux servers,
regardless of which platform you use for development and testing. Leave enough time in your
development to fully test on the servers before submission.
SP Assessed Exercise 2
-7-
Points Description
10 Your report – accurately, clearly and honestly describes the state of your
submission
90 dependencyDiscoverer (+ other classes, if provided)
workable solution (looks like it should work):
16 marks if sequential
25 marks if 1-worker thread
40 marks if multiple workers
4 for successful compilation with no warnings
15 appropriate thread safety implemented for Working Queue and Hash
Map
5 efficient mechanism for determining when worker threads have
finished: 0 if sequential
8 if it works correctly with the files in the test folder and an unseen folder
of files
8 runtime performance with 1 worker on test folder is shown to be similar
to single threaded implementation: 0 if sequential
10 sound experimentation with different numbers of threads, and analysis of
the results: 0 if sequential or 1-worker thread.
Some things should be noted about the marking schemes:
• If your solution does not look workable, then the marks associated with successful
compilation and lack of compilation errors are not available to you.
• The marks associated with “workable solution” are the maximum number of marks that can
be awarded. If only part of the solution looks workable, then you will
be awarded a portion
of the points in that category.