辅导 program、讲解 c/c++编程设计
            
                2024.1 Multicore Computing, Project #3
(Due : 11:59pm, May 26)
Submission Rule
1. Create a directory {studentID#}_proj3 (example: 20203601_proj2). In the directory, create 
subdirectories ‘prob1’ and ‘prob2’. 
2.a For problem 1, write (i)’C with OpenMP’ source code prob1.c, and (ii)a document that reports 
the parallel performance of your code into the directory "prob1". Insert the files (i), and (ii) 
into the subdirectory ‘prob1’. 
2.b For problem 2, write (i) ‘C with OpenMP’ source code prob2.c , and (ii) a document that 
reports the parallel performance of your code. Insert (i) and (ii) into the subdirectory ‘prob2’.
2.c For problem 3, insert demo video files (.mp4) into the directory {studentID#}_proj3.
3. zip the directory {studentID#}_proj3 and submit the zip file into eClass homework board.
※ If possible, use quad-core/hexa-core/octa-core CPU (or CPU with more cores) rather than dual-core CPU for 
your experimentation, which will better show the performance gains of the parallelism. 
[Problem 1] In project 1, we looked at the JAVA program that computes the number of ‘prime numbers’ between 1 
and 200000. The parallel implementation of a static approach based on bad work decomposition (i.e. just dividing 
the entire range of the numbers into k consecutive sub-ranges, where k is the number of threads) may not give 
satisfactory performance because (i) higher ranges have fewer primes and (ii) larger numbers are harder (i.e. taking 
longer time) to test whether they are prime or not. Therefore thread workloads may become uneven and hard to 
predict. For better performance, we implemented dynamic load balancing approach in project 1 where each thread 
takes a number one by one and test whether the number is a prime number. 
(i) Write ‘C with OpenMP’ code that computes the number of prime numbers between 1 and 200000. Your program 
should take two command line arguments: scheduling type number (1 = “static with default chunk size”, 2 = 
“dynamic with default chunk size”, 3 = “static with chunk size 10”, 4 = “dynamic with chunk size 10”), and 
number of threads (1, 2, 4, 6, 8, 10, 12, 14, 16) as program input argument. Use schedule(static) , 
schedule(dynamic) , schedule(static, 10) and schedule(dynamic, 10). Your code should print 
the execution time as well as the number of the prime numbers between 1 and 200000. 
command line execution: > a.out scheduling_type# #_of_thread
execution example> a.out 1 8 <---- this means the program use “schedule(static)” using 8 threads.
(ii) Write a document (in PDF file format) that reports the parallel performance of your code. The graph that shows 
the execution time when using 1,2,4,6,8,10,12,14,16 threads. There should be at least four graphs that show the 
result of static and dynamic scheduling policies. The document that reports the parallel performance should contain 
(a) in what environment (e.g. CPU type, memory size, OS type ...) the experimentation was performed, (b) tables 
and graphs that show the execution time (unit:milisecond) for thread number = {1,2,4,6,8,10,12,14,16}. (c) The 
document should also contain explanation on the results and why such results can be obtained.
 
exec time
(unit: ms)
chunk
size
1 2 4 6 8 10 12 14 16
static default
dynamic default
static 10
dynamic 10
performace
(1/exec time)
chunk
size
1 2 4 6 8 10 12 14 16
static default
dynamic default
static 10
dynamic 10[Problem 2] Parallelize prob2.c (see our class webpage project 3 announcement to access prob2.c) using 
OpenMP. Your program should take three command line arguments: scheduling type number (1=static, 2=dynamic, 
3=guided), chunk size, and number of threads as program input argument. Your code should print the execution time 
and the result of PI calculation. Assume the number of steps num_steps = 10000000.
command line execution: > a.out scheduling_type# chunk_size #_of_thread
execution example> a.out 2 4 8 <---- this means dynamic scheduling (chunk size = 4) using 8 threads.
 
(i) submit the OpenMP source code prob2.c
(ii) Write a document (in PDF file format) that reports the parallel performance of your code. Your report should 
contain (a) following tables and graphs that shows information in the tables, and (b) brief explanation and 
interpretation on the results (including why such results can be obtained). 
execution time
(unit:ms)
chunk
size
1 2 4 6 8 10 12 14 16
static
dynamic 1
guided
static
dynamic 5
guided
static
dynamic 10
guided
static
dynamic 100
guided
performace
(1/exec time)
chunk
size
1 2 4 6 8 10 12 14 16
static
dynamic 1
guided
static
dynamic 5
guided
static
dynamic 10
guided
static
dynamic 100
guided
[Problem 3] Create a demo video file (.mp4 format) that shows compilation and execution of your source files 
(prob1.c, prob2.c). The size of the demo video file should be less than 50MB.