首页 >
> 详细

DFT Parallelization in OpenMP

The FFT is the most famous library to solve a Fourier Transform (FT) using

the Fast Fourier Transform algorithm. The goal of this exercise is to develop a

fast DFT (Discrete Fourier Transform, a different algorithm to calculate FT) by

parallelizing the serial C code that is provided as DFT_seq.c.

The code calculates direct and inverse DFTs, timing the total time taken for the

computation of the two DFTs. The computational core of the program is the

DFT subroutine that takes idft argument as 1 for the direct DFT and -1 for the

inverse DFT.

The code also prints the value of the first element of the DFT computation. This

is equal to the input size (N). You can use this number to quickly check the

correctness of your implementation.

You will focus on parallelizing the DFT function:

int DFT(int idft, double *xr, double *xi, double *Xr_o, double *Xi_o, int N)

{

// Region 1: main DFT function

for (int k = 0; k < N; k++)

{

for (int n = 0; n < N; n++)

{

// Real part of X[k]

Xr_o[k] += xr[n] * cos(n * k * PI2 / N) + idft * xi[n] * sin(n*k*PI2/N);

// Imaginary part of X[k]

Xi_o[k] += -idft * xr[n] * sin(n * k * PI2 / N) + xi[n] * cos(n*k*PI2/N);

}

}

// Region 2: Normalizing IDFT

if (idft == -1)

{

for (int n = 0; n < N; n++)

{

Xr_o[n] /= N;

Xi_o[n] /= N;

}

}

return 1;

}

In this exercise, N=10000. Please provide a comprehensive report to answer the

following questions. You should plot figures to show your testing results in the

report.

Questions:

1

1. (1 point) Using OpenMP to parallelize the given sequential code in

DFT_seq.c . The goal is to achieve the best possible performance for DFT

with the correct result and no false sharing. Note that the scheduling policy

and where to parallelize will affect your final code performance. Your plot

should reflect the execution time for the runs on 1, 2, 4, 8 and 16 threads.

The performance measurement should be an average of 5 repeated runs.

Note that you need to parallelize all the code regions that are necessary

for performance improvement.

2. (1 point) Zoom into Region 1, which is the main DFT function nested

loops, please write the two OpenMP parallelization versions for outer and

inner loops in the report and discuss which design choice is better for

performance and why. After this, using the one design above that has

the better performance as the baseline, please discuss how the scheduling

policy choices (default, static and dynamic) may impact its final execution

time. Plot them out in a figure to reflect the runs on 1, 2, 4, 8 and 16

threads. Note that please design your scheduling policy that avoids false

sharing. Also, it is possible that they don’t present significant performance

differences. You have to summarize your findings.

联系我们

- QQ：99515681
- 邮箱：99515681@qq.com
- 工作时间：8:00-21:00
- 微信：codinghelp

- 代写 Lab 2: Threads 2022-05-10
- 辅导assessment 1. Present Your Client ... 2022-05-10
- 5Cce2sas辅导、Python，Java程序辅导 2022-05-10
- 代写brae Webb编程 2022-05-09
- 辅导csci 3110 Assignment 1 2022-05-09
- Mth2222 Assignment 2代写 2022-05-09
- Cse3bdc Assignment 2022辅导 2022-05-08
- 辅导cis 468、辅导java，Python编程 2022-05-08
- Comp Sci 4094/4194/7094 Assignment 3 D... 2022-05-07
- Cs 178: Machine Learning & Data Mining... 2022-05-07
- Data7703 Assignment 4 2022-05-07
- 讲解assignment 2: Databases 2022-04-25
- 辅导ait681 Static Analysis 2022-04-25
- Cse121 & Cse121l 编程辅导、辅导c++程序语言 2022-04-25
- 辅导iti1120 Bject-Oriented Programming 2022-04-25
- Cmt304语言辅导、辅导c++，Python编程 2022-04-25
- 辅导comp/Engn4528 Computer Vision 2022-04-24
- 辅导fin 2200 Bloomberg Investment Proj... 2022-04-24
- 辅导bism 7255 Uml Assignment 2022-04-23
- 讲解comp202 Programming Assignment 2022-04-23