首页 > > 详细

辅导KIT308/408、Programming讲解、讲解Java,c++,Python程序语言讲解R语言编程|讲解留学生Prolog

KIT308/408 Multicore Architecture and Programming
Assignment 1 — Multithreading
Aims of the assignment
The purpose of this assignment is to give you experience at writing a multithreaded program for the CPU. This assignment will
give you an opportunity to demonstrate your understanding of:
creating and handling multiple threads;
partitioning work;
dynamic scheduling; and
data-sharing between threads.
Due Date
11:55pm Friday 21st of August (Week 6 of semester)
Late assignments will only be accepted in exceptional circumstances and provided that the proper procedures have been
followed (see the School Office or this link for details) assignments which are submitted late without good reason will be
subject to mark penalties if they are accepted at all (see the School office or this link for details on this as well).
Forms to request extensions of time to submit assignments are available from the School of Engineering and ICT office.
Requests must be accompanied by suitable documentation and should be submitted before the assignment due date.
Due to the tight schedule of this unit (and the reliance of future assignments on a solution to this one), extensions will be unable
to be granted and late assignments will not be accepted. If exception circumstances occur for an individual student they must
contact the lecture-in-charge before the assignment due date to arrange alternative methods of assessment in this unit.
Assignment Submission
Your assignment is to be submitted electronically via MyLO and should contain:
An assignment cover sheet;
A .zip (or .rar) containing a Visual Studio Solution containing a project for each attempted stage of the assignment (in the
format provided in the downloadable materials provided below).
A document containing:
A table of timing information comparing the original single-threaded times against each of three stages on each
scene file.
An analysis of the above timing data.
You do not need to (and shouldn't) submit executables, temporary object files, or images. In particular, you
must delete the ".vs" diretory before submission as it just Visual Studio temporary files and 100s of MBs. Do not however
delete the "Scenes" folder or the "Outputs" folder (but do delete the images within this one).
Task/Topic Marks
1. Basic Multithreaded CPU Implementation 30%
Correct (5%) and elegant (5%) thread management
(i.e. the correct image for any number of threads (even >64 threads)
10%
Correct use of "-thread" command-line argument 5%
Successful use of "-testMode" and "-colourise" command-line arguments 5%
Correct partitioning of render for images with height equally divisible by thread count
(i.e. the correct image, with exactly equal work area for each thread, and with the correct number of samples
rendered for: Tests 1, 2, 4, 5, and 7 with 8 threads)
5%
Correct partitioning of render for images with height NOT equally divisible by thread count
(i.e. the correct image, with work area allocated "evenly" across threads, and with the correct number of
samples rendered for: Tests 3 and 6 with 8 threads, and also Test 1 with 3, 5, 6, 7, and 513 threads)
5%
2. Dynamically Balanced Line-Based Multithreaded CPU Implementation 20%
Correct (5%) and elegant (5%) thread management 10%
Correct (5%) and elegant (5%) data-sharing between threads
(i.e. no possibility of threads "fighting" over shared memory locations)
10%
3. Dynamically Balanced Square-Based Multithreaded CPU Implementation 30%
Correct and elegant thread management 5%
Correct and elegant data-sharing between threads 5%
Correct use of "-blockSize" command-line argument 5%
Correct partitioning of render into squares for images with size that is equally divisible by block size
(i.e. the correct image, with exactly equal work area for each block, rendered for: Tests 1, 2, 5, and 7 with
block size 8 and 64)
5%
Correct partitioning of render into squares for images with size that is NOT equally divisible by block size
(i.e. the correct image, with work area for each block either at block size or smaller at edges, rendered for:
Tests 3, 4, and 6 with block size 8 and 64)
5%
Correct partitioning of render with no extra samples generated
(i.e. the correct image, with work area for each block either at block size or smaller at edges, and with the
correct number of samples rendered for: Tests 3, 4, and 6 with block size 8 and 64)
5%
Documentation 20%
Outputs showing timing information for each stage on all applicable scene files with all thread counts
(to get full marks for this part your code needs to pass all tests for all stages above)
10%
Analysis of data with respect to CPU used
(to get full marks for this part your code needs to pass all tests for all stages above)
10%
Penalties
Failure to comply with submission instructions (eg. no cover sheet, incorrect submission of files, submission of
unnecessary temporary or image files, abnormal solution/project structure, etc.)
-10%
Poor programming style (eg. insufficient / poor comments, poor variable names, poor indenting, obfuscated
code without documentation, compiler warnings, etc.)
up to -20%
Lateness (-20% for up to 24 hours, -50% for up to 7 days, -100% after 7 days)
Late assignments will not be accepted
up to -100%
Marking
This assignment will be marked out of 100. The following is the breakdown of marks:
Programming Style
This assignment is not focussed on programming style, but you should endeavour to follow good programming practices. You
should, for example:
comment your code;
use sensible variables names;
use correct and consistent indenting; and
internally document (with comments) any notable design decisions.
[NOTE: any examples in the provided assignment materials that don't live up to the above criteria, should be considered to be
deliberate examples of what not to do and are provided to aid your learning ;P]
The Assignment Task
You are to implement a "simple" raytracer in a multithreaded fashion (for a quick definition of raytracing see the wikipedia page).
From the provided single-threaded raytracer implementation, you will create multiple subsequent versions as follows:
1. A basic multithreaded CPU implementation.
2. A dynamically balanced line-based multithreaded CPU implementation.
3. A dynamically balanced square-based multithreaded CPU implementation.
Implementation
1. Basic Multithreaded CPU Implementation
This stage involves splitting up the render across multiple threads by splitting up the render into equal sized chunks (or almost
equal sized chunks when the number of threads doesn't divide evenly into the image height) with each chunk being rendered
entirely by an individual thread.
In order to complete this step you will need to:
Write code to partition the rendering job into chunks (so that each thread generates a different part of the final image).
Write code for the to create and manage multiple threads.
At the end of this stage the program should be able to handle any image size (any dimensions that are multiples of four, at least,
up to the maximum of 2048x2048).
To be eligible for full marks, your assignment should also use the command-line option "-threads" to set the number of threads to
any value that is less than or equal to the height of the image being rendered and also respect the "-colourise" option to tint each
section of the image with a set of colours (colour re-use is expected for a large number of threads).
2. Dynamically Balanced Line-Based Multithreaded CPU Implementation
This stage involves splitting up the render into lines which are dynamically allocated to threads from the thread pool as they
request more work.
In order to complete this step you will need to:
Start as many threads as required (specified from the command-line arguments) for the thread pool.
Manage some shared data between the threads so that each do the correct work.
3. Dynamically Balanced Square-Based Multithreaded CPU Implementation
This stage involves splitting up the render into squares which are dynamically allocated to thread from the thread pool as they
request more work.
In order to complete this step you will need to:
Partition the render into squares (requires more carefulness when writing results to the image).
NOTE: this implementation is (slighty) harder than Stage 2, but might not result in an increase in performance.
Hints / Tips
The techniques required to complete each stage rely heavily on work done in the tutorials 2 and 3 — refer to them often.
Documentation
For each stage of the assignment you attempt you should provide:
average (of 3 runs) timing information for each scene file for the total time taken for a render for particular thread counts.
See the next section for an example format for this timing table that specifies which tests are required for which scenes;
and
an explanation of the results (e.g. why there's no difference between the performance of stages 1, 2, and 3 (NOTE: this is
a made up example and isn't necessarily what to expect), or why a particular type of implementation works well (or
poorly) on a particular scene, etc.). This explanation should be with respect to the CPU on the system on which you ran the
tests, and you should discuss how the architectural features of the CPU explain the results.
Tests / Timing
The following table lists all the tests that your code needs to generate correctly at each stage. It also shows the timing tests that
need to be performed in order to fully complete the documentation section of the assignment.
In order to confirm your images match the images created by the base version of the assignment code, it's strongly
recommended you use a image comparison tool. For part of the marking for this, Image Magick will be used:
Image Magick: download for Windows/Mac/Linux.
e.g. running this from the command-line to test:
magick.exe compare -metric mae Outputs\cornell.txt_1024x1024x1_RayTracerAss1.exe.bmp
Outputs_REFERENCE\cornell.txt_1024x1024x4_RayTracerAss1.exe.bmp diff.bmp
Test
Average Time (Milliseconds)
Single
Threaded Multithreaded Technique / (Stage)
Threads
1 2 3 4 5 6 7 8
1. -input
Scenes/cornell.txt -size
1024 1024 -samples 1
Static Chunks (Stage 1) X X X X X X X
Dynamic Lines (Stage 2) X X X X X X X
Dynamic Squares (Stage 3) -- blocksize 8 X X X X X X X
Dynamic Squares (Stage 3) -- blocksize 64 X X X X X X X
2. -input
Scenes/cornell.txt -size
1024 1024 -samples 4
Static Chunks (Stage 1) X X X X X X X
Dynamic Lines (Stage 2) X X X X X X X
Dynamic Squares (Stage 3) -- blocksize 8 X X X X X X X
Dynamic Squares (Stage 3) -- blocksize 64 X X X X X X X
3. -input
Scenes/cornell.txt -size
500 300 -samples 1
Static Chunks (Stage 1) X X X X X X X
Dynamic Lines (Stage 2) X X X X X X X
Dynamic Squares (Stage 3) -- blocksize 8 X X X X X X X
Dynamic Squares (Stage 3) -- blocksize 64 X X X X X X X
4. -input
Scenes/allmaterials.txt
-size 1000 1000 -
samples 1
Static Chunks (Stage 1) X X X X X X X
Dynamic Lines (Stage 2) X X X X X X X
Dynamic Squares (Stage 3) -- blocksize 8 X X X X X X X
Dynamic Squares (Stage 3) -- blocksize 64 X X X X X X X
5. -input Scenes/cornell-
199lights.txt -size 256
256 -samples 1
Static Chunks (Stage 1) X X X X X X X
Dynamic Lines (Stage 2) X X X X X X X
Dynamic Squares (Stage 3) -- blocksize 8 X X X X X X X
Dynamic Squares (Stage 3) -- blocksize 64 X X X X X X X
6. -input
Scenes/5000spheres.txt
-size 480 270 -samples
1
Static Chunks (Stage 1) X X X X X X X
Dynamic Lines (Stage 2) X X X X X X X
Dynamic Squares (Stage 3) -- blocksize 8 X X X X X X X
Dynamic Squares (Stage 3) -- blocksize 64 X X X X X X X
7. -input Scenes/dudes.txt
-size 256 256 -samples
1
Static Chunks (Stage 1)
Dynamic Lines (Stage 2)
Dynamic Squares (Stage 3) -- blocksize 8
Dynamic Squares (Stage 3) -- blocksize 64
This produces an image ("diff.bmp") showing the differences between the two source images, and also a numeric summary
of the difference, here these images are 0.13266% different):
0.33827 (0.0013266)
(this example deliberately compares the 1x1 sampled to the 4x4 sampled image to show there is a difference).
Note: to fully complete the empty cells in the table below, around 180 total images will need be calculated (each test needs 3
runs, and the first six tests need to be run with the base code, and with 8 threads for each stage of the assignment, and test 7
needs to be run for 1–8 threads for stage 1, stage 2, and stage 3 with two different block sizes), so plan your time accordingly.
Provided Materials
The materials provided with this assignment contain:
the source code of the base single-threaded version of the raytracer;
a set of scene files to be supplied to the program;
a set of reference output files created from the single-threaded version of the program;
four batch files that will run all 7 timing tests for the base code, and each of the three stages; and
a further set of five batch files that will run comparison tests (assuming Image Magick is installed in its default location) for
each of the three stages.
Download the materials as an ZIP file.
Source Code
The provided code consists of 19 source files.
Raytracing logic:
Raytrace.cpp: this file contains the main function which reads the supplied scene file, begins the raytracing, and
writes the output BMP file. The main render loop, ray trace function, and handling of reflection and refraction is also
in this file.
Intersection.h and Intersection.cpp: these files define a datastructure for capturing relevant information at the point
of intersection between a ray and a scene object and functions for testing for individual ray-object collisions and rayscene
collisions.
Lighting.h and Lighting.cpp: these files provide functions to apply a lighting calculation at a single intersection point.
Texturing.h and Texturing.cpp: these files provide functions for the reading points from 3D procedural textures.
Constants.h: this header provide constant definitions used in the raytracing.
Basic types:
Primitives.h: this header contains definitions for points, vector, and rays. It also provides functions and overloaded
operators for performing calculations with vectors and points.
SceneObjects.h: this header file provides definitions for scene objects (ie. materials, lights, spheres, and boxes).
Colour.h: this header defines a datastructure for representing colours (with each colour component represented as a
float) and simple operations on colours, including conversions to/from the standard BGR pixel format.
Scene definition and I/O:
Scene.h and Scene.cpp: the header file contains the datastructure to represent a scene and a single function that
initialises this datastructure from a file. The scene datastructure itself consists of properties of the scene and lists of
the various scene objects as described above. The implementation file contains many functions to aide in the scene
loading process. Scene loading relies upon the functionality provided by the Config class.
Config.h and Config.cpp: this class provide facilities for parsing the scene file.
SimpleString.h: this is helper string class used by the Config class.
Image I/O:
ImageIO.h and ImageIO.cpp: these files contain the definitions of functions to read and write BMP files.
Miscellaneous:
Timer.h: this class provides a simple timer that makes use of different system functions depending on whether
TARGET_WINDOWS, TARGET_PPU, or TARGET_SPU is defined (we don't use the latter two, but I left this file unchanged in
case anyone wanted to see how such cross-platform stuff can be handled).
Executing
The program has the following functionality:
By default it will attempt to load the scene "Scenes/cornell.txt" and render it at 1024x1024 with 1x1 samples.
By default it will output a file named "Outputs/[scenefile-name]_[width]x[height]x[sample-level]_[executablefilename].bmp"
(e.g. with all the default options, "Outputs/cornell.txt_1024x1024x1_RayTracerAss1.exe.bmp")
It takes command line arguments that allow the user to specify the width and height, the anti-aliasing level (must be a
power of two), the name of the source scene file, the name of the destination BMP file, and the number of times to perform
the render (to improve the timing information).
Additionally it accepts some arguments (that are currently unused) for setting the number of threads, whether each thread
will tint the area that it renders, and the size of the block to render (only applicable for stage 3).
Further it accepts an arguments "-testMode" that will fill the output with white. When executing with multiple threads this
should be a greyscale colour that represents the thread number.
It loads the specified scene.
It renders the scene (as many times as requested).
It produces timing information for the average time taken to produce a render ignoring all file IO, and the number of
samples generated per run.
It outputs the rendered scene as a BMP file.
For example, running the program at the command line with no arguments would perform the first test (as described in the scene
file section):
On execution this would produce output similar to the following (as well as writing the resultant BMP file to
Outputs/cornell.txt_1024x1024x1_RayTracerAss1.exe.bmp):
average time taken (1 run(s)): 3672ms
Testing Batch Files
A number of batch files are provided that are intended to be executed from the command line, e.g.
For timing:
Stage1Timing.bat 3 8 will perform all the timing tests required for Stage 1 (i.e. 3 runs with 8 threads one each Test
scene).
Stage2Timing.bat 3 8 will perform all the timing tests required for Stage 2 (i.e. 3 runs with 8 threads one each Test
scene).
Stage3Timing.bat 3 8 8 will perform all the timing tests required for Stage 3 at a particular block size (i.e. 3 runs
with 8 threads and block size 8 on each Test scene).
For testing (requires Image Magick installation), e.g.:
Stage1Tests1.bat will perform all the comparison required for Stage 1 Tests where the thread count evenly divides
into the image height.
Stage1Tests2.bat will perform all the comparison required for Stage 1 Tests where the thread count doesn't evenly
divide into the image height.
Ray Tracing
The materials provided with this assignment include an implementation of a simple raytracer based on the raytracing tutorial at
codermind.com (links currently broken, but still available via the wayback machine here). This tutorial also provides a pretty good
introduction to the concepts of ray tracing.
Apart from a general structure rewrite and simplification of the code, the supplied implementation has the following similarities
and differences to the codermind tutorial:
it retains support for:
spheres;
diffuse (Lambert) and specular (Blinn-Phong) lighting;
shadows;
super-sampled anti-aliasing;
simple exposure;
a conic perspective camera model; and
reflection and refraction.
it is extended, in that is provides additional support for:
(axis-aligned) boxes;
(simple) procedural 3D textures;
setting the camera's position and (xz) rotation;
reading and writing BMP files.
it is simplified, in that it doesn't provide support for:
blobs;
gamma calculations;
perlin noise based procedural textures;
bump mapping;
auto exposure estimation;
bitmap texturing;
cubemap texturing;
an environment cubemap;
depth of field; and
support for reflection and refraction in the same material.
For those of you who want more information (as it's not entirely necessary to understand these concepts to be able to complete
the assignment) on some of the general and/or mathematical aspects of raytracing, see:
General explanations:
Codermind Tutorial (archived mirror): A step-by-step guide and the original source of the codebase of the
assignment materials.
Project Amiga Juggler: A step-by-step guide to the maths and implementation of a raytracer (handling only spheres
and a ground plane) in Java:
Step 1: vectors, rays, dot-product, and cross-product.
Step 5: camera model.
Step 8: ray-sphere intersections and Phong shading.
Step 9: ray-plane intersections.
Step 13: reflection.
Wikipedia's ray tracing page: a basic outline of the concepts.
Princeton Ray Casting Lectures: concepts, maths, and some pseudo code.
Princeton Illumination lectures: concepts, maths, and some pseudo code.
Softsurfer Algorithms: geometry algorithms.
Intersections:
Ray-sphere intersections: wikipedia, codermind (part 1), Project Amiga Juggler (step 8), Princeton Ray Casting
Lectures (Slide 11–14).
Ray-AABB (axis-aligned bounding box) intersections: medium.
Ray-plane intersections: wikipedia, Princeton Ray Casting Lectures (Slide 17), Softsurfer.com Algorithms.
Ray-triangle intersections: wikipedia, source paper, lighthouse3d, one more explanation.
Ray-something intersections: Real-time Rendering.
Lighting:
Diffuse lighting: codermind (part 1), Princeton Illumination Lectures (Slide 19–21).
Specular lighting: codermind (part 2), Princeton Illumination Lectures (Slide 23–25).
Shadowing: Project Amiga Juggler (step 8).
Perlin Noise:
2D: wikipedia.
3D: Understanding Perlin Noise.
One of the scenes uses voxel characters created by miklovesrobots originally in the .VOX format:
Voxel character / objects: Mini Mike's Metro Minis.
Magica Voxel editor and resources: MagicaVoxel.
Voxel file format: voxel-model.

联系我们 - QQ: 99515681 微信:codinghelp
程序辅导网!