Lab 3: A Simple MapReduce-Style Wordcount Application
CMPSC 473, SUMMER 2022
Released on July 21, 2022, due on August 04, 2022, ll:59:59pm
Raj Pandey and Bhuvan Urgaonkar
1 Purpose and Background
This project is designed to give you experience in writing multi-threaded programs by
implementing a simplified MapReduce-style wordcount application. By working on this
project:
• You will learn to write multi-threaded code that correctly deals with race conditions.
• You will carry out a simple performance evaluation to examine the performance
impact of (i) the degree of parallelism in the mapper stage and (ii) the size of the
shared buffer which the two stages of your application will use to communicate.
Input
File
read
fappers Buffer
produce
Reducer
consume write
Output
File
Figure 1: Overview of our Mapreduce-style multi-threaded wordcount application.
The wordcount application takes as input a text file and produces as output the counts
for all uniquely occurring words in the input file arranged in an alphabetically increasing
order. We will assume that the words within our input files will only contain letters
of the English alphabet and the digits 0-9 (i.e., no punctuation marks or other special
characters). Our wordcount will consist of two stages. The first stage, called "mapper,"
DPBS1110 Evidence-Based Problem Solving
Assessment 2b: Assessment Guide and Marking Rubric
You have been asked to do further investigation and provide a recommendation to NSW
government. In particular, you have been asked to provide a recommendation on whether
the NSW government need to interfere to address the housing stress problem and ease
the pressure of the rising interest rates. Recommend at least one policy that NSW
government could consider.
1. Solve lt! 'Statistical Toolbox'
This section of the report is approximately 500 words (guide only, not a word limit).
After your initial investigation and feedback from various stakeholders, it has been
advised that there are other important factors which affect housing stress (e.g. low
income, age and financial literacy).
To understand whether these factors play an important role, you will need to
(i) formulate a multiple linear regression model and (ii) explain whether the
relationship is statistically and/or economically significant.
The use of a multiple linear regression, confidence intervals and hypothesis testing
would help you address this and thus provide evidence for your arguments.
In addition to interpreting the results of your analysis you will also need to draw to
attention issues of causality and confoundment which can impact the conclusions
from the analysis.
As a part of this assessment, assumptions and limitations need to be explicitly
identified. e.g. What variable would you want to have in an ideal situation to measure
income in this analysis? Do you have this variable in the dataset?
Word limit: A maximum of 1,500 words (no minimum word limit; graphs, figures and
reference list are excluded from the word count). A 10% penalty will apply if you exceed
the word count. Structure and format: No introduction or executive summary are required. You are
required to write in a business report style (i.e. formal language, etc).
Referencing style: Harvard (see The 'In-Text' or Harvard method for more information).
2. Solve lt! 'Ethics Toolbox'
This section of the report is approximately 500 words (guide only, not a word limit).
Your client, the NSW government, is especially concerned with the ethical issues that
housing stress presents. Apply the 7-step ethical decision-making framework from Unit 3
to one of the following ethical dilemmas:
Should NSW government provide food stamps to low income earners to alleviate
the rising cost of living problem?
Should NSW government asked borrowers to undertake compulsory financial
literacy course before borrowing money from the bank?
For the ethical dilemma you are writing up, apply the 7-step ethics decision-making
framework to formulate a position on the ethical issue selected. Makes sure your position
on the ethical issue (step 7 in the framework) is clearly presented.
3. Solve It! 'Information Toolbox'
This section of the report is approximately 500 words (guide only, not a word limit).
After compiling all of the above evidence, you are now ready to consolidate this and other
evidence that you researched, and provide your recommendations to your Team Leader
for inclusion on the main report.
Your instructions are to:
Write your report with a clear problem-solving and decision-making logic
Tell a compelling story with a clearly articulated report argument structure.
You MUST include either a grouping or argument structure logic tree. (the logic tree
will not be included in the word count)
Ensure that your recommendations are persuasive and include specific course of
actions for the government.
Be consistent with the statistical and ethical analysis conducted in the report.
Comment on the reliability and validity of this report.
Note about the data
These data are realistic but fictitious; they have been constructed for the sole purpose of use in
teaching.
The data set contains 2,000 observations that were collected over a one-month period during
2021. Each observation refers to a different household residing in Sydney who have purchased
a residential property with a mortgage at the time. The variables from the full survey that have
been selected for your use are:
lowinc Indicator variable =1 if the household is in the lowest two income
quintiles, =0 otherwise
lowSEIFA Indicator variable =1 if the property is located in an area that is in the
lowest two SEIFA quintiles, =0 otherwise
age Age of the household head (in years)
comtime Normal weekly commuting time (in hours) from the property to work
Financial Literacy 1= Very Poor; 2= Poor; 3= Satisfactory; 4= Good; 5= Very Good
Access to financial
advice
0= no
1= yes
hcost Individual housing cost-income ratios calculated as the ratio of weekly
mortgage repayments and weekly gross household income*
* Disposable income is defined as equivalised after-tax income where equivalence scales are used to
adjust for households of different sizes and hence to allow for economies of scale that arise from income
sharing within households. Those households reporting zero or negative income have been excluded
from the analysis.