UNSW Sydney Business School
School of Risk and Actuarial Studies
ACTL1101
Introduction to Actuarial Studies
Main Assignment
(Due date: 28 July, Friday 4pm)
T2 2023
June 2023
Contents
1 Part One: Taxation Data by Postcode 2
1.1 Context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Your Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2 Part Two: Optimal Investments 4
2.1 Context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2 Your Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3 Format Requirements 6
4 Marking Criteria 7
4.1 Part One . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
4.2 Part Two . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
4.3 Plagiarism Awareness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
4.4 Late Penalties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
5 Answering Students’ Questions 9
A Variable Description for Part One 10
1
1. Part One: Taxation Data by Postcode
1.1 Context
In this part of the assignment, you will perform an analysis (and create visualisations) of
a dataset which contains variables mostly related to income and taxation within different
Australian postcodes for tax year 2018-2019. Some information about the schools (elementary
and secondary) within those postcodes is also present in the dataset (last 4 columns).
Warning: because some postcodes do not have any schools in them, those last 4 variables
contain many ‘NA’ values. Other variables may also contain some ‘NA’ values.
Each of your datasets consists of 800 randomly generated records and can be downloaded
as the csv file. At the end of this document, you can find ”Appendix A: Variable Description,”
which provides a brief description of each variable and its meaning. It is important to
understand the representation and meaning of the variables used in order to interpret the
results accurately.
For your information, those datasets are ‘real’ (and publicly available). The taxation and
income data is available here1 (with its license found here). The school data is available
here.
1We must disclose that, compared to the original income/tax data found here, we have modified many
variables to obtain averages by individual (as opposed to the total amounts by postcode found in the original
data). We have also removed many variables from the original dataset, and deleted one postcode for which
the average tax rate was strongly negative (we consider this postcode to be an outlier which we do not want to
include in our analysis).
2
ACTL1101 Introduction to Actuarial Studies 2023 Main Assignment
1.2 Your Tasks
1. (1pt) Produce a visualisation of the distribution of variable Private.Health.Proportion
across all postcodes. Briefly describe this distribution.
2. (1pt) Produce a table containing, for each State:
the number of postcodes within that state
the mean of variable Private.Health.Proportion (across postcodes within that
state)
the standard deviation of variable Private.Health.Proportion (across postcodes
within that state)
3. (1pt) Create a new variable called Avg.Gross.Rent, which is simply:
Gross.Rent.Amt
Total.Nb
.
Then, compute the sample correlation between Avg.Gross.Rent and Avg.Tax.Rate.
Report and briefly interpret your result.
4. (2pts) Add a variable called Tax.Bracket to this dataset. This new variable should be
based on variable Avg.Tax.Rate, and be equal to:
‘Low’ if Avg.Tax.Rate is below its 25% quantile.
‘Medium’ if Avg.Tax.Rate is equal or above its 25% and below its 75% quantile.
‘High’ if Avg.Tax.Rate is equal or above its 75% and below its 99% quantile.
‘Very High’ if Avg.Tax.Rate is equal or above its 99% quantile.
Then, report the average Avg.Income within each Tax.Bracket.
Hint: consider using the R function quantile().
5. (2pts) Produce a visualisation which illustrates the relationship between variable
Private.Health.Proportion and the new variable Tax.Bracket. Briefly discuss what
you observe.
6. (3pts) Open Question: use any variable(s) you want in this dataset to tell a brief story
about the data. This can be anything you find relevant, but you must include at
least one visualisation to support your ‘story’. Example: an interesting/surprising link
between variables, an insight that could help set a new public policy (or improve an
existing one), a finding that is the starting point for new research, etc.
3
2. Part Two: Optimal Investments
2.1 Context
In this second part of the assignment (which is totally unrelated to the first part), you will
work on an investment problem that would be difficult to tackle without programming. You
need to use the R Shiny app to look for the values that correspond to your zID. The app
will provide the numerical values of r, μ0, and w0 specific to your zID. Please use the app to
retrieve these values and incorporate them into your analysis. The context is as follows.
You want to invest your money and you have two investment Options. Both of them will
yield a random rate of return. However, Option B is substantially riskier than Option A. Their
dynamic is as follows:
An amount of 1 invested in Option A will yield a random amount A, with
A = 1 + U · r, (2.1)
and where U is a Uniform(0,1) random variable and r is a constant.
An amount of 1 invested in Option B will yield a random amount B, with
B = exp
(
μ0 +
√
0.12 ? c2Z + cΦ?1(U)
)
, (2.2)
and where U is the same Uniform(0,1) as in Equation (2.1), Z is a N(0,1) random
variable (independent of U ), Φ?1() is the quantile function of the standard Normal
distribution and μ0, c are constants (with 0 ≤ c ≤ 0.1).
You make financial decisions using a utility function given by:
v(w) = 1? exp(?w), for w ∈ R,
and you invest all your wealth w0 in some proportion to Option A, and in some proportion to
Option B. Call γ the proportion of your wealth you invest in Option B (where 0 ≤ γ ≤ 1). Your
final (and random) wealth W is then equal to:
W = w0 [(1? γ)A+ γB] .
Note 1: Don’t worry about function Φ?1(), simply know that you can get this function in R as
qnorm(). So, to get Φ?1(U) you would write qnorm(U).
Note 2: We stretch that the same U is used in the calculation of Option A and Option B. This
induces a correlation between A and B.
4
ACTL1101 Introduction to Actuarial Studies 2023 Main Assignment
2.2 Your Tasks
1. (3pt) Create a function called generate.AB. This function has two arguments: n (no
default value) and c (default value of 0). This function does the following:
It generates n random pairs of (A,B) under the dynamic given by Equations
(2.1)-(2.2), with constant c specified via the second argument ‘c’ of the function.
It returns this sample as a matrix of size n × 2 (n rows, 2 columns). The first
column is the sample (A1, . . . , An); the second column is the sample (B1, . . . , Bn).
Then, use your function to create a scatter plot of a sample (A1, B1), . . . , (An, Bn) for
n = 2000 and c = 0.08 (the A values should be on the x axis, while the B values should
be on the y axis) and briefly comment on the relationship between A and B.
Hint: In this question, vectorization is your friend. Remember that a single command
like rnorm(n) can generate a vector of n random variables in one go.
2. (2pt) Use a visualisation of your choice to illustrate the relationship between:
the correlation ρ(A,B).
the constant c.
Then, briefly analyse and interpret this relationship.
Hint: We do not know the theoretical correlation between A and B, but we can use
function generate.AB() to obtain a sample from (A,B) and then use the R function
cor() to estimate the correlation, from that sample.
3. (3pt) Following this investment strategy, your Expected Utility is:
E[v(W )] = E[1? exp(?W )] = E[1? exp (? w0(1? γ)A? w0γB)].
Assume that c = 0.05. Find the γ which maximises E[v(W )]. [Recall that 0 ≤ γ ≤ 1.]
Hint 1: It would be hard to compute E[v(W )] with pen and paper, but here again you
can use generate.AB() to obtain a sample v(W1), . . . , v(Wn). You can then compute
the mean of that sample, which is a good estimate of E[v(W )]. You can repeat that for
many values of γ, as to find the (approximate) γ that maximises Expected Utility.
Hint 2: The ‘computational burden’ here is quite high, as you need a fairly large sample
(we suggest n > 200,000) to approximate E[v(W )] by the sample mean of v(W ). That
said, in this process you only need to generate ONE sample (A1, B1), . . . , (An, Bn).
Indeed, using this ONE sample you can then obtain samples v(W1), . . . , v(Wn) for
different values of γ. Said otherwise, you do not need to generate a new random
sample for every different value of γ.
4. (1pt) Assume that c = 0.05. If you follow this investment strategy with the γ value
as derived in the previous question, what is: Pr[W < w0] (i.e. the probability you
experience a loss)? [Here again, we expect an approximate answer based on R
simulations, not a pen-and-paper calculation.]
5. (1pt) Include in an Appendix the pseudocode of all the tasks your performed in R/Python
for questions 1 to 4. [You are encouraged to write your pseudo-code before the actual
implementation in R/Python, as this helps structure the coding process and organise
your thoughts.]
5
3. Format Requirements
You must submit your assignment on Turnitin (under the ”Main Assignment” section in
Moodle).
You must submit two files:
– A .pdf file: contains your answers to all questions.
– A .R/py file: contains all the R or Python code you used to produce your answers.
About the .pdf file:
– It includes a title page with your student name and student zID.
– The page format is A4 (the standard Australian format).
– The minimum font size used is equivalent to ”Times New Roman” size 11.
– The minimum line spacing used is 1.15.
– The margins should not be narrower than the ”narrow” option in Word (0.5 inches
on every side).
– The answers (including sub-parts) are numbered in the same way they are numbered
in the statement of the questions.
– Your answers to Part One (including plots) must fit on 2 pages.
– Your answers to Part Two (including plots) must fit on 1 page.
– The main body of the pdf (i.e., the 3 pages) need not contain R or Python code
but must include everything that is asked in the questions (e.g., visualizations,
tables, numbers, explanations, etc.).
– All R or Python code necessary to produce your results must be placed in an
Appendix to your .pdf file. There is no page limit on this Appendix, but the
efficiency of your code will be graded (see Marking Criteria). To be clear: we
want the entirety of your R or Python code to be present in your Appendix (as well
as in your .R/py file).
– This R or Python code Appendix must be made of text (not images).
? Specifically about your .R/py file:
– Your R or Python code must run as it is (and produce exactly the results in your
assignment). If we cannot run your code, you will lose ALL marks associated with
the R or Python code (C1 and C2 in the Marking Criteria).
– Your R or Python code must contain ALL steps necessary to answer the questions
in this assignment. To be specific: you are NOT allowed to do any data manipulation
in Excel or any software other than R or Python.
6
4. Marking Criteria
Each individual Question is allocated a fixed number of marks. To assess your answers,
we will use a series of criteria. Those criteria are stated below, with a brief description that
corresponds to a ‘HD mark’. Not all criteria are relevant to every sub-question: find a detailed
mapping below.
C1: Code Correctness: Your codes, functions and algorithms produce exactly the
desired results, and do not produce any irrelevant/superfluous results.
C2: Code Efficiency: Your codes are extremely efficient, without sacrificing readability.our R codes are extremely well organised and easy to follow.
C3: Analysis: Your analysis is insightful and accurate. Your interpretation of your
results is correct, clear, precise and shows a great depth of understanding and critical
thinking. Your writing is concise, fluent and devoid of typos, grammatical and syntactical
mistakes.
C4: Choice of Visualisation: Your choice of which visualisation to use is excellent: it
conveys all (and only) the appropriate information.
C5: Presentation: The formatting and presentation of your results and/or visualisations
is impeccable: clear, readable and aesthetic.
C6: Pseudocode: Your pseudocode is clearly written in a neutral syntax and contains
all steps necessary to reproduce your results.
4.1 Part One
For each sub question in Part One, the relevant marking criteria are:
Q1: C1 (20%), C3 (30%), C4 (30%) C5 (20%)
Q2: C1 (50%), C2 (30%), C5 (20%)
Q3: C1 (60%), C3 (40%)
Q4: C1 (60%), C2 (40%)
Q5: C1 (20%), C3 (30%), C4 (30%) C5 (20%)
Q6: C3 (50%), C4 (25%), C5 (25%)
7
ACTL1101 Introduction to Actuarial Studies 2023 Main Assignment
4.2 Part Two
For each sub question in Part Two, the relevant marking criteria are:
Q1: C1 (40%), C2 (20%), C3 (20%), C5 (20%)
Q2: C1 (20%), C2 (20%), C3 (20%), C4 (20%), C5 (20%)
Q3: C1 (60%), C2 (40%)
Q4: C1 (60%), C2 (40%)
Q5: C6 (100%)
4.3 Plagiarism Awareness
This is an individual assignment. While we have no problem with students discussing
assignment problems if they wish, the material each student submits must be their own
individual work. Students should make sure they understand what plagiarism is.
In particular, any R or Python code you present must be from your own computer, and
developed by you alone. With≈360 students performing the same task, some small elements
of code are likely to be similar. However, big patches of identical code (even with different
variable names, layout, or comments) will be considered suspicious and investigated for
plagiarism. Turnitin picks this up easily, so cases of plagiarism have a very high probability
of being discovered. The best strategy to avoid any problem is to never share bits and pieces
of code with other students.
4.4 Late Penalties
Penalties for late assignments are as indicated in the course outline:
Late submission will incur a penalty of 5% per day or part thereof (including weekends)
from the due date and time. An assessment will not be accepted after 5 days (120 hours)
of the original deadline unless special consideration has been approved.
Hence, be careful: 4.99 days of lateness gives you a penalty of 25%, but 5.01 days of
lateness gives you a penalty of 100%.
8
5. Answering Students’ Questions
Questions or clarification about the assignment must be posted on the Ed Forum. We do not
plan to give out many additional hints, but if we were to do so, we want everyone to benefit
from them.
Important Note: The deadline for submission of this assignment is 28 July at 16:00. However,
we will stop answering any questions about the assignment on Wednesday 26 July at 16:00.
The rationale for this is twofold:
? we want to incentivise students to start the assignment early
? we want to be fair to assiduous students who decide to submit their assignment ahead
of time. Were we to give hints right before the deadline, those students would be
penalised for their earliness.
9
A. Variable Description for Part One
Postcode: Identifier of an Australian postcode (for which all other variables are recorded)
State: State in Australia where the Postcode is located
Total.Nb: Total number of individuals (with tax returns) in that postcode
Total.Income: Total taxable income (all sources) for that postcode
Net.Tax.Amt: Total net tax paid
Avg.HELP: Average student HECS-HELP Debt repayment
Avg.Salary: Average salary or wages
Total.Income.Amt: Total income or loss (including non taxable income)
Avg.Income: Average total income or loss
Avg.Tax: Average tax paid
Avg.Tax.Rate: Average tax rate paid (Net.Tax.Amt divided by Total.Income.Amt)
Avg.Work.Expenses: Average work related expenses (all expenses)
Employer.Super.Contributions.Amt: Total reportable employer superannuation contributions
Net.Capital.Gain.Amt: Total net capital gain
Tax.Net.Capital.Gain.Amt: Total estimated tax on net capital gains
Avg.Foreign.Income: Average assessable foreign source income
Gross.Rent.Amt: Total gross rent (income)
Personal.Super.Contributions.Amt: Total personal superannuation contributions made
Total.Business.Income.Amt: Total business income
Total.Business.Exp.Amt: Total business expenses
Net.Business.Income.Amt: Total net income or loss from business
Business.Net.Tax.Amt: Total estimated business net tax
Private.Health.Proportion: Number of individuals with private health insurance divided
by Total.Nb
ICSEA: Average Index of Community Socio-Educational Advantage (see link for details)
LBOTE: Average proportion of students with a ‘Language Background other than English’
Indigenous: Average proportion of indigenous students
Teaching.Ratio: Average number of teaching staff per student