讲解MA9070、辅导Python编程设计

MA9070 Simulation and Machine Learning Project 2022
1
1 Methods for Asian options (50% of project credit)
1.1 Overview
An Asian option is an option where the payoff is not determined by the underlying price at
maturity, but by the average underlying price over some preset time interval. Asian options
were originated in Asian markets to prevent option traders from attempting to manipulate the
price of the underlying on the exercise date.
There are a variety of Asian options. We will consider one with the following payoff:
max(
1
N
N
∑
n=1
Sn?K,0),
where Sn are daily closing prices of the underlying and K is the fixed strike price. The option
corresponding to this payoff function is called a Fixed Strike Asian Call Option with Discrete
Arithmetic Average.
For the underlying process we will use the geometric Brownian motion
dSt = rStdt+σ(St , t)StdWt , (1)
allowing for the possibility that the volatility can depend on the current time t and current
value of underlying asset St . We refer to this as the local volatility model.
The aim of Part 1 of the project is to price Asian options by Monte-Carlo simulations,
employing different variance reduction techniques.
1.2 Particulars
Unless otherwise specified, use the following parameters:
The strike price is K = 110.
The interest rate is r = 0.05.
The local volatility is given by the function
σ(S, t) = σ0(1+σ1 cos(2πt))(1+σ2 exp(?S/50)). (2)
where σ0 = 0.2, σ1 = 0.3 and σ2 = 0.5. Time t is in years.
Assume there are 260 (working) days in a year.
Fix the number of sample paths to be N_paths = 1000.
2
1.3 Computational Tasks
Programme the local volatility Eq. (2) in a Python function and then write separate
functions to price an Asain option:
– without variance reduction (naive method);
– antithetic variance reduction.
– control variates (see below).
Use Euler time stepping with a time step of one day. Each function should return
option price and variance.
After you have fully tested your code, compare the different methods you have imple-
mented. For this, fix the time to maturity (expiry) to 3 years, i.e., T = 3. Then price
the option for three values of the spot price S0 = S(t = 0):
S0 < K, S0 = K, S0 > K
You are free to choose sensible values of S0 to give a good assessment of how the
methods are performing under different situations. For each method you have imple-
mented, evaluate the option at three values of S0. From the variances you can obtain
95% confidence intervals for each case.
Write code to plot option price as a function of spot price over the range S0 = 10
to S0 = 180. You only need to plot the option price using the method that gives the
smallest variance.
Using a method of your choice, programme a function to compute the delta for the
Asian option. You only need to implement one method, but ideally it should be a
method with small variance. Write code to plot the delta over the same range of spot
prices as the previous item.
1.4 Report contents
See general discussion of report contents in Sec. 3 and 4. The report should follow the
structure of the computational tasks with the aim to produce a report that leads the reader
clearly through the tasks undertaken. The report should summarize the overall picture of how
the various reduction methods perform and the dependence of option prices and deltas on S0.
A few specific things to consider for this part of the project are:
Your Python code should be commented so that it is clear how you have implemented
each variance reduction technique.
In addition, to make the report understandable independently of the Python code, you
include markdown cells that briefly state which variance reduction methods you have
3
implemented. You do not need to give an analysis of the variance reduction. These can
be short explanations of a few sentences.
Report the results of your runs for different methods and different S0 in a clear
and understandable form. Discuss the benefits and/or disadvantages of the different
methods. Taking into account additional cost of variance reduction computations,
determine which method is the most efficient for this problem.
The plots of option price and corresponding delta should be clear. You can be creative
here and plot prices and deltas for a few values of the time to maturity T to show
the evolution with time to maturity. You could also contrast Asian option prices with
the European counterparts. You should summarize and discuss your plots, possibly
including from a financial perspective.
1.5 Control variates
There are three possible control variates one can consider:
1. ZT ,
2. e?rT max(ZT ?K,0),
3. max
((
∏Nn=0Zn
) 1
N+1 ?K,0
)
,
where Zt is governed by geometry Brownian motion
dZt = rZtdt+σ Zt dWt , (3)
where r and σ are constant.
The volatility in our model varies, but not too much, so one can expect that the discounted
payoff for the Asian option computed along a geometric Brownian path in the local volatility
model will be highly correlated with a corresponding constant-volatility geometric Brownian
path. In practice, one simulates (3) alongside the simulation of (1). From these simulations,
the different control variates depending on Zt are available. A simple choice for σ is σ(S0,0),
(why?). Other choices are possible and might be better.
The first control variate is just the value of ZT at the final time, and hence has a known
expectation (mean), just as was used for European options. The second is the discounted
payoff for a European call option, and hence the expectation is given by the Black Scholes
formula. The final is the discounted payoff for a geometrically averaged Asian option, for
which there is also a formula for the expectation
Z0 exp((rg? r)T )N(d1)?K exp(?rT )N(d2),
4
where
N(·) denotes the cumulative distribution function of the standard normal distribution
σg = σ
√
2N+1
6(N+1)
rg =
1
2
(
r? 1
2
σ2g
)
d1 =
log(Z0/K)+(rg+ 12σ
2
g )T
σg
√
T
d2 = d1?σg
√
T
This part of the project is challenging. You might not succeed at correctly implementing
all three methods. It is strongly recommended that you focus on control variate 1. Only after
you have completed other parts of the project should you attempt the other control variates.
2 Machine Learning: Credit Approval Data (50% of project
credit)
2.1 Overview
A popular use of machine learning is predicting credit risk. This talk1 by Soledad Galli
of Zopa provides an excellent overview of the steps and procedures involved in an actual
deployment. While it would be far too much to attack all these steps in this project, we will
consider a limited set of tasks using a pre-processed dataset for credit card approvals.
The aim of Part 2 of the project is to train, test, and evaluate the performance of different
classifiers in predicting credit card approval.
2.2 Particulars
A popular dataset used to examine machine learning classifiers is the Australian Credit
Approval Dataset2 hosted on the UC Irvine Machine Learning Repository. "This file concerns
credit card applications. All attribute names and values have been changed to meaningless
symbols to protect confidentiality of the data. This dataset is interesting because there is a
good mix of attributes – continuous, nominal with small numbers of values, and nominal
with larger numbers of values. There are also a few missing values."
We consider the dataset with all categorical values replaced by numerical values. Missing
values have been replaced by the mode of the attribute (categorical values) or by the mean of
1https://www.youtube.com/watch?v=KHGGlozsRtA&ab_channel=PyData
2https://archive-beta.ics.uci.edu/ml/datasets/statlog+australian+
credit+approval
5
the attribute (continuous values). The dataset contains 690 examples. The credit card approval
information is contained in the last column and encoded as 0 for “not approved” and +1 for
“approved”. This column is the label vector. The remaining columns contain the features. The
dataset will be posted on my.wbs as a comma-separated-values file: australian.csv along
with a description of the dataset.
Scikit-learn will be used for all machine learning tasks. Pandas and seaborn will be useful
for importing, inspecting and visualizing the data.
2.3 Tasks
Using pandas, read the dataset and verify that it is sensible. Using pandas and/or
seaborn provide a summary of the dataset. (See "Report contents" below.)
Extract the design matrix X and vector of labels y from the data. Create a train-test
split. Scale the data appropriately.
Perform a sanity check of the training data by running a cross validation score of the
SVC classifier with default parameters. Report the mean cross validation score. This
will provide a baseline score of what one can expect from a basic classifier without
any tuning of hyperparameters.
Now consider the linear and rbf kernels for the SVC classifier and tune the hyperpa-
rameters for the two kernels. Standard tuning of hyperparameters would mean tuning
regularisation parameter C for the linear kernel and C and the scale parameter gamma
for the rbf kernel. You do not need to tune more than these hyperparameters, although
you may consider more if they do not require a large amount of computer time to tune.
Based on mean cross validation scores, decide final hyperparameter values for the two
kernels.
Test and compare the two classifiers using the tuned hyperparameters. (See "Report
contents" for suggestions on what you might compare.)
Now consider other classifiers from the scikit-learn library. You must consider the
MLP classifier but should in addition consider the Decision Tree and Random Forest
Classifiers.
For the MPL classifier, you should investigate tuning the hidden layers, but this can
result in large computation times, and you should not leave code in the notebook that
would take long run times. (See "Report contents" below.)
For the Decision Tree, Random Forest, or any other classifiers that you investigate,
you may briefly investigate different hyperparameters.
Finally, it is possible to investigate which features are most important in determining
the classification. You are encouraged to investigate this. A few useful approaches
6
are to look at permutation_importance in the scikit-learn library. Also, if you run
a Decision Tree with a small depth and output the tree, you can see what features
are important. You might want to use seaborn to visualize the connection between
important features and the label.
2.4 Report contents
See general discussion of report contents in Sec. 3 and 4. The report should follow the
structure of the computational tasks with the aim to produce a report that leads the reader
clearly through the tasks undertaken and then summarizes the overall picture of how the
various classifiers perform and possibly connects this with the structure of the dataset.
A few specific points to consider are:
After reading the dataset, you need to briefly summarize its contents to the reader
using pandas and/or seaborn. At a minimum you want to use the .describe() method,
but ideally you should include some useful plots.
The Python code for turning the hyperparameters for the SVC classifier should be
included in your submission. Make sure you explain, or print, or plot results from the
tuning of hyperparameters so that the final choice is clear from reading the report.
While you are strongly encouraged to investigate different choices for hidden layer
in the MLP classifier, there are too many possibilities here for you to include Python
code for this turning in your submission. You should briefly summarize in words in
the report what you tried. The Python code should contain only the final MLP that you
decided. (Other code can stay in as long as it is commented out and does not execute
when the notebook is run.)
For any other classifies you run, please be succinct.
When evaluating classifiers, you will surely want to generate confusion matrices
and classification reports. Since the goal is to predict credit card approval, false
positives (incorrectly predicting 1) are considered worse the false negatives (incorrectly
predicting 0). This means that the precision of predicting 1 and the recall (sensitivity)
of predicting 0 are especially important.
The complexity of models is also something that can be discussed when comparing
classifiers. This is a relatively small dataset and so there is some danger of overfitting.
7
3 Report Notebooks
Your project work will be reported in two separate JupyterLab notebooks, one for each part
of the project. Each notebook should run without errors and produce your report.
Each notebook should begin with a concise introduction. These can typically be one or
at most two paragraphs and should describe what the notebook contains and/or give some
motivation to the work.
You should:
Use section headings and possibly horizontal lines to give your report structure.
Explain to the reader the purpose or goal of each section. Be brief, focusing on what is
being done and why.
Python code should be commented. You want to communicate concisely at the top of
code cells what task is being performed in the cell. You also need to include comments
for block of code that compute specific tasks. You should assume that the reader
understands Python. Do not comment line-by-line what is obvious.
Clearly label all plots!
Explain parameter choices you have made. Describe and interpret your results. It is
important that you interpret your findings. Findings will often be in the form of a
plot. End individual sections and/or whole notebooks with a brief summary of your
findings.
A very useful guide to constructing a clear notebook is the following. Run the notebook
and then collapse all code cells. The introduction, results, plots, and any discussion should
be readable as a short report.
Further points:
There is no specific guidance for length other than include all the material in the descrip-
tions above. It is better to produce a shorter report that clearly and concisely addresses
all the required points.
– Do Not include numerous non-illustrative plots.
– Do Not explain the Python code line-by-line.
Do Not include irrelevant material and discussion.
In developing and testing your codes you will surely need some Python code that does
not belong in your final report. This is normal. However, such things should not be
included in your submitted report. A useful way to approach this is to leave all code in
place until you have a finalised your work. Then remove any code cells unnecessary to
the final report.
8
It is not necessary to include citations in your report to numerical methods or to
example Python code covered in the module lectures and labs. You are permitted to
use sections of code directly from the examples in the scikit-learn documentation or
Users Guide. If you do, include a simple comment line in the code saying where the
code is from. For example:
# This follows the examples section of the
# sklearn.svm.SVC documentation.
In the unlikely event that you use methods or Python code examples not covered in the
module, then you must cite the source.
Write in passive voice or used the editorial we (as in “We see that ...”). Do not use
contractions, e.g., “don’t”, “haven’t”, etc.
4 Further details
4.1 Marks
Marks will be awarded for the project in line following the Generic WBS Marking criteria
with technical capability found at the bottom of this page3. Specifically, the criteria are:
Technical Capability [40%]. This includes using appropriate and correct methods
and algorithms, implementing correct Python coding, and using appropriate external
libraries. Correctly completing all tasks is of primary importance.
Academic Writing [20%]. Results should not only be accurate, but they must also be
presented in a clear, structured, and understandable form. Plots and other outputs must
be labelled and described. The use of relevant literature; referencing and citation are
not normally significant factors for the project assessment.
Analysis and Critical evaluation [20%]. WBS considers these to be separate criteria, but
we will consider this to be a single criterion. Results must be interpreted. Justification
must be given for the various choices made in the project work. Both parts of the
project should contain a concise introduction and concise and informative discussion
of the findings.
Comprehension [20%]. Showing deep knowledge & understanding of the subject
matter and its context. Originality will also be assessed here.
As already emphasised, satisfying these criteria does not require lengthy reports.
3https://my.wbs.ac.uk/-/teaching/216161/resources/in/870142/item/
690223/
9
4.2 Project Submission
The project must be submitted electronically through my.wbs:
The submission will consist of a single zip file named uxxxxxxx.zip, where xxxxxxx
are the digits of your University ID. The zip file will contain two Jupyter notebooks,
plus any modules and data needed to run the notebooks, e.g. you should include
australian.csv file.
The marker should be able to unzip your submission and run each notebook without
error and without any additional input or files.
Important: before submitting, you should restart the kernel and run all cells in
each notebook. You should then save the notebooks in the run state. This way
your submission contains two notebooks exactly in the state that you last ran
them.
It is the students’ responsibility to ensure that the zip file is not corrupt.
Marks will be deducted for not following these procedures.
4.3 Rules and Regulations
This project is to be completed by individuals only and is not a group exercise. Plagiarism is
taken extremely seriously and any student found to be guilty of plagiarism of fellow students
will be severely punished.
4.3.1 Plagiarism
Please ensure that any work submitted by you for assessment has been correctly referenced
as WBS expects all students to demonstrate the highest standards of academic integrity at all
times and treats all cases of poor academic practice and suspected plagiarism very seriously.
You can find information on these matters on my.wbs, in your student handbook and on the
library pages here.
It is important to note that it is not permissible to reuse work which has already been
submitted for credit either at WBS or at another institution (unless explicitly told that you
can do so). This would be considered self-plagiarism and could result in significant mark
reductions.
Upon submission of your assignment, you will be asked to sign a plagiarism declaration.