代写COMP90073 Security Analytics 辅导Matlab程序

Faculty of Engineering and Information Technology
The University of Melbourne
COMP90073 Security Analytics,
Semester 2, 2023
Assignment 2: Blue Team & Red Team Cybersecurity
Release: Fri 1 Sep 2023
Due: Tue 17 Oct 2023
Marks: The Project will contribute 25% of your overall mark for the subject.
You will be assigned a mark out of 25, according to the criteria below.
Overview
You’ve recently been brought into head up the cybersecurity team FashionMarketData.com -
who are a major player in providing platforms for online fashion retailers to leverage their
data through machine learning. However, after recently witnessing high profile data
breaches, including at Medibank and AHM, leadership of the company are concerned that
the business might face existential financial, legal, and reputational risks stemming from
hackers potentially manipulating their data, or exploiting their machine learning models.
The CEO has tasked you with heading up the newly formed Blue and Red Team
cybersecurity groups inside the company, and developing a report for the board that outlines
both risks and opportunities for the company. The Blue Team are concerned about users
potentially uploading images that do not match their labels, through either mistaken use of
the platform, or to potentially actively manipulate the company's systems. As such, the Blue
Team are working on designing and implementing systems that ensure only genuine,
fashion-related images are processed and ingested into the company’s crowd sourced
datasets. In practice, this will involve reliably detecting and distinguishing both anomalous
and out of distribution samples.
The Red Team are taking a different bent. Rather than actively defending the company’s
systems, they’re more concerned with understanding the scope of vulnerabilities in machine
learning models that have rapidly become a core part of the company’s business practices.
As such, the team plans to construct evasion and data poisoning attacks against exemplar,
non-production models, and to use these results to build a picture of the vulnerabilities
present within the company’s systems and processes.
Finally, you will need to present a report for the non-technical leadership of
FashionMarketData.com, based upon your insights from working with both the Blue and Red
teams. Due to the critical nature of understanding the risk that the company may face from
its data management and machine learning practices, it is crucial that you deliver this report
at the next meeting of the company's board, which will be on Tuesday the 17th of October,
2023.
Datasets
To understand these vulnerabilities, you have been provided with images encompassing 10
distinct fashion categories, which have primarily been drawn from the Fashion-MNIST
dataset. This dataset consists of 28*28 grayscale images in 10 distinct fashion categories.
This compilation serves as your in-distribution (normal) dataset, representative of the core
and expected content within the fashion domain. Examples of the first 3 fashion categories in
the given dataset are shown below.
Your primary task is to devise and refine multiple algorithms with the specific aim of
identifying anomalies and out-of-distribution (OOD) images. OOD samples refer to images
that do not belong to the fashion categories defined w.r.t. the in-distribution data, such as
airplanes, animals, and hand-written letters/characters, etc. Meanwhile, anomaly samples
pertain to fashion images that diverge from typical fashion items. While they remain
categorically fashion images, they differ from the familiar images in the dataset due to
distortions, rotations, cropping, and similar alterations.
To facilitate this objective, five separate datasets will be made available to you. Each dataset
will play a crucial role in training, validating, and testing the efficiency and accuracy of the
algorithms you develop.
Dataset Description
Training set
[train_data.npy]
[train_labels.npy]
This dataset features images from 10 unique fashion
categories, labelled from 0 to 9. It acts as the principal
guide to discern the standard content in the fashion
domain. You will employ this dataset for both anomaly
and OOD detection tasks.
For Anomaly Detection
Validation set
[anomaly_validation_data.npy]
[anomaly_validation_labels.npy]
This set comprises both original and distorted fashion
items. Importantly, items labelled '1' indicate an
anomaly status, while those labelled '0' represent
normal data. This validation set is primarily intended for
tuning your model's hyperparameters and reporting its
performance using the relevant metrics in your analysis.
For Anomaly Detection
Test set
[anomaly_test_data.npy]
The test set comprises original and distorted fashion
items, with a similar proportion found in the validation
set. However, unlike the validation set, this dataset
contains no labels. As such, you are required to use
your trained model to predict their anomaly statuses.
For OOD Detection
Validation set
[ood_validation_data.npy]
[ood_validation_labels.npy]
This set contains a blend of both fashion and nonfashion items. Notably, items labelled as '1' signify OutOf-Distribution (OOD) status, indicating they do not
align with the standard fashion categories. On the other
hand, samples labelled '0' represent in-distribution data.
Primarily, this validation set is intended for tuning your
model's hyperparameters and for reporting performance
using the relevant metrics in your analysis.
For OOD Detection
Test set
[ood_test_data.npy]
The test set includes both fashion-centric items and
potential non-fashion items, mirroring the proportion
observed in the validation set. Unlike the validation set,
this dataset lacks pre-labelled OOD statuses. As such,
you will be required to use your trained model to predict
these OOD statuses.
Note that the NumPy array file (.npy) can be loaded via: data = np.load('input.npy').
Blue Team Tasks
You will create anomaly detection and OOD detection algorithms with the provided training
and validation sets. Following the development phase, these algorithms will be tested on the
given separate test set. You need to annotate this test set with anomaly and OOD statuses
derived from each of the detectors.
For anomaly detection, you will develop two distinct detection algorithms:
1) Shallow model
Use a shallow (non neural network) model (e.g., OCSVM, LOF) to develop a detector
for identifying non-fashion items. It might be beneficial to utilize dimensionality
reduction techniques before inputting the data into the detection model.
2) Deep learning model
Develop a deep learning model, such as autoencoder, to detect whether an item
belongs to the category of fashion items or not.
For OOD detection, you are required to develop a single algorithm.
Deliverables
1) The predicted labels for the test sets (submit in a zip archive of .npy files)
• After running each of the three detection algorithms on the test set, the annotated
results (the non-fashion statuses determined by each detector) should be prepared in
a structured format as the validation set.
• For your Blue Team results you will need to generate 3 result files corresponding to
each of the Blue Team approaches. The filenames will be 1.npy (anomaly detection:
shallow model), 2.npy (anomaly detection: deep learning model), and 3.npy (OOD
detection).
2) Python code or Jupyter Notebook (submit as zip archive)
• This should contain the complete code for all three detection algorithms, starting from
data import and preprocessing, progressing to algorithm implementation, and ending
with an appropriate analysis of your results. This may include visualisations to help
emphasise your points.
• It is important that if your code is a Jupyter Notebook, the notebook must contain the
evaluated results of all cells; and if you are using Python, it must be able to be run
completely. In both cases, you must include a supplementary README file that
includes the versions of all libraries used must be included.
• When utilizing any pre-existing code sourced from online resources, you need to
clearly annotate that within your codebase using comments. Furthermore, please
ensure that you provide a comprehensive reference to these sources in your report,
detailing their origins and contributions to your work.
• Ensure the code is well-structured with clear function definitions, variable names, and
comments that explain the purpose of each section or step. Complex or nuanced
sections of the code should have accompanying comments for clarity.
− If submitting the Python code (.py), please provide comments in the code for
major procedures and functions. Also, please contain a README file (.txt)
showing instructions on how to run each script. You need to submit a zip archive
containing all scripts (.py) and README.txt.
− If submitting a Jupyter Notebook (.ipynb), incorporate markdown cells to segment
the code and provide explanatory notes or observations. Before submission,
restart the kernel and run the notebook from the beginning to ensure all cells
execute in order and produce the expected outputs. Ensure that all outputs,
especially visualizations or essential printed results, are visible and saved in the
notebook.
• Please include all data preprocessing or visualisation steps you may have
undertaken, even those not included in the report. Use comments to specify the
intent behind each result/graph or what insights were derived from them.
3) Report (submit as PDF)
• Your report should be targeted towards your intended audience and should use
qualitative and quantitative methods to describe your techniques, results, insights,
and understandings. This should include an appropriate description of your choice of
detection algorithms, evaluation methods, and a discussion regarding the
ramifications of these choices. As a part of this, it is important to include any
challenges that you faced, and the decisions or assumptions you made throughout
the process. Your results should be presented in a fashion that is readily
comprehensible to a non-technical audience.
• Your report should include both an introductory executive summary to provide an
overview of the underlying task, offering a snapshot for readers to understand the
context and objective of the report. Following the body of your report, the conclusion
should encapsulate the primary findings of your investigation or study. Additionally,
this section should present recommendations for potential enhancements or
alternative strategies that might be considered in the future.
• The word limit for Blue Team (Task I) report is 1500. Your main report should not
exceed 7 pages in length. However, any supplementary diagrams, plots, and
references that you wish to include can be added after the main report. These
additional materials will not be considered as part of the word or page count limits.
• You should evaluate your model with at least three appropriate metrics for each
detection algorithm. Some commonly used metrics include AUROC and false positive
(FP) rate. However, depending on the context of your algorithm, other metrics might
be equally or even more relevant. Ensure that your chosen metrics provide a wellrounded view of the algorithm's performance in its intended application.
• You could also evaluate samples where the model misclassified. For instance, if
certain types of anomalies are consistently missed, what valuable insights into
patterns or consistencies could be gained from these failures? Additionally, if there
are any extreme cases that makes your model fails to predict, what measures could
be taken in future training? You could also discuss how these inaccuracies might
manifest in real-world scenarios.
• To make your findings more accessible to readers, tables are recommended to use
for structured presentation of numerical results and comparisons. Meanwhile,
visualizations like bar graphs, scatter plots, or heat maps can offer intuitive insights
into the data, helping to convey relationships, distributions, or anomalies that might
be less apparent in raw numbers alone.
• The creativity marks will be allocated based upon both how you extend and present
your work, and the insights that you provide. When we refer to extensions, this may
be in terms of the techniques taken, insights get from your experiments, comparison
of model parameters, or your comprehensive analysis – even if your tested novel
ideas are not successful. The amount of time you spend pursuing the creativity
marks should be commensurate with the marks allocated.
Red Team Tasks
Because the company’s leadership is cautious about Red Team attempting to attack a
production model, you will instead need to train a similar model, that you will use as a proxy
for attacking. To ensure that the model closely matches what is used in production, the
trained architecture that you produce should incorporate at least 3 linear layers, with a
dropout layer (with probability 0.25) preceding the last two linear layers. You should train on
the training set to an accuracy of at least 85%, using a cross entropy loss, an Adam
optimizer, and a learning rate of 10
-4
. All your input samples should be normalised to within
[0,1] before being passed into the model.
After training this model, you will need to design an iterative gradient based attack. The code
for this should be flexible enough that you would be able to take any model and input sample
and attack it for a up to a specified number of iterations with a fixed step size, while ensuring
that the attack image always remains within [0,1]. While the maximum number of iterations
should be fixed, consider if your attack can be modified to stop early if necessary. Your
attack should be able to produce both targeted and untargeted attacks. Because the Red
Team is trying to build up their own internal capabilities, your attack should avoid
unnecessary use of off-the-shelf libraries that implement adversarial attacks themselves.
Your focus should be on employing basic machine learning and mathematical libraries, as
well as autograd.
To test the performance of your ability to attack the model, you should attack every 20
th
sample of the test set. As you do so, vary the step size of your iterative attack between 10
-5
and 10
1
for at most 100 steps, and perform an analysis on the untargeted performance of the
attack, and the performance when targeted towards forcing the model to predict the 0th class.
You will need to perform an appropriate analysis of the attack performance, which should
include an analysis of the success rates and l2 norm distance between your tested images
and your successful attacks (this l2 norm should be calculated using sum, square-root and
power operations).
Given that you know that Blue Team are working on techniques that could be used to detect
attacks, your report to the company's leadership should consider how the techniques
implemented by the Blue Team could be used to defend your model from adversarial attack.
You may also wish to consider how changes to the model architecture, training process, or
data handling procedures may influence the level of adversarial risk faced by your model, or
how you might attack a model that incorporates defensive stratagems from the Blue Team.
Deliverables
1. Python code or Jupyter Notebook (submit as zip archive)
• This should contain the complete code for (1) training the underlying network, (2)
performing adversarial attacks, and (3) evaluating their performance.
• The requirements/guidelines are identical to those outlined for Blue Team (Task I).
2. Report (submit as PDF)
• Please make a separate report for Red Team (Task II), The word limit for this task is
1000. Your main report should not exceed 4 pages in length. However, any
supplementary diagrams, plots, and references that you wish to include can be
added after the main report. These additional materials will not be considered as part
of the word or page count limits.
• You should evaluate your model with appropriate metrics for your implemented
attack. Some commonly used metrics include accuracy drops and perturbation
size/distribution. However, other metrics might be equally or even more relevant.
• You could include visualizations of the adversarial noise and the perturbed images in
the report, such as side-by-side comparisons that illuminate the slight alterations that
result in significant prediction deviations, helping readers discern the vulnerabilities of
your implemented attack. Meanwhile, the plot of loss/performance changes versus
iterations could be used to provide a visual representation of the model's training
dynamics, making it easier to diagnose issues, compare solutions, and communicate
the model's behaviour to both technical and non-technical stakeholders.
• The creativity marks will be allocated based on both how you extend and present
your work and the insights that you provide. This may be in terms of the network
structure, training techniques, adversarial attack techniques, evaluation and analysis,
or to present interesting findings – even if your tested ideas are not successful. The
amount of time you spend pursuing the creativity marks should be commensurate
with the marks allocated.
Assessment Criteria – Blue Team (Task I)
Code quality and README (2 marks)
Technical report (13 marks)
1. Methodology: (4 marks)
2. Critical Analysis: (5 marks)
3. Report Quality (4 marks)
Creativity: (2 marks, as Bonus)
Assessment Criteria – Red Team (Task II)
Code quality and README (2 marks)
Technical report (8 marks)
1. Methodology: (2 marks)
2. Critical Analysis: (3 marks)
3. Report Quality (3 marks)
Creativity: (1 mark, as Bonus)
Changes/Updates to the Project Specifications
If we require any changes or clarifications to the project specifications, they will be posted on
the Canvas. Any addendums will supersede information included in this document. If you
have assignment-related questions, you are welcome to post those in the discussion board.
Academic Misconduct
For most people, collaboration will form a natural part of the undertaking of this project.
However, it is still an individual task, and so reuse of ideas or excessive influence in
algorithm choice and development will be considered cheating. We will be checking
submissions for originality and will invoke the University’s Academic Misconduct policy
(http://academichonesty.unimelb.edu.au/policy.html) where inappropriate levels of collusion
or plagiarism are deemed to have taken place.
Late Submission Policy
You are strongly encouraged to submit by the time and date specified above, but if
circumstances do not permit this, the marks will be adjusted as follows. Each day (or part
thereof) that this project is submitted after the due date (and time) specified above, 10% will
be deducted from the marks available, up until 5 days have passed, after which regular
submissions will no longer be accepted.
Extensions
If you require an extension, please email Mark Jiang