Advanced (Business) Data Analytics – 2023 S1 – Assignment 2
Advanced (Business) Data Analytics
ASSIGNMENT 2
Advanced (Business) Data Analytics – 2023 S1 – Assignment 2
2
Summary
• Type: Project report, individual assignment
• Deliverable:
o Report in the format of Python script (.ipynb file) only
(you need to use the provided A2 template)
The aim is to provide experience in the steps involved with text preparation, text feature generation,
topic modeling and profiling and using the text features in model building and evaluation. Feel free to
discuss concepts and ideas with peers but remember your submission must be your work. Be careful
not to allow anyone to copy your work. You need to research text analytics and Python functionalities
if you aim to achieve excellent marks.
Specification
The focus of this ML project is to predict discharge decision after the hospitalization. The hospital
environment presents a challenging environment due to the constantly evolving severity of each
patient's illness and the presence of multiple independent measurement devices that often produce
conflicting and false alarms, negatively impacting the quality of care. Previous work in discharge
decision models aimed to consolidate data from these devices and transform the information streams
into knowledge, but this approach overlooked a valuable source of medical information: free-text
clinical notes and reports.
Clinical notes provide health care staff with a quick overview of the most important aspects of a patient's
health conditions. Integrating features extracted from these notes with standard health measurements
yields a more comprehensive representation of a patient's health state, resulting in improved outcome
prediction. However, free-text data is challenging to incorporate into predictive models due to its lack
of structure. To overcome this challenge, latent variable models such as topic models can be used to
infer intermediary representations that can be used as structured features for a prediction task.
In this project, you aim to demonstrate the value of incorporating information from clinical notes, via
latent topic features, in predicting discharge decision after the hospitalization.
Dataset
A2 dataset consists of clinical notes along with some structured data. It uses hospitalization data, which
includes electronic medical records (EMRs) for patients. It includes patients’ information and their
health metrics along with clinical notes. In this data, discharge decision after the hospitalization
determines which patients died in hospital, required extended care etc.
Advanced (Business) Data Analytics – 2023 S1 – Assignment 2
3
Your task is to
prepare text,
generate text features,
apply topic modeling & generate topic profiles, and
develop a predictive model & evaluate it.
You will need to analyze clinical notes, extract their features and then develop and evaluate predictive
model(s).
Note: In your final notebook, you should only use one classification technique (e.g., SVM)
along with 6-folds cross validation to show that the extracted features can predict with target
variable.
Deliverables
A notebook template is provided to show how you can structure your work. You need to use the template
and strictly follow its format which is designed based on the provided A2 rubric.
It is useful to add some comments next to your codes to explain it briefly. Using text analytics can be
challenging, and you will need to do your own research. Your reports should be delivered in the ipynb
file.
You will get higher marks if your approach is innovative. For an innovative method, you need to
customize it in the provided context, and elaborate on them based on the context too. Usually a novel
method is unique and no other, or few others, have used it with some differences. It is highly advised
that you do not share your creative work with anyone else. You can still discuss ideas and help each
other.
Submission
To be done through Blackboard Assignment Submission, as indicated in Learn.UQ. The only acceptable
submission format is .ipynb file. The file should be named in the format of YourStudentID.ipynb
You need to submit only one ipynb file. Before submission, make sure that all the important outputs are
shown in your notebook. Avoid showing trivial outputs such as df1, df2, etc, in the notebook. So make
sure to remove codes such as head() before submission.
Note: Your marker will first look at your generated output as a reference without running your notebook.
So your significant outputs should be generated and the elaboration should be provided in the notebook,
as shown in the template.
Advanced (Business) Data Analytics – 2023 S1 – Assignment 2
4
Then your marker will use “Restart & Run All” option from Kernel tab. If there is an error in running
your notebook, you will not receive any mark for all the parts after the cell that returns the error.