辅导 program、讲解 R设计程序
Statistics and Data Analysis for Bioinformatics Assessment Information
Summative Assessment
The course will be assessed by in-course assessment consisting of 2 components: a MCQ quiz (20%) and a class report (80%).
MCQ test (20%) 22/10/2024
The multiple choice question (MCQ) test will take place online within Moodle. This will assess the all course content and intended learning outcomes. You will have the opportunity to undertake a practice MCQ test in preparation.
Report (80%): due 30/10/2024
Your written report aligns to the course ILOs, particularly “Present statistical and graphical results from analyses of bioinformatic data within a written scientific report”. Using the dataset provided as your data source answer a research question set out in the “Report assignment details” appendix below. You will be expected to demonstrate your learning of course materials, i.e. include data visualisations and statistics relevant to answer your research question. Your report should be of similar format to a scientific paper of 1000-1500 words (+33% allowance, excludes figures and references) and include the following sections:
Introduction: A brief introduction to the biological context of your research question, drawing clear aims and hypotheses. Utilise the scientific literature to justify your research question.
Methods: Your statistical approach (no need to detail how data was collected unless of particular use). Methods should balance being concise with sufficient information to repeat/understand analyses carried out.
Results: Likely the largest section of your report, where you present visual and statistical results to help answer your research question. You will gain practice at reporting results throughout the course (practicals/lectures/papers).
Discussion: Briefly place your findings in the wider biological context and critique the chosen dataset.
References: Consistent formatting
Appendix: Can be useful for supplementary information that is important but not key to answer research question, e.g. model diagnostics.
You should also submit your annotated R script, in a .R format. Your R Script is not marked but can provide proof/support of your work.
Formative Assessment and Feedback
Formative individual feedback will be provided to students throughout each computer laboratory practical session (asking questions and checking practical answers is encouraged), drop-in sessions and the Q&A forum. Generic and individual written feedback will be provided for both in-course summative assessments.
Appendix: Report assignment details
Background
In this assignment you will explore a new dataset. This dataset is whole blood, from humans who are either healthy (Healthy), have Gout (Gout) or have Septic Arthritis (SA).
In Gout build-up of uric acid crystals triggers an immune response in patients’ joints. The immune system mistakenly thinks it’s a bacterial infection. It’s extremely painful, and can damage joints over time, but is not harmful other than that.
In SA there is a real bacterial infection in the joint, which quickly spreads to the blood. This is fatal in a few days if not diagnosed and treated.
Our issue is that Gout and SA present clinically in a very similar way, and the bodies’ reaction is also very similar. This makes it hard to diagnose one from another. We want to know if this similarity extends to the transcriptional level in blood. We might then be able to use blood to diagnose SA from Gout.
Dataset
Our dataset is RNA-seq. We have three groups – Healthy, Gout and SA, and 14 samples of each. We provide:
The table of expression values by Gene (row) and Sample (column)
A sample information sheet, listing important clinical information about each sample
An annotation file – linking gene ID to gene name.
Two differential files (log2fold, p, adjusted p), comparing Healthy to Gout and Healthy to SA.
Task
Your task is provide some insight to our main research question (see background). But there are additional aims.
Are our groups well matched clinically? I.e. what will a table of summary statistics for the clinical information show? What do p-values between the groups at each clinical measurement show?
What are the most significantly differential genes between HC and Gout and HC and SA? What distributions do they show when we plot them? Are the genes similar?
Are these genes affected by any of the clinical measurements, such as Age, Sex, Neutrophils, Monocytes?
Are there any genes that are significantly different between Gout and SA. If so, what are they? What do they look like when plotted?
Report
You are to provide a report of 1,000 – 1,500 words in length. It must include tables and plots you feel appropriate. The report should have a brief introduction, methods, a results section, and a discussion / conclusions. All analysis should be performed in R. You must include your R script with the report.
Hints and Tips
Your report should not (and shouldn’t) be exhaustive. Just do the best with the time that you have. Prioritise things that you think are important. We are not looking for an exact “right” answer. We are looking for how you approach the problem.
Obviously, your report should demonstrate your understanding of the material covered in the course. You don’t have to use every test and method you have learnt. Just what you believe the most appropriate.