讲解2 2020讲解留学生Python语言

Actuarial Data and Analysis, T2 2020

Assignment Part A

Due time: Week 5 Wednesday, 1 July 2020, 11.55 am (sharp)

1 Skills developed

This assignment provides you with an opportunity to get familiar with the given datasets before applying

modeling techniques you are learning in the course lectures to a business task involving data. In addition,

your skills in understanding/applying data manipulation and analysis methods (from the course materials

and any additional reference material you consider) will be developed via this assignment. Communication

of the results of your investigations and analysis is also an important skill developed.

2 Task

You are a fresh actuarial graduate who has just joined the US Medicare Fraud Department as an analyst.

Your team is in charge of analyzing Medicare data for detecting Medicare frauds made by the providers.

Your manager has currently tasked you with providing a preliminary report on the attached datasets for you

to be familiar with the data and the Medicare provider characteristics, and get ready for further analysis.

Your main tasks involve data manipulation and analysis, as well as a report and a recommendation for

further analysis (i.e. modeling).

Note that all relevant steps in the data manipulation as well as data analysis results should be included in

the report or appendix.

3 Additional information and mark allocation

3.1 Data manipulation and analysis (17 marks)

For the data you have, you should manipulate the data to prepare for data analysis. This includes (but is

not limited to): data exploration, data cleaning (if necessary), combining all the datasets and aggregating

the data per provider (see the Resources section for documentation).

The analysis of the data should provide a good sense of the datasets, insights on beneficiary, claim and

provider characteristics as well as providing drive for further analysis. You may find interesting insights by

analysing both the combined and aggregate datasets.

This task does not consist of modeling but you should keep in mind that the question your team will

ultimately be looking at is which providers are likely to have fraudulent claims.

See the section on data for details.

Mark allocation for the assignment can be found in the rubrics (on the course Moodle webpage).

3.2 Presentation Format (3 marks)

Communication of quantitative results in a concise and easy-to-read manner is a skill that is vital in practice.

As such, marks will be given for the presentation of your results. In order to maximize your marks for

presentation you may wish to consider issues such as: table size/readability, figure axes/formatting, ease

of reading, grammar/spelling, and report structure. You may also wish to consider the use of executive

summaries and appendixes, where appropriate. Provide sufficient details to the reader so that they can

judge what you are doing, using appendices for non-essential but useful results for the report as necessary.

Note that sufficient detail must be provided (in either the report body and/or appendices) so that the

reviewer can follow all the steps and derivations required in your work.

Note that a maximum page limit of 2 pages (excluding tables and graphs) is applicable to the main body

of the report.1 You should also consider the rubric for the presentation component (on the course Moodle

webpage). There is no limit to the length of the appendix.

3.3 Software

You may choose which software package to use (e.g. R, Python or other), however, nearly every function you

will be required to use for this task is available in R. Note also that code enabling you to perform most of

the computing can be found in the learning activities of the course and the Resources section. Note that

any assumptions must be clearly identified and justified (if used).

4 Data

The data is related to US Medicare claims and beneficiary details of 4436 providers from 2008 to 2009 and

consists of 4 datasets:

1. Medicare_Provider.csv

2. Medicare_Inpatient.csv

3. Medicare_Outpatient.csv

4. Medicare_Beneficiary.csv

Similar (but not identical) datasets are provided here. You may wish to check that webpage for further

information about the context, data and problem.2

You may also wish to have a look at the following explanatory data analysis based on the Kaggle datasets

to give you an idea of why and how to start the data analysis: Healthcare Fraud Detection With Python:

The importance of exploratory data analysis (weblink here). This data analysis is just a brief example and

is not based on your datasets. Different and more variables may be of interest for your analysis.

4.1 Medicare_Provider.csv (Provider Data)

This dataset provides the provider ID and if yes or no they are fraudulent providers.

Variable Description

ProviderID: A unique ID assigned to each provider (character)

Fraud: Is fraudulent? (categorical: “no”,“yes”)

1Please kindly note that this is a maximum - you should feel free to use less pages if it is sufficient!

2Optional readings for extra information and context on Medicare Fraud in US can be found here: link 1 and link 2.

4.2 Medicare_Inpatient.csv (Inpatient Data)

This dataset provides insights about the claims filed for those patients who are admitted to hospital. It also

provides additional details about the admission, discharge dates and diagnosis code.

Variable Description

BeneID: A unique ID assigned to each beneficiary (chr)

ClaimID: A unique ID assigned to each claim (chr)

ClaimStartDt: Start date of the claim (date)

ClaimEndDt: End date of the claim (date)

InscClaimAmtReimbursed: Claim amount reimbursed (num)

AttendingPhysician: Attending physician (chr)

OperatingPhysician: Operating physician (chr)

OtherPhysician: Other physician (chr)

AdmissionDt: Admission date (date)

ClmAdmitDiagnosisCode: Claim admission diagnosis code (chr)

DeductibleAmtPaid: Deductible amount paid (num)

DischargeDt: Discharge date (date)

DiagnosisGroupCode: Diagnosis group code (chr)

ClmDiagnosisCode_1: Claim diagnosis code 1 (chr)

ClmProcedureCode_1: Claim procedure code 1 (num)

ProviderID: A unique ID assigned to each provider (chr)

Important remark: Variables ClmAdmitDiagnosisCode, DiagnosisGroupCode, ClmDiagnosisCode_1 and

ClmProcedureCode_1 correspond to specific international or national codifications.3 You don’t need to know

or understand the details of the meaning of the codification. You can treat those variables as categorical

and investigate only the most significant levels.

• ClmAdmitDiagnosisCode represents the diagnosis code on the institutional encounter indicating the

beneficiary’s initial diagnosis at admission. This diagnosis code may not be confirmed after the patient

is evaluated; it may be different than the eventual diagnoses.

• DiagnosisGroupCode represents the diagnostic group to which a hospital claim belongs. It is a unique

identifier of a hospital case type that is based on similar clinical problems.

• ClmDiagnosisCode_1 represents the diagnosis code in the 1st position identifying the condition(s) for

which the beneficiary is receiving care.

• ClmProcedureCode_1 indicates the principal procedure performed during the period covered by the

institutional claim.

4.3 Medicare_Outpatient.csv (Outpatient Data)

This dataset provides details about the claims filed for those patients who visited hospitals as outpatients.

Variable Description

BeneID: A unique ID assigned to each beneficiary (chr)

ClaimID: A unique ID assigned to each claim (chr)

ClaimStartDt: Start date of the claim (date)

ClaimEndDt: End date of the claim (date)

InscClaimAmtReimbursed: Claim amount reimbursed (num)

AttendingPhysician: Attending physician (chr)

3Reference: Research Data Assistance Center, weblink here.

Variable Description

OperatingPhysician: Operating physician (chr)

OtherPhysician: Other physician (chr)

ClmDiagnosisCode_1: Claim diagnosis code 1 (chr)

ClmProcedureCode_1: Claim procedure code 1 (num)

DeductibleAmtPaid: Deductible amount paid (num)

ClmAdmitDiagnosisCode: Claim admission diagnosis code (chr)

ProviderID: A unique ID assigned to each provider (chr)

4.4 Medicare_Beneficiary.csv (Beneficiary Details Data)

This dataset contains beneficiary individual details (e.g. date of birth, date of death, health conditions, state,

etc).

Variable Description

BeneID: A unique ID assigned to each beneficiary (chr)

DOB: Date of birth (date)

DOD: Date of death (date)

Gender: Gender 1 or 2 (categorical)

Race: Race 1 to 5 (categorical)

RenalDiseaseIndicator: Renal disease indicator “0” (No) or “Y” (Yes) (chr)

State: US state number (num)

County: County (num)

NoOfMonths_PartACov: Number of months Medicare Part A covered (num)

NoOfMonths_PartBCov: Number of months Medicare Part B covered (num)

ChronicCond_Alzheimer: Chronic condition Alzheimer 1 (Yes) or 2 (No) (num)

ChronicCond_Heartfailure: Chronic condition Heart failure 1 (Yes) or 2 (No) (num)

ChronicCond_KidneyDisease: Chronic condition Kidney Disease 1 (Yes) or 2 (No) (num)

ChronicCond_Cancer: Chronic condition Cancer 1 (Yes) or 2 (No) (num)

ChronicCond_ObstrPulmonary: Chronic condition Obstructive Pulmonary 1 (Yes) or 2 (No) (num)

ChronicCond_Depression: Chronic condition Depression 1 (Yes) or 2 (No) (num)

ChronicCond_Diabetes: Chronic condition Diabetes 1 (Yes) or 2 (No) (num)

ChronicCond_IschemicHeart: Chronic condition Ischemic Heart 1 (Yes) or 2 (No) (num)

ChronicCond_Osteoporasis: Chronic condition Osteoporasis 1 (Yes) or 2 (No) (num)

ChronicCond_rheumatoidarthritis: Chronic condition rheumatoidarthritis 1 (Yes) or 2 (No) (num)

ChronicCond_stroke: Chronic condition stroke 1 (Yes) or 2 (No) (num)

IPAnnualReimbursementAmt: Inpatient annual reimbursement amount (num)

IPAnnualDeductibleAmt: Inpatient annual deductible amount (num)

OPAnnualReimbursementAmt: Oupatient annual reimbursement amount (num)

OPAnnualDeductibleAmt: Outpatient annual deductible (num)

5 Resources

• Data manipulation with R: dplyr (weblink here)

• Merging with R (weblink here)

• Tidy data in R (weblink here)

• Explanatory Data Analysis with R (weblink here)

• Data visualistion in R with ggplot2 for fancy plots (weblink here)

• For any code related question google.com or stackoverflow.com are pretty helpful!

• As usual you can ask your questions on the course Ed forum.

6 Assignment submission procedure

6.1 Turnitin submission

Your assignment report must be uploaded as a unique document and all parts must be in portrait

format. As long as the due date is still future, you can resubmit your work; the previous version of your

assignment will be replaced by the new version.

Assignments must be submitted via the Turnitin submission box that is available on the course Moodle

website. Turnitin reports on any similarities between your cohort’s assignments, and also with regard to

other sources (such as the internet or all assignments submitted all around the world via Turnitin). More

information is available at: [click]. Please read this page, as we will assume that you are familiar with its

content. You can also find on the Moodle webpage the Turnitin Similarity Report Interpretation Guide

(2019).

Please also submit any programming code used in your analysis as a separate file in the dedicated

“Code only” Moodle assignment box on the course webpage. These will be referred to by the marker only if

needed, and in particular the report (with appendix) should be self-contained.

You need to check your document once it is submitted (check it on-screen). We will not mark assignments

that cannot be read on screen.

Students are reminded of the risk that technical issues may delay or even prevent their submission (such

as internet connection and/or computer breakdowns). Students should allow enough time (at least 24

hours is recommended) between their submission and the due time. The Turnitin module will not

let you submit a late report. No paper copy will be either accepted or graded.

6.2 Late submission

Please note that it is School policy that late submission of assignments will incur in a penalty.

A penalty of 25% of the mark the student would otherwise have obtained, for each full (or part) day of

lateness (e.g., 0 day 1 minute = 25% penalty, 2 days 21 hours = 75% penalty). Students who are late

must submit their assignment to the LIC via e-mail. The LIC will then upload documents to the relevant

submission boxes. The date and time of reception of the e-mail determines the submission time for the

purposes of calculating the penalty.

More information on Late submissions, extensions and special consideration is available in the Moodle course

webpage section Additional resources from UNSW (at the bottom).

6.3 Plagiarism awareness

Students are reminded that the work they submit must be their own. While we have no problem with

students working together on the assignment problems, the material students submit for assessment must

be their own.

Students should make sure they understand what plagiarism is—cases of plagiarism have a very high prob-

ability of being discovered. For issues of collective work, having different persons marking the assignment

does not decrease this probability.

More information on Academic integrity and plagiarism is available in the Moodle course webpage section

Additional resources from UNSW (at the bottom).

联系我们

QQ：99515681
邮箱：99515681@qq.com
工作时间：8:00-21:00
微信：codinghelp

热点文章

讲解 dts206tc applied linear... 2024-04-17
辅导 apstat.ge.2110 ap plied... 2024-04-17
辅导 econ 1103 002 & 013... 2024-04-17
辅导 busness 114 – accounti... 2024-04-17
讲解 rest0004: property inve... 2024-04-17
讲解 swen30006 software mode... 2024-04-17
讲解 interconnected worlds: ... 2024-04-17
讲解 problem set #1 – econ ... 2024-04-17
讲解 ad 685 term project辅导... 2024-04-17
辅导 math 194: problem set #... 2024-04-17
辅导 ceg 5301 assignment 5 a... 2024-04-17
辅导 chin0085 final exam que... 2024-04-17
辅导 mth 214 life insurance ... 2024-04-17
辅导 comm1110 evidence-based... 2024-04-17
讲解 cen103 – solids and st... 2024-04-17
辅导 stats 320 – applied st... 2024-04-17
辅导 ac1103 – accounting i ... 2024-04-17
讲解 fit3175 usability - s1 ... 2024-04-17
辅导 bx2031 sp51 2024 invest... 2024-04-17
讲解 cs 20a: data structures... 2024-04-17

热点标签

mat187-written-homework

联系我们 - QQ: 99515681 微信：codinghelp

程序辅导网！