Assignment Case: German credit

Assignment

Case: German credit

The German Credit data set contains observations on 30 variables for 1000 past applicants for

credit. Each applicant was rated as “good credit” (700 cases) or “bad credit” (300 cases).

New applicants for credit can also be evaluated on these 30 "predictor" variables. We want to

develop a credit scoring rule that can be used to determine if a new applicant is a good credit risk

or a bad credit risk, based on values for one or more of the predictor variables. The data has been

organized in the spreadsheet GermanCredit.xlsx. All the variables are explained in ‘Codelist’

worksheet of the data file.

The consequences of misclassification have been assessed as follows: the costs of a false

negative (incorrectly saying an applicant is a good credit risk) outweigh the cost of a false

positive (incorrectly saying an applicant is a bad credit risk) by a factor of five. This can be

summarized in the following Table 1.

Table 1 Opportunity Cost

Predicted (Decision)

Actual Good (Accept) Bad (Reject)

Good 0 $100

Bad $500 0

The opportunity cost table was derived from the average net profit per loan as shown below:

Table 2 Average Net Profit

Predicted (Decision)

Actual Good (Accept) Bad (Reject)

Good $100 0

Bad $-500 0

Let us use this table in assessing the performance of a logistic regression model because it is

simpler to explain to decision-makers who are used to thinking of their decision in terms of net

profits.

1. Review the predictor variables and guess from their definition at what their role might be in a

credit decision. Are there any surprises in the data?

2. Divide the data randomly into training (60%) and test (40%) partitions, and develop a

classification model using the logistic regression technique in Python and evaluate the model by

using the confusion matrix and the ROC curve.

3. Based on the confusion matrix and the payoff matrix, what is the net profit on the test data?

4. Let's see if we can improve our performance by changing the cutoff. Rather than accepting the

above classification of everyone's credit status, let's use the "predicted probability of finding a

good applicant" in logistic regression as a basis for selecting the best credit risks first, followed

by poorer risk applicants.

a. Sort the validation data on "predicted probability of finding a good applicant."

b. For each test case, calculate the actual cost/gain of extending credit.

c. Add another column for cumulative net profit.

d. How far into the test data do you go to get maximum net profit? (Often this is

specified as a percentile or rounded to deciles.)

e. If this logistic regression model is scored to future applicants, what "probability of

success" cutoff should be used in extending credit?

Submission Guidelines

Please submit your work via Blackboard by the deadline. When you submit via Blackboard, you

should submit your Jupyter notebook with codes and comments.

联系我们

QQ：99515681
邮箱：99515681@qq.com
工作时间：8:00-21:00
微信：codinghelp

热点文章

mgt202辅导、讲解 java/pytho... 2025-06-28
讲解 pbt205—project-based l... 2025-06-28
辅导 comp3702 artificial int... 2025-06-28
辅导 cs3214 fall 2022 projec... 2025-06-28
辅导 turnitin assignment讲解... 2025-06-28
辅导 finite element modellin... 2025-06-28
讲解 stat3600 linear statist... 2025-06-28
辅导 problem set #3讲解 matl... 2025-06-28
讲解 elen90066 embedded syst... 2025-06-28
讲解 automatic counting of d... 2025-06-28
讲解 ct60a9602 functional pr... 2025-06-28
辅导 stat3600 linear statist... 2025-06-28
辅导 csci 1110: assignment 2... 2025-06-28
辅导 geography调试r语言 2025-06-28
辅导 introduction to informa... 2025-06-28
辅导 envir 100: introduction... 2025-06-28
辅导 assessment 3 - individu... 2025-06-28
讲解 laboratory 1讲解留学生... 2025-06-28
辅导 ct60a9600 renewable ene... 2025-06-28
辅导 economics 140a homework... 2025-06-28

热点标签

msinm014/msing014/msing014b

联系我们 - QQ: 99515681 微信：codinghelp

程序辅导网！