首页 > > 详细

data留学生讲解、R程序语言调试、program辅导 讲解数据库SQL|解析Java程序

Points available: 170
This assignment uses glm() (logistic regression) and ctree(), each for 2 category
classification. Data file “churn_mod.txt” is made available through
UTDbox>BUAN6356_2020Summer>data.
You will need the “partykit” package for this assignment. The “data.table”
package is suggested but not required. Do not use any other “require()” or
“library()” statement in your code. Use of the install.packages() command in the
code you submit will result in a score of zero.
If multiple submit instances for a single student are waiting to be graded, only the
last (most recent) will be run and graded.
The first commands of your submitted code MUST be:
setwd(“c:/data/BUAN6356/HW_5”); source(“prep.txt”, echo=T)
and the last command of your submitted code MUST be:
source(“validate.txt”, echo=T)
Be careful with the quote characters as they must ALL be the same at the
beginning and end of a string.(Use the single or double quote character from the
key next to “Enter”.) Inclusion of these lines is required BEFORE your code will
be tested.
Submit the code to eLearning as an ASCII file which can be copied directly into R.
You may submit this assignment as many times as needed until you get full credit.
At that point you should stop since only the last score counts.
Background: Your task in this assignment is to explore 3 strategies for use in 2
category classification: logistic regression via glm() using AIC minimization,
logistic regression via glm() using predictor elimination through Coefficient
Estimate t-values, and ctree() from “partykit”. You will want to assess the final
models in each strategy with a single 10% testing sample from the original
“churn_mod” dataset. You will use 379546790 as the RNG seed. Your objective
is to classify customers by “churn” (canceled subscription, labeled 1, or continued
subscription, labeled 0) and to be ready to assess these 3 strategies through the
Expected Bayes Risk associated with each strategy together with their individual
overall Accuracy. You should be ready to perform these assessments for both the
training and testing data. The variable “ID” should be excluded from the analysis.
Deliverables (all names case as shown) :
1. seed (vector) Random Number Seed value
2. tstPc (vector) Proportion of sample in testing set (single value)
3. raw (data.frame) data as read
4. wk (data.frame) data prepared for analysis
5. nTst (vector) Number of obs in the testing set (single value).
6. tstIdx (vector) Index values for testing set obs
7. m0 NULL model (logistic regression) for training set (“em-zero”)
8. m1 Baseline logistic regression model for training set (“em-one”)
9. m1s AIC minimization strategy final model
10.m1t t-statistic strategy final model.
11.pred1s Predicted values from test set using final AIC model.
12.pred1t Predicted values from test set using t-statistic final model.
13.class1s Classification results from test set using final AIC model.
14.class1t Classification results from test set using t-statistic final model.
15.tree ctree() model for training data
16.predCt Predicted values for test set using ctree() model.
17.classCt Classifiation results from test set using ctree() model.

联系我们
  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-21:00
  • 微信:codinghelp
热点标签

联系我们 - QQ: 99515681 微信:codinghelp
程序辅导网!