首页 > > 详细

讲解 Exercise 1: SLR Analytics & Assessment辅导 留学生Matlab程序

Exercise 1:  SLR Analytics & Assessment

This exercise is designed to give you practice running simple linear regression (SLR) models (econometric models with a single explanatory, or independent/RHS, variable).  In the first SLR analysis, you’ll be looking at the relationship between the SAT's component ERW scores (Evidence-Based Reading and Writing) and overall SAT scores. In the subsequent SLR analyses you’ll be looking first at the relationship between weekly Spotify streams and iTunes sales in the US, and then at the relationship between annual %wins and runs scored (RS) and runs allowed (RA) in the Korean Baseball Organization. The data are as current as possible.

Predicting SAT Scores

In your first SLR analysis, you’ll be using SAT Evidence- Based Reading and Writing (ERW) scores to predict combined SAT scores.  The data in this exercise are built from summary statistics for the 2024 SAT results, for college-bound students. Links to the data sources have been posted.


1.   Login to the Exercise #1 Answers/Updates page and copy and paste your SAT dataset into Excel. You will be using this data to estimate the relationship between the 2024 SAT's

Evidence-Based Reading and Writing scores (erw) and overall SAT scores (sat), with a sample of five observations.

2. Working in Excel:  Let’s look at the relationship between erw and sat in your sample:

a.   Plot the data points using Excel's XY scatter chart … .  putting erw on the horizontal (x) axis and sat (y) scores on the vertical axis.

Estimated coefficients: β and β

b.   Use Excel‘s “add trendline” feature (right click on a data point in the scatterplot to see  this option) to add a linear trendline to your XY scatter plot.  Be sure to specify (under “Options”) that the equation and R2 be displayed along with the trendline.  Record the trendline slope, intercept and R2 on the Answers/Updates page.

i.    Take a snapshot of your figure with the trendline results, save it as a jpg, png or pdf file and upload that file to the Exercise #1 Canvas Quiz Answers (it is not a Quiz, of course, but it is a convenient way to submit your snapshot).

c.   In class we derived the OLS formulas for the SLR intercept and slope estimates:

Working towards these coefficients and continuing in Excel, compute the sample statistics used in these formulas (x , y , Sx , Sxx , Sy , Syy , Sxy and rxy) and record those on the Answers/Updates page.

d.   Use the sample statistics that you just derived to compute the SLR intercept and slope

estimates, and record your slope and intercept coefficients on the Answers/Updates page. Do you get the same estimates of the slope and intercept that you saw with the trendline   equation?  Record your Yes/No answer on the Answers/Updates page.

Use the SLR slope and intercept coefficients that you just derived to generate predicted SAT scores ( i =  + xi ) and associated residuals (actuals – predicteds: i = yi i ), for each observation.

e.   Compute the means and variances of the predicteds  (the i ' s ) and the residuals (the i ' s ) and record those figures on the Answers/Updates page.

i.   You should have found that the mean of the predicteds is the mean of the actuals, y , and that the mean of the residuals is 0.  Did you?  Record your answer.

Goodness-of-Fit: MSE, RMSE andR2

f.   Working with the residuals that you computed, derive the SSR (Sum Squared Residuals)

and calculate the MSE (Mean Squared Error):   n − 2/SSR .  Take the square root of this to

calculate the RootMSE (RMSE).  Record your answers on the Answers/Updates page.

g.   Use the variance of the actuals Syy to compute SST (Sum Squared Totals), and record your answer on the Answers/Updates page.

i.    Recall that since SST, the sum squared deviations of the

actual values from their mean, is related to the variance of the y's:

h.   Use the calculated SSR and SST to calculate and verify that this agrees with the ratio of the variances of the predicteds and actuals:

Record your answers on the Answers/Updates page.

OK, let's move to Stata!

3.   Import your data into Stata (use the File\Import command, or just copy and paste your data   from Excel into Stata) and use Stata to run the regression of sat on erw (so: reg sat erw). Record your answers.

a.   You should get the same estimated intercept and slope coefficients, R2, MSE, and RMSE as above (with your work in Excel).  Do you?   Record your answer.

b.   Take a snapshot of your Stata regression results, save it as a whatever file and upload that file to the Exercise #1 Canvas Quiz Answers.

4.   Continue working in Stata, and with the sat-erw regression:

a.   After running your SLR model, use Stata’s predict command to generate the predicted

values (the i 's) and the residuals (the i = yi i 's). Your Stata code might look something like:

reg sat erw

predict yhat

predict uhat, residual

b.   Use Stata to calculate the sample correlation of the actuals (the sati ’s, the yi ' s ) with the explanatory variable (erw, the xi ' s ), rxy , and the sample correlation of the predicteds (the i ' s ) with the actuals, ry .  Record your answer on the Answers/Updates page.

i.    You can do this with a single Stata correlation command:   corr sat erw yhat

ii.  You should have found that those two correlations were the same.  Did you?  Record your answer on the Answers/Updates page.

c.   Square these sample correlations and verify that R2  = rxy(2) = r … so now you know why

R2   is called R2  !  Record your verification on the Answers/Updates page.

d.   Finally, verify that the sample correlation of the predicteds (the i ' s ) and the residuals (the i ' s ) is zero: r = 0 .  Record your verification on the Answers/Updates page.

Meaningfulness (Economic Significance)

5.   Continue working in Stata, and with the sat-erw regression:

a.   Beta Regression

i.   Use the , beta command to run a beta regression of sat on erw, and record the estimated intercept and slope coefficients on the Answers/Updates page.

1.   Stata syntax:  reg sat erw, beta

ii.  You should find that the estimated beta regression slope coefficient is also the

correlation between sat and erw, which you generated in above.  Do those two values in fact agree?  Record your answer on the Answers/Updates page.

iii. Would you say that this (the estimated slope coefficient) suggests a meaningful (non- trivial) relationship between sat and erw?  Explain why you say what you say.

Hint:  Meaningfulness is very much in the eye of the beholder... it's a judgement call. But don’t ignore the issue!:  You've spent a look of time looking at correlation

coefficients in Stats and elsewhere, no doubt.  Use that experience!

b. Elasticity @ the means

i.   Using the sample statistics that you've already computed and the SLR estimated

coefficients, predict the SAT score at the mean of erw scores (evaluate the SRF at the mean of erw).  Record your answer on the Answers/Updates page.

ii.  Your answer should agree with the mean sat score, since   = y x ,   + x = y .

Does it?  Record your answer on the Answers/Updates page.

1.   This reflects the fact that the SRF passes through the sample means.

iii. Calculate the point elasticity of predicted SAT scores wrt (with respect to) changes in the erw score, evaluated at the means.  Record your answer on the Answers/Updates page.

(Recall that if the SRF is   =  + x , then  = and elasticity =   =  .

When this elasticity is evaluated at the means, it is x .)

iv. Would you say that this estimated elasticity (at the means) suggests a meaningful

(non-trivial) relationship?  Explain why you say what you say.  Record your answer on the Answers/Updates page.

1.   Hint 1:  As mentioned above, meaningfulness is very much in the eye of the beholder... it's a judgement call.  But don’t ignore the issue!

2.   Hint 2:  Most would agree that elasticities larger than 0.4? 0.5? or so in magnitude are large and meaningful (economically significant), and that elasticities less than  0.1 or even 0.05, in magnitude are small, and not so meaningful or economically   significant.

3.   Hint 3:  But where do you draw the line in-between?  I have two answers:

a.   Do you have to draw a line?  (See what the magnitudes are before you worry about this.), and

b.   Do you really have to draw a line?  I tend to say < .1 is not so meaningful,

above .3 is, and between 0.1 and 0.3, who knows?  But that's just my opinion!

6. Skip for now: Return to your erw/sat Excel work.  For each observation, compute the

square of the x-distances from the mean and show that the OLS slope estimate is indeed a

weighted average of the slopes of the lines joining each datapoint to the sample means point, where the weights are proportional to the square of the x-distances from the x-mean.

You should have found that the OLS slope estimate is indeed a weighted average of the

slopes of the lines joining each datapoint to the sample means point.  Did you?  Record your Answer.

Take a snapshot of your Excel worksheet results and upload that to the Exercise #1 Canvas dropbox.

a.   Hint:  This is discussed early in the OLS/SLR Analytics handout and slideshow … where I walk you through the calculation for a sample of four observations from the bodyfat dataset.

For the remainder of this Exercise, you'll be repeating 2. (Excel)  and 3. (Stata) above, working  with different datasets. Warning: You’ll need to spend some time building these datasets.  But that time will be time well spent!  At some point we’ll review in class how to merge datasets in  Excel and in Stata … which will be the most challenging part of constructing your final datasets. I have posted a handout with some tips to Canvas.




联系我们
  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-21:00
  • 微信:codinghelp
热点标签

联系我们 - QQ: 99515681 微信:codinghelp
程序辅导网!