ETB1100
Assignment: A Regression Analysis
The Value of Linear Relationships for Decision Making in Business
Learning Objectives (LO):
LO1: Understand how to use Excel to draw a random sample of data
LO2: Develop a simple linear regression model using EXCEL
LO3: Understand simple linear regression analysis including assessing the validity of the model and interpreting the findings.
LO4: Develop the ability to analyze and interpret multiple linear regression models using ChatGPT as a collaborative tool for extending statistical knowledge.
LO5: Describe the business implications of your multiple linear regression analysis.
Submission Details:
• This assignment is marked out of 72 and worth 10% of the assessment for this unit.
• It is designed to test learning Objective 4: “Interpret and evaluate relationships between variables for business decision-making, using the concept of correlation and simple linear regression. ”
Due Date: 11:55pm, Sunday 13th October, 2024.
|
• You must submit your completed assignment (including the Assignment Coversheet, correctly filled in AND signed), on-line via the Moodle site for this unit.
• Name the soft copy of your assignment as follows:
Student ID_Surname_Initial.doc
(this should include all tables, charts, exhibits etc produced using EXCEL)
• DO NOT submit any EXCEL files (You should have already copied any relevant EXCEL output and pasted it into your Word document).
• SUBMIT ONLY ONE FILE.
• Upload this file on Moodle any time PRIOR to the deadline. (After this time, the upload link will be closed).
• You will find the upload link in the ASSESSMENTS section on Moodle.
• Click on the “ Click Here to Upload Assignment” link to upload.
• Once you have uploaded and saved, the following message will appear momentarily, “File uploaded successfully.”
To confirm your upload was successful, you will then see your uploaded file’s name.
• A penalty of up to 5% of the marks earned may apply for each day an assignment is late unless an extension of time has been sought. Extensions will only be granted for substantive reasons at the discretion of the Chief Examiner and must be applied for before the assignment is due.
• Please retain your own copy of the assignment until after the publication of final results for this unit.
Beyond the Haze: Decoding the True Impacts of Vaping
Using Regression Analysis to Investigate the Consequences of Vaping.
https://theconversation.com/vaping-now-more-common-than-smoking-among- young-people-and-the-risks-go-beyond-lung-and-brain-damage-223125
The Assignment Brief:
With the increasing popularity of vaping comes the need for a critical review of the real health consequences hidden behind the enticing clouds of vapour. To this end, you have been provided with some relevant data and are tasked with analysing it using correlation and regression analysis.
There are SIX parts to this analysis:
1. Background
(AI and Generative AI tools are required to be used in this Part).
2. Sample Acquisition [LO1]
(AI and Generative AI tools must NOT be used in this Part because it requires students to demonstrate human knowledge and skill in using EXCEL).
3. Model Development - Correlation Analysis & Simple Linear Regression [LO2]
(AI and Generative AI tools must NOT be used in this Part because it requires students to demonstrate human knowledge and skill in using EXCEL).
4. Model Validation and Interpretation-Simple Linear Regression [LO3]
(AI and Generative AI tools must NOT be used in this Part because it requires students to demonstrate human knowledge and skill in using EXCEL).
5. Analysis Extension to Multiple Linear Regression [LO4]
(AI and Generative AI tools may be used selectively within this Part as per explanation provided).
6. Conclusions and Business Implications [LO5]
(AI and Generative AI tools may be used selectively within this Part as per explanation provided).
The data comprises 100 observations across six variables that provide a comprehensive view of individual vaping habits and their potential impacts on lung health in Australia.
The variables are labelled
Lung Function Score (Y):
• Definition: A numerical score ranging from about 40 – 150 that represents the lung health of the individual, with higher scores indicating poorer lung function.
• Unit: Score (no specific unit, higher scores indicate worse health)
Scores closer to 40 suggest better lung health, scores approaching or exceeding 100 suggest significantly impaired lung function)
Nicotine Concentration (mg/mL) (X1):
• Definition: The concentration of nicotine in the e-liquid used in the vaping device.
• Unit: Milligrams per millilitre (mg/mL)
Years of Vaping (X2):
• Definition: The total number of years the individual has been using vaping products.
• Unit: Years
Daily Usage Frequency (X3):
• Definition: The average number of times per day the individual uses vaping products.
• Unit: Times per day
Number of Flavors Used Regularly (X4):
• Definition: The number of different flavours of e-liquid that the individual uses on a regular basis.
• Unit: Count (no specific unit)
Age of Vaping Initiation (X5):
• Definition: The age at which the individual first started using vaping products.
• Unit: Years
and can be found in the data file labelled Vaping Health Impact.xlsx under the ASSIGNMENTS heading in the ASSESSMENTS section on Moodle, and is to be used to answer the questions listed here.
Assume that the population from which this data was drawn, was approximately normally distributed.
NOTE: All relevant EXCEL output must be copied and pasted into a single document (.docx) for submission.
|
DATA PREPARATION AND 6EXCEL HYGIENE,.
In all lecture examples involving the use of EXCEL, as well as the solutions to tutorial questions, I have been very particular about how to format the data, clean up the output (e.g. adjust to four decimal places, label everything, edit the charts etc etc). This is because this ‘EXCEL hygiene’ is essential in the workplace and also highly valued. Generating output is easy, consistently ensuring it is clearly labelled and easy to identify, understand and track, is more difficult, simply because it takes more time. This time is worth the investment and will be expected in your report.
PART ONE: Background (4 marks)
(AI and Generative AI tools are required to be used in this Part).
Write an introductory paragraph about vaping in Australia with the objective of providing some context for this assignment. You are required to use generative AI software, such as Chat GPT.
It must be strictly no more than 200 words, and you must provide the prompt and prompt refinements you used, and of course, footnote your source. Screenshots of ChatGPT output are acceptable.
Guideline for footnoting source:
“ChatGPT's Explanation of … … … … … … .," Generated by ChatGPT-3.5, OpenAI, September 15, 2021, [URL of the Chat or Platform]
Q1 NOTE 1: Prompts will be assessed using the 3C’s criteria (0.5 mark each):
1. Clarity: Clear prompts prevent confusion and guide AI effectively
2. Context: Rich context avoids incomplete or inaccurate outputs
3. Creativity: Open-ended prompts encourage diverse and innovative content
Q1 NOTE 2: Response will be assessed using the following criteria (0.5 mark each):
1. Relevance: Does the response directly address the prompt's intent and context?
2. Coherence: Is the response logically organized and structured, ensuring easy comprehension?
3. Completeness: Does the response cover all relevant aspects of the prompt or leave critical gaps?
4. Accuracy: Are the facts, information, and details presented in the response correct and reliable?
5. Appropriateness: Is the tone, style, and language of the response suitable for the intended audience?
PART TWO [LO1]: Sample Acquisition: (5 marks)
(AI and Generative AI tools must NOT be used in this Part because it requires students to demonstrate human knowledge and skill in using EXCEL).
Begin your analysis by using the Random Sampling procedure demonstrated in both the lecture and tutorials in Week 10, to select a RANDOM SAMPLE of 80 observations from your data and copy and paste all EIGHT variables (Observation Number, Lung Function, Nicotine, Years of Vaping, Daily Usage, Flavours, Initiation Age, Random Number)) into a separate worksheet labelled, ‘Sample_80’, in columns B - I respectively. In column A, you are to number the rows (1- 80) and label this column ‘Count’ .
Include a screenshot of this ‘Sample_80’ worksheet here to demonstrate you have sampled correctly. Label this as EXHIBIT 1 and include a relevant title.
PART THREE [LO2]: Model Development-Correlation Analysis & Simple Linear Regression (22 marks)
(AI and Generative AI tools must NOT be used in this Part because it requires students to demonstrate human knowledge and skill in using EXCEL).
(a) Use EXCEL to produce a correlation matrix for all variables (dependent and independent), remembering to follow the approach demonstrated in Lecture 10. (3 marks)
(b) Now use this correlation matrix to identify which independent variable has the strongest relationship with Lung Function (Y) to be used later in a regression model. State which variable this is and what evidence led you to choose it. (3 marks)
(c) To investigate if a linear relationship is a reasonable assumption, use EXCEL s scatterplot option to produce a graph of these two variables. Include the line of best fit (DO NOT INCLUDE R2 it is not to be discussed here).
Label this graph as EXHIBIT 3 with a relevant title and remember to optimise its presentation via the various formatting options available. (4 marks)
(d) Based ONLY on the scatterplot you produced as EXHIBIT 3, does a linear relationship seem reasonable? If so, is it a positive or negative slope? Provide evidence for your answer and interpret what this means in context of this question. (4 marks)
Regardless of your answer in (d), now assume that a linear relationship is reasonable.
(e) Using the Regression Analysis procedure in EXCEL, produce a simple linear regression model of Y vs X1, with the following requirements:
• Select 99% Confidence Level in the Output Options.
• Report all values to 4 decimal places where relevant.
• Provide the Summary Output labelled as EXHIBIT 4 with an appropriate title. (4 marks)
(f) Based on this output, state the equation of this regression model (correct to 4 decimal places), remembering to define the variables. (4 marks)
PART FOUR [LO3]: Model Validation and Interpretation-Simple Linear Regression (18 marks)
(AI and Generative AI tools must NOT be used in this Part because it requires students to demonstrate human knowledge and skill in using EXCEL).
Before interpreting this model, it is first essential to determine whether or not it is a true representation of the relationship that exists in the population between Lung Function (Y) and the independent variable (X1) you selected in (b). To do this, a hypothesis test of significance is required.
(a) Using a 5% level of significance, determine whether or not this relationship between Lung Function (Y) and the independent variable (X1) you selected in (b), is a statistically significant, linear relationship. Ensure that you clearly state your hypothesis, show ALL steps, ALL working AND interpret your conclusion IN CONTEXT of this question. (6 marks)
Assuming now that the model you have identified is statistically significant, it is time to interpret the model.
(b) State and provide an interpretation of the Y intercept, b0 and the slope coefficient, b1 . (7 marks)
(c) State and interpret the coefficient of determination for this model, in context of this question. (5 marks)
PART FIVE [LO4]: Analysis Extension to Multiple Linear Regression (15 marks)
(AI and Generative AI tools may be used selectively within this Part as per free text explanation provided).
(a) Using the Regression Analysis procedure in EXCEL, now include ALL FIVE independent variables in the regression against Lung Function (Y) to produce a Multiple Linear Regression model.
Label the regression output as EXHIBIT 5 with a relevant title, and remember to optimise its presentation via the various formatting options available (TIP: it is not ‘user-friendly’ for management if you leave any scientific notation in the output). (3 marks)
(b) From what you have learned about Simple Linear Regression analysis, discuss what the Multiple Linear Regression output you produced in EXHIBIT 5 tells you?
SPECIAL INSTRUCTIONS:
Through our examination of Simple Linear Regression, we covered most of what you need to know to understand a Multiple Linear Regression model – however, not everything.
Use ChatGPT to help you fill in the gaps for this Multiple Linear Regression part of the assignment. This does not mean you should use ChatGPT to do everything – if that was my intention, I would have said that. Instead, I want to see how you can work WITH ChatGPT as your assistant, not boss.
For this part, you will be assessed on how you interact with ChatGPT, what prompts you use and then how you refine those prompts, review and add to the responses, draw your own conclusions from the responses etc. You will be assessed less on the accuracy of your discussion and more on your intellectual engagement with the ChatGPT process and output.
If you choose poorly and use ChatGPT to do everything, with little to no involvement from yourself, you will be penalised heavily and likely score zero for this Multiple Linear Regression part of the assignment.
My whole intention is to provide the opportunity for you to harness the power of ChatGPT whilst remaining in the driver’s seat. This will be a critically important experience for you, should you choose to accept it with integrity.
|
GUIDANCE TO FOLLOW:
1. Even before calling on help from ChatGPT, you should comment on the things you already know about: R-Square and the p-values for each of the coefficients.
2. Next, you should be curious about how to interpret each of the coefficients, now that there is more than one – ask for help from ChatGPT
3. And what about the ‘Adjusted R-Square’ – is that relevant and why? Ask for help from ChatGPT – remember, the better your question, the better the answer!
4. And, is the ‘Significance F’ value relevant? Ask for help from ChatGPT
Be sure to include in your answer whether or not the multiple linear regression model is better than the simple linear regression model and be able to explain why? How? (12 marks)
PART SIX [LO5]: Conclusions and Business Implications (5 marks)
(AI and Generative AI tools may be used selectively within this Part as per explanation provided).
Time To Deliver Your Expert Opinion on This Matter
Now, referring ONLY to the Multiple Linear Regression analysis, that is, what you found and discussed in PART FIVE, in 200 words or less, describe what the business implications of your findings might be? There will be many possible correct answers to this question, but the ones you present, must be consistent with your findings and context.
If you choose to get some help from ChatGPT, as always, you must provide the prompt and any prompt refinements you used, and of course, footnote your source.
PRESENTATION: (3 marks)
There are 3 marks available for presentation. These marks will be awarded for things such as: Easy to read; logical flow of answers; cohesive report; answers clearly labelled; appropriate font size, borders, colour choice, labelling of graphs, care in spelling, grammar and punctuation.
ASSIGNMENT TOTAL = 4 + 5 + 22 + 18 + 15 + 5 + 3 = 72 MARKS