Description of Assignment
SDS 475/5210
Homework 4: Temperature Prediction Model for Saint Louis
Due Date 11/15/2024
Class Statistical Computation SDS 475/5210
Overview
In this assignment, you will build, evaluate, and interpret a predictive model for TLML, the temperature in degrees Fahrenheit, specifically for Saint Louis. Using a training dataset (data training) with various meteorological variables, you will develop and fit a model to predict Saint Louis temperatures. Once your model is developed and validated, you will test its accuracy using a separate prediction dataset (data prediction), where Saint Louis temperature values have been removed to assess prediction error.
You are encouraged to apply statistical techniques learned in class to ensure model robustness and interpretabil- ity.
Dataset Description
The datasets (data training and data prediction) are provided on Canvas under the “Assignment Data” module. Both datasets contain meteorological data across a spatial grid.
data training
This dataset provides temperature data (TLML) for Saint Louis, along with other meteorological variables across a grid of locations. Key variables include:
• TLML (target variable): Temperature in degrees Fahrenheit.
• SWGDN: Surface incoming shortwave flux (Watts per meter square) .
• PRECTOT: Total precipitation (Kg per meter square per second) .
• QLML: Specific humidity (ratio between 0 and 1) .
• SPEED: Wind speed (meters per second) .
• PRECSNO: Snowfall rate (Kg per meter square per second) .
• Longitude (lon) and Latitude (lat): Represent grid coordinates (not used for model building but can be used for visualization) .
data prediction
This dataset mirrors the structure of data training but is intended for testing your model. In data prediction, Saint Louis temperature (TLML) has been set to NA across all time points, and your goal will be to predict these missing values.
Saint Louis Coordinates
• Latitude: 38.63
• Longitude: -90.20
Instructions for Locating Saint Louis in the Data
To locate the grid indices closest to Saint Louis, use the latitude and longitude vectors in data training and data prediction. Here’s an example of how to find the grid indices in R:
# Define Saint Louis coordinates
st l l at <- 38.63
st l lon <- −90.20
# Find the closest la titude and longitude indices for Saint Louis in the data
lat index <- which . min( abs ( data training $ lat − st l l at ))
lon index <- which . min( abs ( data training $ lon − st l lon ))
This code will give you:
• lat index: the index of the latitude closest to 38.63 .
• lon index: the index of the longitude closest to -90.20.
—
Assignment Instructions
Your goal is to build the simplest, most interpretable model that accurately predicts Saint Louis temperatures in the prediction set. You have flexibility in model selection, but every choice must be justified both statistically and physically. Ensure thorough interpretations of all decisions, assumptions, and results.
1. Model Development Using Training Data: (40 points total) (a) Model Development Using Training Data:
– Propose a model to predict TLML (Saint Louis temperature) using one or more of the available variables (SWGDN, PRECTOT, QLML, SPEED, PRECSNO, temperature at other locations, etc.) . Fit your model using the provided data training dataset.
– Clearly explain your choice of model and the variables you select. Simplify where possible, and explain why this approach provides an accurate yet interpretable model.
– You may explore various models, including but not limited to regression models or time-series models, and apply optimization techniques such as Newton’s method, Quasi-Newton methods, etc., for parameter estimation.
2. Model Validation and Prediction:(15 points total)
• Apply your model to the data prediction dataset, predicting the temperature for Saint Louis at each time point where values are missing.
• Calculate and interpret the prediction error by comparing the model’s predictions in data prediction with the actual temperature values for Saint Louis from data training.
• Temperature Map: Generate atemperature map that visualizes the predicted and actual temperature distributions across the grid, with Saint Louis marked clearly. This plot is mandatory to provide spatial
context to your prediction.
3. Bootstrap Analysis for Uncertainty Quantification: (30 points total)
• Implement at least three of the following bootstrap techniques for uncertainty analysis:
– Bootstrap Estimation of Standard Error
– Bootstrap Estimation of Bias
– Standard Normal Bootstrap Confidence Interval
– Basic Bootstrap Confidence Interval
– Percentile Bootstrap Confidence Interval
– Bootstrap t Interval
• Explain each method’s purpose and interpret the results, discussing any biases or patterns in the model predictions for Saint Louis.
4. Gibbs Sampler for MCMC Analysis: (15 points total)
• Propose a model for predicting the temperature at Saint Louis where the Gibbs sampler can be used to estimate the model parameters. Your model should be structured in a way that naturally allows for the application of Gibbs sampling, with appropriate conditional distributions for each parameter.
• The primary goal of this exercise is to demonstrate a correct implementation of the Gibbs sampler within your chosen model, not necessarily to achieve accurate temperature predictions. You are encouraged to explore a variety of models that allow for Gibbs sampling, even if they may not provide the most accurate predictions.
• Submit a working code implementation that uses the Gibbs sampler to generate parameter samples and produce temperature predictions. While high prediction accuracy is not required, the model should be functional and capable of making predictions. Failure to implement a model where the Gibbs sampler is applied correctly and generates predictions will result in a penalty.
• Provide a clear explanation of why you chose to use the Gibbs sampler in this model and interpret the results of the parameter samples and predictions. Discuss the convergence of the sampler and the interpretation of the distributions of the parameters in the context of temperature prediction at Saint Louis.
5. Additional Analysis Options (encouraged but optional): (7 points total)
• Bias Reduction using Jackknife: You may apply the Jackknife method to further understand or reduce bias in your model’s predictions.
• Prediction Error and Cross-Validation: If applicable, use cross-validation techniques to check for over- fitting and ensure that the model generalizes well to unseen data.
• Other Methods Covered in Class: Feel free to incorporate any additional techniques discussed in class to enhance your analysis, improve prediction accuracy, or address model robustness.
Requirements and Deliverables
Report
• Clearly document your approach, including any assumptions and simplifications made in your model.
• Explain the rationale for each method chosen, with a particular emphasis on the interpretability and statis- tical relevance of each decision.
• Discuss both statistical and physical interpretations of your results, particularly how meteorological factors relate to temperature variations in Saint Louis.
• Justify any optimization techniques used (e.g., Newton’s or Quasi-Newton methods) and discuss how they impacted the model’s efficiency and accuracy.
• Describe your bootstrap and Gibbs sampling results, interpreting confidence intervals and uncertainty in predictions.
Code
• Submit well-documented code, with clear comments on each step of the process, from data loading and model building to analysis and visualization.
• Ensure that your code is reproducible and functions as expected, including all necessary libraries and param- eters.
• Include the temperature map visualization. Failure to include this map will result in a deduction of points.
—
Key Points to Remember
• The primary goal is to predict Saint Louis temperature (TLML) using an interpretable and well-justified model.
• Simplicity and clarity are highly valued in this assignment. Every decision regarding model choice, variable selection, and analysis technique must be well-explained.
• A choice of a model without a thorough explanation will be penalized.
• For uncertainty quantification, at least three bootstrap methods and the Gibbs sampler for MCMC are mandatory.
• Other analyses (like bias reduction, cross-validation, and optimization methods) are encouraged but optional. However, thoughtful application of these methods, when justified, will be rewarded.
Good luck! Make sure each step is guided by a clear purpose and grounded in solid statistical and physical reasoning.