Assignment 1 Reading Rectangular/Tabular data
STA141B Spring 2025
Due: April 9, 11pm
Submit via Canvas
The first 2 assignments are related and similar. The first is due Wednesday, April 9. The second is due April 16.
The task is to read solar and climate-related data from a variety of locations in California. The data are related to solar performance for buildings and simulation models for understanding this solar performance.
We have data for all of the USA and also for the entire world.
We will focus on just 5 locations and 5 ZIP files:
USA_CA_UC-Davis-University.AP.720576_TMYx.2009-2023.zip
USA_CA_San.Diego-MCAS.Miramar.722930_TMYx.2009-2023.zip
USA_CA_Mount.Shasta.725957_TMYx.2009-2023.zip
USA_CA_Mammoth.Yosemite.AP.723894_TMYx.2009-2023.zip
USA_CA_Point.Arguello.994210_TMYx.2009-2023.zip
A zip file containing these 5 zip files is available in the Files section of the course’s Canvas portal, named Solar1.zip. In each ZIP archive, we have files with different file extensions such as clm, ddy, epw, stat, e.g.,
USA_CA_Point .Arguello .994210_TMYx .2009-2023 .clm
USA_CA_Point.Arguello.994210_TMYx.2009-2023.ddy
USA_CA_Point .Arguello .994210_TMYx .2009-2023 .epw
USA_CA_Point.Arguello.994210_TMYx.2009-2023.pvsyst
USA_CA_Point .Arguello .994210_TMYx .2009-2023 .rain
USA_CA_Point .Arguello .994210_TMYx .2009-2023 .stat
USA_CA_Point .Arguello .994210_TMYx .2009-2023 .wea
There is a basic description of the formats of these different files in the ZIP archive. However, you will have to explore the details of sample files to understand the general structure.
For this assignment, you are to write functions that will read the data in the .clm (climate) file in each zip file into a data.frame. The resulting data.frame should have 9 columns – the 6 columns in the data set and also the day, month and hour of each observation.
Write functions (rather than one or more R commands) to read each of these tables, as you need to apply these to the contents of the 5 ZIP files. Also, you will most likely need to run the code multiple times to iteratively modify it and verify it is correct. Use functions rather than repeating the same code for each file and table!
Verifying Results
It is vital to verify that the results are correct. You need to check by
• manually comparing individual values in the files and the results,
• computing summary statistics from the results, and
• visualizing the results
• programmatically verifying the results, to ensure they make sense and are correct.
Describe the approaches and processes by which you verified the results.
Identify Assumptions
State any assumptions you are making about the structure and order of the data, and show how you verified these were true.
Useful Functions
• strsplit()
• lapply(), sapply()
• list.files()
• readLines(), read.csv(), read.table()
• textConnection()
• substring(), substr(),
• trimws()
• grep(), grepl(), gsub()
• cumsm()
• which.min(), min(), max()
• data.frame(), as.data.frame()
• unlist()
• rep()
• split(), tapply(), by()
• strptime(), as.POSIXct(), as.Date()
• sprintf(), paste(), paste0()
• close(), on.exit()
• %in%
• unzip()
• system(), system2()
• rbind(), do.call()
• by(), tapply(), aggregate()
The essential functions for checking results correspond to what you expect include and debugging code include:
• length(), names(), dim(), nrow(), ncol(), class(), typeof(), is.na()
• debug()
• browser()
• options(error = recover)
• summary(), plot()
What to Submit
• A description of your approach and the important steps.
• The file(s) with the R functions and the code to use those functions to read the data.
• A description of how you verified the results were correct with numerical summaries, tables, plots that “prove” this is correct, along with a description of any manual verification you did.
• The R code you used to verify the results.
The writeup must be a PDF document containing the text and the plots, tables, etc.
The code must be submitted separately.
If you use Rmarkdown to create the PDF, it is good to also can submit this but you must separate the R code from the PDF from the Rmarkdown.