首页 > > 详细

AM05 AUT24讲解 、辅导 R设计编程

AM05 AUT24 Final Project Assignment: Outfit Of The Day Recommendation System
Introduction
Welcome to your final project for the Data Management course. This project is designed to integrate and apply the skills you've acquired throughout the course, including data acquisition, web scraping, ETL processes, SQL database management, automation with Bash scripts, and API development using R.
You will create an Outfit Of The Day Recommendation System Outfit RecSys) that recommends daily outfits based on the current weather in London. The system will scrape clothing items from websites, store them in a database, retrieve weather data from a public API, and provide outfit recommendations through an API endpoint.
Project Overview
The Outfit RecSys should:
Database Contain a database of at least 25 clothing items, scraped from
an appropriate fashion website - including:
5 pairs of shoes
5 bottoms (e.g., pants, skirts)
5 tops (e.g., shirts, blouses)
5 coats or jackets
5 accessories (e.g., umbrellas, sunglasses)
API Endpoint Be accessible through an API endpoint using Plumber in R. Functionality:
When the API is called, it should:
Check the current weather in London.
AM05 AUT24 Final Project Assignment: Outfit Of The Day Recommendation System 1

Generate an outfit from the closet database using simple rules. Create a plot showing the weather forecast and images of the
recommended outfit.
Project Components
This project consists of several interconnected components:
 Data Acquisition and Scraping: Scrape at least 25 clothing items, including 5 from each category (shoes, bottoms, tops, coats, and accessories).
 Data Processing and ETL Clean and store the scraped data into a SQL database using the provided schema.
 Weather Data Integration: Use the Weatherstack API to get current weather data for London and integrate it into your recommendation system.
 Recommendation System: Build a simple rules-based recommender system that generates an outfit based on the weather conditions in London.
 API Development: Implement an API using Plumber in R with two endpoints: /ootd to get the outfit recommendation and /rawdata to return all product data.
 Automation: Automate the entire workflow using Bash scripts. Detailed Instructions
Use the following names for your scripts:
 product_scraping.R  weatherstack_api.R  etl.R
 ootd_api.R
 run_ootd_api.R  run_pipeline.sh
1. Data Collection and Web Scraping
Objective Scrape product images and information to populate your closet database.
AM05 AUT24 Final Project Assignment: Outfit Of The Day Recommendation System 2

Instructions:
Choose a Website Select an online clothing retailer that allows web
scraping (ensure compliance with the website's terms of service). Scrape Data:
Collect at least 25 items covering the categories mentioned above. For each item, collect the following information:
Product Name
Category (e.g., shoes, tops) Image URL
Download Images:
Save the product images locally in a folder named images , located in
your project folder. Example Code Snippet:
# product_scraping.R
library(rvest)
# Example: Scraping product names and image URLs
url <- "https://www.example.com/clothing"
webpage <- read_html(url)
product_names <- webpage %>% html_nodes(".product-name") %
>% html_text()
image_urls <- webpage %>% html_nodes(".product-image") %>%
html_attr("src")
# Download images
for(i in seq_along(image_urls)) {
download.file(image_urls[i], destfile = paste0("images/",
product_names[i], ".jpg"), mode = "wb")
}
AM05 AUT24 Final Project Assignment: Outfit Of The Day Recommendation System 3

# Create a data frame
products <- data.frame(
name = product_names,
category = ..., # Extract category
image_path = paste0("images/", product_names, ".jpg"),
stringsAsFactors = FALSE
)
# Save data frame for ETL process
write.csv(products, "products_raw.csv", row.names = FALSE)
Note Replace selectors like ".product_name" with the actual CSS selectors from the chosen website.
2. Weather Data Acquisition
Objective Retrieve current weather data for London using the Weatherstack API.
Instructions:
You should already have a Weatherstack API account and API key from
Assignment #1. Otherwise follow the instructions below:
Sign Up Register for a free API key at Weatherstack.
Store API Key Save your API key in an environment variable named YOUR_ACCESS_KEY .
Access Weather Data: Example Code Snippet:
# weatherstack_api.R
library(httr)
library(jsonlite)
# Retrieve API key from environment variable
api_key <- Sys.getenv("YOUR_ACCESS_KEY")
# Construct API request
AM05 AUT24 Final Project Assignment: Outfit Of The Day Recommendation System 4

response <- GET(
url = "http://api.weatherstack.com/current",
query = list(
access_key = api_key,
query = "London"
)
)
# Parse response
weather_data <- content(response, as = "text") %>% fromJSON
(flatten = TRUE)
# Extract relevant information
current_temperature <- weather_data$current$temperature
weather_descriptions <- weather_data$current$weather_descri
ptions
# Save weather data for use in recommendation logic
saveRDS(weather_data, "weather_data.rds")
3. ETL Process and Database Management
Objective Clean and store product data into a SQL database. Instructions:
Create a Database Use SQLite for simplicity (no server setup required).
Define Schema Ensure all students use the same schema. Schema:
CREATE TABLE closet (
id INTEGER PRIMARY KEY AUTOINCREMENT,
name TEXT,
category TEXT,
image_path TEXT
);
ETL Process:
AM05 AUT24 Final Project Assignment: Outfit Of The Day Recommendation System 5

Read the raw product data from products_raw.csv . Clean the data (e.g., handle missing values). Insert the cleaned data into the closet table.
Example Code Snippet:
# etl.R
library(RSQLite)
library(dplyr)
# Read raw data
products <- read.csv("products_raw.csv", stringsAsFactors =
FALSE)
# Data cleaning (example)
products_clean <- products %>%
filter(!is.na(name), !is.na(category), !is.na(image_pat
h))
# Connect to SQLite database
conn <- dbConnect(SQLite(), dbname = "closet.db")
# Write data to database
dbWriteTable(conn, "closet", products_clean, overwrite = TR
UE, row.names = FALSE)
# Disconnect
dbDisconnect(conn)
4. Outfit Recommendation Logic
Objective Implement rules-based logic to recommend outfits based on weather conditions.
Instructions: Define Rules:
Temperature  25°C Light clothing (e.g., t-shirts, shorts, sandals).
AM05 AUT24 Final Project Assignment: Outfit Of The Day Recommendation System 6

Temperature 15°C  25°C Comfortable clothing (e.g., long-sleeve tops, jeans, sneakers).
Temperature  15°C Warm clothing (e.g., jackets, sweaters, boots). Rain Forecast Include a raincoat or umbrella.
Sunny Suggest sunglasses.
Implement Logic Use R to query the database and select items matching the rules.
Example Code Snippet (within ootd_api.R ):
# ... within the /ootd endpoint function
# Load weather data
weather_data <- readRDS("weather_data.rds")
temperature <- weather_data$current$temperature
weather_desc <- weather_data$current$weather_descriptions
# Connect to database
conn <- dbConnect(SQLite(), dbname = "closet.db")
# Initialize outfit list
outfit <- list()
# Apply rules
if (temperature > 25) {
# Select light clothing
outfit$top <- dbGetQuery(conn, "SELECT * FROM closet WHER
E category = 't-shirt' LIMIT 1")
outfit$bottom <- dbGetQuery(conn, "SELECT * FROM closet W
HERE category = 'shorts' LIMIT 1")
outfit$shoes <- dbGetQuery(conn, "SELECT * FROM closet WH
ERE category = 'sandals' LIMIT 1")
} else if (temperature >= 15 && temperature <= 25) {
# Select comfortable clothing
} else {
# Select warm clothing
AM05 AUT24 Final Project Assignment: Outfit Of The Day Recommendation System 7

}
# Check for rain
if (grepl("Rain", weather_desc)) {
outfit$accessory <- dbGetQuery(conn, "SELECT * FROM close
t WHERE category = 'umbrella' LIMIT 1")
}
# Disconnect
dbDisconnect(conn)
# Proceed to create the plot with selected items
5. API Development with Plumber
Objective Develop two API endpoints using Plumber in R. Endpoints:
/ootd  Returns a plot showing the outfit recommendation.
/rawdata  Returns all product data as a JSON object. Instructions:
Setup Plumber Install and load the plumber package.
Define Endpoints: Example Code Snippet:
# ootd_api.R
library(plumber)
library(DBI)
library(RSQLite)
library(jsonlite)
#* @apiTitle Outfit Recommendation API
#* Get Outfit of the Day
AM05 AUT24 Final Project Assignment: Outfit Of The Day Recommendation System 8

#* @get /ootd
function() {
# Implement recommendation logic (as per previous sectio
n)
# Create a plot
plot.new()
# Example plot code:
plot.window(xlim=c(0,1), ylim=c(0,1))
text(0.5, 0.9, paste("Date:", Sys.Date()), cex=1.5)
text(0.5, 0.8, paste("Weather:", weather_desc), cex=1.2)
# Add images (this is a placeholder, you need to use func
tions like rasterImage)
# Return the plot
}
#* Get Raw Product Data
#* @get /rawdata
function() {
conn <- dbConnect(SQLite(), dbname = "closet.db")
data <- dbGetQuery(conn, "SELECT * FROM closet")
dbDisconnect(conn)
return(toJSON(data))
}
6. Guidance on the Outfit of the Day Format
You are required to generate an outfit recommendation output that presents the selected items in a clear and visually appealing manner. This output will be a key component of your project's deliverables, particularly when testing your
/ootd API endpoint. Below are the guidelines to help you create an effective recommendation output.
Essential Components
Your /ootd recommendation output is an image that must include the following elements:
 Date and Weather Forecast:
Today's Date: Display the current date prominently.
AM05 AUT24 Final Project Assignment: Outfit Of The Day Recommendation System 9

Weather Forecast: Include a brief description of the weather conditions, such as temperature, weather descriptions (e.g., sunny, rainy), and any other relevant details retrieved from the Weatherstack API.
 Outfit Images:
Clothing Categories: The outfit must consist of images representing
each of the following categories:
Shoes
Bottom (e.g., trousers, jeans, skirts)
Top (e.g., shirts, sweaters, blouses) Outerwear (e.g., jackets, coats)
Accessory (e.g., sunglasses, umbrella, bag)
Image Quality: Ensure that the images are clear and of high quality so that the details of each item are visible.
Layout and Presentation
You have creative freedom in how you present the outfit images, but your layout should adhere to the following guidelines:
Clarity and Visibility:
Arrange the images in a way that each item is fully visible and not obscured by other elements.
Avoid overlapping images unless it enhances the presentation without compromising clarity.
Layout Options:
Mosaic/Grid Layout: Place the images in a grid format, aligning them neatly in rows and columns. This approach ensures that each item has its own space.
Stylistic Overlay: If you prefer a more creative approach, you can overlay the images to mimic how the outfit would look when worn together. Ensure that this method still allows each item to be distinctly identified.
Labels and Annotations Optional):
AM05 AUT24 Final Project Assignment: Outfit Of The Day Recommendation System 10

You may include labels or brief descriptions next to each item to indicate the category or any special features.
Use legible fonts and colours that contrast well with the background and images.
Example Approaches
Here are some ideas on how you might structure your output:  Mosaic/Grid Example:
 Stylistic Overlay Example:
AM05 AUT24 Final Project Assignment: Outfit Of The Day Recommendation System 11

Technical Implementation Tips
Image Processing with magick :
Use the magick package in R to manipulate and combine images.
Ensure that all images are resized proportionally to maintain aspect ratios.
Use image_append() or image_montage() functions to arrange images in a grid.
For overlays, use image_composite() with appropriate gravity and offsets. Adding Text Annotations:
Use image_annotate() to add the date and weather information at the top or bottom of the output image.
Choose font sizes and styles that are readable and professional.
File Formats and Sizes:
Save the final output as a PNG or JPEG file.
Optimise the image size to balance quality and file size.
Testing Your Output
Visual Inspection:
AM05 AUT24 Final Project Assignment: Outfit Of The Day Recommendation System 12

Open the generated image to ensure that all elements are displayed correctly.
Check for any distortions or misalignments.
Consistency with Recommendation Logic:
Verify that the selected items align with your recommendation logic based on the weather data.
Ensure that accessories like umbrellas are included on rainy days.
7. Automation with Bash Scripts
Objective Automate the entire pipeline so that the assessor can run your Bash script and retrieve the outfit recommendation.
Instructions:
Create a Bash Script Name it run_pipeline.sh . Script Requirements:
Accept an input variable for the Weatherstack access key. Example:
#!/bin/bash
# Usage: ./run_pipeline.sh YOUR_ACCESS_KEY
YOUR_ACCESS_KEY=$1
export YOUR_ACCESS_KEY
# Run R scripts
Rscript product_scraping.R
Rscript weatherstack_api.R
Rscript etl.R
Rscript run_ootd_api.R &
# Wait for API to start
sleep 5
# Call the /ootd endpoint
AM05 AUT24 Final Project Assignment: Outfit Of The Day Recommendation System 13

curl "" --output ootd_plo
t.png
echo "Outfit of the Day plot saved as ootd_plot.png"
Run OOTD API The run_ootd_api.R script should start the Plumber API on port 8000.
Example Code Snippet:
# run_ootd_api.R
library(plumber)
# Load the API
r <- plumb("ootd_api.R")
# Run the API on port 8000
r$run(port = 8000)
Deliverables
 Project Folder A zipped folder named win-123456.zip or mac-123456.zip , where 123456 is your student number.
 Bash Script:
A script named run_pipeline.sh that:
Takes an input variable for the Weatherstack access key ( YOUR_ACCESS_KEY ).
Loads and runs all relevant R scripts.
Makes a call to the /ootd endpoint using curl to produce the plot
of the Outfit of the Day.  R Scripts:
product_scraping.R
weatherstack_api.R
etl.R
AM05 AUT24 Final Project Assignment: Outfit Of The Day Recommendation System 14

ootd_api.R
run_ootd_api.R
 Product Images:
A folder named images containing the product images associated with
your closet database.  Example outfit image
a Outfit generated from your /ootd endpoint with file name ootd_plot.png .  Readme file
README Updates:
In your README.md , how a section that explains how the recommendation output is generated.
Provide any instructions necessary to reproduce the output.
Important Notes
Environment Variables Ensure your API key is retrieved from an environment variable that is passed to the bash script from the command line.
Example:
#!/bin/bash
# example_script.sh
# Usage: ./example_script.sh YOUR_API_KEY
# Check if the API key is provided
if [ -z "$1" ]; then
echo "Usage: $0 YOUR_API_KEY"
exit 1 fi
# Get the API key from the command-line argument
YOUR_API_KEY=$1
AM05 AUT24 Final Project Assignment: Outfit Of The Day Recommendation System 15

# Export the API key as an environment variable
export YOUR_API_KEY
# Now you can use the API key in your script or in scri
echo "API key has been set as an environment variable."
# Example usage within the script
echo "Using the API key in the script:"
echo "The API key is: $YOUR_API_KEY"
# Example of running another script that uses the API k
# Assuming you have a script called api_call_script.sh
# ./api_call_script.sh
# Alternatively, run an R script that uses the API key
# Rscript my_r_script.R
Suppose your API key is abcd1234. You would run the script as follows:
./example_script.sh abcd1234
Using the API Key in an R Script (e.g., my_r_script.R):
e
# my_r_script.R
# Retrieve the API key from the environment variable
api_key <- Sys.getenv("YOUR_API_KEY")
if (api_key == "") {
stop("API key not found. Please ensure YOUR_API_KEY i
}
# Use the API key in your API calls
# For example:
library(httr)
response <- GET("https://api.example.com/data", add_hea
AM05 AUT24 Final Project Assignment: Outfit Of The Day Recommendation System
16
s
d

# Process the response as needed
Port Configuration The API should run on port 8000 . Ensure no other services are using this port.
Dependencies List any R packages required in a README.md file. Testing Verify that your entire pipeline works on a different machine to
ensure it runs outside of your development environment.
Assessment Criteria (Total: 100 points)
 Data Collection and Scraping 15 points)
Quality and completeness of the web scraping script 10 points). Variety and coverage of items across different categories 5 points).
 Database Design and Implementation 10 points)
Correct SQL database design according to the specified schema 5
points).
Successful population of the database with scraped items 5 points).  Weather Integration 10 points)
Successful integration and automation of weather data retrieval 5 points).
Correct usage and storage of weather data in the system 5 points).  Outfit Recommender System 20 points)
Effectiveness of the recommendation logic 10 points).
Proper implementation using R 10 points).  Automation and Workflow 15 points)
Use of Bash scripts to automate tasks 10 points).
Correct execution of the entire pipeline from the script 5 points).  Code Quality and Documentation 10 points)
Code readability and adherence to best practices 5 points).
AM05 AUT24 Final Project Assignment: Outfit Of The Day Recommendation System 17

Clear documentation and instructions in a README.md file 5 points).  OOTD Endpoint Functionality 20 points)
/ootd endpoint returns a plot showing date, weather forecast, and outfit images 10 points).
/rawdata endpoint returns all products in the closet database as JSON (10 points).
 Bonus steps / functionality 10 bonus points)
50 products are added to your closet database 5 points)
/ootd endpoint has additional functionality to product two or more outfit choices for each call rather than 1 outfit. 5 points)
Submission Instructions
Deadline See canvas assignment page.
Submission Method Upload your zipped project folder to canvas.
File Naming Ensure your zipped folder follows the naming convention ( win- 123456.zip or mac-123456.zip ).
Tips and Best Practices
Testing Run your Bash script from start to finish to ensure all components work seamlessly.
Error Handling Include error checks in your scripts to handle potential issues (e.g., missing data, API errors).
Comments Comment your code to explain the logic and flow. Dependencies Use renv or list your packages to ensure the assessor can
install them easily.
Security Do not hardcode your API keys in the scripts; always use environment variables.
Data Privacy Ensure compliance with data scraping regulations and respect website terms of service.
AM05 AUT24 Final Project Assignment: Outfit Of The Day Recommendation System 18

Getting Started
 Set Up Your Environment:
Install necessary R packages:
rvest
httr
jsonlite
DBI
RSQLite
plumber
Ensure you have curl installed for making HTTP requests in the Bash script.
 Plan Your Approach:
Review the requirements and plan each step. Start by setting up your database schema.
 Incremental Development:
Test each component individually before integrating. Use print statements or logs to debug.
 Consult Course Materials:
Revisit workshops and assignments related to each component.
Support
If you have any questions or need clarification, please reach out during office hours or via email at jfrancis@london.edu.
Good luck with your project!
APPENDIX 1.0 - Guidelines for Your README File
AM05 AUT24 Final Project Assignment: Outfit Of The Day Recommendation System 19

Your README.md file is a crucial part of your project submission. It should provide clear instructions and information to help others understand and run your project without any confusion. Below are some key points you should include:
Project Title and Description:
Clearly state the name of your project.
Provide a brief overview of the project's purpose and functionality.
Table of Contents Optional for Longer READMEs):
If your README is extensive, include a table of contents to help readers navigate the document.
Prerequisites and Dependencies:
List all software, packages, and libraries required to run your project. For example: R (version X.X.X, SQLite, Bash shell, rvest , httr ,
jsonlite , etc.
Include any system requirements or platform-specific instructions. Provide commands or steps to install these dependencies.
Installation and Setup Instructions:
Step-by-step guidance on how to set up the project environment. Cloning or downloading the project repository.
Setting up directories and files.
Instructions on obtaining and setting up the Weatherstack API key. How to export the API key as an environment variable if needed.
Project Structure Overview:
Briefly describe the purpose of each major script and file in your project.
product_scraping.R  Scrapes product data and images from the web. weatherstack_api.R  Fetches current weather data using the
Weatherstack API.
etl.R  Cleans data and populates the SQLite database. ootd_api.R  Defines the API endpoints using Plumber.
AM05 AUT24 Final Project Assignment: Outfit Of The Day Recommendation System 20

run_ootd_api.R  Runs the API server.
run_pipeline.sh  Bash script that automates the entire pipeline. images/  Directory containing product images.
closet.db  SQLite database file containing the closet data.
Mention any additional files or directories, such as logs or outputs.
Usage Instructions:
How to run the entire pipeline using the Bash script. Example command: ./run_pipeline.sh YOUR_ACCESS_KEY
Instructions on how to start the API server independently if needed. Example command: Rscript run_ootd_api.R
How to access the API endpoints.
Accessing /ootd and /rawdata via a web browser or using curl . Example: curl "" --output ootd_plot.png
Any additional steps required to generate the outputs.
Recommendation Logic Explanation:
Describe how the weather data influences the outfit recommendation. Temperature thresholds and corresponding clothing choices. Handling of specific weather conditions (e.g., rain).
Any additional logic or rules implemented.
Output Description:
Details about the generated outputs, such as the outfit plot image. Explain the contents and format of ootd_plot.png .
Includes date, weather forecast, and images of the outfit items. Mention any other output files and their purposes.
Additional Features Bonus Implementations):
Describe any extra items added to the closet beyond the required 25. Detail any additional API endpoints you have created.
Their purposes and how to access them.
AM05 AUT24 Final Project Assignment: Outfit Of The Day Recommendation System 21

Explain if you have implemented multiple outfit suggestions.
Troubleshooting and FAQs:
Common issues that might arise and their solutions. API key errors.
Missing dependencies.
Port conflicts if the API server doesn't start. Tips for ensuring the scripts run smoothly.
Dependencies and Package Installation:
Provide a list of R packages and how to install them. Example:
install.packages(c("rvest", "httr", "jsonlite", "DBI", "RSQLite",
"plumber", "dplyr", "magick"))
Instructions for installing any system-level dependencies if applicable.
License Information Optional):
Specify any licenses if you are using third-party code or resources.
Contact Information Optional):
Your name and email address for any questions or feedback.
Acknowledgments Optional):
Credit any resources, tutorials, or individuals that helped you.
Formatting Tips:
Use Markdown syntax to structure your README
Headings ( # , ## , ### ) for sections and subsections.
Bullet points and numbered lists for clarity.
Backticks for inline code ( code ) and triple backticks for code blocks. Hyperlinks for referencing external resources or documentation.
Example of a Command:
Example of Inline Code:
./run_pipeline.sh YOUR_ACCESS_KEY
AM05 AUT24 Final Project Assignment: Outfit Of The Day Recommendation System 22

To install packages: install.packages("package_name")
Final Checklist:
Clarity and Conciseness:
Ensure instructions are easy to follow and free of jargon.
Keep sentences and paragraphs short and to the point.
Completeness:
Double-check that all required sections are included.
Verify that all instructions are accurate and up-to-date.
Proofreading:
Check for spelling and grammatical errors.
Ensure consistent formatting throughout the document.
APPENDIX 2.0 - Passing Variables, Data, and Files Between Scripts in a Pipeline
In a data processing pipeline, it's essential to pass variables, data, and files from one script to another to ensure seamless execution and maintain modularity. This practice allows different components of the pipeline to communicate and share necessary information without tightly coupling the scripts. Below are various methods to achieve this, along with explanations of their importance and examples based on the Final Project Assignment: Personal Outfit Recommendation System.
1. Command-Line Arguments
Explanation:
Scripts can accept input parameters directly from the command line when they are executed.
This method allows you to pass variables, such as API keys or file paths, dynamically.
Why It's Important:
AM05 AUT24 Final Project Assignment: Outfit Of The Day Recommendation System 23

Flexibility: Users can specify different inputs without modifying the script code.
Security: Sensitive information like API keys can be passed at runtime instead of hardcoding them.
Example from the Project:
Passing the Weatherstack API Key:
In the Bash script run_pipeline.sh , the API key is passed as a command-line argument:
Within run_pipeline.sh , the API key is captured and exported:
./run_pipeline.sh YOUR_ACCESS_KEY
#!/bin/bash
# Check if the API key is provided
if [ -z "$1" ]; then
echo "Usage: $0 YOUR_ACCESS_KEY"
exit 1 fi
# Export the API key as an environment variable
export YOUR_ACCESS_KEY=$1
Each R script can then access the API key from the environment variable.
2. Environment Variables
Explanation:
Environment variables are key-value pairs available to all processes in the shell session.
Scripts can read environment variables to obtain necessary information.
Why It's Important:
Security: Keeps sensitive data out of the codebase.
Consistency: Ensures that all scripts access the same variable values.
AM05 AUT24 Final Project Assignment: Outfit Of The Day Recommendation System 24

Portability: Environment variables can be easily configured on different systems.
Example from the Project:
Accessing the API Key in R Scripts:
In weatherstack_api.R , the API key is retrieved from the environment:
# Retrieve the API key from the environment variable
api_key <- Sys.getenv("YOUR_ACCESS_KEY")
if (api_key == "") {
stop("API key not found. Please ensure YOUR_ACCESS_KEY
is set as an environment variable.")
}
3. Reading and Writing Files
Explanation:
Scripts can write data to files, which subsequent scripts read and process.
Common file formats include CSV, JSON, RDS R's binary format), and databases.
Why It's Important:
Data Persistence: Stores intermediate results that can be reused or
inspected.
Decoupling: Allows scripts to operate independently, focusing on specific tasks.
Debugging: Facilitates troubleshooting by examining intermediate files. Example from the Project:
Sharing Scraped Data:
product_scraping.R : Scrapes product data and saves it to a CSV file.
# Save raw product data
write.csv(products, "products_raw.csv", row.names = F
ALSE)
AM05 AUT24 Final Project Assignment: Outfit Of The Day Recommendation System 25

etl.R : Reads the CSV file for data cleaning and loading into the database.
Storing Weather Data:
weatherstack_api.R : Fetches weather data and saves it as an RDS file.
ootd_api.R : Reads the weather data for generating outfit recommendations.
4. Using Databases
Explanation:
Databases provide a structured way to store and retrieve data.
Scripts can insert data into a database, which other scripts can query as needed.
Why It's Important:
Data Integrity: Enforces data types and constraints.
Concurrency: Allows multiple scripts to access data without conflicts. Scalability: Handles larger datasets efficiently.
Example from the Project: Centralized Data Storage:
etl.R : Inserts cleaned product data into a SQLite database.
# Read the raw product data
products <- read.csv("products_raw.csv", stringsAsFac
tors = FALSE)
# Save weather data to an RDS file
saveRDS(weather_data, "weather_data.rds")
# Load weather data
weather_data <- readRDS("weather_data.rds")
AM05 AUT24 Final Project Assignment: Outfit Of The Day Recommendation System 26

# Connect to the SQLite database
conn <- dbConnect(SQLite(), dbname = "closet.db")
# Write data to the 'closet' table
dbWriteTable(conn, "closet", products_clean, append =
TRUE, row.names = FALSE)
ootd_api.R : Queries the database to select items for the outfit.
# Connect to the SQLite database
conn <- dbConnect(SQLite(), dbname = "closet.db")
# Query for outfit items based on category
outfit_item <- dbGetQuery(conn, "SELECT * FROM closet
WHERE category = 'tops' ORDER BY RANDOM() LIMIT 1")
5. Standard Input and Output (Pipes)
Explanation:
Scripts can read from standard input ( stdin ) and write to standard output ( stdout ).
Allows chaining commands using pipes ( | ), where the output of one command serves as input to another.
Why It's Important:
Stream Processing: Useful for processing data streams or large datasets. Flexibility: Enables quick data transformations without intermediate files.
Example from the Project:
Chaining Commands Hypothetical):
While not explicitly used in the project, you could use pipes in the command line:
# Pass the output of one script to another
Rscript script1.R | Rscript script2.R
AM05 AUT24 Final Project Assignment: Outfit Of The Day Recommendation System 27

6. Function Calls Between Scripts (Sourcing)
Explanation:
One script can source another, effectively importing its functions and variables.
In R, source("script.R") runs the code from the sourced script in the current environment.
Why It's Important:
Code Reusability: Share common functions without duplicating code. Organisation: Keep code modular and maintainable.
Example from the Project:
Shared Functions Hypothetical):
If you have utility functions used across scripts:
# In 'utils.R'
calculate_temperatu
联系我们
  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-21:00
  • 微信:codinghelp
热点标签

联系我们 - QQ: 99515681 微信:codinghelp
程序辅导网!