首页 >
> 详细

The goal of the assignment is to get familiar with different types of recommender

systems. Specifically, we are going to build a system that can recommend Yelp

businesses to users.

Dataset

The dataset that we will be using contains information about Yelp businesses. More

precisely, for 14397 active users and 1000 popular businesses on Yelp, we know if a

given user visited&rated a given business, for example, a restaurant.

The folder contains:

• user-business.csv - This is the ratings matrix R, where each row corresponds to a

user and each column corresponds to a business. Rij = 1 if the user has visited&rated

that business. Otherwise Rij = 0. (To simplify the question, we ignore the exact ratings.)

The columns are separated by a space.

• business.csv - This is a file containing the names of the businesses, in the same order

as the columns of R.

Overview

In this assignment we are going to implement three types of recommender systems

namely

● User - User recommender system

● Item – Item recommender system

● Latent factor model recommender system

We are then going to compare the results of these systems for the 4th user(index

starting from 1) of the dataset. Let’s call him Alex. In order to do so, we have erased the

CS/INFO 5304 Assignment 2

first 100 entries of Alex’s row in the matrix, and replaced them by 0s. This means that

we don’t know which of the first 100 businesses Alex has visited. Based on Alex’s

behavior on the other businesses, you need to give Alex recommendations on the first

100 businesses. We will then see if our recommendations match what Alex had in fact

visited.

To verify your output using the following recommenders, the 1s in the erased first entries

are:

● Piece of Cake

● Papi's Cuban & Caribbean Grill

● Loca Luna

● Farm Burger

● Little Rey

● Seven Lamps

● Vatica Indian Cuisine

● Shake Shack

● Truva Turkish Kitchen

● Yoi Yoi Japanese Steakhouse & Sushi

Part A: user – user recommender system [10 points]

In a user-user recommender system, you need to find users who have visited and rated

similar businesses. Then among these users, you can recommend the top visited items.

for all businesses b, compute

rAlex,b = Σx∈users cos-sim(x, Alex)· Rx, b

where cos-sim(x,Alex) is the cosine similarity of other users with Alex (excluding entries

of the first 100 businesses), and R is the ratings matrix. In the above equation you are

first finding the similarity between users and then multiplying it with their product rating

for each item. So the businesses that have higher rAlex,b will be the businesses that are

popular among the users similar to Alex.

Let S denote the set of the first 100 businesses (the first 100 columns of the matrix).

From all the businesses in S, which are the five that have the highest similarity scores

(rAlex,b

) for Alex? What are their similarity scores? In case of ties between two

businesses, choose the one with a smaller index. Do not write the index of the

businesses, write their names using the file business.csv.

Part B: item – item recommender system [10 points]

CS/INFO 5304 Assignment 2

In an item-item recommender system, you need to find items that have similar ratings

and recommend it to Alex. For all business b, compute

rAlex,b = Σx ∈ business cos-sim(x, b)· RAlex,x

where R is the ratings matrix and cos-sim(x,b) is the cosine-similarity of each pair of

businesses(excluding entries of Alex). Here you are finding similar items and then

multiplying it with Alex’s ratings for items. So the businesses that are similar to the

businesses already visited by Alex will have the higher rating r

From all the businesses in S (first 100 businesses), which are the five that have the

highest similarity scores for Alex? In case of ties between two businesses, choose the

one with a smaller index. Again, hand in the names of the businesses and their

similarity score.

Part C: Latent hidden model recommender system [15 points]

Latent model recommender system is the most popular type of recommender system in

the market today. Here we perform a matrix factorization of the ratings matrix R into two

matrices U and V where U is considered as the user features matrix and V is the movie

features matrix. Note that the features are ‘hidden’ and need not be understandable to

users. Hence the name latent hidden model. (refer slides for more information)

The latent model can be implemented by performing a singular value decomposition

(SVD) that factors the matrix into three matrices

R = U Σ V

T

where R is user ratings matrix, U is the user “features” matrix, Σ is the diagonal matrix of

singular values (essentially weights), and V

T

is the movie “features” matrix. U and V

T are

orthogonal, and represent different things. U represents how much users “like” each

feature and V

T

represents how relevant each feature is to each business.

To get the lower rank approximation, we take these matrices and keep only the top k

features (k factors), which we think of as the k most important underlying taste and

preference vectors.

CS/INFO 5304 Assignment 2

With k set to 10, perform SVD to identify the U and V matrices. You can then multiply

the matrices to estimate the following

R

* = U Σ V

T

From the R* matrix, select the top 5 businesses for Alex in S (first 100 businesses). In

case of ties between two businesses, choose the one with a smaller index. Again, hand

in the names of the businesses and their similarity score.

Hint: You can use SVD in surprise package, or numpy, scipy

Part D: bonus [10 points]

Your goal is to build a good recommendation system for Yelp with an ensemble of

predictors. You can use any individual predictor and any method to combine them (it

could be linear weighted combination or vote)

- Test set:

- 5 new users x the same 1000 businesses. Their records of the 100 first

businesses are also erased.

- Submission:

- the prediction of the erased records which are 1s and 0s.

- Submission format: sample_bonus_submission.csv, (5 rows, 100

columns, separator = comma, integers) Please make sure your raw text

exactly matches the sample format. Otherwise you might have 0 points

since we run auto grading.

- Evaluation metrics:

- Since the test set is sparse, i.e. most entries are 0s. We use F1 score as

our evaluation metric.

- You can split a validation set out of the training set (for example, user 5-9 )

if you want to test your model.

- Code and write-up:

- Write your code for the test set in a separate jupyter notebook. At the top

of the notebook, add brief write-ups to explain each predictor you used

and how you combined them.

- Your bonus points = max(10*min( , 1), 0) 𝑦𝑜𝑢𝑟𝐹1 − 0.12

0.6 − 0.12

CS/INFO 5304 Assignment 2

- This means that you will get some points as long as you attempt! For

reference, a random guess(all as 1s) is 0.12. And 0.6 is pretty accurate.

Turn in:

#A2

a) A Jupyter notebook a2.jpynb with the code and answers

(if you work in a study group, write their names at the top to avoid any

trouble in the plagiarism check.)

b) A a2.py exported from your .jpynb

#A2-bonus

c) bonus_submission.csv

d) bonus.ipynb

e) A bonus.py exported from your .jpynb

联系我们

- QQ：99515681
- 邮箱：99515681@qq.com
- 工作时间：8:00-21:00
- 微信：codinghelp

- Fit5217辅导、Python程序语言辅导 2022-05-31
- 辅导ecs 170 Introduction To Artificial... 2022-05-31
- 辅导ecs 170 Homework Assignment 5 2022-05-31
- Fit 5003 Software Security辅导 2022-05-30
- 辅导cse 101 Data Structures And Algori... 2022-05-30
- 辅导econ7150、辅导java，Python编程 2022-05-30
- Econ7150编程辅导 讲解 S1 2022 2022-05-29
- 讲解cse 101 程序 辅导 Data Structures 2022-05-29
- 辅导fit 5003 Software Security 2022-05-29
- Stat7055 Introductory Statistics For B... 2022-05-28
- Assignment 3 Description: Computer Sy... 2022-05-28
- 辅导laboratory 程序、辅导program编程 2022-05-28
- 讲解eece 1080C Programming For Ece 2022-05-28
- Comp10002 Foundations Of Algorithms辅导... 2022-05-28
- 辅导 Swen30006、辅导java/C++编程 2022-05-28
- Comp326讲解导、辅导python，Java程序 2022-05-28
- 辅导 Dungeon Crawler C++ - Assignment ... 2022-05-27
- 辅导mast30025 Linear Statistical Model... 2022-05-27
- Prog2002辅导、辅导sql语言编程 2022-05-26
- 辅导 Info411/911 Data Mining Knowledge... 2022-05-26