4/6/2020 COMP3311 20T1 - Assignment 2
https://cgi.cse.unsw.edu.au/~cs3311/20T1/assignments/a2/index.php 1/7
COMP3311 20T1 Assignment 2
SQL, Python, SQLite
Database Systems
Last updated: Thursday 2nd April 11:32pm (most recent updates are in [..])
Downloads: a2.tgz, a2.zip
Note that a2.tgz and a2.zip contain the same material.
Each archive contains the IMDB database dump a2.db, plus a sample Python code file called sample.
Aims
This assignment aims to give you practice in
use of SQL in SQLite
writing scripts in Python that interact with a database
The goal is to complete the functionality of some command-line tools via a combination of database code and Python code.
Summary
Submission: Login to Course Web Site > Assignments > Assignment 2 > Assignment 2 Specification > Make Submission > upload required files > [Submit]
Required Files (total 4 files): acting, title, toprank, similar
Deadline: Friday 17 April 2020 @ 17:00
Late Penalty: Late submissions will have marks deducted from the maximum achievable mark at the rate of 0.5% of the total mark per hour that they are
late (i.e., 12% per day).
This assignment contributes 30 marks toward your total mark for this course.
The mark for each question indicates (roughly) its level of difficulty.
The total marks for the questions sum to 15, and will be scaled to 30.
How to do this assignment:
read this specification carefully and completely
login to grieg or a CSE linux machine
create a directory for this assignment
unpack the supplied zip file into this directory
get familiar with the schema and data by exploring and querying the provided database using the command: sqlite3 a2.db
familiarise (read the code) yourself with the provided sample Python code file called sample
try out the sample code by running: ./sample YEAR where YEAR is a number representing a year, e.g., 1989
make sure sample is executable (by chmod u+x sample) and run it
complete the tasks using sample as a reference template
submit all these files (the Required Files) via WebCMS3 as described above
Details of the above steps are given below. You can also edit and run the Python files on hosts other than grieg.
Introduction
A successful movie (or TV show) not only entertains audience, but also enables film companies to gain tremendous profit. A lot of factors (such as good
directors, experienced actors) are important for creating good movies. Nevertheless, famous directors and actors usually bring an attractive box-office income,
but they do not necessarily guarantee a highly rated imdb score. This assignment is based on an IMDB dataset to build several small Python commands to
show interesting results.
The dataset itself contains around 5000 movies, spanning across 100 years in 66 countries. There are more than 2000 movie directors, and thousands of
actors/actresses. It also contains the IMDB rating score, numbers of votes and various facebook likes. To let you feel the kind of data that you are dealing
with, a (unordered) glimpse of the dataset is included below:
cs3311@grieg:~/sqlite/a2$ sqlite3 a2.db
SQLite version 3.8.7.1 2014-10-29 13:59:56
Enter ".help" for usage hints.
sqlite>
sqlite> select * from movie limit 10;
1|Avatar|2009|PG-13|178|English|USA|760505847|237000000|735
2|Pirates of the Caribbean: At World's End|2007|PG-13|169|English|USA|309404152|300000000|555
3|Spectre|2015|PG-13|148|English|UK|200074175|245000000|1776
4|The Dark Knight Rises|2012|PG-13|164|English|USA|448130642|250000000|1078
6|John Carter|2012|PG-13|132|English|USA|73058679|263700000|2217
7|Spider-Man 3|2007|PG-13|156|English|USA|336530303|258000000|625
8|Tangled|2010|PG|100|English|USA|200807262|260000000|922
9|Avengers: Age of Ultron|2015|PG-13|141|English|USA|458991599|250000000|1410
10|Harry Potter and the Half-Blood Prince|2009|PG|153|English|UK|301956980|250000000|1117
11|Batman v Superman: Dawn of Justice|2016|PG-13|183|English|USA|330249062|250000000|2180
sqlite>
sqlite> .headers on
sqlite>
sqlite> select * from movie limit 10;
id|title|year|content_rating|duration|lang|country|gross|budget|director_id
1|Avatar|2009|PG-13|178|English|USA|760505847|237000000|735
2|Pirates of the Caribbean: At World's End|2007|PG-13|169|English|USA|309404152|300000000|555
3|Spectre|2015|PG-13|148|English|UK|200074175|245000000|1776
4/6/2020 COMP3311 20T1 - Assignment 2
https://cgi.cse.unsw.edu.au/~cs3311/20T1/assignments/a2/index.php 2/7
4|The Dark Knight Rises|2012|PG-13|164|English|USA|448130642|250000000|1078
6|John Carter|2012|PG-13|132|English|USA|73058679|263700000|2217
7|Spider-Man 3|2007|PG-13|156|English|USA|336530303|258000000|625
8|Tangled|2010|PG|100|English|USA|200807262|260000000|922
9|Avengers: Age of Ultron|2015|PG-13|141|English|USA|458991599|250000000|1410
10|Harry Potter and the Half-Blood Prince|2009|PG|153|English|UK|301956980|250000000|1117
11|Batman v Superman: Dawn of Justice|2016|PG-13|183|English|USA|330249062|250000000|2180
sqlite>
sqlite> select * from rating limit 10;
movie_id|num_critic_for_reviews|num_user_for_reviews|num_voted_users|movie_facebook_likes|cast_total_facebook_likes|imdb_score
1|723|3054|886204|33000|4834|7.9
2|302|1238|471220|0|48350|7.1
3|602|994|275868|85000|11700|6.8
4|813|2701|1144337|164000|106759|8.5
6|462|738|212204|24000|1873|6.6
7|392|1902|383056|0|46055|6.2
8|324|387|294810|29000|2036|7.8
9|635|1117|462669|118000|92000|7.5
10|375|973|321795|10000|58753|7.5
11|673|3018|371639|197000|24450|6.9
sqlite>
sqlite> .mode column
sqlite>
sqlite> select * from director limit 10;
id name facebook_likes
---------- ---------- --------------
1 JK Youn 2
2 David S. W 42
3 James Fraw 21
4 Kar-Wai Wo 0
5 Brian Tren 53
6 Perry Lang 17
7 Jeff Burr 155
8 Al Sillima 0
9 Morten Tyl 77
10 Hue Rhodes 0
sqlite>
sqlite> select * from actor limit 10;
id name facebook_likes
---------- ----------------- --------------
1 Maureen McCormick 458
2 Andrew Fiscella 137000
3 Brittany Daniel 861
4 Michael Smiley 177
5 Najarra Townsend 538
6 Gustaf Skarsgrd 908
7 Laila Haley 1000
8 Isaac C. Singleto 312
9 Veronica Ferres 30000
10 Chris 'Wonder' Sc 0
sqlite>
sqlite> select * from acting limit 10;
movie_id actor_id
---------- ----------
407 2024
3699 1841
3016 11
2846 195
3421 738
3645 1186
2430 211
4823 1299
1737 786
2282 866
sqlite>
sqlite> select * from genre limit 10;
movie_id genre
---------- ----------
407 Adventure
407 Comedy
407 Family
3699 Drama
3699 Horror
3699 Mystery
3699 Sci-Fi
3699 Thriller
3016 Drama
2846 Comedy
sqlite>
sqlite> select * from keyword limit 10;
movie_id keyword
---------- ----------
4/6/2020 COMP3311 20T1 - Assignment 2
https://cgi.cse.unsw.edu.au/~cs3311/20T1/assignments/a2/index.php 3/7
407 dog
407 parole
407 parole off
407 prison
407 puppy
3699 alien
3699 bunker
3699 car crash
3699 kidnapping
3699 minimal ca
sqlite>
sqlite> .width 10 30
sqlite>
sqlite> select * from keyword limit 10;
movie_id keyword
---------- ------------------------------
407 dog
407 parole
407 parole officer
407 prison
407 puppy
3699 alien
3699 bunker
3699 car crash
3699 kidnapping
3699 minimal cast
sqlite>
You may also explore the schema and any table information using the sqlite system catalog table called sqlite_master (as shown in the Week 7 lecture), or the
sqlite PRAGMA functions (please see the sqlite documentation for details), as shown in the examples below:
cs3311@grieg:~/sqlite/a2$ sqlite3 a2.db
SQLite version 3.8.7.1 2014-10-29 13:59:56
Enter ".help" for usage hints.
sqlite> .tables
acting actor director genre keyword movie rating
sqlite> pragma table_info(movie);
0|id|integer|1||1
1|title|character varying(256)|1||0
2|year|integer|0||0
3|content_rating|character varying(9)|0||0
4|duration|integer|0||0
5|lang|character varying(10)|0||0
6|country|character varying(20)|0||0
7|gross|bigint|0||0
8|budget|bigint|0||0
9|director_id|integer|0||0
sqlite>
sqlite> select * from sqlite_master where name="movie";
table|movie|movie|6|CREATE TABLE movie (
id integer NOT NULL,
title character varying(256) NOT NULL,
year integer,
content_rating character varying(9),
duration integer,
lang character varying(10),
country character varying(20),
gross bigint,
budget bigint,
director_id integer,
PRIMARY KEY (id),
FOREIGN KEY (director_id) REFERENCES director(id)
)
sqlite>
The sample Python program
By following the steps above, you can experience the provided sample program and study its code as follows:
cs3311@grieg:~/sqlite/a2$ ./sample
Usage: ./sample YEAR
cs3311@grieg:~/sqlite/a2$ ./sample 1989
A Nightmare on Elm Street 5: The Dream Child (1989)
Back to the Future Part II (1989)
Batman (1989)
Bill Ted's Excellent Adventure (1989)
Black Rain (1989)
Born on the Fourth of July (1989)
Dead Poets Society (1989)
Do the Right Thing (1989)
Driving Miss Daisy (1989)
Friday the 13th Part VIII: Jason Takes Manhattan (1989)
Glory (1989)
4/6/2020 COMP3311 20T1 - Assignment 2
https://cgi.cse.unsw.edu.au/~cs3311/20T1/assignments/a2/index.php 4/7
Halloween 5 (1989)
Henry V (1989)
Indiana Jones and the Last Crusade (1989)
Licence to Kill (1989)
Major League (1989)
New York Stories (1989)
Pet Sematary (1989)
Road House (1989)
Roger Me (1989)
Sea of Love (1989)
Sex, Lies, and Videotape (1989)
Star Trek V: The Final Frontier (1989)
Tango Cash (1989)
The Abyss (1989)
The Blood of Heroes (1989)
The Toxic Avenger Part II (1989)
Troop Beverly Hills (1989)
UHF (1989)
Warlock (1989)
We're No Angels (1989)
When Harry Met Sally... (1989)
cs3311@grieg:~/sqlite/a2$ ls
a2.db sample
cs3311@grieg:~/sqlite/a2$
cs3311@grieg:~/sqlite/a2$ cat sample
#!/usr/bin/python3
"