Database Systems & Applications
Course Project 2022
1. Goal
⚫ Build a database driven web application from the ground up.
⚫ To develop the project, you are encouraged to use the Docker platform.
⚫ If you wish, you may use any languages, tools, microservice architecture,or web application frameworks, and run servers on your own machines orcloud servers.
⚫ Submit a report, together with your data and code.
2. Project Requirements
The project is to build a database driven web application. Specifically, you willneed to complete the following tasks over the course of this semester.1) Pick your favorite data management application. It should berelatively substantial, but not too enormous. Several project ideas aredescribed at the end of this document, but you are encouraged to comeup with your own. When picking an application, keep the followingquestions in mind:
⚫ How do you plan to acquire the data to populate your database?Use of real datasets is highly recommended. You may use program
generated “fake” datasets if real ones are too difficult to obtain.
⚫ How are you going to use the data? What kind of queries do youwant to ask?
⚫ How is the data updated? Your application should support both
queries and updates
2) Design the database schema. Start with an E/R diagram and convertit to a relational schema. Identify any constraints that hold in yourapplication domain, and code them as database constraints. If you planto work with real datasets, it is important to go over some samples ofreal data to validate your design (in fact, you should start Task 7 belowas early as possible, in parallel to Tasks 3~6). Do not forget to applydatabase design theory and check for redundancies.
3) Create a sample database using a small dataset. You may generatethis small dataset by hand. You will find this sample database veryuseful in testing, because large datasets make debugging difficult. It isa good idea to write some scripts to create/load/destroy the sampledatabase automatically; they will save you lots of typing whendebugging.
4) Design a web-based user interface for your application. Thinkabout how a typical user would use your application. Optionally, it mightbe useful to build a “canned” demo version of the site first (i.e., withhardcoded rather than dynamically generated responses), while youbrush up your application design skills at the same time. Do not spendtoo much time on refining the look of your interface; you just need tounderstand the basic “flow” in order to figure out what databaseoperations are needed in each step of the user interaction.
5) Write SQL queries that will supply dynamic contents for the web pagesyou designed for Task 4. Also write SQL code that modifies thedatabase on behalf of the user. You may hardcode the query andupdate parameters. Test these SQL statements in the sample database.
6) Choose an appropriate platform for your application. Python orPHP? JavaScript or plain HTML? Start by implementing a “hello world”type of simple database driven web application, deploy it in yourdevelopment environment, and make sure that all parts are workingtogether correctly.
7) Acquire the large “production” dataset, either by downloading it
from a real data source or by generating it using a program. Make sure
the dataset fits your schema. For real datasets, you might need to write
programs/scripts to transform them into a form that is appropriate for
loading into a database. For program generated datasets, make sure
they contain enough interesting “links” across rows of different tables,
or else all your join queries may return non-empty results.
8) Test the SQL statements you developed for Task 5 in the large
database. Do you run into any performance problems? Try creating
some additional indexes to improve performance.
9) Implement and debug the application and the web interface. Test
your application with the smaller sample database first. You may need
to iterate the design and implementation several times in order to
correct any unforeseen problems.
10) Test your application with the production dataset. Resolve any
performance problems.
11) Polish the web interface. You may add as many bells and whistles as
you like, though they are optional because they are not the main focus
of this course.
3. Milestone 1
You should have completed Tasks 1~5 and have started thinking about 6 and
7. If you plan to work with real data, you should also have made significant
progress on Task 7 (you should at least ensure that it is feasible to obtain the
real dataset, transform it, and load it into your database).
4. Milestone 2
You should have completed Tasks 1~8 and have made good progress on 9.
5. Project Demo
At the end of the semester, you will need to present a working demo of your
system. Prior to your demo, submit the following:
1) A final project report, including
⚫ A brief description of your application.
⚫ A plan for getting the data to populate your database, as well as
some sample data.
⚫ A list of assumptions that you are making about the data being
modeled.
⚫ An E/R diagram for your database design.
⚫ A list of database tables with keys declared.
⚫ A description of the Web interface. You can write a brief
description of how users interact with the interface (e.g., “the user
selects a car model from a pulldown menu, clicks on the ‘go’ button,
and a new page will display all cars of this model that are available
for sale”).
⚫ A brief description of the platform you chose in Task 6.
⚫ Changes you made to the database during performance tuning in
Task 8, e.g., additional indexes created.
2) A .zip or .tar.gz archive of your source code. The source code directory
should at least contain:
⚫ A file describing how to create and load your sample database.
⚫ Files containing the SQL code used for creating tables, constraints,
stored procedures and triggers (if any).
⚫ A file test-sample.sql containing the SQL statements you wrote for
Task 5.
⚫ A file test-sample.out showing the results of running testsample.sql over your sample database.
⚫ If applicable, any code for downloading/scraping/transforming real
data that you have written for Task 7.
⚫ A file describing how to generate the “production” dataset and load
it into your database. Do not submit the production dataset itself
through if it is too big; instead, submit the URL where you
download/scrape the raw data (if applicable), and the code that
extracts and transforms (or generates) the production dataset.
⚫ A file test-production.sql containing the SQL statements you
wrote for Task 5. You may wish to modify some queries to return
only the top 10 result rows instead of all result rows (there might be
lots for large datasets).
⚫ A file test-production.out showing the results of running testproduction.sql over the production dataset.
⚫ All your source code.
⚫ a README file describing how to set up your servers and database,
and how to compile and deploy your application.
6. Project Ideas
Below is a list of possible project ideas for which high quality datasets exist. Of
course, you are welcome to come up with your own.
Examples include those that allow visitors to explore information about movies,
music, sports, stocks, etc. There are already many commercial offerings for
such purposes. While there is less room for innovation, there are plenty of
examples of what a good web application would look like, as well as high quality,
well formatted datasets.
⚫ IMDb Datasets (http://www.imdb.com/interfaces );
⚫ Sports Statistics (https://sports-statistics.com/sports-data/);
⚫ Million Song Dataset (http://millionsongdataset.com/);
⚫ Finance datasets available on data.world
(https://data.world/datasets/finance);
⚫ Airbnb public datasets (http://insideairbnb.com/get-the-data.html);
⚫ Tianchi Data Sets (https://tianchi.aliyun.com/dataset/);
⚫ TPC-C benchmark (http://www.tpc.org/tpcc/);
⚫ TPC-H benchmark (http://www.tpc.org/tpch/);
These projects are well suited for those who just want to learn how to build
database backed web applications as beginners. You can always spice things
up by adding features that you wish those websites had (e.g., different ways for
summarizing, exploring, and visualizing the data).
Your task would be to take one of such datasets, design a good relational schema,
clean up/restructure the data, and build a web application for the public to explore
the dataset.