Database Systems & ApplicationsCourse Project 2022
1. Goal
⚫ Build a database driven web application from the ground up.
⚫ To develop the project, you are encouraged to use the Docker platform.
⚫ If you wish, you may use any languages, tools, microservice architecture,or web application frameworks, and run servers on your own machines orcloud servers.
⚫ Submit a report, together with your data and code.
2. Project Requirements
The project is to build a database driven web application. Specifically, you willneed to complete the following tasks over the course of this semester.
1) Pick your favorite data management application. It should berelatively substantial, but not too enormous. Several project ideas are
described at the end of this document, but you are encouraged to comeup with your own. When picking an application, keep the followingquestions in mind:
⚫ How do you plan to acquire the data to populate your database?Use of real datasets is highly recommended. You may use program
generated “fake” datasets if real ones are too difficult to obtain.
⚫ How are you going to use the data? What kind of queries do youwant to ask?
⚫ How is the data updated? Your application should support bothqueries and updates
2) Design the database schema. Start with an E/R diagram and convertit to a relational schema. Identify any constraints that hold in yourapplication domain, and code them as database constraints. If you planto work with real datasets, it is important to go over some samples ofreal data to validate your design (in fact, you should start Task 7 belowas early as possible, in parallel to Tasks 3~6). Do not forget to applydatabase design theory and check for redundancies.
3) Create a sample database using a small dataset. You may generatethis small dataset by hand. You will find this sample database veryuseful in testing, because large datasets make debugging difficult. It isa good idea to write some scripts to create/load/destroy the sampledatabase automatically; they will save you lots of typing whendebugging.
4) Design a web-based user interface for your application. Thinkabout how a typical user would use your application. Optionally, it might
be useful to build a “canned” demo version of the site first (i.e., withhardcoded rather than dynamically generated responses), while you
brush up your application design skills at the same time. Do not spendtoo much time on refining the look of your interface; you just need tounderstand the basic “flow” in order to figure out what database
operations are needed in each step of the user interaction.
5) Write SQL queries that will supply dynamic contents for the web pagesyou designed for Task 4. Also write SQL code that modifies thedatabase on behalf of the user. You may hardcode the query andupdate parameters. Test these SQL statements in the sample database.
6) Choose an appropriate platform for your application. Python orPHP? JavaScript or plain HTML? Start by implementing a “hello world”type of simple database driven web application, deploy it in yourdevelopment environment, and make sure that all parts are workingtogether correctly.
7) Acquire the large “production” dataset, either by downloading itfrom a real data source or by generating it using a program. Make surethe dataset fits your schema. For real datasets, you might need to writeprograms/scripts to transform them into a form that is appropriate forloading into a database. For program generated datasets, make surethey contain enough interesting “links” across rows of different tables,or else all your join queries may return non-empty results.
8) Test the SQL statements you developed for Task 5 in the largedatabase. Do you run into any performance problems? Try creatingsome additional indexes to improve performance.
9) Implement and debug the application and the web interface. Testyour application with the smaller sample database first. You may needto iterate the design and implementation several times in order tocorrect any unforeseen problems.
10) Test your application with the production dataset. Resolve anyperformance problems.
11) Polish the web interface. You may add as many bells and whistles asyou like, though they are optional because they are not the main focusof this course.
3. Milestone 1
You should have completed Tasks 1~5 and have started thinking about 6 and7. If you plan to work with real data, you should also have made significantprogress on Task 7 (you should at least ensure that it is feasible to obtain thereal dataset, transform it, and load it into your database).
4. Milestone 2
You should have completed Tasks 1~8 and have made good progress on 9.
5. Project Demo
At the end of the semester, you will need to present a working demo of yoursystem. Prior to your demo, submit the following:
1) A final project report, including
⚫ A brief description of your application.
⚫ A plan for getting the data to populate your database, as well assome sample data.
⚫ A list of assumptions that you are making about the data beingmodeled.
⚫ An E/R diagram for your database design.
⚫ A list of database tables with keys declared.
⚫ A description of the Web interface. You can write a briefdescription of how users interact with the interface (e.g., “the user
selects a car model from a pulldown menu, clicks on the ‘go’ button,and a new page will display all cars of this model that are availablefor sale”).
⚫ A brief description of the platform you chose in Task 6.
⚫ Changes you made to the database during performance tuning inTask 8, e.g., additional indexes created.
2) A .zip or .tar.gz archive of your source code. The source code directoryshould at least contain:
⚫ A file describing how to create and load your sample database.
⚫ Files containing the SQL code used for creating tables, constraints,stored procedures and triggers (if any).
⚫ A file test-sample.sql containing the SQL statements you wrote for
Task 5.
⚫ A file test-sample.out showing the results of running testsample.sql over your sample database.
⚫ If applicable, any code for downloading/scraping/transforming realdata that you have written for Task 7.
⚫ A file describing how to generate the “production” dataset and loadit into your database. Do not submit the production dataset itselfthrough if it is too big; instead, submit the URL where youdownload/scrape the raw data (if applicable), and the code thatextracts and transforms (or generates) the production dataset.
⚫ A file test-production.sql containing the SQL statements youwrote for Task 5. You may wish to modify some queries to returnonly the top 10 result rows instead of all result rows (there might belots for large datasets).
⚫ A file test-production.out showing the results of running testproduction.sql over the production dataset.
⚫ All your source code.
⚫ a README file describing how to set up your servers and database,and how to compile and deploy your application.
6. Project Ideas
Below is a list of possible project ideas for which high quality datasets exist. Ofcourse, you are welcome to come up with your own.
Examples include those that allow visitors to explore information about movies,music, sports, stocks, etc. There are already many commercial offerings forsuch purposes. While there is less room for innovation, there are plenty ofexamples of what a good web application would look like, as well as high quality,well formatted datasets.
⚫ IMDb Datasets (http://www.imdb.com/interfaces );
⚫ Sports Statistics (https://sports-statistics.com/sports-data/);
⚫ Million Song Dataset (http://millionsongdataset.com/);
⚫ Finance datasets available on data.world(https://data.world/datasets/finance);
⚫ Airbnb public datasets (http://insideairbnb.com/get-the-data.html);
⚫ Tianchi Data Sets (https://tianchi.aliyun.com/dataset/);
⚫ TPC-C benchmark (http://www.tpc.org/tpcc/);
⚫ TPC-H benchmark (http://www.tpc.org/tpch/);These projects are well suited for those who just want to learn how to build
database backed web applications as beginners. You can always spice thingsup by adding features that you wish those websites had (e.g., different ways forsummarizing, exploring, and visualizing the data).
Your task would be to take one of such datasets, design a good relational schema,clean up/restructure the data, and build a web application for the public to explorethe dataset.