讲解 DTS106TC: Introduction to Database Assessment Task 002辅导 SQL语言

DTS106TC: Introduction to Database

School of AI and Advanced Computing

Assessment Task 002 (CW)

3rd June 2024 at 5:00 PM (GMT +8)

DTS106TC Introduction to Database

Coursework- Assessment Task 002

Due: 17:00, June 3^rd 2024

Weight: 40%

Maximum Marks: 100

The course work will be assessed for the following learning outcomes:

Demonstrate a basic understanding of the design of databases.

Show a fundamental grounding in the operation and usage of database management systems including "hands-on" experience of a basic database management system.

Demonstrate in-depth knowledge of the database language, SQL.

Overview:

To evaluate learning outcomes A, B, and C, this assessment aims to replicate a practical scenario in database development: constructing a professional database system based on a given dataset. This assessment will involve creating and normalizing a database to hold the data from a dataset and constructing queries against the data once it is in a suitable form.

There are the following 5 parts of this coursework

The Relational Model (20%)

Normalization (25%)

Database construction (15%)

Querying (35%)

Visualization (5%)

In detail, Step 1 (Q1: The Relational Model) requires students to analyze the provided dataset, identifying data types and determining candidate keys and primary keys for the relation. Recognizing potential issues such as data integrity and storage efficiency in the original relation, Step 2 (Q2: Normalization) is undertaken to mitigate redundancy and dependency, thereby ensuring data integrity and enhancing database performance. Subsequently, Step 3 (Q3: Database Construction, Q4: Querying) illustrates the process of populating/exporting data of the Database Management System and executing manipulations on the Database Management System using SQL. Lastly, Step 5 (Q5: Visualization) facilitates users in comprehending complex datasets swiftly and effortlessly by presenting information in a visual format.

You will be required to write a brief report to answer the questions and note down your process, thoughts, and assumptions made as well as answers. You will also need to construct a database using Oracle Application Express (APEX) or SQLite and produce a set of queries that can be run against that database.

Marking Criteria

This coursework will be graded out of 100 marks and contributes 60% of the overall credit for the module. There are 20 questions and 5 marks available for every question. Marks will be awarded based on the level of correctness of each answer. For example, 100% for fully correct with required explanation/justification, partial marks based on the level of incorrectness or missing required details. Marks will be awarded as follows:

full marks: Complete and correct answers with rationale

80% marks: An answer is correct but rationale/justification is partially correct

60% mark: Partially attempted/correct

40% and 20% marks, based on the level of attempt/correctness

0 marks: Not attempted/wrong

Dataset

The dataset to be used for this assessment is random generated virus data. The dataset contains the number of distribution of virus in different regions.

The dataset and data dictionary can be downloaded from LMO under the Assessment 002-Coursework.

Setup

You can complete this coursework either using Oracle Application Express (APEX) or SQLite. All you need to submit SQL scripts for constructing and querying the database along with the report.

Q1: The Relational Model (4*5=20 Marks)

Q1(a): Write down the relations that directly map the dataset file into a mega table/relation. You don’t have to create SQL code for transforming the dataset into a table, just list down relations and attributes with appropriate data types and justification, here is an example:

R1 (A1 [data type], A2[data type]…An[data type])

Where R1 refers to relation and A1 to An are set of attributes

Note: You must use the attribute names exactly as they appear in the dataset.

Types: To ease the modeling process, you can either use SQLite data types (INTEGER, TEXT, BLOB, REAL, and NUMERIC).

Please refer the following link for more information https://www.sqlite.org/datatype3.html

or Oracle data types (NUMBER, VARCHAR2, CHAR, CLOB, BLOB, etc.) https://docs.oracle.com/cd/B12037_01/server.101/b10759/sql_elements001.htm

Q1(b): List the minimal set of Functional Dependencies (FDs) and provide an explanation.

· Every FD must be minimal on its LHS (left-hand side)

§ There can be more than one attribute on the LHS, but there should be no attributes on the LHS that add no further information (e.g. If A -> C, then A, B -> C would not be minimal as the B is not adding anything further)

· There should be no trivial FDs (a -> a adds nothing of value)

· There must be no redundant FDs (those which are already implied by other FDs)

· Tip: If A -> B and C -> B and C -> A then C -> B is redundant (as this is implied by C -> A -> B)

Explain any assumptions you make applying what you know of the domain to the data and consider future data and the impact it may have as well.

You will need to think and determine whether values are 'blank' (a known value of blank) or null (an as yet unknown value) as this may have an impact on your dependencies. Explain any assumptions and decisions you make in the report.

Q1(c): List all potential candidate keys with justification

Q1(d): Identify a suitable primary key, and justify your decision

Q2: Normalization (5*5=25 Marks)

In this exercise, you will decompose the mega relation in Q1 into a set of normalized schemas and explain the role of each schema in the database.

You should apply step by step procedure to transform. the unnormalized tables created in Q1 to create a normalized database. Each schema must be in the 3rd Normal Form. Please consider the following guidelines while answering the below questions:

Keys: Where possible, you should only introduce new Surrogate Keys to avoid anomalies, you should explain this in your report with a justification.

Attributes: While you are able to introduce new attributes if you wish; you must not rename or remove or change the values of any of the attributes in the original relation. All must appear as originally named in your decomposed relations.

NULL values: NULL values are not values in themselves, but represent unknown values in the dataset (You cannot treat all NULL values as the same 'null' value). NULL values can be present throughout the normalization process, you do not need to remove them. However, you may find you need to introduce surrogate keys in the case where a NULL could or is present in something you would want to be a key or split into a relation.

Q2(a): List any partial-key dependencies with justification from the mega relation as it stands in the dataset file and any resulting additional relations you should create as part of the decomposition.

Q2(b): Convert the relation into 2nd Normal Form. using your answer to the above. List the new relations and their fields, types, and keys. Explain the process you took.

Q2(c): List any transitive dependencies, if any, in your new relations and justify your answer

Q2(d): Convert the relation into 3rd Normal Form. using your answers to the above. List the new relations and their fields, types, and keys. Explain the process you took.

Note: Depending on assumptions you have made in earlier processes, tables may already be in 3NF - in that case, demonstrate why a table is in 3NF.

Q2(e): Is your relation in Boyce-Codd Normal Form? Justify your answer.

Q3: Database Construction (3*5=15 Marks)

Q3(a): Import the raw dataset into a single table called 'CW2_dataset'.

Tip: To import the dataset in SQLite DB Browser, you should create a database called vrius.db and then use File->import-> Table from CSV file.

Tip: To import the dataset in Oracle APEX, you should apply for a workspace on the APEX website, then use SQL Workshop -> Utilities -> Data Workshop, select "Load Data" -> "Choose File," and upload the local CW2_dataset.csv file to the workspace.

Note: You should not change the CSV file - it must be the original provided dataset file. The attribute names must be the same as in the original file.

Export this table as dataset.sql (including CREATE and INSERT statements), such that running it will import the full dataset into a fresh database.

The entire database at this point should be saved as dataset.sql.

In the report write down the process you took and submit dataset.sql along with the report.

Q3(b): Based on Q2, write the SQL to create the full normalized representation, including all tables as a result of the decomposition process you took for Q2, excluding the mega dataset table.

The SQL should contain CREATE statements to create all new tables. You should include foreign keys where appropriate, and list and justify these in your answer.

Q3(c): Write INSERT statements using SELECT to populate the new tables from the 'CW2_dataset' table.

Save both (CREATE and INSERT statements in a single file) as Normalized_Data.sql and run them to populate the normalized database.

Each SQL statement should be written in the report, as well as saved. You should explain the steps for each of the above questions. Briefly write down the process you went through to go with it - enough that a person with just your report could reproduce what you have done.

Submit dataset.sql (containing original dataset) and Normalized_Data.sql along with your report.

Q4: Querying (7*5=35 Marks)

For each exercise in this question, you will need to write a SQL query against your newly created normalized tables in your database. The queries created over the unnormalized mega dataset table will be considered wrong.

Each SQL statement, explanation, snapshot of results should be written in the report, as well as all queries should be saved in a single file with numbers such as 4a, 4b, and so on, as sqlScript.sql which can be run against your normalized database, as it stands at the end of Q3. You should also briefly describe your approach for each in the report.

Write SQL statement, explanation, and snapshot of results for each of the following questions:

4(a): Find the ‘continents’, ‘countriesAndTerritories’, ‘year’, ‘month’ and ‘day’ where both the number of cases and deaths are higher than the average number of daily cases and daily deaths across all records in the table. Sort the results in descending order based on the highest number of cases and deaths.

4(b): Find the top 3 ‘countriesAndTerritories’ within each continent with the highest number of daily ‘cases’.

4(c): The number of cases and deaths by date, in descending date order, for each continent with the continent, date, number of cases, and number of deaths as columns)

Tip: You may choose whether to organize by date first or by continent first

4(d): For each continent, find the monthly average ‘cases’ and ‘deaths’ of every ‘countriesAndTerritories’. Then display the monthly ‘cases’ and ‘deaths’ of the ‘countriesAndTerritories’ that higher than the monthly average. The columns of the output should consist of ‘Continent’, ‘countriesAndTerritories’, ‘MonthlyAverageCases’, ‘MonthlyAverageDeath’, ‘MonthlyCases’ and ‘MonthlyDeath’.

4(e): The total number of cases and deaths as a percentage of the population, for each country with

‘country’, ‘% cases of population’, ‘% deaths of population as columns’.

4(f): A descending list of the top 10 countries, by percentage total deaths out of total cases in that country (with country name and % deaths of country cases as columns)

4(g): The date against a cumulative running total of the number of deaths by day and cases by day for the United State of America (with date, USA_daily_cases, USA_cumulative_cases, USA_daily_deaths, and USA_cumulative_Deaths as columns)

Note: Cumulative deaths refer to the addition of all the deaths to a specific date. For example Deaths: 1, 6, 10, 20, Cumulative Deaths would be: 1, 7, 17, 37.

Q5: Visualization (5 Marks)

Q5(a): Write a SQL script. to plot the top 10 countries in terms of overall cumulative deaths only. Include an explanation of your script. in the report [3 marks].

Q5(b): Produce a graph that has a date on the horizontal axis and the cumulative number of deaths by country on the vertical axis. You should represent the top 10 countries in terms of overall cumulative deaths only [2 marks].

The full script. and resulting screenshot of the graph should be included in the report.

Report Submission Guidelines

All students must download their file and check that it is viewable after submission. Documents may become corrupted during the uploading process (e.g. due to slow internet connections). However, students themselves are responsible for submitting a functional and correct file for assessments. Only electronic submission is accepted and no hard copy submission.

You should submit single final report (PDF) and a zip script. file at LMO. Therefore, you should submit two files:

1. REPORT in PDF. A document named [‘your student name_Id’].pdf, which is your final report. The assignment must be typed in an MS Word document and submitted as a pdf via Learning Mall.

2. ZIP script. file. You should submit a zip file including all the relevant scripts files, e.g.,

1. dataset.sql: the create and insert statements of the unnormalized dataset after importing the dataset

2. Normalized_Data.sql: The full database after creating and populating the normalized tables (CREATE and INSERT statements of the normalized dataset

3. sqlScrip.sql: SQL statements to run queries against normalized tables

4. …

Generic Marking Criteria and Assessment Rubric

The below are generic marking criteria and assessment rubric.

Support

Any questions and answers will be added to the FAQ. Please use the coursework discussion channel at LMO as the first point of call for any questions, problems, clarifications, or anything you would like us to go over. If you prefer to ask privately, please send an email to the relevant instructor.

Good Luck!

Generic Marking Criteria

Grade	Point Scale		Criteria to be satisfied
A	81+	First	Ø Outstanding work that is at the upper limit of performance. Ø Work would be worthy of dissemination under appropriate conditions. Ø Mastery of advanced methods and techniques at a level beyond that explicitly taught. Ø Ability to synthesise and employ in an original way ideas from across the subject. Ø In group work, there is evidence of an outstanding individual contribution. Ø Excellent presentation. Ø Outstanding command of critical analysis and judgment.
B	70 - 80	First	Ø Excellent range and depth of attainment of intended learning outcomes. Ø Mastery of a wide range of methods and techniques. Ø Evidence of study and originality clearly beyond the bounds of what has been taught. Ø In group work, there is evidence of an excellent individual contribution. Ø Excellent presentation. Ø Able to display a command of critical thinking, analysis and judgment.
C	60 - 69	Upper Second	Ø Attained all the intended learning outcomes for a module or assessment. Ø Able to use well a range of methods and techniques to come to conclusions. Ø Evidence of study, comprehension, and synthesis beyond the bounds of what has been explicitly taught. Ø Very good presentation of material. Ø Able to employ critical analysis and judgement. Ø Where group work is involved there is evidence of a productive individual contribution
D	50- 59	Lower Second	Ø Some limitations in attainment of learning objectives but has managed to grasp most of them. Ø Able to use most of the methods and techniques taught. Ø Evidence of study and comprehension of what has been taught Ø Adequate presentation of material. Ø Some grasp of issues and concepts underlying the techniques and material taught. Ø Where group work is involved there is evidence of a positive individual contribution.
E	40 - 49	Third	Ø Limited attainment of intended learning outcomes. Ø Able to use a proportion of the basic methods and techniques taught. Ø Evidence of study and comprehension of what has been taught, but grasp insecure. Ø Poorly presented. Ø Some grasp of the issues and concepts underlying the techniques and material taught, but weak and incomplete.
F	0 - 39	Fail	Ø Attainment of only a minority of the learning outcomes. Ø Able to demonstrate a clear but limited use of some of the basic methods and techniques taught. Ø Weak and incomplete grasp of what has been taught. Ø Deficient understanding of the issues and concepts underlying the techniques and material taught. Ø Attainment of nearly all the intended learning outcomes deficient. Ø Lack of ability to use at all or the right methods and techniques taught. Ø Inadequately and incoherently presented. Ø Wholly deficient grasp of what has been taught. Ø Lack of understanding of the issues and concepts underlying the techniques and material taught. Ø Incoherence in presentation of information that hinders understanding.
G	0	Fail	Ø No significant assessable material, absent, or assessment missing a "must pass" component.