首页 > > 详细

CS5481 Data Engineering Assignment 2

CS5481 Data Engineering Assignment 2

Deadline: 25‐OCT‐2019 (Friday), noon (no late submission will be accepted)

Points to note:

 Different books may have slightly different descriptions of concepts, algorithms and

terminologies. To ensure fair assessment and uniformity in marking, you must follow the

convention used in the lecture slides or our textbook (Database System Concepts). Other

conventions will not be accepted.

 Students are expected to generalize the concepts they have learnt during the lecture in

order to finish the assignment.

 You must show the steps clearly. The marker will not give you marks if he cannot

understand your work.

 This is an individual assignment. You must work on your own. Check

http://www6.cityu.edu.hk/ah/plagiarism.htm for “The Problem of Plagiarism”.

 Submit the file to Canvas on or before the deadline.  The file type must be either .docx file or .pdf file.

 Use your student ID(s) to name the file, such as 5xxxxxxx.docx or 5xxxxxxx.pdf.

Query Processing and Optimization

Consider the following database schema.

Category (category-no, category-name, …)

Reader (reader-id, gender, age, grade-no, …)

Book (book-id, c-no, …)

Loan-record (r-id, b-id, loan-date, …)

The attributes r-id and b-id in Loan-record are foreign keys referencing reader-id in Reader

and book-id in Book, respectively. The attribute c-no in Book is a foreign key referencing

category-no in Category.

Suppose ONLY the following information and statistics are available and assume that all

distributions are uniform and independent of each other.

 Number of categories (nCategory): 280

 Number of readers (nReader): 50,000

 Number of books (nBook): 4,500

 Number of loan records (nLoan-record): 60,000

 Blocking factor of Category (fCategory): 8  Blocking factor of Reader (fReader): 12

 Blocking factor of Book (fBook): 9  Blocking factor of Loan-record (fLoan-record): 10

 Proportion of category names containing ‘science’: 1/7

 Number of female readers: 24,950

 Range of age: from 12 to 72

 Values of grade-no: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10

 Category, Reader and Book are sequential files in sorted order of their primary key

and Loan-record is a sequential file in sorted order of loan-date. Records do not span

across blocks.

CS5481 Data Engineering Assignment 2

Consider the following SQL query that retrieves all information about female readers whose

age is less than 30 with grade number above 6 and have loan records of any non-sciencerelated books. select * from Category c, Reader r, Book b, Loan-record l where c.category-name not like ’%science%’

and c.category-no=b.c-no and b.book-id=l.b-id

and r.reader-id=l.r-id and r.gender=’female’

and r.age<30 and r.grade-no>6;

(a) Write an unoptimized relational-algebra expression of the above SQL query that might

be initially generated from an SQL translator.

(b) Estimate the number of records the query returns. Explain your estimation.

(c) Assume that only 30 memory blocks are available and consider the cost of disk block

transfers (only). If left-deep join orders are considered, suggest the best evaluation plan for

the query. Particularly, you need to do the followings.

(i) Transform the unoptimized relational-algebra expression in Part (a) it into an

equivalent optimized relational-algebra expression for producing your suggested

evaluation plan. Show all the steps and state clearly the rule number of the equivalence

rule you used in each step.

(ii) Draw the fully annotated evaluation plan (including exactly what algorithm is used

for each operation and how the execution of the operations is coordinated).

(iii) Indicate in the evaluation plan the memory allocation (number of memory blocks

allocated to each input and output (bb), but the total cannot exceed the available

memory at one time).

(iv) Find the cost of each operation and the total cost of the whole plan in number of

disk block transfers.

(v) State clearly any justifiable assumptions you have made in your estimation.

(d) If any join orders can be considered, is it possible to have a better evaluation plan for the

query? If yes, you need to follow the steps in Part (c) to explain your answer.

(e) If a B+-tree index can be created on one attribute for one of the base relations, is it

possible to have a better evaluation plan for the query? If yes, which attribute and which

relation will you choose? Again, if yes, you need to follow the steps in Part (c) to explain

your answer.

联系我们

QQ：99515681
邮箱：99515681@qq.com
工作时间：8:00-21:00
微信：codinghelp

热点文章

更多

mgt202辅导、讲解 java/pytho... 2025-06-28
讲解 pbt205—project-based l... 2025-06-28
辅导 comp3702 artificial int... 2025-06-28
辅导 cs3214 fall 2022 projec... 2025-06-28
辅导 turnitin assignment讲解... 2025-06-28
辅导 finite element modellin... 2025-06-28
讲解 stat3600 linear statist... 2025-06-28
辅导 problem set #3讲解 matl... 2025-06-28
讲解 elen90066 embedded syst... 2025-06-28
讲解 automatic counting of d... 2025-06-28
讲解 ct60a9602 functional pr... 2025-06-28
辅导 stat3600 linear statist... 2025-06-28
辅导 csci 1110: assignment 2... 2025-06-28
辅导 geography调试r语言 2025-06-28
辅导 introduction to informa... 2025-06-28
辅导 envir 100: introduction... 2025-06-28
辅导 assessment 3 - individu... 2025-06-28
讲解 laboratory 1讲解留学生... 2025-06-28
辅导 ct60a9600 renewable ene... 2025-06-28
辅导 economics 140a homework... 2025-06-28

热点标签

engn4536/engn6536

comp(2041|9044)

litr1-uc6201.200

int2067/int5051

csci-ua.0480-003

cs247—assignment

msinm014/msing014/msing014b

联系我们 - QQ: 99515681 微信：codinghelp

© 2024 www.7daixie.com

程序辅导网！