首页 > > 详细

CS5481 Data Engineering Assignment 2

 CS5481 Data Engineering Assignment 2

Deadline: 25‐OCT‐2019 (Friday), noon (no late submission will be accepted)
Points to note:
 Different books may have slightly different descriptions of concepts, algorithms and 
terminologies. To ensure fair assessment and uniformity in marking, you must follow the 
convention used in the lecture slides or our textbook (Database System Concepts). Other 
conventions will not be accepted.
 Students are expected to generalize the concepts they have learnt during the lecture in 
order to finish the assignment.
 You must show the steps clearly. The marker will not give you marks if he cannot 
understand your work.
 This is an individual assignment. You must work on your own. Check 
http://www6.cityu.edu.hk/ah/plagiarism.htm for “The Problem of Plagiarism”.
 Submit the file to Canvas on or before the deadline.  The file type must be either .docx file or .pdf file. 
 Use your student ID(s) to name the file, such as 5xxxxxxx.docx or 5xxxxxxx.pdf. 
Query Processing and Optimization
Consider the following database schema.
Category (category-no, category-name, …)
Reader (reader-id, gender, age, grade-no, …)
Book (book-id, c-no, …)
Loan-record (r-id, b-id, loan-date, …)
The attributes r-id and b-id in Loan-record are foreign keys referencing reader-id in Reader
and book-id in Book, respectively. The attribute c-no in Book is a foreign key referencing 
category-no in Category.
Suppose ONLY the following information and statistics are available and assume that all 
distributions are uniform and independent of each other.
 Number of categories (nCategory): 280
 Number of readers (nReader): 50,000
 Number of books (nBook): 4,500
 Number of loan records (nLoan-record): 60,000
 Blocking factor of Category (fCategory): 8  Blocking factor of Reader (fReader): 12
 Blocking factor of Book (fBook): 9  Blocking factor of Loan-record (fLoan-record): 10
 Proportion of category names containing ‘science’: 1/7
 Number of female readers: 24,950
 Range of age: from 12 to 72
 Values of grade-no: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10
 Category, Reader and Book are sequential files in sorted order of their primary key
and Loan-record is a sequential file in sorted order of loan-date. Records do not span 
across blocks.
CS5481 Data Engineering Assignment 2
Consider the following SQL query that retrieves all information about female readers whose 
age is less than 30 with grade number above 6 and have loan records of any non-science￾related books. select * from Category c, Reader r, Book b, Loan-record l where c.category-name not like ’%science%’
and c.category-no=b.c-no and b.book-id=l.b-id
and r.reader-id=l.r-id and r.gender=’female’ 
and r.age<30 and r.grade-no>6;
(a) Write an unoptimized relational-algebra expression of the above SQL query that might 
be initially generated from an SQL translator.
(b) Estimate the number of records the query returns. Explain your estimation.
(c) Assume that only 30 memory blocks are available and consider the cost of disk block 
transfers (only). If left-deep join orders are considered, suggest the best evaluation plan for 
the query. Particularly, you need to do the followings.
(i) Transform the unoptimized relational-algebra expression in Part (a) it into an 
equivalent optimized relational-algebra expression for producing your suggested 
evaluation plan. Show all the steps and state clearly the rule number of the equivalence 
rule you used in each step.
(ii) Draw the fully annotated evaluation plan (including exactly what algorithm is used 
for each operation and how the execution of the operations is coordinated).
(iii) Indicate in the evaluation plan the memory allocation (number of memory blocks 
allocated to each input and output (bb), but the total cannot exceed the available 
memory at one time).
(iv) Find the cost of each operation and the total cost of the whole plan in number of 
disk block transfers.
(v) State clearly any justifiable assumptions you have made in your estimation.
(d) If any join orders can be considered, is it possible to have a better evaluation plan for the 
query? If yes, you need to follow the steps in Part (c) to explain your answer.
(e) If a B+-tree index can be created on one attribute for one of the base relations, is it 
possible to have a better evaluation plan for the query? If yes, which attribute and which 
relation will you choose? Again, if yes, you need to follow the steps in Part (c) to explain 
your answer.
联系我们
  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-21:00
  • 微信:codinghelp
热点标签

联系我们 - QQ: 99515681 微信:codinghelp
程序辅导网!