首页
编程语言
数据库
网络开发
Algorithm算法
移动开发
系统相关
金融统计
人工智能
其他
首页
>
> 详细
讲解 COMP3211、Python/Java程序辅导
Coursework Specification
Late submissions will be penalised at 10% per working day.
No work can be accepted after feedback has been given.
You should expect to spend up to 37.5 hours on this assignment.
Please note the University regulations regarding academic integrity.
Module: COMP3211 Advanced Databases
Assignment: Database Programming Exercise Weighting: 25 %
Deadline: 16:00 Wed 8 May 2024 Feedback: Fri 17 May 2024
Instructions
In this assignment, you will build a query optimiser for SJDB, a simple RDBMS. Your optimiser should accept a
canonical query plan (a project over a series of selects over a cartesian product over the input named
relations) and aim to construct a left-deep query plan which minimises the sizes of any intermediate relations.
Part 1: Estimator.java
Before implementing an optimiser for query plans, you must first estimate the cost of the query plans.
In the first phase, you must create a class Estimator that implements the PlanVisitor interface and performs
a depth-first traversal of the query plan. On each operator, the Estimator should create an instance of Relation
(bearing appropriate Attribute instances and tuple counts) and attach to the operator as its output.
Some operators may require you to revise the value counts for the attributes on the newly created output
relations (for example, a select of the form attr=val will change the number of distinct values for that
attribute to 1). Note also that an attribute on a relation may not have more distinct values than there are
tuples in the relation.
Page 5 of this coursework specification lists the formulae that you should use to calculate the sizes of the
output relations, and to revise the attribute value counts. The supplied distribution of SJDB includes a
skeleton for Estimator, including an implementation of the visit(Scan) method.
Part 2: Optimiser.java
Once you have an estimator, you must create a class Optimiser that will take a canonical query plan as input,
and produce an optimised query plan as output. The optimised plan should not share any operators with the
canonical query plan; all operators should be created afresh.
In order to demonstrate your optimiser, you should be able to show your cost estimation and query
optimisation classes in action on a variety of inputs. The SJDB zip file contains a sample catalogue and
queries. In addition, the SJDB class (see page 3) contains a main() method with sample code for reading a
serialised catalogue from file and a query from stdin.
Part 3: Report
In addition to your estimator and optimiser, you should produce a short (maximum 500 word) report that
describes the optimisation strategy that you’ve adopted.
Note
You should not need to modify any of the provided classes or interfaces as part of your submission (aside
from Estimator), but if you think that you have a justifiable reason for doing so, please contact Nick for
permission first.
2
Submission
Please submit your files (Estimator.java, Optimiser.java and report.pdf) using the electronic hand-in system
(http://handin.ecs.soton.ac.uk/) by 4pm on the due date.
Late submissions will be penalised at 10% per working day and no work can be accepted after feedback has
been given.
You should expect to spend up to 37.5 hours on this assignment, and you should note the University
regulations regarding academic integrity:
http://www.calendar.soton.ac.uk/sectionIV/academic-integrity-statement.html
Relevant Learning Outcomes
1. The internals of a database management system
2. The issues involved in developing database management software
3. Demonstrate how a DBMS processes, optimises and executes a query
4. Implement components of a DBMS
Marking Scheme
Criterion Description Outcomes Total
Cost Estimator Implementation of the cost estimator 1,2,3,4 40 %
Optimiser Implementation of the query optimiser 1,2,3,4 40 %
Report Description of your query optimisation strategy 1,2,3 20 %
Note that partial credit will be given for incomplete solutions; for example, an optimiser that moves some
(but not all) selections down the query plan will still receive part of the total mark for the optimiser
component.
3
SJDB – A Simple Java Database
SJDB supports a limited subset of the relational algebra, consisting of the following operators only:
• cartesian product
• select with a predicate of the form attr=val or attr=attr
• project
• equijoin with a predicate of the form attr=attr
• scan (an operator that reads a named relation as a source for a query plan)
In addition, all attributes on all relations will be strings; there are no other datatypes available. Attributes also
have globally unique names (there may not be two attributes of the same name on different relations), and
self-joins on relations are not permitted.
The sjdb package contains the following classes and interfaces:
Relation an unnamed relation, contains attributes
NamedRelation a named relation
Attribute an attribute on a relation
Predicate a predicate for use with a join or select operator
Operator abstract superclass for all operators
UnaryOperator abstract superclass for all operators with a single child
Scan an operator that feeds a named relation into a query plan
Select an operator that selects certain tuples in its input, via some predicate
Project an operator that projects certain attributes from its input
BinaryOperator abstract superclass for all operator with two children
Product an operator that performs a cartesian product over its inputs
Join an operator that joins its inputs, via some predicate
Catalogue a directory and factory for named relations and their attributes
CatalogueException a failure to retrieve relations or attributes from the catalogue
CatalogueParser a utility class that reads a serialised catalogue from file
QueryParser a utility class that reads a query and builds a canonical query plan
PlanVisitor an interface that when implemented performs a depth-first plan traversal
Inspector a utility class that traverses an annotated plan and prints out the estimates
SJDB class containing main()
Test an example of the test harnesses used for marking
The SJDB class contains a main() method with skeleton code for reading catalogues and queries.
The system provides basic statistical information about the relations and attributes in the database, as below.
These are stored on the relations and attributes themselves, and not in the catalogue.
• the number of tuples in each relation
• the value count (number of distinct values) for each attribute
A sample serialised catalogue (cat.txt) and queries (q1.txt, etc) are available in sjdb/data.
4
Test Harness Notes
The file Test.java in the SJDB distribution contains an example of the test harness that I will be using to mark
your submissions. This example test harness manually constructs both plans and catalogues as follows:
package sjdb;
import java.io.*;
import java.util.ArrayList;
import sjdb.DatabaseException;
public class Test {
private Catalogue catalogue;
public Test() {
}
public static void main(String[] args) throws Exception {
Catalogue catalogue = createCatalogue();
Inspector inspector = new Inspector();
Estimator estimator = new Estimator();
Operator plan = query(catalogue);
plan.accept(estimator);
plan.accept(inspector);
Optimiser optimiser = new Optimiser(catalogue);
Operator planopt = optimiser.optimise(plan);
planopt.accept(estimator);
planopt.accept(inspector);
}
public static Catalogue createCatalogue() {
Catalogue cat = new Catalogue();
cat.createRelation("A", 100);
cat.createAttribute("A", "a1", 100);
cat.createAttribute("A", "a2", 15);
cat.createRelation("B", 150);
cat.createAttribute("B", "b1", 150);
cat.createAttribute("B", "b2", 100);
cat.createAttribute("B", "b3", 5);
return cat;
}
public static Operator query(Catalogue cat) throws Exception {
Scan a = new Scan(cat.getRelation("A"));
Scan b = new Scan(cat.getRelation("B"));
Product p1 = new Product(a, b);
Select s1 = new Select(p1, new Predicate(new Attribute("a2"), new Attribute("b3")));
ArrayList
atts = new ArrayList
();
atts.add(new Attribute("a2"));
atts.add(new Attribute("b1"));
Project plan = new Project(s1, atts);
return plan;
}
}
As can be seen in this test harness, I use the Inspector class (provided with the SJDB sources) to print out a
human-readable version of your query plans – your query plans must be able to accept this visitor without
throwing exceptions. Your estimator and optimiser need not (and should not) produce any data on stdout
(you should use the Inspector for this when testing).
Note also that you should manually construct plans that contain joins in order to test your Estimators.
Estimators and Optimisers that do not run without errors will be marked by inspection only, and will
consequently receive a reduced mark.
5
Cost Estimation
As described in lectures, the following parameters are used to estimate the size of intermediate relations:
• T(R), the number of tuples of relation R
• V(R,A), the value count for attribute A of relation R (the number of distinct values of A)
Note that, for any relation R, V(R, A) ≤ T(R) for all attributes A on R.
Scan
T(R) (the same number of tuples as in the NamedRelation being scanned)
Product
T(R × S) = T(R)T(S)
Projection
T(πA(R)) = T(R) (assume that projection does not eliminate duplicate tuples)
Selection
For predicates of the form attr=val:
T(σA=c(R)) = T(R)/V(R,A), V(σA=c(R),A) = 1
For predicates of the form attr=attr:
T(σA=B(R)) = T(R)/max(V(R,A),V(R,B)), V(σA=B(R),A) = V(σA=B(R),B) = min(V(R,A), V(R,B)
Join
T(R⨝A=BS) = T(R)T(S)/max(V(R,A),V(S,B)), V(R⨝A=BS,A) = V(R⨝A=BS,B) = min(V(R,A), V(S,B))
(assume that A is an attribute of R and B is an attribute of S)
Note that, for an attribute C of R that is not a join attribute, V(R⨝A=BS,C) = V(R,C)
(similarly for an attribute of S that is not a join attribute)
Further Reading
For further information on cost estimation, see §16.4 of Database Systems: The Complete Book
联系我们
QQ:99515681
邮箱:99515681@qq.com
工作时间:8:00-21:00
微信:codinghelp
热点文章
更多
讲解 econ1202 – quantitativ...
2024-11-22
辅导 msds 490: healthcare an...
2024-11-22
讲解 civl 326 geotechnical d...
2024-11-22
辅导 term paper medicine whe...
2024-11-22
讲解 eng3004 course work辅导...
2024-11-22
讲解 ee512: stochastic proce...
2024-11-22
辅导 geog100 ol01 - fall 202...
2024-11-22
辅导 st5226: spatial statist...
2024-11-22
讲解 ece 101a engineering el...
2024-11-22
讲解 database development an...
2024-11-22
讲解 comp3134 business intel...
2024-11-22
讲解 practice exam 2, math 3...
2024-11-22
讲解 project 4: advanced opt...
2024-11-22
辅导 38003 organisational be...
2024-11-22
辅导 economic growth调试spss
2024-11-22
辅导 ee512: stochastic proce...
2024-11-22
讲解 eesb04 "principles of h...
2024-11-22
辅导 am2060 final assignment...
2024-11-22
辅导 acfim0035 fundamentals ...
2024-11-22
辅导 stat 612 (fall 2024) ho...
2024-11-22
热点标签
mktg2509
csci 2600
38170
lng302
csse3010
phas3226
77938
arch1162
engn4536/engn6536
acx5903
comp151101
phl245
cse12
comp9312
stat3016/6016
phas0038
comp2140
6qqmb312
xjco3011
rest0005
ematm0051
5qqmn219
lubs5062m
eee8155
cege0100
eap033
artd1109
mat246
etc3430
ecmm462
mis102
inft6800
ddes9903
comp6521
comp9517
comp3331/9331
comp4337
comp6008
comp9414
bu.231.790.81
man00150m
csb352h
math1041
eengm4100
isys1002
08
6057cem
mktg3504
mthm036
mtrx1701
mth3241
eeee3086
cmp-7038b
cmp-7000a
ints4010
econ2151
infs5710
fins5516
fin3309
fins5510
gsoe9340
math2007
math2036
soee5010
mark3088
infs3605
elec9714
comp2271
ma214
comp2211
infs3604
600426
sit254
acct3091
bbt405
msin0116
com107/com113
mark5826
sit120
comp9021
eco2101
eeen40700
cs253
ece3114
ecmm447
chns3000
math377
itd102
comp9444
comp(2041|9044)
econ0060
econ7230
mgt001371
ecs-323
cs6250
mgdi60012
mdia2012
comm221001
comm5000
ma1008
engl642
econ241
com333
math367
mis201
nbs-7041x
meek16104
econ2003
comm1190
mbas902
comp-1027
dpst1091
comp7315
eppd1033
m06
ee3025
msci231
bb113/bbs1063
fc709
comp3425
comp9417
econ42915
cb9101
math1102e
chme0017
fc307
mkt60104
5522usst
litr1-uc6201.200
ee1102
cosc2803
math39512
omp9727
int2067/int5051
bsb151
mgt253
fc021
babs2202
mis2002s
phya21
18-213
cege0012
mdia1002
math38032
mech5125
07
cisc102
mgx3110
cs240
11175
fin3020s
eco3420
ictten622
comp9727
cpt111
de114102d
mgm320h5s
bafi1019
math21112
efim20036
mn-3503
fins5568
110.807
bcpm000028
info6030
bma0092
bcpm0054
math20212
ce335
cs365
cenv6141
ftec5580
math2010
ec3450
comm1170
ecmt1010
csci-ua.0480-003
econ12-200
ib3960
ectb60h3f
cs247—assignment
tk3163
ics3u
ib3j80
comp20008
comp9334
eppd1063
acct2343
cct109
isys1055/3412
math350-real
math2014
eec180
stat141b
econ2101
msinm014/msing014/msing014b
fit2004
comp643
bu1002
cm2030
联系我们
- QQ: 99515681 微信:codinghelp
© 2024
www.7daixie.com
站长地图
程序辅导网!