INFO20003 S1 2022 A1 Page 1 of 5
INFO20003 Semester 1, 2022
Assignment 1: ER Modelling
Due: 1st of April 6:00pm
Submission: Via LMS https://canvas.lms.unimelb.edu.au/
Google scholar
Google scholar is a free accessible web search engine that provides a broad range of information on scholarly
publications. It also provides information on researchers, relations among scholarly publications and researchers,
and different metrics to measure impact of publications.
Your team has been asked to design a MySQL database to store information about the researchers, the details of
the publications, and relations among researchers and publications. With your design, Google scholar hopes to
improve access and searching of the data. The following specifications have been provided to your team to assist
in your design.
Researchers
For each researcher profile, Google scholar records the researcher’s details such as first name, last name, a unique
email address tied to the researcher, and the institution(s) the researcher is affiliated with. A researcher can be
affiliated with one or multiple institutions. For each institution, the name of the department, if exists (e.g.,
Computing and Information Systems), and the name of the institution (e.g., The University of Melbourne) are stored.
Each researcher can also be associated with a few keywords representing their ‘research area’, such as Databases,
Machine learning, Psychology, Medicine, etc.
Each researcher is also associated with multiple metrics providing insight on their research quality. A researcher
has a total citation count number (e.g., 203) that denotes the total number of times their publications have been
cited by other publications (see ‘Publications’ for the details on citations), an h-index number (e.g., 15) and an i10-
index number (e.g., 12). An h-index value of 15 means that this researcher has 15 publications that have each been
cited at least 15 times (similarly an h-index of 14 implies that the researcher has 14 publications that have each
been cited at least 14 times). An i10-index value is the number of publications with at least 10 citations.
Publications
Google scholar maintains three types of publications: conference papers, journals, and book chapters. For each
publication, Google scholar stores its title, year of publication, publication venue name, a list of authors (there can
be multiple authors of a publication, where each author is a researcher), start page number (e.g., 475), end page
number (e.g., 500), a short description (which is denoted as its abstract), and publisher company (assume only one
publisher company per publication). Google scholar may also store a few keywords such as Databases, Machine
learning, Psychology, etc for the ‘research area’ of each publication. Each publication has a list of references (i.e.,
it “cites” other publications), where each reference is another publication. If publication A is in the reference list of
another publication B, then B is “cited by” A. For each publication, Google scholar also shows the total number of
its citations, which denotes the number of times that the publication was cited by other publications. The following
figure shows an example publication with its basic information and its list of references.
É
INFO20003 S1 2022 A1 Page 2 of 5
Fig 1: An example of publication with authors, abstract and the list of references
Google scholar database will not store the actual publications, but rather a link to the document objects. Each
publication is linked to one document object. For each document object, Google scholar stores the URL link and
the document size in KB.
For each journal, the volume number and the issue number of the journal publication are also stored. Each book
chapter has an edition number. Each conference paper is associated with a conference event where one of the
authors (researchers) of the paper needs to make an oral presentation of the paper. Multiple conference papers
can be presented in the same conference event, but a conference paper can be presented in only one conference
event. Google scholar stores the following information for each conference event: the location (e.g., Melbourne,
Australia) of the conference, the start date, the end date of the conference, and the URL to the conference event
website. For each conference paper, the author who presents that paper in the conference, is also stored.
Research activities
For each researcher, Google scholar stores their list of publications. The researchers who author a publication
together are called ‘co-authors’. A researcher can create a list of publications as ‘bookmark’ that they find
important. A publication can be added or removed from the ‘bookmarked’ list of a researcher, but Google scholar
does not record the history of bookmarks. Only the publications that are currently in the ‘bookmarked’ list for a
researcher, are stored.
Featured publications
Google scholar manually curates a list of top 10 publications every fortnight depending on the number of citations
of the publications. A publication can make it to top 10 more than once over time. We must be able to determine
INFO20003 S1 2022 A1 Page 3 of 5
how many times a publication has been in the top 10 (but we do not need to know the rationale why this is the
current ranking, i.e. we do not need to track publication citations throughout time). A publication can stay in the
top 10 at the same or different rank. For example, the publication entitled “Learning to index” can be number 1
from January 1st – January 14th, 2022 but the rank drops to number 3 from January 15th-January 31st, 2022. For
each such rank, we must keep a record of start date when a publication reached that position and end date when
it was removed from the position (and potentially from the list altogether).
Business Requirements
Your database design needs to be able to meet the business' needs to answer questions as:
1. How many researchers have at least 10 publications?
2. How many publications of a researcher “Farhana Choudhury” have at least 5 citations?
3. Which researcher has the highest number of publications in the entire Google scholar?
4. How many researchers are associated with the institute “The University of Melbourne”?
5. When did the researcher “Farhana Choudhury” and “Renata Borovica-Gajic” have their first co-authored
publication?
6. Who are the researchers who published in any journal in 2021, where the journal has the research area
“Databases”?
7. Which researcher has the highest h-index at “The University of Melbourne”?
8. Which publication has referenced (cited) the highest number of other publications?
9. Are there any publications whose document size exceeds 500KB?
10. Which keyword is the most popular (meaning that the highest number of researchers have stated it as
one of their research areas)?
11. Which publication is appearing in the most bookmarked lists for the researchers?
12. Which publication stayed the longest in the top 10 list?
13. Which researcher has the highest cumulative number of co-authors?
14. Which researchers have presented more than one conference paper in the same conference event?
INFO20003 S1 2022 A1 Page 4 of 5
Instructions
Assignment 1 is worth 10% of your final mark. The assignment will be graded out of 100 marks as described in the
table below:
ER Physical Model with assumptions 80%
Conceptual Model in Chen’s Notation 20%
.mwb Physical Model File submitted Assignment Hurdle
x You are to analyse this business case and design a Conceptual ER Model in Chen’s notation (can be hand
drawn) as taught in class and a Physical ER Model for a MySQL Relational Database in Crow’s foot
notation (modelled with MySQL Workbench).
x You may list any assumptions you have made about the model on a separate page. There is a 200-word
limit for assumptions. Assumptions must not be used to simplify the assignment, but only to justify your
decision about any ambiguity in the study.
x As part of your submission you must submit your final MySQL Workbench .mwb file of your Physical ER
Model.
NOTE: Items 1 (Conceptual Model & Physical ER model) and 2 (Assumptions) must be submitted as a
single PDF. The title of the PDF document must be your student id (e.g. 987654.pdf). Similarly, the title of
your .mwb file should be your studentid.mwb (e.g. 987654.mwb).
INFO20003 S1 2022 A1 Page 5 of 5
Assignment Submission
You are to submit the assignment under the Assignments tab on Canvas LMS. The submission will require you to
submit two files:
1. ONE PDF document named as your student id (e.g. 987654.pdf) containing:
i. a legible (hand drawn, or otherwise modelled) picture of a conceptual model in Chen’s notation
ii. a screenshot/export of your Physical ER Model done in MySQL workbench. Ensure that all attributes
are readable, and tables are fully expanded.
iii. any assumptions you made (limit this to a maximum of 200 words)
2. A copy of your final .mwb MySQL Workbench file of the Physical ER model.
Note: This model will only be used by staff in circumstances where the screenshot/export of your model in the
PDF is unreadable (i.e. if we have to open this file, a penalty of 10% will occur), so remember to include the
export of your physical model in the PDF!
Note: If you fail to submit clear and legible models your assignment will be penalised
Unless you have an approved extension (see below), you will be penalised -10% of the total number of marks in
the assignment per day that your submission is late. For instance, if you received a 78% raw score, but submitted
2 days late, you'd receive a 58% for the assignment.
Requesting a Submission Deadline Extension
If you need an extension due to a valid (medical) reason, you will need to provide evidence to support your
request by 5pm, 31st of March. Medical certificates need to be at least two days in length.
To request an extension:
x Email Farhana Choudhury (farhana.choudhury@unimelb.edu.au) from your university email address,
supplying your student ID, the extension request and supporting evidence.
x If your submission deadline extension is granted you will receive an email reply granting the new
submission date. Do not lose this email!
Reminder: INFO20003 Hurdle Requirements
To pass INFO20003, you must pass two hurdles:
x Hurdle 1: Obtain at least 50% (15/30) or higher for the three assignments (each worth 10%)
x Hurdle 2: Obtain at least 50% (35/70) or higher for the combination of quizzes and end of semester exam
It is our recommendation to students that you attempt every assignment and every question in the exam.
GOOD LUCK!