IFN647 ASSIGNMENT2.201 cont/…
IFN647 – Assignment 2 Requirements
Weighting: 35% of the assessment for IFN647
Items required to be submitted through IFN647 Blackboard:
1. A PDF or word file includes both
• Statement of completeness and your name(s) and student ID(s) in a cover page.
• Solutions to questions Q1, Q2, Q4 and Q7, and a paragraph README description
for how to execute your python code in terminal or in IDLE, the structure of your
data folder setting and import packages as well.
2. Your source code for all other questions, containing all files necessary to run the solutions
and perform the evaluation (source code only, no executables) and a main python file
(“script.py”) to run all source code you defined for all questions (using a zip file “code.zip”
to put them together).
3. A zip file “result.zip” contains all “result” data files (in text).
Please note you do not need to include the dataset folder generated by “dataset101-150.zip” in
your submission. Zip all the above file as your “student ID_Surname_Asm2.zip” and submit it
in BB before 11.59pm on 29 May 2020.
Due date of Blackboard Submission: Friday week 12 (29th May 2020)
Individual working/pair: You may work on this assignment individually or in a pair (please
note the different requirements for individual and pairs as indicated in the questions).
Currently, a major challenge is to build communication between users and Web search systems.
However, most Web search systems use user queries rather than user information needs due to
the difficulty of automatically acquiring user information needs. The first reason for this is that
users may not know how to represent their topics of interest. The second reason is that users
may not wish to invest a great deal of effort to dig out relevant pages from hundreds of
thousands of candidates provided by a Web search system.
In this assignment, you are expected to design a system, “Weak Supervision Model (WSM)”, to
provide a solution for this challenging issue. The system is broken up into three parts: Part I
(Training Set Discovery), Part II (IF model) and Part III (Evaluation). In Part I, the major task is
to present an approach in order to automatically discover a training set for a specified topic (we
will provide you 50 topics), which includes both positive documents (e.g., labelled as “1”) and
negative documents (e.g., labelled as “0”). You may need to use the topic title, description or
narratives, Pseudo-Relevance Feedback technique (or clustering technique) and an IR model for
this part to find a training set D which includes both D+ (positive – likely relevant documents)
and D-(negative – likely irrelevant documents) in a given un-labelled document set U. Part II is
to select more terms in D and discover weights for them; and then use the selected terms and
their weights to rank documents in U. Part III is the evaluation, you are required to prove your
solution is better than the query-based method (“the baseline model”) which uses only the topic
titles to rank U.
IFN647 ASSIGNMENT2.201 cont/…
2
Example of topic102 - “Convicts, repeat offenders” is described as follows:
Number: R102