首页 > > 详细

讲解IFN647-Assignment 2讲解留学生Python语言

IFN647 ASSIGNMENT2.201 cont/… 
IFN647 – Assignment 2 Requirements 
Weighting: 35% of the assessment for IFN647 
 
 
 
Items required to be submitted through IFN647 Blackboard: 
 
1. A PDF or word file includes both 
• Statement of completeness and your name(s) and student ID(s) in a cover page. 
• Solutions to questions Q1, Q2, Q4 and Q7, and a paragraph README description 
for how to execute your python code in terminal or in IDLE, the structure of your 
data folder setting and import packages as well. 
2. Your source code for all other questions, containing all files necessary to run the solutions 
and perform the evaluation (source code only, no executables) and a main python file 
(“script.py”) to run all source code you defined for all questions (using a zip file “code.zip” 
to put them together). 
3. A zip file “result.zip” contains all “result” data files (in text). 
 
Please note you do not need to include the dataset folder generated by “dataset101-150.zip” in 
your submission. Zip all the above file as your “student ID_Surname_Asm2.zip” and submit it 
in BB before 11.59pm on 29 May 2020. 
 
 
Due date of Blackboard Submission: Friday week 12 (29th May 2020) 
 
Individual working/pair: You may work on this assignment individually or in a pair (please 
note the different requirements for individual and pairs as indicated in the questions). 
 
 
Currently, a major challenge is to build communication between users and Web search systems. 
However, most Web search systems use user queries rather than user information needs due to 
the difficulty of automatically acquiring user information needs. The first reason for this is that 
users may not know how to represent their topics of interest. The second reason is that users 
may not wish to invest a great deal of effort to dig out relevant pages from hundreds of 
thousands of candidates provided by a Web search system. 
 
In this assignment, you are expected to design a system, “Weak Supervision Model (WSM)”, to 
provide a solution for this challenging issue. The system is broken up into three parts: Part I 
(Training Set Discovery), Part II (IF model) and Part III (Evaluation). In Part I, the major task is 
to present an approach in order to automatically discover a training set for a specified topic (we 
will provide you 50 topics), which includes both positive documents (e.g., labelled as “1”) and 
negative documents (e.g., labelled as “0”). You may need to use the topic title, description or 
narratives, Pseudo-Relevance Feedback technique (or clustering technique) and an IR model for 
this part to find a training set D which includes both D+ (positive – likely relevant documents) 
and D-(negative – likely irrelevant documents) in a given un-labelled document set U. Part II is 
to select more terms in D and discover weights for them; and then use the selected terms and 
their weights to rank documents in U. Part III is the evaluation, you are required to prove your 
solution is better than the query-based method (“the baseline model”) which uses only the topic 
titles to rank U. 
IFN647 ASSIGNMENT2.201 cont/… 
 
Example of topic102 - “Convicts, repeat offenders” is described as follows: 
 
 
Number: R102 

联系我们
  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-21:00
  • 微信:codinghelp
热点标签

联系我们 - QQ: 99515681 微信:codinghelp
程序辅导网!