首页 > > 详细

讲解CE306-6辅导asp、R解析

CE306-6-SP

UNIVERSITY OF ESSEX

Undergraduate Examinations 2019



INFORMATION RETRIEVAL



Time allowed: TWO hours


Candidates are permitted to bring into the examination room:
Calculator – Casio FX-83GT Plus or Casio FX-85GT Plus ONLY


Candidates must answer ALL questions.


The paper consists of THREE questions.


The percentages shown in brackets provide an indication of the proportion of the total marks for
the PAPER which will be allocated.






Please do not leave your seat unless you are given permission by an invigilator.
Do not communicate in any way with any other candidate in the examination room.
Do not open the question paper until told to do so.
All answers must be written in the answer book(s) provided.
All rough work must be written in the answer book(s) provided. A line should be
drawn through any rough work to indicate to the examiner that it is not part of the
work to be marked.
At the end of the examination, remain seated until your answer book(s) have been
collected and you have been told you may leave.

CE306-6-SP 2


Question 1


Basics


(a) Briefly explain the motivation for using the inverse document frequency (idf) in the
weighting formula tf.idf.


(b) Discuss the implications of Zipf’s law on the distribution of words in a document collection
and in the queries submitted to search this collection. Briefly discuss how increasing the
corpus size might or might not address these implications.


(c) Briefly discuss three different reasons that might explain the popularity of Elasticsearch
over alternative search engines when applying it to a local Web site.


(d) Briefly discuss the problems a tokenizer might encounter when processing texts which
contain the period character (‘.’).
























[5%]
[10%]
[10%]
[5%]
CE306-6-SP 3


Question 2


Applications and Evaluation


(a) Outline the typical steps that need to be performed by an enterprise search engine to match
a user request against the documents stored in the system's database. Discuss how
enterprise search might differ from Web search.


(b) Several evaluation metrics have been developed to assess the quality of results returned by
search engines. Two such measures are precision and recall. What can you say about
precision and recall for queries for which no relevant documents exist in the collection?
Discuss whether discounted cumulative gain or mean reciprocal rank might or might not
be suitable alternative measures for the given scenario.


(c) Discuss the applicability of the PageRank algorithm in an enterprise search setting.


(d) Outline a search scenario in which you would apply A/B testing to evaluate a search system
within an enterprise search setting.





















[10%]
[15%]

[10%]
[5%]
CE306-6-SP 4


Question 3


Advanced Concepts


(a) Separating fake news from real news is one of the major search engine challenges that have
emerged in recent years. One step in that direction is automated fact-checking. Imagine
you are tasked with developing a system for automated fact-checking. Assume that your
system will be incorporated in a Web search engine and is called whenever a user submits
a query that is classified as a claim. Outline a possible processing pipeline that could
confirm or reject a claim. Discuss important design decisions.


(b) Contextual information is frequently being used in modern Web search engines. Discuss
the difficulties in contextualising a query in the result ranking stage of the retrieval process.


(c) Log analysis can be used to personalize a search engine. Present three possible motivations
for this approach. Discuss how one might integrate log analysis in the query submission
stage of an information retrieval system.








END OF PAPER CE306-6-SP

[10%]
[10%]
[10%]

联系我们
  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-21:00
  • 微信:codinghelp
热点标签

联系我们 - QQ: 99515681 微信:codinghelp
程序辅导网!