首页 > > 详细

Homework 5 Collect some sample data

 Homework 5 (15 points)

Collect some sample data (at least 20) from the following resources, which include 
the keyword “Fitness” and “Wearables”. Or a keyword you would like to choose.
Resources: Google-news, Bing-news, Twitter (give a try to Twint) or Weibo (you 
can do it in Chinese as well), or any other sources you like.
It is not recommended, but you can even collect data manually from Twitter. 
However, if you collect them manually no grade will be reduced. 
Calculate the sentiment of each record (or token), from the data source you have 
Identify the theme for twenty of them, and report them in a excel sheet file. Your 
result excel file should include three columns:
• Tweet text or news entry. • Sentiment scores from one of the baseline in the class, e.g. AfiNN, NRC, Bing
(3 points)
and one state-of-the-art library which we did not explain in the class, e.g. 
FastText, BERT, Word2Vec and GloVe. (7 points)
• Theme, which is a keyword you have extracted from them. This means you 
should perform theme analysis “manually” and not with algorithm. (5 points)
You need to prepare a report on your tasks and findings along with a video file 
describing what you have done. You can copy paste your codes, its results and your 
description into a Word document, Python Notebook or you can use R notebook.
Your deadline for delivering this home work is written on the blackboard online. 
Please feel free to ask your question and prepare it for presentation for the next 
  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-21:00
  • 微信:codinghelp

联系我们 - QQ: 99515681 微信:codinghelp