首页 > > 详细

STAT 7008 - Assignment 3

 STAT 7008 - Assignment 3

Note: A3 is 20% of the overall assessment. The 100 points in A3 will be rescaled to 20% in 
the final score.
Web Scraping
1. (25 points) Crawl information from https://www.sciencedirect.com
(1) (13 points) Crawl some key information about all articles published in 2022 from the 
website https://www.sciencedirect.com/journal/journal-of-econometrics/issues, including 
year, volume, article content, title, authors and pages. Crawl the volume numbers from 226 
to 230 only.
(2) (6 points) Remove “\xa0” in volume_name and store the crawled data into pandas 
DataFrame.
(3) (6 points) Filter the author with Null value and then find the top 10 authors that published 
the most articles.
Hint:
i. Click the button of the targeted item
ii. Pass the html to BeautifulSoup and get all links
iii. Use requests to get article content, title, authors and pages for each block
联系我们
  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-21:00
  • 微信:codinghelp
热点标签

联系我们 - QQ: 99515681 微信:codinghelp
程序辅导网!