首页 > > 详细

Analyzing the File for Data Visualization

2.4 Task 3: Analyzing the File for Data Visualization 11 2.4 Task 3: Analyzing the File for Data Visualization In the last task, based on the class defined in Section 2.3 (Task 2) , you will implement two functions to visualise the statistics as some form of graphs. The implementation of these two functions should make use of the external Python packages, including NumPy, SciPy, Pandas, and/or Matplotlib in order to create the suitable graphs for comparing the statistics collected for posts. The implementation of two functions should follow the requirement below: • visualizeVocabularySizeDistribution(inputFile, outputImage): Given the input file “data.xml”, you should count the vocabulary size for each post. Then you should draw a bar chart in Python to visualize the distribution of the vocabulary size of all posts. The x-axis is the vocabulary size, and the y-axis represents the number of posts with certain vocabulary size. Note that for the x-axis, the vocabulary size interval is 10 and once the vocabulary size is larger than or equal to 100, you should put them into “others”, i.e., 0-10, 10-20, 20-30, 30-40, 40-50, 50-60, 60-70, 70-80, 80-90, 90-100, others (left inclusive). You should save your visualization figure into a png file named as “vocabularySizeDistribution.png”. • visualizePostNumberTrend(inputFile, outputImag): This function displays the trend of the post number in the Q&A site. Given the input file “data.xml”, you should first get the number of questions and answers in each quarter. Then following the time order, you should draw a line chart to annotate the number of posts in each quarter. Note that you should draw two lines for question number and answer number respectively, and add a legend in the figure to tell which line is for which type of posts. You should save your visualization figure into a png file named as “postNumberTrend.png”. Note: Please import the class defined in Section 2.3 (Task 2). Apart from the defining these two functions, you should also call these two functions and obtain the png files. You should put your code for this final task into the template file “dataVisualization_studentID.py”, and name the file with your own ID. © 2019, Faculty of IT, Monash University
联系我们
  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-21:00
  • 微信:codinghelp
热点标签

联系我们 - QQ: 99515681 微信:codinghelp
程序辅导网!