Twitter is a social networking

Twitter is a social networking website where users can post very short messages know as "tweets".

Each Twitter user can choose to "follow" other users, which means that they see those users' tweets.

A Twitter user sees the tweets of users they are "following", and their tweets are seen by their

"followers" (the users who follow them).

All the "follow" connections define a network among Twitter users, and it's quite interesting to look for

patterns in the connections. Tools like Twiangulate let you explore questions like "what connections do

my two friends have in common?". In this assignment, you'll write a program that lets you ask

questions (or "queries") about a Twitter dataset.

Any tool for exploring the Twitterverse must get its data from Twitter itself. Twitter provides an API to

allow programmers to write programs that interact with Twitter and extract data from it. In general, an

API is a module that defines functions for accessing underlying data and performing other tasks

without having to know how that data is actually stored and retrieved.

To make this assignment more manageable for you, we will assume that the information we need has

already been extracted from Twitter and stored in a file.

How to tackle this assignment

This is your first experience designing a program of this size. We are providing detailed advice to help

you break the task down into manageable pieces.

Make sure your twitterverse_functions.py module runs without error before submitting. If you have

syntax errors in your module, comment them out before submitting, or we will not be able to test your

functions!

The Twitter Data File

A Twitter data file contains a series of one or more user profiles, one after the other. Each user profile

has the following elements, in this order:

A line containing a non-blank, non-empty username. You may assume that usernames are unique;

that is, a single username will not occur more than once in the file, and that usernames do not contain

any whitespace.

A line for the user's actual name. If they did not provide a name, this line will be blank.

A line for the user's location, or a blank line if they did not provide one.

A line for the URL of a website, or a blank line if they did not provide one.

Zero or more lines for the user's bio, then a line with nothing but the keyword ENDBIO on it. This

marks the end of the bio, and is not considered part of it. (You may assume that no bio has the string

ENDBIO within it.) If the user did not provide a bio, the ENDBIO line will come immediately after the

website line, with no blank line in between.

Zero or more lines each containing the username of someone that this user is following, then a line

with the keyword END on it. (You may assume that no one has END as their username.) A user

cannot be on his or her own following list. You may assume that every user on a following list has a

user profile in the Twitter data file.

Notice that the keywords act as separators in this file. All of their letters are capitalised, and the

keywords contain no punctuation.

Examples

Here is a sample user profile that might occur among many in a file:

tomCruise

Tom Cruise

Los Angeles, CA

http://www.tomcruise.com

Official TomCruise.com crew tweets. We love you guys!

Visit us at Facebook!

ENDBIO

katieH

NicoleKidman

END

The file data.txt is a smallish example of a complete Twitter data file (and was made by hand) and the

file rdata.txt (see starter code) is a much larger example (and is made from real data extracted from

Twitter). These should help you confirm your understanding of the file format and will also be useful in

testing your program.

Cycles in the data

Although a user cannot be on their own following or followers lists, there can be "loops" (we call them

"cycles") such as this: user A can be following B who is following A. This is the shortest possible cycle.

Of course, cycles can be longer.

The Query File

Note that the word "query" just means "question". In computer science, we use it to mean a request

for information. For this assignment, a query will be provided in a file. Below we will review the high

level parts of the query, look at an example, and then describe the format of the query file.

Overview

A query has three components: a search specification, a filter specification, and a presentation

specification.

The search specification describes how to generate a list of Twitter usernames, starting with an initial

username (a list of length one) and then finding their followers or people they are following, then

people that are those people's followers or who they are following, and so on. When processing the

search specification, don't try to do anything to avoid cycles. For instance, if the search specification

says to find the people who user A is following, and from there the people they are following, you

could find yourself back at user A. Don't try to avoid that.

After processing the search specification, we have a list of Twitter usernames. Its length could be

zero. For example, if the initial username is 'adalovelace' and the search specification contains a

single 'followers' keyword, then the length of the list will be zero if 'adalovelace' has no followers.

The filter specification describes how to filter the list of usernames produced by the search

specification. The filtering can be based on

whether or not they are following a particular user,

whether or not a particular user is their follower,

whether their name contains a particular string (case-insensitive), or

whether their location contains a particular string (case-insensitive).

After processing the filter specification, we have a possibly reduced list of usernames.

Once the search results have been found and filtered, the presentation specification describes how

the output should be presented. It specifies on what basis the results should be sorted, and whether

the results should be presented in a short or long format.

Example query

Here is an example query:

tomCruise

following

FILTER

following c

location-includes CA

PRESENT

sort-by popularity

format long

The search specification in this particular query has four steps.

Start with a list containing the username to start the search from; i.e.,. ['tomCruise']. Let's call that list

L1.

The search keyword 'following' says to replace each username p in L1 with the usernames of the

users who p is following. This yields a new list, L2.

For the next 'following' keyword, we start with L2 and repeat the same operation as in the previous

step, yielding another list, L3.

For the final 'following' keyword, we start with L3 and repeat that operation one last time, yielding list

L4.

Notice that each step yields a list of zero or more usernames that is the input to the next step. There

should be no duplicates in the final results list. Duplicates should be removed after each step.

The Twitter data file diagram_data.txt (see starter code) contains the follower/following relationships

as represented by this diagram. For those relationships, the search specification above would yield

this list of usernames: ['i', 'j', 'h', 'k', 'tomCruise']. Make sure that you can see how the four lists, ending

with this final one, are generated. Notice that the final list contains the users you can get to in three

"steps" of the "following" relationship, starting from 'tomCruise'.

The final list generated by the search specification becomes the input to the filter specification. For our

current example, the filter specification says that the list should be filtered in this way: a user should

be kept only if they are following user 'c' and has a location that includes the string 'CA'. Notice that

the resulting list of usernames is just ['tomCruise'].

The presentation specification says to present the results in long format and to order the users

according to their popularity.

联系我们

QQ：99515681
邮箱：99515681@qq.com
工作时间：8:00-21:00
微信：codinghelp

热点文章

辅导 comm2000 creating socia... 2026-01-08
讲解 isen1000 – introductio... 2026-01-08
讲解 cme213 radix sort讲解 c... 2026-01-08
辅导 csc370 database讲解迭代 2026-01-08
讲解 ca2401 a list of colleg... 2026-01-08
讲解 nfe2140 midi scale play... 2026-01-08
讲解 ca2401 the universal li... 2026-01-08
辅导 engg7302 advanced compu... 2026-01-08
辅导 comp331/557 – class te... 2026-01-08
讲解 soft2412 comp9412 exam辅... 2026-01-08
讲解 scenario # 1 honesty讲解... 2026-01-08
讲解 002499 accounting infor... 2026-01-08
讲解 comp9313 2021t3 project... 2026-01-08
讲解 stat1201 analysis of sc... 2026-01-08
辅导 stat5611: statistical m... 2026-01-08
辅导 mth2010-mth2015 - multi... 2026-01-08
辅导 eeet2387 switched mode ... 2026-01-08
讲解 an online payment servi... 2026-01-08
讲解 textfilter辅导 r语言 2026-01-08
讲解 rutgers ece 434 linux o... 2026-01-08

热点标签

msinm014/msing014/msing014b

联系我们 - QQ: 99515681 微信：codinghelp

程序辅导网！