COMP3310 - Assignment 2: Indexing a Gopher.
Background:
This assignment is worth 15% of the final mark.
It is due by 23:55 Wednesday 26 April AEST
Late submissions will not be accepted, except in special circumstances.
o Extensions must be requested as early as possible before the due date, with suitable
evidence or justification.
If you would like feedback on particular aspects of your submission, please note that in the
README file within your submission.
This is a coding assignment, to enhance and check your network programming skills. The main focus is
on native socket programming, and your ability to understand and implement the key elements of an
application protocol from its RFC specification.
Please note that this is an experiment for the course, the first time we’ve used gopher for this
assignment. We may discover some additional challenges as we go, that requires some adjustments to
the assignment activities, or a swap of server. Any adjustments will be noted via a forum Announcement.
Assignment 2 outline
An Internet Gopher server was one of the precursors to the web, combining a simple query/response
protocol with a reasonably flexible content server, and a basic model for referencing and describing
resources on different machines. The name comes from the (Americanised) idea to “go-for” some
content… and also the complexity of their interconnected burrows1.
For this assignment, you need to write your own gopher client in C, Java or Python23, without the use of
any external gopher-related libraries. The client will need to ‘spider’ or ‘crawl’ or ‘index’ a specified
server, do some simple analysis and reporting of what resources are there, as well as detect, report and
deal with any issues with the server or its content.
Your code MUST open sockets in the standard socket() API way, as per the tutorial exercises. Your code
MUST make appropriate and correctly-formed gopher requests on its own, and capture/interpret the
results on its own. You will be handcrafting gopher protocol packets, so you’ll need to understand the
structures of requests/responses as per the gopher RFC 1436.
We will provide a gopher server to run against, with a mix of content – text and binary files, across some
folder structure, along with various pointers to resources.
1 https://en.wikipedia.org/wiki/Gopher
2 As most high-performance networking servers, and kernel networking modules, are written in C with other
languages a distant second, it is worth learning it. But, time is short, and everyone has a different background.
3 If you want to use another language (outside of C/Java/Python), discuss with your tutor – it has to have native
socket access, and somebody on the tutoring team has to be able to mark it.
In the meantime, you SHOULD install a gopher server on your computer for local access, debugging and
wiresharking. There are a number available, with pygopherd perhaps the more recently updated but
more complex, and Motsognir, which is a bit older but simpler.
Wireshark will be very helpful for debugging purposes. A common trap is not getting your line-ending
right on requests, and this is rather OS and language-specific. Remember to be conservative in what you
send and reasonably liberal in what you accept.
What your successful and highly-rated indexing client will need to do:
1. Connect to the class gopher server, and get the initial response.
a. Wireshark this initial-response conversation in both directions, from the starting TCP
connection to its closing, and include that in your report.
b. The class gopher site is not yet fully operational (31/3), an announcement will be made
when it is.
2. Starting with the initial response, scan through the directories on the server, following links to
any other directories on the same server, and download any text and binary files you find. Keep
scanning till you run out of directories to visit. Note that there will be items linked more than
once, so beware of getting stuck in a loop.
3. Count, store, and, at the end of the scan, report:
a. The number of Gopher directories on the server.
b. The number, and a list of all simple text files (full path)
c. The number, and a list of all non-text (i.e. binary) files (full path)
d. The contents of the smallest and largest text files.
e. The size of the smallest and largest binary files.
f. The number of invalid references (those with an “error” type)
g. The number of external references (those on another server)
i. Doesn’t matter if they’re valid or not. Don’t follow them, no matter how
tempting.
4. While running, prints to STDOUT:
a. The timestamp of each request
b. The client-request you are sending.
You will need to keep an eye on your client while it runs, as some items might be a little challenging if
you’re not careful. Identify any such items you find on the gopher server in your README or code
comments, and how you dealt with them.
We will test your code against the specified gopher, and validate its outputs.
Submission and Assessment
You need to submit your source code, and an executable (where appropriate). Any additional comments
and insights, and any instructions to run the code, please provide those in a README text file. Your
Page 3 of 3
submission must be a zip file, packaging everything as needed, and submitted through the appropriate
link on wattle.
There are a number of existing gopher clients, servers and libraries out there, many of them with source.
While perhaps educational for you, the assessors know they exist and they will be checking your code
against them, and against other submissions from this class.
Your code will be assessed on [with marks% available]
1. Output correctness [45%]
o Does the gopher server correctly respond to all of your queries?
o Does your code report the right numbers?
o Does your code deal well with issues it encounters?
o Does your code provide the running log of requests as above?
2. Performance [10%]
o A great indexer should run as fast as the server allows, and not consume vast amounts
of memory nor total cpu time.
3. Code “correctness, clarity, and style” [45%]
o Use of native sockets, writing own gopher requests correctly.
o Documentation, i.e. comments in the code and the README - how easily can somebody
else pick this code up and, say, modify it.
o How easy the code is to run, using a standard Linux environment (like the CS Labs, WSL).
During marking your tutor may ask you to explain some particular coding decisions.
Reminder: Wireshark is very helpful to check behaviours of your code by comparing against existing
gopher clients (some are preinstalled in Linux distributions, or are easily added). There are a number of
youtube videos on gopher as well that e.g. show how the clients work. Your tutors can help you with
advice (direct or via the forum) as can fellow students. It’s fine to work in groups, but your submission
has to be entirely your own work.