Network Programming辅导：EE450 MapReduce Model调试C/C++语言

MapReduce，Linux。

Problem Statement

In this project you will implement a simple model of computational offloading where a single client offloads some computation to a server which in turn distributes the load over 3 backend servers. The server facing the client then collects the results from the backend and communicates the same to the client in the required format. This is an example of how a cloudcomputing service such Amazon Web Services might implement MapReduce to speed up a large computation task offloaded by the client.

The server communicating with the client is called AWS (Amazon Web Server) and the three backend servers are named BackServer A, BackServer B and BackServer C. The client and the AWS communicates over a TCP connection while the communication between AWS and the BackServers A, B amp; C is over a UDP connection.

Input Files Used

The files specified below will be used as inputs in your programs in order to dynamically configure the state of the system. The contents of the files should NOT be “hardcoded” in your source code, because during grading, the input files will be different, but the formats of the files will remain the same.

If you are working in an environment other than UNIX, pay particular attention to line endings or newlines . For this project, it is assumed that all files follow the UNIX line ending convention. This is particularly important while handling the input file(s). See the articles here and here for more information.

nums.csv: An ASCII file that contains a single column of integers. Each row consists of a single integer and ends with a newline. You may assume that each integer is within the range of a long signed integer type. The number of rows in the file will be a multiple of 3. This file will always reside in the same directory as the client.

Source Code Files

Your implementation should include the source code files described below, for each component of the system.

AWS: You must name your code file: aws.c or aws.cc or aws.cpp (all small letters). Also you must call the corresponding header file (if you have one; it is not mandatory) a ws.h (all small letters).
BackServer A, B and C: You must use one of these names for this piece of code: server#.c or server#.cc or server#.cpp (all small letters except for #). Also you must call the corresponding header file (if you have one; it is not mandatory) server#.h (all small letters, except for #). The “#” character must be replaced by the server identifier (i.e. A or B or C), depending on the server it corresponds to.
Note: In case you are using one executable for all four servers (i.e. if you choose to make a “fork” based implementation), you should call the file servers.c or servers.cc or servers.cpp. Also you must call the corresponding header file (if you have one; it is not mandatory) servers.h (all small letters). In order to create four servers in your system using one executable, you can use the fork() function inside your server’s code to create 4 child processes. You must follow this naming convention! This piece of code basically handles the server functionalities.
Client: The name of this piece of code must be client.c or client.cc or client.cpp (all small letters) and the header file (if you have one; it is not mandatory) must be called client.h (all small letters).

More Detailed Explanations

Phase 1

All four server programs (AWS, BackServer A, B, amp; C) boot up in this phase. While booting up, the servers must display a boot message on the terminal. The format of the boot message for each server is given in the onscreen messages tables at the end of the document. As the boot message indicates, each server must listen on the appropriate port for incoming packets/connections.

Once the server programs have booted up, the client program is run. The client displays a boot message as indicated in the onscreen messages table. Note that the client code takes an input argument from the command line, that specifies the computation that is to be run. The format for running the client code is

./client lt;function_namegt;

where function_name can take a value from {min, max, sum, sos}. As an example, to find the sum of the all the numbers in the input file, the client should be run as follows:

./client sum

After booting up, the client establishes a TCP connection with AWS. After successfully establishing the connection, the client first sends the function_name to AWS. Once the function_name is sent, the client should print a message in the format given int the table. The client then reads all integers from nums.csv and proceeds to send them to AWS over the same TCP connection. After successfully sending the integers, the client should print the number of integers sent to AWS. This ends Phase 1 and we now proceed to Phase 2.

Phase 2

In Phase 1, you read the numbers from the file and sent them to the AWS server over a TCP connection. Now in phase 2, this AWS server will divide the data into 3 nonoverlapping components and send that to the 3 backservers. If there are N numbers in the file, then the first N/3 numbers must be sent to backserver A, next N/3 to backserver B and the last N/3 numbers to backserver C. TAs will make sure that the number N is divisible by 3. Also the function to be performed needs to be communicated to the backservers.

The communication between the AWS server and the backservers happen over UDP. The AWS server will send the function_name along with the actual numbers. Note that the function_name can be MIN, MAX, SUM or SOS (sum of squares). The port numbers for backservers A, B and C are specified in table 2. Since all the servers will run on the same machine in our project, all have the same IP address (the IP address of localhost is usually 127.0.0.1).

Once a backserver receives the actual numbers (a total of N/3 numbers) and the function to be performed, it computes the function value. Let this value for server i as X(i). This step is also called as map in MapReduce. If the numbers received the backserver i are n(1), n(2), then the Map operations it performs are as follows.

Phase 3

At the end of Phase 2, all backendservers have their answers ready. Let’s call the value calculated by backendserver i as X(i). This is to be sent to the AWS server using UDP. The final answer needs to be calculated by the Frontendserver (AWS) in the reduce step and then handed over to the user.

The frontendserver (server D) looks at the type of reduction operation and calculates the final answer which we call X f inal based on the answers it receives from the backservers A, B and C. This step is also called as reduce in MapReduce. Now depending on the operation requested by the user we have.