C辅导 Data File Lab Assignment 1辅导R、R编程解析

Introduction
，，，

Requirement
Data File Lab – Assignment 1
COMP202 Revised 1 August 2017
Len Hamey,
This lab is the first of three assignments in COMP202.
Commences: Week 1.
Progress: Weeks 2, 3, 4.
Due: 11:59pm, Monday Aug 28 (Week 5)
Value: 15% (12% for task, 3% weekly progress)
Overview of the Lab
Data files exist in various formats. In Unix, text files are common for simple data, but large data files
are stored as binary data. In this lab, you will be developing programs to read, write, modify, and
reformat the data in binary data files. The lab consists of a sequence of stages which build on each
other. The first three stages develop your skills. In the final stage you will reverse engineer a data
file using your knowledge of data representations. You have the choice between an easier final
stage (stage 4) that can earn you at most 2 marks for the stage, or a more difficult final stage (stage
5) worth up to 3 marks for the stage. You may attempt both stages 4 and 5, but only the maximum
of the two marks will count towards your total
The marking outline is:
Section Value
Stage 1 2
Stage 2 2
Stage 3 2
Stage 4 (max 2 marks) or
stage 5 (max 3 marks)
3
Code style. 3
Progress 3
Total 15
Learning Outcomes
This lab will involve you in developing the following specific skills and capabilities.
 Able to write programs that use C data structures, pointers and arrays.
 Able to read and write binary data files, and write data in text format.
 Able to convert between different data representations.
 Able to use malloc and free to construct data structures using the heap.
 Able to implement simple command line parameters.
 Able to interpret and recognise binary data representations.
 Research Unix library and system calls
Individual work and information resources
The stages of the lab are based on data files that will be provided to you. Each student will have
their own specific data files to work with, with their own unique data format.
This lab must be your own work. However, you may use resources on the Internet to obtain general
information including information about the C language and libraries, information about binary and
text data formats, and information about the operating system. If you obtain useful information
from the Internet, you must include comments at the relevant points in your code acknowledging
the source of the information (URL) and briefly describing the key idea(s) that you are using.
(Exception: information from the Unix manual pages does not require citation in your program).
The Unix manual pages are available online on ash and iceberg – use the man command. You can
also find Unix manual pages online through Google. For example, to find out about the printf
library call, use the command “man 3 printf” or Google “man printf” and to find out about the
directory listing command ‘ls’, use the command “man ls” or Google “man ls”. However, you
should be wary of using information found online because sometimes there are differences between
different Unix systems and our systems may not behave exactly the same as described in some
online documentation.
The manual pages on the system (man command) are divided into sections:
1. System commands such as ls, wc, etc.
2. Unix system calls such as read(), open(), etc.
3. Unix library such as printf(), fopen(), etc.
4. Sections 4-8 contain other information.
For more information on the man command, use the command “man man” to read the manual
pages about the man command.
Fetching your lab
The lab files are accessed through the lab command which can be found at
/home/unit/group/comp202/lab
There is no Unix man page for the lab command (it is not a Unix system command) but there is
documentation on iLearn and if you don’t give it any command-line parameters or options then it
will print out some brief documentation itself. This feature is common in many Unix programs. To
see how this works, try the following command (where the $ symbol represents the Unix command-
line prompt – you should type the command that is underlined in this example).
$ /home/unit/group/comp202/lab
The option –g is used to get a lab stage. For example, to get lab 1 stage 1, do:
$ /home/unit/group/comp202/lab –g 1.1
For stage 2, the option would be –g 1.2 instead. Please see the Lab Command Manual in iLearn
for more information about the lab command, including options for submitting assignments, getting
marking reports, checking due dates and claiming your free extension days. Also, you can abbreviate
the command so that you can type “lab” instead of the full path name
“/home/unit/group/comp202/lab”. For the rest of this document, we will use the
abbreviated name.
The lab get command downloads your lab data as a tar file. For stage 1, the tar file is stage1.tar.
Tar is an archive utility (like zip) – it stores many files packed into one file. Use tar to extract the
contents of this file. You can read all about tar in the Unix man page
$ man 1 tar
Here is the command to extract the contents of stage1.tar.
$ tar xvf stage1.tar
This will create a directory called stage1 and put the downloaded files in that directory.
Submitting your lab solution
Your lab solution can be submitted using the lab command. The option –s is used to submit a
solution to a lab stage. After the option, list all the files that you want to submit. Each time you
submit, it is treated as a fresh submission, so you must list all the files that you want to submit every
time. (If you find that tedious, learn about wildcards in the bash shell.) For example:
$ lab –s 1.1 stage1.c sub.c defs.h
The lab utility sends your submitted files to a server which compiles the C files together into a
program, runs it, and tests that it works correctly for your particular lab assignment. The server
records information about your submission and also sends back information to you through the lab
command.
You can submit as many times as you like. As a matter of personal achievement, you should aim to
achieve a really good score on your initial submit, having checked that your program compiles
without errors and performs correctly on the provided sample data files. However, if there are
problems identified by the auto marker, you can resubmit without penalty.
You must download each stage before you attempt to submit a solution to that stage. You should
download each stage because the download provides you with the input and output data files that
you need in order to test your program yourself.
Progress marks [3 marks]
Each lab assignment includes marks that are awarded for progress on the task each week. The lab
assignments are to be done both during lab sessions (with the assistance of lab supervisors) and in
your own time. Each week that the lab is out, you earn a progress mark if you achieve the specified
milestone by 11:59pm on the specified date. You can earn the progress marks early, but you cannot
earn them late.
If you do not achieve the milestone for a progress mark by the specified date then you lose that
week’s progress mark and the milestone “slips” and becomes due on the next progress date. All the
later milestones also slip back by one week, but the last milestone is lost. If you achieve the slipped
milestone by the new progress date then you receive the progress mark for that date, but you have
lost the progress mark for the missed date and you cannot make it up later.
Milestones
 Monday of Week 2: Stage 1 achievement mark of at least 1.5/2.0
 Monday of Week 3: Stage 2 achievement mark of at least 1.5/2.0
 Monday of Week 4: Stage 3 achievement mark of at least 1.5/2.0
 Monday of week 5: Lab closes
Detailed marking guides for each stage
When you extract the files for a stage you will also find a file called marking-guide.txt in the
extracted files. This text file contains a detailed marking rubric for the stage. The auto marker uses
this rubric to mark your submission for the stage. The marking guide includes detailed notes that
describe how each mark is calculated and what is being marked.
In later stages, some auto marker checks are thresholds. Threshold conditions may not earn you
marks, but are required for your program to be eligible to earn other marks. The marking report will
display if any threshold has failed, and it will indicate which marks are suppressed due to the failed
threshold. Thresholds and marks that require thresholds are indicated in the marking guide
marking-guide.txt.
Stage 1: Initialising a C struct and printing it out as text [2 marks]
In this stage you will declare a C data structure, create an instance of it and statically initialise it
(declare it as a static or global variable and initialise it in one statement using braces). You will then
print out the instance. This stage develops the following specific skills:
 Declaring a C struct.
 Initialising a C struct
 Printing various data types using printf
Note: Do not use bit fields in your struct. All the data types that are specified correspond to
ordinary C data types.
Resources
The following documents on iLearn may be helpful:
 Compile, Run, Make C Programs on Linux
 C Programming Notes for Data File Lab
Your downloaded stage1.tar file contains the following files:
 filestruct-description.txt: A simple description of the fields that are in your
struct – their names and type description.
 initialisation-specification.txt: Specifies the initial value for each field of
your struct. The initial value has to be formatted in a specific way in your source code – this
may mean that you have to convert one representation to another. See the lab note
Decimal, Binary, Octal and Hex. Note: It makes no difference to the data that is stored inside
the computer whether you initialise the field with decimal or the equivalent hexadecimal or
octal. However, as an exercise, we require you to make the appropriate type conversions
and the automarker will check your code.
 expected-output.txt: Stage 1 expected output file. Use the example in this file to
work out what formatting options to use in printf.
Useful Unix commands
You might find the following Unix system commands helpful.
 cat
 diff
Task
Write a C program that declares your particular data structure as described in the C structure
description file. Statically initialise an instance of the data structure to the initial values as specified
in the file – use the data formats as specified in the file such as hexadecimal, decimal or octal
constants. In the main program, print out the data structure using printf formatting to make it
exactly match the provided sample output file. Note that you may need to use various formatting
options with printf to control the appearance of the output. You are expected to read about
printf and work out how to format the data so that it exactly matches the expected output.
Submit your program for automatic assessment using the lab command. Your program style. may
be assessed according to the coding standards in the documents Some Important Comments on Code
Style. and Systems Programming Style. which are available on iLearn.
Stage 2: Reading a binary data file and printing it out [2 marks]
In this stage you will read a binary data file in a known format, storing the information into instances
of a C data structure which you will then print out. This stage develops the following specific skills:
 Reading binary data
 Opening and closing files
 Printing various data types using printf.
 Using a simple command-line parameter.
Resources
 filestruct-description.txt: Describes the members of the C data struct which
correspond to fields in the records of the data file.
 input-.bin: Sample binary input files.
 output-.txt: Sample text output files corresponding to the input files.
Useful Unix commands
You might find the following Unix system commands helpful.
 “more” or “less”
 diff
 od
Task
Write a C program that reads a file of binary data records as described in the structure description
file. The program will read and print all the records in a binary data file where each record has the
format described in filestruct-description.txt. You already developed code to print out
a single record in stage 1, so the focus of this stage is reading a binary data file into memory.
The output formatting requirements for this stage are the same as in stage 1. However, it is possible
that you may need to modify your record printing code – it could be that your printf call worked
correctly for the single initialised record in stage 1 but it may not be correct for all the data records
in the files. You should check the output against the expected output using diff, and improve your
printf statement in whatever way is needed to get the correct output.
Your program must accept one command-line parameter which is the name of the input file.
The fields of the records are stored using the types specified in the data file description. The fields
are stored packed next to each other in the data file. You cannot read the entire record directly into
a C struct in one call because C inserts additional unused space between some of the fields in the
struct (this is called alignment padding; we will discuss it later in COMP202 lectures). You must
read the data record one field at a time. It is suggested to use fread to read each field.
Each record that you read should be printed out as text. Your output should exactly match the
sample output files.
Remember that coding style. is important: use good modularisation, and use header files
appropriately. Your program’s style. may be assessed according to the coding standards in the
documents Some Important Comments on Code Style. and Systems Programming Style.
Submit your program for marking using the lab command. We may use additional data files for
testing, including files that are larger than the samples provided to you.
Stage 3: Sorting a binary data file [2 marks]
In this stage you will sort files of binary data in a known format. This stage develops the following
specific skills:
 Reading and writing binary data files.
 Opening and closing files.
 Working with pointers to structures.
 Memory allocation, dynamically sizing an array.
 Using system library routines (specifically, a system library sort routine).
 Writing code to compare structures with a lexical sort order.
 Using a function pointer in C.
Resources
 filestruct-description.txt: Describes the members of the C data structure which
correspond to fields in the records of the data file.
 filestruct-sort.txt: Specifies the sorting order.
 input-.bin: Sample binary input files.
 output-.bin: Sample binary output files corresponding to the input files. The output
files contain the same data as the input files, but the records are sorted.
Useful Unix commands
You might find the following Unix system commands helpful.
 od
 cmp
Task
Modify your program from stage 2 so that it reads the input file (parameter 1), storing all the records
into a dynamic array in memory. The program should then sort the data records and write the
output file (parameter 2) in sorted order.
Use the Linux library sort routine qsort to perform. the sorting. Use the Unix manual (section 3) to
find out how to call the qsort library routine. Hint: you must write a comparison routine that can
compare two structures according to the sort order specified for your lab.
Your program will need to store all the records in memory in order to sort them. The program will
allocate a dynamic array of structs (or some other data structure), and read the data file into the
array. You do not know how large the file may be, so you must accommodate different file sizes.
Here are two possible approaches (there are others).
1. Dynamic sized array: Allocate an initial array of some size (e.g. 100 records) and then if
(while reading the file) you find that the array is not large enough then use realloc to
double the size of it. Realloc allocates a new larger array in memory and copies the data
from the existing array to the new larger array, before freeing the original array. Repeatedly
doubling the size allows you to accommodate arbitrarily large data files without copying the
data too many times. See the Unix manual pages for malloc and realloc.
2. Compute the number of records from the file size: This is a systems approach that will
require some reading to find out how to do it. There is a system call stat that can tell you
the total number of bytes in a file. There are also other ways to find out how many bytes are
in a file but you should NOT read the entire file just to find out how big it is! Your file
description gives you the information about how long each record is, so you can compute
the number of records in the file from the number of bytes. You can then allocate an array
of struct to the exact correct size using malloc. See the Unix manual pages for stat
and malloc.