首页 > > 详细

COMP 3430 - Operating systems Assignment 4

 COMP 3430 - Operating systems

Assignment 4
 
Franklin Bristow
 
Summer 2022
 
Description
General submission requirements
Implementing a File System reader
exFAT documentation
Volumes
Required commands
info
list
get
Implementation notes
Sectors ↔︎ clusters
Root directory and cluster number offsets
Directory entries
Unicode and ASCII
Boot sector
Binary files
Evaluation
Implementation
Submitting your assignment
General Advice
Description
For assignment 4, we’re looking for you to be able to write software that works with a file system.
 
Specifically, you’re going to be writing code that can read an exFAT file system, and extract files from that file system.
 
Assignment 4 is due Wednesday (NOT Friday), August 10th, 2022 at 11:59pm.
 
General submission requirements
Submissions that violate any of the following will not be evaluated (i.e., you will receive a score of 0 for the assignment):
 
All solutions must be written in C. No other languages are accepted.
 
All solutions must include a working Makefile.
 
Forget how to make a Makefile? Never knew how to make a Makefile?
 
Use one of the Makefiles that have been distributed to you in class (you have permission to use this!), or
You can go to https://makefiletutorial.com and find some very easy to use basic examples.
Your Makefile should contain a clean rule. When
 
make clean
is run in your directory, all the temporary files, including the executable should be removed (e.g., rm -f my_prog).
 
All solutions must be compiled by issuing the command make in the directory containing the Makefile. No other build commands are acceptable, even if documented in the README.md.
 
All solutions must include a Markdown-formatted README.md file that minimally describes how to run your submission.
 
Forget how to make a README.md? Never knew how to make README.md? Thankfully, you can go to https://readme.so/ and build a nice, Markdown-formatted README.md with a nice preview and some handy buttons to help you build it.
All solutions must run to successful completion. Premature termination for any reason (no obvious output, Segmentation Fault, Bus Error, etc) is considered to be a solution implementation that does not run.
 
For programs that are expected to terminate on their own (without your intervention), programs should complete in a reasonable amount of time (e.g., < 30 seconds).
 
All solutions must compile. Code that does not compile will not be evaluated.
 
Programs must produce no errors when compiled with the flags
 
-Wall -Wpedantic -Wextra -Werror
Note that -Werror prevents your code from being compiled when warnings are present.
 
If these flags are not in your Makefile, your submission will be treated as though it does not compile.
 
Your code must compile and run on a machine on aviary.cs.umanitoba.ca.
 
No late submissions will be accepted. The assignment deadline is enforced electronically. Submissions will not be accepted by e-mail.
 
Reminder: All submitted code will be evaluated using an automated similarity testing tool, alongside commonly available online solutions for problems and students submissions from past terms.
 
Implementing a File System reader
🎵 Butterfly in the sky…
🎵 Butterfly in the sky…
In this question, you will write a program that reads an exFAT-formatted volume. For this assignment you are provided with a couple of volume images, but you are encouraged to read a USB flash, SDHC, or SDXC drive as a raw device (e.g., /dev/sda3), provided that you have physical access to a Linux machine.
 
exFAT documentation
The general structure of the FAT or exFAT file systems were discussed in class, so you should take the time to review this general structure if you haven’t already.
 
You should also open Microsoft’s exFAT file system specification. This document comprehensively describes the exFAT format, and this should be your universal source of truth for exFAT.
 
Volumes
Two volumes have been prepared for you to use to test your program. The two volumes are distributed as a packaged zip file that contains a README.md describing the two volumes, a text file with a list of all files that are in the volumes, and the two volumes themselves.
 
The graders may not use these volumes to evaluate your work.
 
Required commands
You should implement 3 commands that are all part of the same program: info, list, and get.
 
info
The info command will print information about the volume. The command, assuming your program is named exfat, would be
 
./exfat imagename info
Print out the following:
 
Volume label.
Volume serial number.
Free space on the volume in KB.
The cluster size, both in sectors and in bytes OR KB.
Note: 1KB ↔︎ 1024 bytes.
 
Volume label
The volume label is encoded as a directory entry in the root directory. You can find information about the volume label in section 7.3 Volume Label Directory Entry.
 
The Volume label is a Unicode-formatted string, please see Unicode and ASCII at the bottom for some additional information.
 
Volume serial number
The volume serial number is in the boot sector, and you can find information about it both in sections 3.1 Main and Backup Boot Sector Sub-regions, and in section 3.1.11 VolumeSerialNumber Field.
 
You should print this out as a numeric value (i.e., "%u", unsigned).
 
Free space
exFAT uses allocation structures (bitmaps) to represent used and unused clusters. You can use these allocation bitmaps to calculate the free space in the volume (the number of unset/0 bits is the number of unallocated clusters).
 
The allocation bitmap is encoded as a directory entry in the root directory. You can find information about the allocation bitmap in section 7.1 Allocation Bitmap Directory Entry.
 
You should use code that you wrote for lab 4 to determine this value.
 
Cluster size
The cluster size in sectors and the sector size in bytes can be found in the boot sector. You can find more information about both of these fields in 3.1 Main and Backup Boot Sector Sub-regions.
 
Remember: These fields are called ___Shift because you can use the left shift operator (<<) to quickly compute powers of two:
 
uint8_t x = 0x1 << 3; // 2^3 -> 8
list
The list command will recursively print all files and directories in the volume. The output should look roughly like the output from tree (try running tree $YOUR_A2 on aviary.cs.umanitoba.ca).
 
You don’t have to print out fancy Extended ASCII characters, but instead can use the - symbol to denote depth, where the number of - characters indicates the depth of the file/folder. Using the Extended ASCII set would be a nice touch, though.
 
The command, assuming your program is named exfat, would be
 
./exfat imagename list
Output should look something like:
 
egret.cs.umanitoba.ca 101% ./exfat image list
File: file.txt
Directory: folder
- Directory: folder2
-- Directory: folder3
--- File: thefile.txt
File: file2.txt
In this example, the root directory contains references to two files: file.txt and file2.txt, and one directory: folder. The folder directory contains a reference to one folder: folder2. The folder2 directory contains a reference to one folder: folder3. The folder3 directory contains a reference to one file: thefile.txt.
 
Order of files is not important (you don’t have to print all files before directories, for example). The recommended strategy here is to just print out files and folders in the same order that they are in in terms of the sequence of DirectoryEntry in the folders. Likewise, the sequence of directories that you follow is also not important. You’re going to have to do this recursively, but it’s up to you if you want to do a breadth-first or a depth-first implementation.
 
get
The get command will extract a complete file from the volume. The command, assuming your program is named exfat, would be
 
./exfat imagename get path/to/file.txt
The path you pass to your program is assumed to be an absolute path (it starts at the root directory), even if it doesn’t have a leading /.
 
The path you pass to your program should be written out as a file to the same directory as your program with the same name as the file in the exFAT image. If the example above is executed, you should write the contents of the file at path/to/file.txt from the disk image to a file named file.txt in the same directory as the ./exfat binary executable.
 
Implementation notes
The exFAT file system specification is fairly comprehensive and generally very good, but some parts of the documentation are not straightforward. Additionally, exFAT supports Unicode characters, and while that’s an excellent property of a file system, working with Unicode characters in C isn’t a great experience.
 
Below is some additional information that you can use to help guide you through working with exFAT.
 
Sectors ↔︎ clusters
The first chunk of the file system (3.1 Main and Backup Boot Sector Sub-regions) are organized in sectors, regardless of the number of sectors per cluster.
 
The FATs are also organized in sectors, but a single FAT may span multiple sectors.
 
The data heap itself is organized in clusters, which may or may not be the same size as a sector. Make sure that you’re using the right units in the right regions! You should consider writing a utility function or macro that quickly converts from cluster number to bytes or from sector number to bytes.
 
Root directory and cluster number offsets
An easily overlooked statement in 5.1 Cluster Heap Sub-region is
 
Importantly, the first cluster of the Cluster Heap has index two, which directly corresponds to the index of FatEntry[2].
 
That means that cluster indexing is effectively a 2-based array (what kind of a maniac came up with this?). This means that if, for example, the value in the boot sector for the root directory is 5, then the root directory is actually stored starting at cluster number 3 (5 − 2 = 3).
 
Directory entries
Files, directories, and some FS metadata are encoded as directory entries in the data heap region. The description of how to interpret types of directory entries is not great (what the heck does “Benign primary” even mean???).
 
While you actually can figure out the specific EntryType values by building up sequences of bits for each type of DirectoryEntry, it’s easier to just look up these values somewhere else. An admittedly spammy looking page selling some software called “Active UNDELETE” has a nice list of the exFAT directory entry types that includes actual numeric values that you can use for comparing against the EntryType field in the directory entry.
 
The directory entries that correspond to one single file will exist as three separate, but sequential directory entries:
 
A file directory entry.
Exactly 1 stream extension directory entry.
At least 1 file name directory entry.
You need to read all of those directory entries to be able to get the name of the file that you’re looking for.
 
Note: Individual directory entries are contained entirely within one cluster, but an entry set for a single file may span multiple clusters. In other words, if the cluster chain for a directory has more than one cluster (let’s say cluster 3 and cluster 10), the file and stream extension entries for a file may be in cluster 3, but the file name directory entry/entries may be in cluster 10.
 
Unicode and ASCII
Many fields in this file system are defined as 16-bit Unicode strings. That means that characters are 16 bits wide instead of 8 bits wide. Provided that your Unicode string consists only of ASCII characters (so unfortunately لا أتحدث اللغة العربية, 沒有中文, ਕੋਈ ਪੰਜਾਬੀ ਨਹੀਂ, and no emoji 🦖), you can trivially convert a Unicode string into a regular C string by stripping off the top 8 bits:
 
/**
 * Convert a Unicode-formatted string containing only ASCII characters
 * into a regular ASCII-formatted string (16 bit chars to 8 bit 
 * chars).
 *
 * NOTE: this function does a heap allocation for the string it 
 *       returns, caller is responsible for `free`-ing the allocation
 *       when necessary.
 *
 * uint16_t *unicode_string: the Unicode-formatted string to be 
 *                           converted.
 * uint8_t   length: the length of the Unicode-formatted string (in
 *                   characters).
 *
 * returns: a heap allocated ASCII-formatted string.
 */
static char *unicode2ascii( uint16_t *unicode_string, uint8_t length )
{
    assert( unicode_string != NULL );
    assert( length > 0 );
 
    char *ascii_string = NULL;
 
    if ( unicode_string != NULL && length > 0 )
    {
        // +1 for a NULL terminator
        ascii_string = calloc( sizeof(char), length + 1); 
 
        if ( ascii_string )
        {
            // strip the top 8 bits from every character in the 
            // unicode string
            for ( uint8_t i = 0 ; i < length; i++ )
            {
                ascii_string[i] = (char) unicode_string[i];
            }
            // stick a null terminator at the end of the string.
            ascii_string[length] = '\0';
        }
    }
 
    return ascii_string;
}
You are permitted to use this code fragment in your submission.
 
Boot sector
We provided this to you for lab 4, but here’s the complete boot sector struct for your convenience:
 
#pragma pack(1)
#pragma pack(push)
typedef struct MAIN_BOOT_SECTOR
{
    uint8_t jump_boot[3];
    char fs_name[8];
    uint8_t must_be_zero[53];
    uint64_t partition_offset;
    uint64_t volume_length;
    uint32_t fat_offset;
    uint32_t fat_length;
    uint32_t cluster_heap_offset;
    uint32_t cluster_count;
    uint32_t first_cluster_of_root_directory;
    uint32_t volume_serial_number;
    uint16_t fs_revision;
    uint16_t fs_flags;
    uint8_t bytes_per_sector_shift;
    uint8_t sectors_per_cluster_shift;
    uint8_t number_of_fats;
    uint8_t drive_select;
    uint8_t percent_in_use;
    uint8_t reserved[7];
    uint8_t bootcode[390];
    uint16_t boot_signature;
} main_boot_sector;
#pragma pack(pop)
As with lab 4, you are permitted to use this code in your submission.
 
Binary files
Remember that the volumes that have been shared with you are binary files. With that in mind, you can (and should!) use all of the same tools that you used for parsing an ELF-formatted binary file in A1. As a brief reminder, those tools include
 
The uintX_t family of integers from stdint.h.
The hexdump command-line tool.
In assignment 1 we were working with binary files where the layout of the structures in the file conditionally changed based on the type of file that it was (e.g., 32-bit vs 64-bit files). The exFAT file system does not have any structures that are sized conditionally. That means that you can actually use (and are recommended to use) the “read the entire struct” strategy. Remember: this means that you need to use preprocessor directives (#pragma pack) to make sure that the compiler doesn’t change the layout of your struct.
 
Evaluation
Implementation
5 points are awarded for code quality and design:
 
Level Description
0 The code is very poor quality (e.g., no comments at all, no functions, poor naming conventions for variables, etc).
1–3
The code is low quality, while some coding standards are applied, their uses is inconsistent (e.g., inconsistent use of comments, some functions but functions might do too much, code is repeated that should be in a function, etc).
 
This is the maximum level you can earn if the implementation of your program is substantially incomplete.
 
4–5 The code is high quality, coding standards are applied consistently throughout the code base.
Each command (info, list, and get) will be graded out of 5 points, for a total of 15 points (5 × 3):
 
Level Criteria
0 No attempt is made to answer this question or code does not run or compile.
1–2 The submitted code is substantially incomplete. Many of the requirements for the assignment are not fully implemented.
3–4 The submitted code is substantially complete, but some minor parts are still missing.
5 The submitted code is complete, all major functionality works as expected.
Submitting your assignment
Submissions will be made using the handin command available on all CS UNIX systems.
 
You should be submitting at least the following files:
 
A Makefile.
A README.md that includes a summary of how to compile and run your program (compiling should be “run make”, but you should tell the grader how to run your program, e.g., what arguments to pass to the program on the command line).
Your solution for question 1 (probably just 1 .c file).
Please do not include any disk images in your submission.
 
If your files are in a folder named my_a4, you would run the command:
 
handin 3430 a4 my_a4
If you need more information, you can see man handin.
 
General Advice
Here’s some general advice that you can choose to follow or ignore:
 
The debugger is your best friend. Don’t even start trying to solve this problem until you’re comfortable with a debugger (either lldb or gdb). The first question I’m going to ask you when you come to ask me for help is: “Have you tried running it through the debugger yet?” If you answer “no”, I’ll ask you (politely!) to do that first, then come back.
Source control is your second best friend. The department hosts an instance of GitLab that you can use to store private projects. You’re also welcome to use private repositories on GitHub.
联系我们
  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-21:00
  • 微信:codinghelp
热点标签

联系我们 - QQ: 99515681 微信:codinghelp
程序辅导网!