首页 > > 详细

Systems Programming Lab 01: Malloc编程讲解

Systems Programming Summer 2023
Lab 01: Malloc

Changelog
If any changes must be made to the lab, they will be posted here (and probably other places as well).

Goals
● Learn about how Malloc works under the hood
● Get used to systems programming environment

Deadlines
Part 1
The Part 1 Checkpoint of this lab is due Monday January 22nd, 2024, 11:59pm . Your submissions will be checked in your lab that week, but they are due on Sunday.
There will be no late submissions accepted for Part 1. If you choose not to submit a Part 1, your most recent submission before the deadline will be pulled and whatever grade you receive for that will count as 10% of your final grade.
Final Submission
This lab will be due Monday January 29th, 2024, 11:59pm. Each day late will receive a penalty of 10pts of the lab grade. The lab will be graded during your lab time that week.

Diagram Legend
This handout makes use of numerous diagrams to assist in the understanding of the lab. The key for those diagrams is thus:

Introduction
As you no doubt know by now, a major part of C programming is the management of memory. You have used malloc() and its kin before, but now it is time to delve into how malloc() works.
In this lab, you will implement a memory allocator, which allows users to malloc() and free() memory as needed.
Your allocator will request large chunks of memory from the OS, and manage all the bookkeeping and memory efficiently.
The allocator you will implement is inspired by the DLMalloc allocator designed by Doug Lea.
Also inspired by the DLMalloc allocator is the PTMalloc allocator, which is currently used by GLibC. Our allocator is a simplified version, but if you look through the approach taken by DLMalloc you will notice many similarities.

Background
UNIX provides a rudimentary API for managing heap memory. Using the system call sbrk(intptr_t increment), we can extend (given a positive integer) and shrink (given a negative integer) the size of the heap. Unfortunately, sbrk() is a very limited syscall: it can only deallocate memory in the order it was allocated. This means that if we used sbrk() to manage memory, in order to be able to free any memory we would need to free the memory in the reverse order it was allocated. This implementation is too coarse-grained for our needs.
Consider the situation below:
// extend the heap to allocate the big array int * big_array = malloc(BIG_NUMBER); ...
// extend the heap to allocate a string char * name = malloc(32); ...
// big_array cannot be freed until name is freed if we can only // deallocate by moving the top of the heap back. free(big_array);
If we managed the memory from the OS ourselves, we could allow the freeing of variables like big_array, and then could reallocate them later when a user requests more memory.

Essentially, all you get from the OS is a way to extend and shrink the heap
How do we implement a more fine grained allocator?
We know that we somehow need to manage the blocks of memory that we are allocating and freeing. A simple way to do this is to add metadata to each block. The metadata will allow us to maintain the size of each block and whether or not it is allocated. We can request a large chunk from the OS and split it up into smaller blocks to service users’ requests. When a user frees a block we can mark it as unallocated so that we can reallocate it for another request. This approach is much more versatile than if we were using sbrk() alone.

How do we find blocks which have been freed?
We still need a way to find the blocks that are unallocated. To do this, we use a linked list data structure we call a free list. We add a next and previous pointer to each block’s metadata so that we can iterate over the unallocated blocks. This allows us to perform a simple list search to find a block which can satisfy our requests.

Dealing with memory fragmentation
Now we can allocate memory, and to free it we can return the blocks to the free list.
Unfortunately, because we are splitting up large blocks into smaller blocks, when the user requests memory we will end up with many small blocks, but some of those blocks may be able to be merged back into a larger block. To address this issue we could iterate over the free list and try to find if the block we are trying to free is adjacent to another already free block. If the neighboring blocks are free, we can coalesce them into a single larger block.

Fragmentation leads to many small chunks in the free list, which cannot satisfy a large request even if the sum of the small chunks could. Coalescing reverses the fragmentation. The fragmented list diagram (top) has a set of three fragmented blocks. The bottom diagram is the same except the fragmented blocks have been coalesced.
Dealing with the edges of the chunks
One detail we must consider is how to handle the edges of the chunks from the OS. If we simply start the first allocable block at the beginning of the memory chunk, then we may run into problems when trying to free the block later. This is because a block at the edge of the chunk is missing a neighbor.
A simple solution to this is to insert a pair of fenceposts at either end of the chunk. The fencepost is a dummy header containing no allocable memory, but which serves as a neighbor to the first and last allocable blocks in the chunk. Now we can look up the neighbors of those blocks and don’t have to worry about accidentally coalescing outside of the memory chunk allocated by the OS, because anytime one of the neighbors is a fencepost we cannot coalesce in that direction.
Conceptual Question: What would happen if we tried to free and coalesce the first block in a chunk without fenceposts? (Hint: the result is nondeterministic)

This design allows us to allocate and free memory without having memory fragmentation problems, but there are some optimizations we can add to improve space and time complexity

Slowly building up the data structures required to handle memory allocation. a: The heap supporting only an increase decrease size operation.
b: Adding block metadata to maintain information about different blocks of heap memory.
c: Adding linked list pointers to maintain all free nodes in the free list. d: Adding fencepost tags to mark the beginning and end of the heap chunks retrieved by sbrk().

Optimizations
Constant Time Coalesce
Naive solution
Above we mention that we can iterate over the free list to find blocks that are next to each other, but unfortunately that makes the free operation O(n), where n is the number of blocks in the list. Optimized solution
The solution we will use is to add another data structure called Boundary Tags, which allow us to calculate the location of the right and left blocks in memory. To calculate the location of the block to the right all we need to know is the size of the current block. To calculate the location of the block to the left we must also maintain the size of the block to the left in each block’s metadata. Now we can find the neighboring blocks in O(1) time instead of O(n). Conceptual Question: Are boundary tags useful when allocating?

Using the size fields in the boundary tags, we can calculate the location of the neighbor blocks without needing the free list at all and in O(1) time.
Multiple Free Lists
Naive solution
Above we have a single free list containing all of the free blocks. To find a block large enough to satisfy a request, we must iterate over all of the blocks to find a block large enough to satisfy the request.
Optimized solution
We can use multiple free lists. We create n free lists one for each allocation size (16, …, 8*(n-1), 8*n bytes.) That way when a user requests memory, we can jump directly to the list representing blocks that are the correct size, instead of having to look through a general list. In the case that list is empty, the next nonempty list will contain the block best fitting the allocation request. However, we only have n lists, so if the user requests 8*n bytes of memory or more, we fall back to the naive approach and scan the final list for blocks which can satisfy the request. This optimization cannot guarantee an O(1) allocation time for all allocations, but for any allocation under 8*n the allocation time is O(number of free lists) as opposed to O(length of the free list).
Conceptual Question: What is the time complexity (Big-O notation) of allocations over 8*n in this optimization? Can you think of a situation when the running time of both solutions would be the same?

With the single free list we must look at all of the small blocks before finally finding the large block. If we maintain size-based free lists, we can simply jump straight to the large enough block.
Reducing Our Metadata Footprint
Naive solution
In our solution above, the memory allocated to the user begins after all of the block’s metadata. This is because we must maintain the metadata like size and allocation status for

when we free the object.
Optimized solution
While we need to maintain the size and allocation status we only use the free list pointers when the object is free. If the object has been allocated it is no longer in a free list and thus the memory that used to store the pointers can be used for other purposes. By placing the next and previous pointers at the end of the metadata we can save an additional 2 * sizeof(pointer) bytes and add that to the memory allocated to the user.
Conceptual Question: What is the minimum number of bytes an allocation can take up with this solution, including both the metadata and the memory requested? (Assume 64-bit architecture)
More Optimizations
The allocated flag that tells if a block is allocated or not uses only one bit. Also, we will later see that the sizes are rounded up to the next 8 bytes and therefore the last three bits are not used. Instead of using an integer to store the allocated flag we will use one of the bits are an unused in the size. That will save an additional 8 bytes.
Coalescing chunks from the OS
Naive solution
When additional memory is required from the OS we can simply add the new chunk to the appropriate free list, but this limits our allocator to only service allocation requests which are less than the arena size.
Optimized solution
Recall that when calls are made to sbrk(), they are retrieved in ascending order. As long as the user program does not call sbrk() in between calls we make to sbrk(), we can coalesce the new chunk with the most recently allocated chunk. This allows us to form larger chunks which can service requests larger than a single arena’s size.
Conceptual Question: Is it possible to coalesce chunks that are not contiguous?

When chunks from the OS are adjacent, we should coalesce them with the last block in the previous chunk.

Lab Specification
General Malloc Spec
The malloc interface is specified in the malloc man page, but leaves many details up to the library’s authors. Consider the numerous optimizations mentioned above; all versions of malloc would be correct by the specification in the man page, but some are better than others, as we’ve shown.
Our Implementation Design Spec
In the background and optimization sections above, we have described the basic implementation we will follow with a set of optimizations. Here we will cover the technical specification of the required design. Some of the requirements are in place to enforce conformance to the design, but others are purely to guarantee determinism between our reference allocator and your allocator for the purposes of testing. The specification below should contain all of the details necessary to make sure your implementation is consistent with the reference implementation. You should read the specification, but it is more for reference when working on the lab and below the specification is a more detailed description of the various tasks to be completed in the lab.
Terminology
Term Definition
chunk ARENA_SIZE sized buffer of memory retrieved from the OS using the sbrk() call. The size of a chunk can be increased by coalescing two chunks together
block Piece of memory containing both metadata and allocable memory. Freed blocks make up all the nodes of the free list excluding the sentinels
metadata A general term for pieces of information malloc uses to manage blocks
allocated metadata Metadata maintained when a block is allocated. This excludes metadata that concerns the free list. It only includes:
- size
- left_size
- allocated
unallocated metadata Metadata maintained when a block is not allocated. This includes all the allocated metadata and the free list pointers:
- next
- prev
header An implementation of metadata. Can either include the free list pointers or not (see the header struct definition in myMalloc.h)
sentinel A header that indicates the start of its corresponding free list. This includes unallocated metadata.
freelistSentinels An array of sentinels initialized at the start of malloc that orders blocks by the allocable size. The first list at index 0 is never used.
fencepost A header with only allocated metadata that denotes the edges of a chunk. When coalescing chunks, the two fence posts between the chunks must be removed
ARENA_SIZE The number of bytes of memory requested of the OS using the sbrk() call. This is a parameterized value which is set for each test case individually (see the tests Makefile: -D ARENA_SIZE={size})
actual size The number of bytes a block takes up in memory. This includes both the amount requested by the user program and the metadata.
allocable size The size that’s available for allocation in the block. This does not include the size of the allocated metadata. This is the size used for indexing into the freelistSentinels array. This may not be equal to the request size.
request size The size the user program has requested. In allocate_object, this is the raw_size.
Data Structures
● The size and left_size fields refer to the “actual size” of a block in memory, which includes the allocable memory and metadata
● All insertions into a free list are made to the end of the list.
● All free lists are implemented as circular doubly linked lists with a sentinel node.

An empty linked list containing only the sentinel node (which contains no data) and a list with three data nodes
● The array of free list sentinels represent our multiple free lists. The array of free lists should be organized as follows. (Allocation sizes are just allocable memory excluding the required size for metadata)
Index Min Allocation Max Allocation
0 1 16
1 17 24
... ... ...
N_LISTS - 2 (N_LISTS - 2) * 8 + 1 (N_LISTS - 1) * 8
N_LISTS - 1 (N_LISTS - 1) * 8 + 1 and up
● The first index in the array of free lists will contain objects for malloc of 16 bytes or less. This is due to the minimum allocation size being 16 bytes for 64-bit allocators. For 64-bit this is caused by the n.
● .
● pointers taking up 16 bytes of space.
Allocation
● An allocation of 0 bytes should return the NULL pointer for determinism
● All chunks requested from the OS should be of size ARENA_SIZE defined in myMalloc.h
● All requests from the user are rounded up to the nearest multiple of 8 bytes
● The minimum request size is the size of the full header struct. Even though the pointer fields at the end of the header are not used when the block is allocated, they are necessary when the block is free and if space is not reserved for them it could lead to memory corruption when freeing the block.
● When allocating from the final free list (N_LISTS - 1), the blocks are allocated in first-fit order: you will iterate the list and look for the first block large enough to satisfy the request size. Given that all other lists are multiples of 8, and all blocks in each list are the same size, this is not an issue with the other lists.
● When allocating, the block allocated should come from the list containing the smallest blocks large enough to satisfy the request.
● When allocating a block there are a few cases to consider:
○ If the block is exactly the request size the block is simply removed from the free list.
○ If the block is larger than the request size, but the remainder is too small to be allocated on its own, the extra memory is included in the memory allocated to the user and the full block is still allocated just as if it had been exactly the right size.
○ If the block is larger than the request size and the remainder is large enough to be allocated on its own, the block is split into two smaller blocks. We could allocate either of the blocks to the user, but for determinism the user is allocated the block which is lower in memory (i.e. the leftmost block).
○ When splitting a block, the remaining block should be removed and inserted into the appropriate free list.
● When no available block can satisfy the user’s request, we must request another chunk of memory from the OS and retry the allocation. When the library is initialized the allocator requests a chunk from the OS and inserts it into the freelist.
○ When allocating another chunk there are two cases we must consider:
■ If the new chunk from the OS is adjacent to the last chunk allocated from the OS, the chunks should be coalesced into a single chunk.
● When coalescing chunks from the OS they may need to be coalesced with existing blocks. In that case follow the specification for coalescing blocks when freeing (listed below.)
■ If the new chunk from the OS is not adjacent to the last chunk allocated from the OS, the chunk should simply be inserted into the free list.
○ In operating systems, you can never simply expect a call to the OS to work all the time. If allocating a new chunk from the OS fails, the NULL pointer should be returned and errno should be set appropriately (see the man page.)
○ New chunks should be allocated lazily. That means that more memory is requested only when servicing a request that cannot be satisfied by any of the available free blocks.
Deallocation
● Freeing a NULL pointer is a no-op (don’t do anything).
● When freeing a block, there are a few cases to consider:
○ Neither the right nor the left blocks are unallocated. In this case, simply insert the block into the appropriate free list
○ Only the right block is unallocated. Then coalesce the current and right blocks together. The newly coalesced block should remain where the right block was in the free list
○ Only the left block is unallocated. Then coalesce the current and left blocks and the newly coalesced block should remain where the left block was in the free list.
○ Both the right and left blocks are unallocated. We must coalesce with both neighbors. In this case the coalesced block should remain where the left block (lower in memory) was in the free list.
● When coalescing a block, if the size of the coalesced block is no longer appropriate for the current list, the newly formed block should be removed and inserted into the appropriate free list. (Note: This applies even to cases above where it is mentioned to leave the block where it was in the free list. Writing the cases in this way makes for a smooth translation into writing code) Conceptual Question: When coalescing blocks in the last free list (N_LISTS - 1), should you ever remove the block from that list?

Tasks
● Task 1: Allocation
● Task 2: Deallocation
● Task 3: Managing multiple chunks
Task 1: Allocation
1. Calculate the required block size (actual request size) by adding the size of the required metadata and rounding the allocation size up to the next 8-byte modulo. The actual request size is either the size calculated in this step or the size of the header struct, whichever is larger.
2. Find the appropriate free list to look for a block to allocate (recall that the block size per each list is based on the rounded request size, not including the metadata) size of the head
a. As long as there are no blocks in the current list, check the next list until there is a block, which can satisfy the request
b. When checking the final list, instead of just checking if the list is empty, iterate over all blocks and check if any is large enough to satisfy the request.
3. Depending on the size of the block, either allocate the full block or split the block and allocate the leftmost block (lower in memory) portion to the user.
a. When splitting, the size and left_size fields maintained in the neighboring blocks will need to be updated.
4. When allocating a block, update its allocation status to ALLOCATED
5. Finally, return to the user a pointer to the data field of the header.
Task 2: Deallocation (Freeing)
1. Free is called on the same pointer that malloc returned, which means we must calculate the location of the header by pointer arithmetic.
2. Once we have the header of the block being freed we must calculate the locations of its right and left neighbors, also using pointer arithmetic and the block’s size fields.
3. Based on the allocation status of the neighboring blocks, we must either insert the block or coalesce with one or both of the neighboring blocks
Task 3: Managing Additional Chunks
Above we don’t specify how to handle the case where the user’s request cannot be fulfilled by any of the available blocks.
1. If no available block can satisfy an allocation request then we must allocate more memory from the OS. To do this we must call sbrk() again.
2. Two consecutive calls to sbrk() should allocate contiguous regions in memory, therefore when we allocate another chunk we can coalesce it with the most recently allocated chunk. Unfortunately, the user of our library could have called sbrk() between our calls and thus we must check that our two chunks are in fact contiguous in memory.
3. There are a few cases for managing adding a new chunk to the free lists:
a. If the new chunk is not adjacent to the previous chunk, then it can simply be inserted into the appropriate free list like any other block.
b. If the new chunk is adjacent to the previous chunk, then the fence posts between the two chunks should be removed and the chunk should be coalesced with any unallocated memory at the end of the previous chunk.
4. This process may need to be repeated multiple times to get enough memory to satisfy a user’s requests.
a. Example: Say a user makes a malloc request for 1000 bytes of memory, but our arena size is 400 bytes. To satisfy the request, we must make 2 consecutive calls to sbrk, leaving us with a total of 1200 bytes in a single contiguous block, from which we can then allocate the necessary 1000 bytes. All new chunks must be the arena size. It is important for you to manage additional chunks this way, as this is how the tests expect you to handle new chunks.
Implementation Notes
● Whenever referencing the size of a variable or structure, you must use sizeof instead of hard coding a constant. One of the tests uses the -m32 flag in gcc to compile the test as a 32-bit binary. For instance, use: x = sizeof(size_t);
Don’t use:
x = 8;
● Segregated Free List

The previous pointers and final pointers back to the sentinel node have been omitted for clarity. The segregated list is made up of an array of freelist sentinels. For each list at index 0..N_LISTS-2, the size of all blocks are the same. For the final list the blocks may vary in size. The sizes listed in the diagram are just the allocable sizes of each block. ● Data Structures together
The previous pointers and pointers back to the sentinel have been omitted for clarity.

Coding
Skeleton code
Log into a lab machine and type:
mkdir cs252 cd cs252 git clone /homes/cs252/sourcecontrol/work/$USER/lab1-src.git cd lab1-src
Note: Before you start working on the lab read through the provided code. Make sure you understand how the data structures work and how the initialization code works. Taking time to understand the skeleton will save you a lot of time in the end.
myMalloc.c
The main implementation file for the project. Provided is an initialization function which allocates the first chunk from the OS and inserts it into the final free list. malloc, calloc, realloc, and free are implemented using the internal functions allocate_object and deallocate_object which are provided as function stubs. Primarily you will be implementing these two functions, but may need to modify others as well depending on your specific implementation approach. It is recommended that you implement your solution in a modular way, for both your clarity and that of your TAs. Try to piece out individual steps or procedures into smaller functions, so that it is easier to pinpoint where issues may occur, and understand the flow of your code. Additionally, modularity lends itself very well to reducing repetitive code, of which there can be a lot in this project, if implemented poorly. Several functions in the skeleton code also exist to verify the correctness of your data structures as you implement your lab. It is heavily suggested that you read through these functions and understand how they work, as that will greatly assist you in understanding how to work with the data structures, and how to understand where issues with your implementation may arise.
printing.c
Helper functions for printing the free list and boundary tags
testing.c
Helper functions for writing test cases
tests/testsrc/test_*.c Test source files.
Using "make" and GIT
To make your code type
cd lab1-src make
The make command will build your malloc sources and it will push the latest changes into your git repository. It is important that you use "make" instead of directly typing the compiler command to make sure that a git log is built for your project. Projects without a git log will not be graded.
To run the tests type
./runtest.py
You can run an individual test by typing
./tests/test_simple0
To see the expected output of a test you may run
./tests/expected/test_simple0
When you are starting your implementation we recommend you try to run test_simple0 to test_simple6 before you try other tests.
To debug a specific test try:
gdb ./tests/test_simple1
(gdb) break main (gdb) run

Testing
runtest.py is the provided test runner. running ./runtest.py (or python3 runtest.py if your environment variables are not set) will run all of the tests and provide a score out of 80 points. There will be an additional 10 points of tests used during the final grading. The remaining 10 points on the final grade are from the checkpoint score.
./runtest.py -t will run only the specified test.
To run only the tests that will be included in the Part 1 Checkpoint, you can run
./runtest.py part1
Testing is done by the test cases compiled with your allocator and the reference implementation and diffing the output. The binaries compiled with the reference implementation can be found in tests/expected
Update the CFLAGS variable in the tests Makefile to include -DDEBUG when dealing with multiple chunks from the OS. This is due to printf allocating memory with sbrk when -DDEBUG is not set (see the init() code.)
Note: If you receive a bad interpreter error when trying to run the runtest.py script then you may need to update the shebang at the top of the script. Instead of #!env python3 it should be #!/usr/bin/env python3.
Output is simple text by default, but color printing can be enabled by setting the
MALLOC_DEBUG_COLOR environment variable to 1337_CoLoRs
To enable: export MALLOC_DEBUG_COLOR=1337_CoLoRs To disable: export MALLOC_DEBUG_COLOR=false

Turning In Your Work
You will use git tags to turn in your project. This allows us to return the state the project was in on the date it was turned in. You must push the tags to the CS departments repo before the deadline for the lab.
For checkpoint 1 your implementation is supposed to pass at least test_simple0 to test_simple4. Since the second part requires that all the tests should pass, we encourage you to finish part 1 as soon as possible.
Git add, commit, and push commands added to turnin instructions.
This checkpoint will test the functionality of allocate memory, and the most basic requirement for requesting new chunks from the OS.
1. Login to a CS department machine
2. Navigate to your lab1-src directory
3. Run make clean
4. Run make and check that the tests look correct
5. Run git add -A
6. Run git commit -m “part1 done”
7. Run git push origin master
8. Run git tag -f part1
9. Run git push -f origin part1
10. Run git show part1
11. The show command should show the diff from the most recent commit
Final Turnin
To turn in the full lab:
1. Login to a CS department machine
2. Navigate to your lab1-src directory
3. Run make clean
4. Run make and check that the tests look correct
5. Run git add -A
6. Run git commit -m “final done”
7. Run git push origin master
8. Run git tag -f final
9. Run git push -f origin final
10. Run git show final
11. The show command should show the diff from the most recent commit

Extra Points
You could earn extra points by implementing the following features. Make sure that the new features do not interfere with the testing of the lab.
1. Make the memory allocator work with programs such as Google Chrome. You may write a different version of the allocator than the one implemented in the lab. You will also need to implement memalign(), valloc(), posix_memalign(), and pvalloc(). See `man malloc` as well as the Linux manual pages for valloc().
(IMPORTANT: To debug with gdb and LD_PRELOAD in gdb type:
set exec-wrapper env 'LD_PRELOAD=./uafo.so'
}
2. Optimize realloc(). The current implementation always allocates a new object, copies the old object to the new one, and then frees the old object. This can be optimized by checking if the block beyond the old object is free, and then extending the old object into this space. Remember to modify the block/header that the old object is being extended into, if this is the case.
The code you will submit for the regular lab1 submission and the lab1 extra credit are different. Make sure that your lab1 regular submission done in your git repository passes all the tests before you work on the extra credit.
Assuming that your extra credit code is in a directory lab1-src-extra/ add a README file that includes
● The extra credit (realloc and/or LD_PRELOAD) ● For realloc:
○ Tests you wrote that test your realloc and what they test ○ Describe your implementation and issues.
● For LD_PRELOAD:
○ What projects you can run with malloc (ls, vim, chrome) ○ Describe your implementation and issues. Then turn in your extra credit with the command:
turnin -c cs252 -p lab1-extra lab1-src-extra
You will also show your implementation during lab time next week.

联系我们
  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-21:00
  • 微信:codinghelp
热点标签

联系我们 - QQ: 99515681 微信:codinghelp
程序辅导网!