/
Proceedings of the San Francisco USENIX Conference,pp. 295-303, June 1 Proceedings of the San Francisco USENIX Conference,pp. 295-303, June 1

Proceedings of the San Francisco USENIX Conference,pp. 295-303, June 1 - PDF document

karlyn-bohler
karlyn-bohler . @karlyn-bohler
Follow
418 views
Uploaded On 2016-02-29

Proceedings of the San Francisco USENIX Conference,pp. 295-303, June 1 - PPT Presentation

135UNIX is a registered trademark of ATT in the US and other countriesSummer USENIX 88295 SanFrancisco June 2024 Design of a General Purpose Memory McKusick KarelsThis memory allocation me ID: 235828

‡UNIX registered trademark

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "Proceedings of the San Francisco USENIX ..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Proceedings of the San Francisco USENIX Conference,pp. 295-303, June 1988.Design of a General Purpose Memory Allocator for the 4.3BSDUNIX‡ KernelMarshall Kirk McKusickMichael J.KarelsComputer Science DivisionUniversity of California, BerkeleyBerkeley, California 94720ABSTRACTThe 4.3BSD UNIX kernel uses manymemory allocation mechanisms, eachdesigned for the particular needs of the utilizing subsystem.This paper describes a gen-eral purpose dynamic memory allocator that can be used by all of the kernel subsystems.The design of this allocator takes advantage of known memory usage patterns in theUNIX kernel and a hybrid strategy that is time-efŒcient for small allocations and space-efŒcient for large allocations.This allocator replaces the multiple memory allocationinterfaces with a single easy-to-program interface, results in more efŒcient use of globalthat no performance loss is observed relative tothe current implementations.The paperconcludes with a discussion of our experience in using the newmemory allocator,anddirections for future work.1. Kernel Memory Allocation in 4.3BSDThe 4.3BSD kernel has at least ten different memory allocators.Some of them handle large blocks,tions. Oftenthe allocations are for small pieces of memory that are only needed for the duration of a singlesystem call.In a user process such short-term memory would be allocated on the run-time stack.Becausethe kernel has a limited run-time stack, it is not feasible to allocate evenmoderate blocks of memory on it.Consequently,such memory must be allocated through a more dynamic mechanism.Forexample, whenthe system must translate a pathname, it must allocate a one kilobye buffer to hold the name.Other blocksof memory must be more persistent than a single system call and really have tobeallocated from dynamicmemory.Examples include protocol control blocks that remain throughout the duration of the networkDemands for dynamic memory allocation in the kernel have increased as more services have beenadded. Eachtime a newtype of memory allocation has been required, a specialized memory allocationscheme has been written to handle it.Often the newmemory allocation scheme has been built on top of anolder allocator.For example, the block device subsystem provides a crude form of memory allocationthrough the allocation of empty buffers [Thompson78].The allocation is slowbecause of the impliedsemantics of Œnding the oldest buffer,pushing its contents to disk if theyare dirty,and moving physicalmemory into or out of the buffer to create the requested size.To reduce the overhead, a ``new''memoryallocator was built in 4.3BSD for name translation that allocates a pool of empty buffers. Itkeeps them onafree list so theycan be quickly allocated and freed [McKusick85]. ‡UNIX is a registered trademark of AT&T in the US and other countries.Summer USENIX '88295 SanFrancisco, June 20-24 Design of a General Purpose Memory ...McKusick, KarelsThis memory allocation method has several drawbacks. First,the newallocator can only handle alimited range of sizes.Second, it depletes the buffer pool, as it steals memory intended to buffer diskblocks to other purposes.Finally,itcreates yet another interface of which the programmer must be aware.Ageneralized memory allocator is needed to reduce the complexity of writing code inside the kernel.Rather than providing manysemi-specialized ways of allocating memory,the kernel should provide a sin-gle general purpose allocator.With only a single interface, programmers do not need to Œgure out the mostappropriate way to allocate memory.Ifagood general purpose allocator is available, it helps avoid the syn-drome of creating yet another special purpose allocator.To ease the task of understanding howtouse it, the memory allocator should have aninterface simi-lar to the interface of the well-known memory allocator provided for applications programmers through theClibrary routines() andfree().Likethe C library interface, the allocation routine should takeaparameter specifying the size of memory that is needed.The range of sizes for memory requests should notbe constrained.The free routine should takeapointer to the storage being freed, and should not require2. CriteriaforaKernel Memory AllocatorThe design speciŒcation for a kernel memory allocator is similar to, but not identical to, the designcriteria for a user levelmemory allocator.The Œrst criterion for a memory allocator is that it makegood useof the physical memory.Good use of memory is measured by the amount of memory needed to hold a setof allocations at anypoint in time.Percentage utilization is expressed as: Here, ``requested''isthe sum of the memory that has been requested and not yet freed.``Required''istheamount of memory that has been allocated for the pool from which the requests are Œlled.An allocatorrequires more memory than requested because of fragmentation and a need to have a ready supply of freememory for future requests.Aperfect memory allocator would have a utilization of 100%.In practice,having a 50% utilization is considered good [Korn85].Good memory utilization in the kernel is more important than in user processes.Because user pro-cesses run in virtual memory,unused parts of their address space can be paged out.Thus pages in the pro-cess address space that are part of the ``required''pool that are not being ``requested''need not tie up physi-cal memory.Because the kernel is not paged, all pages in the ``required''pool are held by the kernel andcannot be used for other purposes.To keep the kernel utilization percentage as high as possible, it is desir-able to release unused memory in the ``required''pool rather than to hold it as is typically done with userprocesses. Becausethe kernel can directly manipulate its own page maps, releasing unused memory is fast;auser process must do a system call to release memory.The most important criterion for a memory allocator is that it be fast. Becausememory allocation isdone frequently,aslowmemory allocator will degrade the system performance.Speed of allocation ismore critical when executing in the kernel than in user code, because the kernel must allocate manydatastructure that user processes can allocate cheaply on their run-time stack.In addition, the kernel representsthe platform on which all user processes run, and if it is slow, itwill degrade the performance of every pro-Another problem with a slowmemory allocator is that programmers of frequently-used kernel inter-faces will feel that theycannot afford to use it as their primary memory allocator.Instead theywill buildtheir own memory allocator on top of the original by maintaining their own pool of memory blocks.Multi-ple allocators reduce the efŒciencywith which memory is used.The kernel ends up with manydifferentfree lists of memory instead of a single free list from which all allocation can be drawn. For example, con-sider the case of twosubsystems that need memory.Iftheyhav etheir own free lists, the amount of mem-ory tied up in the twolists will be the sum of the greatest amount of memory that each of the twosubsys-tems has everused. Iftheyshare a free list, the amount of memory tied up in the free list may be as lowasthe greatest amount of memory that either subsystem used.As the number of subsystems grows, the sav-ings from having a single free list grow.Summer USENIX '88296 SanFrancisco, June 20-24 McKusick, KarelsDesign of a General Purpose Memory ...3. ExistingUser-levelImplementationsThere are manydifferent algorithms and implementations of user-levelmemory allocators.Asurveyof those available on UNIX systems appeared in [Korn85]. Nearlyall of the memory allocators testedmade good use of memory,though most of them were too slowfor use in the kernel. Thefastest memoryvey bynearly a factor of twowas the memory allocator provided on 4.2BSD originallywritten by Chris KingsleyatCalifornia Institute of Technology.Unfortunately,the 4.2BSD memory alloca-tor also wasted twice as much memory as its nearest competitor in the survey.The 4.2BSD user-levelmemory allocator works by maintaining a set of lists that are ordered byincreasing powers of two. Eachlist contains a set of memory blocks of its corresponding size.To fulŒll amemory request, the size of the request is rounded up to the next power of two. Apiece of memory is thenremovedfrom the list corresponding to the speciŒed power of twoand returned to the requester.Thus, arequest for a block of memory of size 53 returns a block from the 64-sized list.Atypical memory alloca-tion requires a roundup calculation followed by a linked list removal. Onlyif the list is empty is a realmemory allocation done.The free operation is also fast; the block of memory is put back onto the list fromwhich it came.The correct list is identiŒed by a size indicator stored immediately preceding the memory4. ConsiderationsUnique to a Kernel AllocatorThere are several special conditions that arise when writing a memory allocator for the kernel that donot apply to a user process memory allocator.First, the maximum memory allocation can be determined atthe time that the machine is booted.This number is nevermore than the amount of physical memory on theis uninteresting to use.Thus, the kernel can statically allocate a set of data structures to manage its dynami-cally allocated memory.These data structures neverneed to be expanded to accommodate memoryrequests; yet, if properly designed, theyneed not be large. For a user process, the maximum amount ofmemory that may be allocated is a function of the maximum size of its virtual memory.Although it couldallocate static data structures to manage its entire virtual memory,eveniftheywere efŒciently encoded theywould potentially be huge.The other alternative istoallocate data structures as theyare needed.However,that adds extra complications such as newfailure modes if it cannot allocate space for additional structuresand additional mechanisms to link them all together.Another special condition of the kernel memory allocator is that it can control its own address space.Unlikeuser processes that can only growand shrink their heap at one end, the kernel can keep an arena ofkernel addresses and allocate pieces from that arena which it then populates with physical memory.Theeffect is much the same as a user process that has parts of its address space paged out when theyare not inuse, except that the kernel can explicitly control the set of pages allocated to its address space.The result isthat the ``working set''ofpages in use by the kernel exactly corresponds to the set of pages that it is reallyAŒnal special condition that applies to the kernel is that all of the different uses of dynamic memoryare known in advance. Eachone of these uses of dynamic memory can be assigned a type.Foreach typeof dynamic memory that is allocated, the kernel can provide allocation limits.One reason givenfor havingseparate allocators is that no single allocator could starvethe rest of the kernel of all its available memoryand thus a single runawayclient could not paralyze the system.By putting limits on each type of memory,the single general purpose memory allocator can provide the same protection against memory starvation.‡Figure 1 shows the memory usage of the kernel overaone day period on a general timesharingmachine at Berkeley. The ``In Use'', ``Free'', and ``Mem Use''Œelds are instantaneous values; the``Requests''Œeld is the number of allocations since system startup; the ``High Use''Œeld is the maximumvalue of the ``Mem Use''Œeld since system startup.The Œgure demonstrates that most allocations are forsmall objects.Large allocations occur infrequently,and are typically for long-livedobjects such as buffersto hold the superblock for a mounted Œle system.Thus, a memory allocator only needs to be fast for smallpieces of memory. ‡One might seriously ask the question what good it is if ``only''one subsystem within the kernel hangs if it issomething likethe network on a diskless workstation.Summer USENIX '88297 SanFrancisco, June 20-24 Design of a General Purpose Memory ...McKusick, Karels Memory statistics by bucket size Size InUse Free Requests 128 32939 3129219256 000512 40161024 1756487712048 130132049­4096 001574097­8192 201038193­16384 00016385­32768 101 Memory statistics by type Type InUse MemUse HighUse Requests mbuf 61K 17K3099066devbuf 1353K 53K13socket 375K 6K1275pcb 557K 8K1512routetbl 22929K 29K2424fragtbl 00K 1K404zombie 31K 1K24538namei 00K 5K648754ioctlops 00K 1K12superblk 2434K 34K24temp 00K 8K258 \*(Lb.\*(Lt5. Implementationof the Kernel Memory AllocatorIn reviewing the available memory allocators, none of their strategies could be used without somemodiŒcation. Thekernel memory allocator that we ended up with is a hybrid of the fast memory allocatorfound in the 4.2BSD C library and a slower but more-memory-efŒcient Œrst-Œt allocator.Small allocations are done using the 4.2BSD power-of-twolist strategy; the typical allocationrequires only a computation of the list to use and the removalofanelement if it is available, so it is quitefast. Macrosare provided to avoid the cost of a subroutine call.Only if the request cannot be fulŒlled fromalist is a call made to the allocator itself.To ensure that the allocator is always called for large requests,the lists corresponding to large allocations are always empty.Appendix A shows the data structures andSimilarly,freeing a block of memory can be done with a macro.The macro computes the list onwhich to place the request and puts it there.The free routine is called only if the block of memory is con-sidered to be a large allocation.Including the cost of blocking out interrupts, the allocation and freeingmacros generate respectively only nine and sixteen (simple) VAX instructions.Because of the inefŒciencyofpower-of-twoallocation strategies for large allocations, a differentstrategy is used for allocations larger than twokilobytes. Theselection of twokilobytes is derivedfrom ourstatistics on the utilization of memory within the kernel, that showed that 95 to 98% of allocations are ofsize one kilobyte or less.Afrequent caller of the memory allocator (the name translation function) alwaysrequests a one kilobyte block.Additionally the allocation method for large blocks is based on allocatingpieces of memory in multiples of pages.Consequently the actual allocation size for requests of sizeSummer USENIX '88298 SanFrancisco, June 20-24 McKusick, KarelsDesign of a General Purpose Memory ...or less are identical.‡In 4.3BSD on the VAX, the (software) page size is one kilobyte, so twokilobytes is the smallest logical cutoff.Large allocations are Œrst rounded up to be a multiple of the page size.The allocator then uses aŒrst-Œt algorithm to Œnd space in the kernel address arena set aside for dynamic allocations.Thus a requestfor a Œvekilobyte piece of memory will use exactly Œvepages of memory rather than eight kilobytes aswith the power-of-twoallocation strategy.When a large piece of memory is freed, the memory pages arereturned to the free memory pool, and the address space is returned to the kernel address arena where it isboth the efŒciencyofmemory utilization and the speed of allocation isto cluster same-sized small allocations on a page.When a list for a power-of-twoallocation is empty,anewpage is allocated and divided into pieces of the needed size.This strategy speeds future allocations asseveral pieces of memory become available as a result of the call into the allocator.kernel memory pagesLegend:cont ­ continuation of previous pagevious page(­)­PA GESIZE 1024,256,512,3072,cont,cont,128,128,free,cont,128,1024,free,cont,cont, char *kmembase\.in 0 \.ce \*(Lb.\*(LtBecause the size is not speciŒed when a block of memory is freed, the allocator must keep track ofthe sizes of the pieces it has handed out.The 4.2BSD user-levelallocator stores the size of each block in aheader just before the allocation.However, this strategy doubles the memory requirement for allocationsthat require a power-of-two-sized block.Therefore, instead of storing the size of each piece of memorywith the piece itself, the size information is associated with the memory page.Figure 2 shows howthe ker-and looking up the size associated with that page.Eliminating the cost of the overhead per piece improvedutilization far more than expected. Thereason is that manyallocations in the kernel are for blocks of mem-ory whose size is exactly a power of two. Theserequests would be nearly doubled if the user-levelstrategywere used.Nowtheycan be accommodated with no wasted memory.The allocator can be called both from the top half of the kernel, which is willing to wait for memoryto become available, and from the interrupt routines in the bottom half of the kernel that cannot wait formemory to become available. Clientsindicate their willingness (and ability) to wait with a ag to the allo-cation routine.Forclients that are willing to wait, the allocator guarrentees that their request will succeed.Thus, these clients can need not check the return value from the allocator.Ifmemory is unavailable and theclient cannot wait, the allocator returns a null pointer.These clients must be prepared to cope with this(hopefully infrequent) condition (usually by giving up and hoping to do better later). ‡Tounderstand whythis number is 2one observes that the power-of-twoalgorithm yields sizes of 1,2, 4, 8, ...pages while the large block algorithm that allocates in multiples of pages yields sizes of 1, 2, 3, 4, ...pages. Thusfor allocations of sizes between one and twopages both algorithms use twopages; it is not untilallocations of sizes between twoand three pages that a difference emerges where the power-of-twoalgorithmwill use four pages while the large block algorithm will use three pages.Summer USENIX '88299 SanFrancisco, June 20-24 Design of a General Purpose Memory ...McKusick, Karels6. Resultsof the ImplementationThe newmemory allocator was written about a year ago.Conversion from the old memory alloca-tors to the newallocator has been going on eversince. Manyofthe special purpose allocators have beeneliminated. Thislist includes(),(),and().Manyofthe special purpose memoryallocators built on top of other allocators have also been eliminated.Forexample, the allocator that wasbuilt on top of the buffer pool allocatorgeteblk() toallocate pathname buffers in() has been elimi-nated. Becausethe typical allocation is so fast, we have found that none of the special purpose pools areneeded. Indeed,the allocation is about the same as the previous cost of allocating buffers from the networkmbufs). Consequentlyapplications that used to allocate network buffers for their own uses have beenswitched overtousing the general purpose allocator without increasing their running time.Quantifying the performance of the allocator is difŒcult because it is hard to measure the amount oftime spent allocating and freeing memory in the kernel. Theusual approach is to compile a kernel for pro-Œling and then compare the running time of the routines that implemented the old abstraction versus thosethat implement the newone. Theold routines are difŒcult to quantify because individual routines wereused for more than one purpose.Forexample, thegeteblk() routine was used both to allocate one kilobytememory blocks and for its intended purpose of providing buffers to the Œlesystem.Differentiating theseuses is often difŒcult. Toget a measure of the cost of memory allocation before putting in our newalloca-tor,wesummed up the running time of all the routines whose exclusive task was memory allocation.TotiŒed as memory allocation usage.This number showed that approximately three percent of the time spentin the kernel could be accounted to memory allocation.The newallocator is difŒcult to measure because the usual case of the memory allocator is imple-mented as a macro.Thus, its running time is a small fraction of the running time of the numerous routinesin the kernel that use it.To get a bound on the cost, we changed the macro always to call the memory allo-cation routine.Running in this mode, the memory allocator accounted for six percent of the time spent inthe kernel. Factoring out the cost of the statistics collection and the subroutine call overhead for the casesthat could normally be handled by the macro, we estimate that the allocator would account for at most fourpercent of time in the kernel. Thesemeasurements showthat the newallocator does not introduce signiŒ-cant newrun-time costs.The other major success has been in keeping the size information on a per-page basis.This techniqueallows the most frequently requested sizes to be allocated without waste. Italso reduces the amount ofbookkeeping information associated with the allocator to four kilobytes of information per megabyte of7. FutureWorkOur next project is to convert manyofthe static kernel tables to be dynamically allocated.Statictables include the process table, the Œle table, and the mount table.Making these tables dynamic will havetwobeneŒts. First,it will reduce the amount of memory that must be statically allocated at boot time.Sec-retained to constrain runawayclients). Otherresearchers have already shown the memory savings achievedby this conversion [Rodriguez88].Under the current implementation, memory is nevermovedfrom one size list to another.With the4.2BSD memory allocator this causes problems, particularly for large allocations where a process may useaquarter megabyte piece of memory once, which is then neveravailable for anyother size request.In ourhybrid scheme, memory can be shufed between large requests so that large blocks of memory are neverstranded as theyare with the 4.2BSD allocator.Howev er, pages allocated to small requests are allocatedonce to a particular size and neverchanged thereafter.Ifaburst of requests came in for a particular size,that size would acquire a large amount of memory that would then not be available for other futureIn practice, we do not Œnd that the free lists become too large. However, wehav ebeen investigatingways to handle such problems if theyoccur in the future.Our current investigations involvearoutine thatcan run as part of the idle loop that would sort the elements on each of the free lists into order of increasingSummer USENIX '88300 SanFrancisco, June 20-24 McKusick, KarelsDesign of a General Purpose Memory ...address. Sinceanygiv enpage has only one size of elements allocated from it, the effect of the sortingwould be to sort the list into distinct pages.When all the pieces of a page became free, the page itself couldbe released back to the free pool so that it could be allocated to another purpose.Although there is no guar-antee that all the pieces of a page would everbefreed, most allocations are short-lived, lasting only for theduration of an open Œle descriptor,anopen network connection, or a system call.As newallocations wouldbe made from the page sorted to the front of the list, return of elements from pages at the back would even-tually allowpages later in the list to be freed.Tw oof the traditional UNIX memory allocators remain in the current system.The terminal subsys-s(character lists).That part of the system is expected to undergo major revision within thenext year or so, and it will probably be changed to usembufsasitismerged into the network system.Thegetblk(),the routine that manages the Œlesystem buffer pool memoryand associated control information.Only the Œlesystem usesgetblk() inthe current system; it manages theconstant-sized buffer pool.We plan to merge the Œlesystem buffer cache into the virtual memory system'spage cache in the future.This change will allowthe size of the buffer pool to be changed according tomemory load, but will require a policyfor balancing memory needs with Œlesystem cache performance.8. AcknowledgmentsIn the spirit of community support, we have made various versions of our allocator available to ourtest sites.Theyhav ebeen busily burning it in and giving us feedback on their experiences. Weacknowl-edge their invaluable input.The feedback from the Usenix program committee on the initial draft of ourpaper suggested numerous important improvements.9. ReferencesKorn85 David Korn, Kiem-Phong Vo, ``In Search of a Better Malloc''Proceedings of the PortlandUsenix Conference,pp489-506, June 1985.McKusick85 M.McKusick, M. Karels, S. Lefer,``Performance Improvements and Functional Enhance-ments in 4.3BSD''Proceedings of the Portland Usenix Conference,pp519-531, June 1985.Rodriguez88 RobertRodriguez, Matt Koehler,Larry Palmer,RickyPalmer,``A Dynamic UNIX Operat-ing System''Proceedings of the San Francisco Usenix Conference,June 1988.Thompson78 Ken Thompson, ``UNIX Implementation''Bell System Technical Journal,volume 57, num-Summer USENIX '88301 SanFrancisco, June 20-24 Design of a General Purpose Memory ...McKusick, Karels0.. A*Constants for setting the parametersofthe kernel memory allocator.*2**MINBUCKET is the smallest unit of memory that will be*allocated. It must be at least large enough to hold a pointer.*Units of memory less or equal to MAXALLOCSAVE will permanently*allocate physical memory; requests for these size pieces of memory*are quite fast. Allocations greater than MAXALLOCSAVE must*always allocate and free physical memory; requests for these size*allocations should be done infrequently as theywill be slow.*Constraints: CLBYTES VE (UCKET + 14)*and MAXALLOCSIZE must be a power of two.MINBUCKET 4&#x= 2 ;&#x** M;&#xINB1;�/* 4 = min allocation of 16 bytes */MAXALLOCSAVEMAXALLOCSAVE (2*Maximum amount of kernel dynamic memory.*Constraints: must be a multiple of the pagesize.PA GESIZE)*Arena for all kernel dynamic memory allocation.*This arena is known to start on a page boundary.boundary./**Array of descriptorsthat describe the contents of eachpage/* bucketindex, size of small allocations *//* for large allocations, pagesallocated *///PA GESIZE];*Set of buckets for eachsize of memory blockthat is retainedkmembucketstkbnext;/* list of free blocks */bucket[MINBUCKET + 16];Summer USENIX '88302 SanFrancisco, June 20-24 McKusick, KarelsDesign of a General Purpose Memory ...*Macrotoconvert a size to a bucketindex. If the size is constant,*this macroreduces to a compile time constant.MINALLOCSIZEMINALLOCSIZE (1MINBUCKET)BUCKETINDXBUCKETINDX(size) \(size) (ALLOCSIZE?(size) (ALLOCSIZE?(size) (ALLOCSIZE?(size) (ALLOCSIZE?(MINBUCKET + 0) \:(MINBUCKET + 1) \:(size) (ALLOCSIZE?(MINBUCKET + 2) \:(MINBUCKET + 3) \:(size) (ALLOCSIZE?(size) (ALLOCSIZE?(MINBUCKET + 4) \:(MINBUCKET + 5) \:(size) (ALLOCSIZE?(MINBUCKET + 6) \:(MINBUCKET + 7) \:(size) (ALLOCSIZE/* etc ... */*Macroversions for the usual cases of malloc/freekmembucketskbp = &bucket[BUCKETINDX(size)]; \s=splimp(); \next == NULL)next; \next =kmembucketsets(()-kmembase)/PA GESIZE]; \s=splimp(); \&#x= MI;&#xN350;indx MAXALLOCSAVE)kbp = &bucket[kspnext; \next = (caddrSummer USENIX '88303 SanFrancisco, June 20-24 Design of a General Purpose Memory ...McKusick, KarelsSummer USENIX '88304 SanFrancisco, June 20-24