/
Dynamic Partitioning of Large Multicomputer Systems HansUlrich Heiss Department of Informatics Dynamic Partitioning of Large Multicomputer Systems HansUlrich Heiss Department of Informatics

Dynamic Partitioning of Large Multicomputer Systems HansUlrich Heiss Department of Informatics - PDF document

min-jolicoeur
min-jolicoeur . @min-jolicoeur
Follow
576 views
Uploaded On 2014-12-18

Dynamic Partitioning of Large Multicomputer Systems HansUlrich Heiss Department of Informatics - PPT Presentation

We regard the problem as a twodimensional resource allo cation problem for which different algorithms are pro posed Results from simulation experiments indicate the performance that can be achieved in terms of fragmentation and throughput 1 The Prob ID: 25844

regard the problem

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "Dynamic Partitioning of Large Multicompu..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Proc. Int. Conf. on Massively Parallel Computing Systems (IEEE MPCS94), Ischia, May 2.-6., 1994Department of Informatics, University of Karlsruhe, GermanyWe consider multiprogramming operation in largescale grid-connected multicomputer systems where thegrid is spatially partitioned among the programs. Weregard the problem as a two-dimensional resource allo-cation problem for which different algorithms are pro-posed. Results from simulation experiments indicate thecan be achieved in terms offragmentation and throughput.Programming for multicomputer systems is stilluncomfortable and inefficient. We often observemonoprogramming operation, which inevitably leads topoor utilization and uneconomic machine usage. Whenmultiprogramming is available, the machine is usuallypartitioned manually and in a rather static way withoutthe ability to adjust the partitioning to the requests of the parallel programs. The reason for thissituation is a lack of operating system support. Wetherefore claim that operating systems for those machineshave to provide a dynamic processor management facilitycomparable to state-of-the-art storage management.The dynamic partitioning problem in parallel systemshas been subject of a number of papers. Among the regu-lar architectures, the hypercube has been the most attrac-tive so far [1, 3, 6]. Partitioning of mesh-connected sys-tems is addressed by Chuang and Tzeng [2] who present ageneral rectangular partitioning. Li and Cheng [7] are us-ing 2D-buddy schemes, but their main interest is onscheduling, i.e. on the problem of how to schedule a givensequence of requests to achieve minimal total processingtime, which leads to a three-dimensional problem. Our point of view, however, is that of anoperating system that has no foreknowledge of futurerequests and has to make local allocation decisions basedon the current allocation situation and the current request. = identical processor nodes. Each node con-sists of a processor and a private memory. communicate with each other by sending messages overan interconnection network which forms a processor connection graphcation with non-adjacent processors, the messages have tobe forwarded along a shortest ('Manhattan') path.Programs are assumed to consist of a set of tasks of thesame length (i.e. execution time) which run concurrentlyinterchange messages. Generally, problem can be addressed at two levels, the and. While allocation at the task level leadsto the so-called mapping problem (which task is mappedto which processor), we consider the allocation at the which addresses the dynamic partitioning case where severalparallel programs share the machine in space. Each pro-gram obtains one or more disjoint partitions of the multi-computer. For the sake of efficient management, parti-tions are of Because in multicomputers the communication delay isa monotonically increasing function of distance, communicating tasks should be placed close to-gether to keep communication delays low. So it is usualand useful, yet not necessary, that a program resides inexactly one partition which is called tion. However, we will also consider al-location which can provide a better utilization of the ma-chine. With non-contiguous allocation, a program isdistributed across several partitions.Programs arrive at random points in time and request aprocessor subset of some size. We consider two types ofrequests: A request may be , i.e. it consists of a of some width and height, or it may be or, i.e. it consists of a of processors. In thelatter case, only the of the partition is specifiedleaving the requested rectangle malleable according to theavailability of free partitions. If a suitable partition is free,: Dynamic allocation and releaseoperations may leave some areas of network unused since their size is too small to fit an If we restrict the possible sizesof partitions to facilitate their management, we may beforced to allocate a larger partition than requested. Inthis case, some of the allocated processors may stay idle. External (light grey) and internal (grey) fragmen-tation. (If a new request of size 5 is arriving, it cannot beallocated, although there are enough processors available.The partition of size 4 is too small and must be consideredThus a processor management scheme has to deal withmeaning processors that are allocated, yet unused, and ex-ternal fragmentation meaning processor partitions that arefree, but cannot be allocated since they are too small.An operating system component responsible for pro-cessor management has to meet several goals. Firstly, it has to avoid fragmentation to maximizethe utilization of the resource. Secondly, even if we do notknow anything about the communication patterns of theparallel programs that occupy the partitions, we have toassume that arbitrary tasks of the program communicatewith each other. This means for the partitioning problemthat the diameter of the partitions should be small tominimize the communication delays. Thirdly, since allare processed at run-time, the management algorithms have to be fast. So eachprocessor management scheme is a compromise between„Minimization of fragmentation„Minimization of communication delays„Minimization of allocation overhead3 Fitted allocationIf a program requests a subset of the processors, it israther natural to allocate exactly what has been requested.So if the request is a rectangle (), we look for a suffi-ciently large free partition and customize it to the requestso that it fits exactly. This is called Consequently, there is no internal fragmentation.A usual way to manage memory partitions in operating. Each unit ofthe resource has an indicator saying whether the unit isfree or occupied. Applied to the management of a 2D-gridof processors, these indicators form a bit-matrix (0=free,1=occupied). If there is a request of size (w,h)find a submatrix of zeros with the corresponding Several variants of search procedures are known:Starting at one corner the rows and columnsof the matrix are scanned until a sufficiently large piece isfound. The required number of units is allocated bysetting the corresponding bits, and the address of the firstunit is returned. This approach leads to a denser allocationat the starting corner of the matrix making it The disadvantage of can be overcomeby starting a new search from the position where the lastpiece was found. The matrix is scanned cyclically.Instead of taking the first possible piece, theentire matrix is scanned to find the smallest piece largeenough to fit the request and therefore reduce external This strategy is useful if two or more piecesshould be close together. The request indicates a positionwhere it wishes the new allocation to take place. Startingat the desired position is applied in four directionsIf the size of the units is small compared to an averageallocation then it may be more efficient to describe each Sorting according to x- and y-coordinate of theallocated partitions, they consist of sets of contiguous but and free partitions (Figure 2).If we require the free partitions to be disjoint, we obtain arather simple management scheme that for allocationsimply selects a suitable partition according to one of the (a)(b)(c) Allocated partitions and free region (a) which we preserve the entire potentiality and Even if a request is unshaped, the processor manage-communication than a long and narrow one, since of all we need a partition (minwaw,h (or vice versa) with nofragmentation at all. But this would imply a very bad di-compactness and fragmentation. width wheight hh = w : h = 2s-w : h = 2s-w-1 w2'w1w2w1'Figure 5: Selecting a minimal compact rectangleThe hyperbola in Figure 5 shows pairs (width, height)having the same area. Due to the discrete nature of the re-source, allocations are possible only at the grid points. Allgrid points beyond, i.e. right above the hyperbola repre-sent sufficiently large rectangles. The line L2 with =, indicates rectangles with low internal which has a wssa 2 =+- Assume a scalar request exists and is at would have a value of , meaning a rectangle of size 14x6 which has thegeous in some cases, however, to trade the variety of pos- well-percube- and grid-architectures of parallel machines [7].All partitions are powers of 2 and are obtained by recur-able internal fragmentation resulting from rounding up to This scheme that starting with a square This Here we are interested only in thedistributed scalar requests, calculations yield 25% loss for (a)(b)(c) of partitions, we expect a considerably lowererwise not utilizable pieces. At the same time the com- to con-which only takes place if a contiguous allocation fails.ous allocation fails.If no such guidelines are given we have to cut accord-ing to the sizes of the available pieces. To limit the frag-mentation, we s houl d us e l ar ge part i t i ons, pr efe ra bly thosethat have one matching dimension with the request. Towhen selecting free partitions e.g. by using the . Non-contiguous allocation can be combined withboth fitted and non-fitted partitions. When applied exten-sively internal fragmentation can completely avoided. different programs produced similar results). A simulationimprove the resource utilization, two reasonable features contiguous allocation fitted partitions (disj.)860%24%76% fitted partitions (overlap.)830%22%78% quartering22460%11%29% independent bisection11837%7%56% area bisection15952%7%41% non-contiguous allocation fitted partitions690%5%95% independent bisection710%7%93% but ratherwere superior to the recursive partitioning schemes. Dueschemes with their compatible sizes are usually below10%. The freeness of internal fragmentation, however, isovercompensation of their worse external fragmentation.not significantly pay off yet. While fitted partitions maynot know the amount of communication delay that ac-them superior. Favorable conditions for non-contiguousation are those where communication plays only a of parallel programs. We simplifyinglyassumed that the processor demand of a program is con-and shrinking demands. Since contiguous partitions mayfavorable partitioning scheme for its significantly better[1]Chen,M.-S.; Shin, K.G.: [2]Chuang, P.-J.; Tzeng, N.-F.: [3]Ercal,F.; Ramanujam,J.; Sadayappan,P.: [4]Heiss, H.-U.: [5]Heiss, H.-U.: [6]Kim,J.; Das,C.R.; Lin,W.: [7]Li,K.; Cheng,K.-H.: