Chapter Selecting the median DoritDor Uri Zwick Abstract Impro ving a long standing result of Sc h onhage P aterson and Pipp enger e sho that the me dian of a set con taining elemen ts can b e found

Chapter Selecting the median DoritDor Uri Zwick Abstract Impro ving a long standing result of Sc h onhage P aterson and Pipp enger e sho that the me dian of a set con taining elemen ts can b e found - Description

In troduction The sele ction pr oblem is de ned as follo ws giv en set con taining distinct elemen ts dra wn from totally ordered domain and giv en a n um b er 1 nd the th or der statistic of ie the elemen t of larger than exactly 1 elemen ts of an ID: 25673 Download Pdf

153K - views

Chapter Selecting the median DoritDor Uri Zwick Abstract Impro ving a long standing result of Sc h onhage P aterson and Pipp enger e sho that the me dian of a set con taining elemen ts can b e found

In troduction The sele ction pr oblem is de ned as follo ws giv en set con taining distinct elemen ts dra wn from totally ordered domain and giv en a n um b er 1 nd the th or der statistic of ie the elemen t of larger than exactly 1 elemen ts of an

Similar presentations


Download Pdf

Chapter Selecting the median DoritDor Uri Zwick Abstract Impro ving a long standing result of Sc h onhage P aterson and Pipp enger e sho that the me dian of a set con taining elemen ts can b e found




Download Pdf - The PPT/PDF document "Chapter Selecting the median DoritDor Ur..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.



Presentation on theme: "Chapter Selecting the median DoritDor Uri Zwick Abstract Impro ving a long standing result of Sc h onhage P aterson and Pipp enger e sho that the me dian of a set con taining elemen ts can b e found"— Presentation transcript:


Page 1
Chapter Selecting the median DoritDor Uri Zwick Abstract Impro ving a long standing result of Sc h onhage, P aterson and Pipp enger e sho that the me dian of a set con taining elemen ts can b e found using at most 2 95 comparisons. In troduction The sele ction pr oblem is de ned as follo ws: giv en set con taining distinct elemen ts dra wn from totally ordered domain, and giv en a n um b er 1 nd the -th or der statistic of , i.e., the elemen t of larger than exactly 1 elemen ts of and smaller than the other elemen ts of The me dian of is the n= -th order statistic of The

selection problem is one of the most fundamen- tal problems of computer science and it has b een ex- tensiv ely studied. Selection is used as a building blo c in the solution of other fundamen tal problems suc as sorting and nding con ex h ulls. It is somewhat surpris- ing therefore that only in the early 70's it w as sho wn, Blum, Flo yd, Pratt, Riv est and arjan [BFP 73], that the selection problem can b e solv ed in ) time. As ( time is clearly needed to solv the selection problem, the w ork of Blum et al. completely solv es the problem. Or do es it? A v ery natural setting for the

selection problem is the omp arison mo del An algorithm in this mo del can access the input elemen ts only b y p erforming pairwise comparisons b et een them. The algorithm is only harged for these comparisons. The comparison mo del is one of the few mo dels in whic exact complexit results ma b e obtained. What is then the exact comparison complexit y of nding the median? The comparison complexit of man comparison problems is exactly kno wn. It is clear, for example, that exactly 1 comparisons are needed, in the w orst case, to nd the maxim um or minim um of elemen ts. Exactly log comparisons

are needed to nd the second largest (or second smallest) elemen (Sc hreier [Sc h32], Kislitsyn [Kis64 ]). Exactly n= e Departmen tof ComputerScience,Sc hoolof Mathematical Sciences,Ra ymondandBev erlySac klerF acult yofExactSciences, elAvivUniv ersit ,T elAviv69978,ISRAEL.E-mailaddresses ddorit,zwic @mat h.t au. ac.i l. comparisons are needed to nd b oth the maxim um and the minim um of elemen ts (P ohl [P oh72 ]). Exactly 1 comparisons are needed to merge t o sorted lists eac h of length (Sto c kmey er and Y ao [SY80 ]). Finally log ) comparisons are needed to sort elemen ts (e.g., F ord and

Johnson [FJ59 ]). A relativ ely large gap, considering the fundamen tal nature of the problem, still remains ho ev er b et een the kno wn lo er and upp er b ounds on the exact com- plexit y of nding the median. After presen ting a basic sc heme whic an selection algorithm can b e obtained, Blum et al. [BFP 73 try to optimize their algorithm and presen selection algorithm that p er- forms at most 43 comparisons. They also obtain the rst non-trivial lo er b ound and sho that comparisons are required, in the w orst case, to nd the median. The result of Blum et al. is subsequen tly im- pro ed b y

Sc h onhage, P aterson and Pipp enger [SPP76 who presen t a b eautiful algorithm for the selection of the median, or an y other elemen t, using at most 3 comparisons. In this w ork w e impro e the long stand- ing result of Sc h onhage et al. and presen selection algorithm that uses at most 2 95 comparisons. Ben and John [BJ85] (see also John [Joh88 ]), im- pro ving previous results of Kirkpatric k [Kir81 ], Munro and oblete [MP82] and ussenegger and Gab o [F G78 ], obtained a (1 + )) ) lo er b ound on the um b er of comparisons needed to select the n -th elemen t of a set of elemen ts, where

) = log (1 ) log is the binary en trop function (all log- arithms in this pap er are tak en to base 2). ha sho wn recen tly [DZ95 ] (using somewhat di eren t meth- o ds from the ones used here) that the n -th elemen t can b e selected using at most (1 + log log log )) comparisons. This for small v alues of is almost opti- mal. The b ound of Ben and John giv es in particular a 2 ) lo er b ound on the n um b er of comparisons needed to nd the median. Our w ork sligh tly narro ws the gap b et een the b est kno wn lo er and upp er b ounds on the comparison com- plexit of the median problem. Though

our impro e- men t is quite mo dest, man y new ideas w ere required to obtain it. These new ideas shed some more ligh t on the in tricacy of the median nding problem.
Page 2
Dorand Zwick Algorithms for selecting the -th elemen t for small alues of ere obtained b y Hadian and Sob el [HS69 ], Hy a l [Hy a76 ], ap [Y ap76 ], Ramanan and Hy a l [RH84 ], Aigner [Aig82 ] and Eusterbro c k [Eus93 ]. All the results men tioned so far deal with the um b er of comparisons needed in the worst c ase Flo yd and Riv est [FR75 sho ed that the -th elemen can b e found using an exp cte um b er of

comparisons. Cun to and Munro [CM89 ] had sho wn that the b ound of Flo yd and Riv est is tigh t. The cen tral idea used y Sc h onhage et al. in their median algorithm is the idea of factories Sc h onhage et al. use factories for the mass pro duction of certain partial orders at uc reduced cost. obtain our results extend the notion of factories. e in tro duce gr en factories and p erform an amortize analysis of their pro duction costs. e obtain impro ed green factories using whic h w e can impro e the 3 result of Sc h onhage, P aterson and Pipp enger. The p erformance of a green factory is

mainly c har- acterized b y t o parameters and (the upp er and lower elemen costs). Using green factory with pa- rameters and obtain an algorithm for the selection of the n -th elemen using at most (1 )) ) comparisons. o select the median, e use a factory with ; A 95. Actually , there is a tradeo b et een the lo er and upp er costs of a factory or ev ery 2 w ma y c ho ose factory that minim zes (1 ). e can select the n= 4-th elemen t, for example, using at most 2 69 comparisons, y using a factory with 4 and 25. In this pap er, w e concen trate on factories for median selection. It is easy to v

erify that the algorithm describ ed here, as the median nding algorithms of b oth Blum et al. and Sc h onhage et al. , can b e implemen ted in linear time in the RAM mo del. In the next section describ e in more detail the concept of factory pro duction and in tro duce our notion of gr en factories. also state the prop erties of the impro ed factories that obtain. In Section explain the in whic green factories are used to obtain ecien selection algorithms. The selection algorithm w e describ e is a generalization of the median algorithm of Sc h onhage et al. [SPP76 and is similar to the

selection algorithm describ e in [DZ95 ]. In the subsequen t sections w e try to demonstrate the main ideas used in the construction of our new green factories. Due to lac k of space, man y of the details are omitted. actoryproduction Denote partial order comp osed of entr elemen t, elemen ts larger than the cen tre and elemen ts smaller than the cen tre (see Fig. 1). An }| {z Figure 1: The partial order is sometimes referred to as spider Sc h onhage et al. [SPP76 sho that pro ducing disjoin copies of usually requires few er comparisons than times the um b er of comparisons required to pro

duce a single The b est w , prior to this w ork, of pro ducing a single for example, requires ab out comparisons ( nd the median of 2 + 1 elemen ts using the 3 ) median algorithm). The cost p er cop y can b e cut b y almost a half if the 's are mass pro duced using factories. factory for partial order is comparison algorithm with con tin ual input and output streams. The input stream of a simple factory consists of single elemen ts. When enough elemen ts are fed in to the factory , a new disjoin t cop y of is pro duced. A factory is c haracterized y the follo wing quan tities: the initial ost

whic is the um b er of comparisons needed to initialize the factory; the unit ost whic is the um b er of comparisons needed to generate eac cop of ; and nally the pr duction r esidue , whic h is the maximal n um b er of elemen ts that can remain in the factory when lac k of inputs stops pro duction. or ev ery 0, the cost of generating disjoin copies of is at most Sc h onhage et al. [SPP76] construct factories with the follo wing c haracteristics: Theorem 2.1. Ther is factory for with initial c ost unit ost and pr duction esidue satisfying: k ; The notation here means that ). Sc h onhage et

al. also sho w that if there ex- ist factories , for 's, satisfying Ak , for some A > 0, and ; R ), then the median of ele- men ts can b e found using at most An ) compar- isons. The ab o e theorem imm ediately implies therefore the existence of a 3 ) median algorithm. The w y factories are used b y selection algorithms is describ ed in the next section. or no w w e just men- tion that most 's generated factory emplo ed y a selection algorithm are ev en tually brok en, with ei- ther their upp er elemen ts eliminated and their lo er elemen ts returned to the factory or vice ersa. While

constructing an factory ma y ha compared el- emen ts that turned out to b e on the same side of the cen tre. If suc h elemen ts are ev er returned to the factory the kno wn relations among them ma y sa e the factory
Page 3
SELECTINGTHEMEDIAN some of the comparisons it has to p erform. o capture this, extend the de nition of factories and de ne gr en factories (factories that supp ort the recycling of kno wn relations). This extension is implicit in the w ork of Sc h onhage et al. [SPP76]. Making this notion explicit simpli es the analysis of our factories. The median algorithm of Sc

h onhage et al. is in fact obtained y replacing the factory of Theorem 2.1 b y a simple green factory A green factory for 's is mainly c haracterized b the follo wing t o quan tities: the lower element c ost and the upp er element c ost Using these quan tities, the amortize pro duction costs of the factory can b e calculated as follo ws: The amortized pro duction cost of an whose upp er elemen ts are ev en tually returned (together) to the factory is The amortized pro duction cost of an whose lo er elemen ts are ev en tually returned (together) to the factory is The amortized pro duction cost

of an suc h that none of its elemen ts is returned to the factory is Note that in this accoun ting sc heme w e attribute all the pro duction cost to elemen ts that are not returned to the factory . The initial cost and the pro duction residue of a green factory are de ned as b efore. somewhat di eren de nition of green factories as giv en us in [DZ95 ]. The new de nition uses amortized costs p er element whereas our old de nition used amortized costs p er partial order. green factory do es not kno in adv ance whether the lo er or upp er part of a generated will b e recycled. This is set an adv

ersary Though not stated explicitly the follo wing result is implicit in [SPP76]. Theorem 2.2. Ther e is a gr en factory for with lower and upp er element osts ; u initial ost and pr duction r esidue The notation ; u 3 here means that ; u 3 + (1) where the (1) is with resp ect to shall see in the next section that green factory for with lo er and upp er elemen costs and yields (1) median algorithm. impro e the algorithm of Sc h onhage et al. it is enough therefore to construct an factory with 3. Unfortunately are not able to construct suc h a factory Ho ev er, w e are able to reduce the upp

er and lo er elemen costs if allo ariation among the partial orders generated y the factory Let 00 k ; 00 e construct impro ed green factories that generate partial orders that are mem b ers of These factories can b e easily incorp orated in to the selection algorithm describ ed in the next section. o obtain our 2 95 median algorithm w e use green factories with the follo wing c haracteristics: Eliminated Eliminated Figure 2: The ordered list of 's. Theorem 2.3. Ther e is a gr en factory for with ; u 942 The main ideas used to construct the factories are describ ed in Section 5.

Selectionalgorithms In this section w e describ e our selection algorithm. This algorithm uses an factory The complexit y of the al- gorithm is completely determined b y the c haracteristics of the factory used. This algorithm is a generalization of the median algorithm of Sc h onhage et al. and a v ari- ation of the selection algorithm w e describ e in [DZ95 ]. Theorem 3.1. et et an factory with lower element ost upp er element ost initial ost and pr duction esidue Then, the n -th smal lest element, among elements, an sele cte using at most + (1 omp arisons. Pr of. refer to the n -th

smallest elemen among the input elemen ts as the p ercen tile elemen t. The algorithm uses the factory where The input elemen ts are fed in to this factory , as sin- gletons, and the pro duction of partial orders commences. The cen tres of the generated 's are in- serted, using binary insertion, in to an ordered list , as sho wn in Fig. 2. When the list is long enough either kno w, as e shall so on sho w, that the cen tre of the upp er (i.e., last) in and the elemen ts ab o e it are to o large to b e the p ercen tile elemen t, or that the cen tre of the lo er (i.e., rst) and the elemen ts b

elo it are to o small to b e the p ercen tile elemen t. Elemen ts to o large or to o small to b e the p ercen tile elemen t are eliminated. The lo er elemen ts of the upp er and the upp er elemen ts of the lo er are returned to the factory for recycling. Let b e the curren length of the list and let b e the n um b er of elemen ts curren tly in the factory . The um b er of elemen ts that ha e not y et b een eliminated is therefore = ( Let b e the rank of the p ercen tile elemen t among the non-eliminated elemen ts.
Page 4
Dorand Zwick Initially and n The um b er of elemen ts in the

list kno wn to b e smaller or equal to the cen tre of the upp er of the list is ( The um b er of elemen ts kno wn to b e greater or equal to the cen tre of the lo est of the list is = ( Note that as the cen tres of all the 's in the list satisfy b oth these criteria, the elemen ts are curren tly in the factory satisfy neither, and all the other non-eliminated elemen ts satisfy exactly one of these criteria. The algorithm consists of the follo wing in tercon- nected pro cesses: (i) Whenev er sucien tly man y elemen ts are supplied to the factory new partial order is pro duced and its cen tre

is inserted in to the list using binary insertion. (ii) Whenev er > i the cen tre of the upp er partial order in the list and the elemen ts ab o it are eliminated, as they are to o big to b e the p ercen tile elemen t. The lo er elemen ts of are recycled. (iii) Whenev er > N + 1, the cen tre of the lo est partial order in the list and the elemen ts b elo w it are eliminated, as they are to o small to b e the p ercen tile elemen t. The upp er elemen ts of are recycled. The v alue of is up dated accordingly i.e., is decremen ted b y the n um b er of elemen ts in the lo er part of (including the

cen tre). If (ii) and (iii) are not applicable then and + 1. Th us + 1 and If (i) is not applicable then y the factory de nition w e ha When no one of (i),(ii) and (iii) can b e applied w e get that ). t this stage ), whic h is ), and the -th elemen among the surviving elemen ts is found using an y linear selection algorithm. e no w analyze the comparison complexit y of the algorithm. Whenev er (ii) is p erformed, the upp er partial order of the list is brok en. Its cen tre and upp er elemen ts are eliminated and its lo er elemen ts are returned to the factory The amortized pro duction cost of

the partial order is at most comparisons p er eac h elemen t ab o e the cen tre. Whenev er (iii) is p erformed, the lo est partial order of the list is brok en. Its cen tre and lo er elemen ts are eliminated and its upp er elemen ts are returned to the factory The amortized pro duction cost of the partial order is at most comparisons p er eac h elemen t b elo w the cen tre. The algorithm can eliminate at most (1 elemen ts larger than the p ercen tile elemen t and at most n elemen ts smaller than the p ercen tile elemen t. The total pro duction cost of all partial orders that are ev en tually

brok en is therefore at most ( A + (1 ). A t most ) generated partial orders are not brok en. Their total pro duction cost is ). The initial pro duction cost is ). The total n um b er of comparisons p erformed b y the factory is therefore ( A + (1 ). Let b e the nal length of the list (when none of (i),(ii) and (iii) is applicable). The total n um b er of partial orders generated is at most n=k , as at least elemen ts are eliminated whenev er partial order is remo ed from The total cost of the binary insertions in to the list is at most (( n=k log ) = (( n=k ) log ) whic h is ). The total n um

b er of comparisons p erformed b y the algorithm is therefore at most ( A + (1 ), as required. Using the factories of Theorem 2.3, w obtain our main result: Theorem 3.2. ny element, among elements, an b e sele cte d using at most 942 omp arisons. Basicprinciplesoffactorydesign In this section giv some of the basic principles used to construct ecien factories. The section is divided in to three subsections. In the rst subsection w remind the reader what hyp erp airs are and what their pruning ost is. In the second subsection describ e the notion of gr afting In the third subsection w e sk etc

the construction of the factories of Sc h onhage et al. [SPP76]. These factories are describ ed as an example for a simple factory design. Before going in to details, describ e clev er accoun ting principle in tro duced b y Sc h onhage et al. to simplify the complexit y analysis. The information w care to remem b er on the elemen ts that pass through the factory can alw ys b e describ ed using a Hasse diagram. Eac h comparison made b y the algorithm adds an edge to the diagram and p ossibly deletes some edges. some stages w e ma y decide to `forget' the result of some comparisons and the

edges that corresp ond to them are remo ed from the diagram. Sc h onhage et al. noticed that instead of coun ting the um b er of comparisons made, w e can coun t the um b er of edges cut! o this should add the um b er of edges in the eliminated parts of the partial orders as ell as the edges that remain in the factory when the pro duction stops. The second um b er, in our factories, is at most a constan times the pro duction residue of the factory and it can b e attributed to the initial cost. 4.1 Hyperpairs factory usually starts the pro- duction of a partial order from y pro ducing a large

partial order, a hyp erp air , that con tains a partial order from
Page 5
SELECTINGTHEMEDIAN Figure 3: Some small 's ( 01 0110 and 011010 ). Definition4.1. n hyp erp air , wher is a bi- nary string, is a nite p artial or der with a distinguishe element, the cen tre de ne cursively by (i) is single element ( her e stands for the empty string). (ii) is obtaine fr om two disjoint 's by omp aring their entr es and taking the higher as the new entr e. is obtaine d in the same way but taking the lower of the two c entr es as the new c entr e. The Hasse diagrams of some small h yp erpairs

are sho wn in Fig. 3. Some basic prop erties of yp erpairs are giv en in the follo wing Lemma. Lemma 4.1. et e the c entr e of a hyp erp air et the pr e x of of length et the numb er of 's in and the numb er of 's in Then: (i) The c entr to gether with the elements gr ater than it form a with c entr The elements gr ater than form a disjoint set of hyp erp airs ... The entr to gether with the elements smal ler than it form a with c entr The elements smal ler than form a disjoint set of hyp erp airs ... (ii) The hyp erp air an arse into its entr and disjoint set jg of smal ler hyp erp airs. Mor

over, the entr of is ab ove if +1 ends with 0, and elow if +1 ends with 1. The Lemma can b e easily pro ed induction. Note, in particular, that if and then con tains an No edges are cut during the construction of yp erpairs. But, b efore outputting an con tained in a yp erpair, all the edges connecting the elemen ts of this with elemen ts not con tained in this ha e to b e cut. This rather costly op eration is referred to as pruning The downwar d pruning c ost ) of a h yp erpair with cen tre is the um b er of edges that connect elemen ts of that are b elo w the cen tre with the other elemen ts

of (excluding ). The upwar d pruning c ost ) of a h yp erpair is de ned analogously Usually , esp ecially if a grafting pro cess is applied, do not an to prune all the elemen ts ab o or b elo the cen tre of yp erpair It is then more con enien t to consider the amortize p er elemen pruning costs. Let and b e the n um b er of 0's and 1's in and let de ne pr and pr ) = to b e the lower element pruning ost and the upp er element pruning ost of It can b e easily sho wn that the cost of pruning elemen ts b elo is at most pr ) + and the cost of pruning elemen ts ab o is at most pr ) + The terms are

usually negligible. Note that is the um b er of edges connected to the cen tre of When an edge connected to is cut, a yp erpair where is pre x of is obtained. This h yp erpair can then b e used in the construction of the next The follo wing Lemma is easily pro ed. Lemma 4.2. pr ) = 0 pr ) = 0 ii pr (0 ) = pr ) + 1 pr (1 ) = pr iii pr (0 ) = pr pr (1 ) = pr ) + 1 pro duce partial orders from for larger and larger v alues of , w e ha e to construct larger and larger yp erpairs. When w e design a family fF =1 of facto- ries, w e usually c ho ose an in nite binary string In eac h mem b er of this

family w e construct a h yp erpair whose sequence is long enough pre x of Let b e the nite pre x of of length The lo er and upp er elemen t pruning costs of an in nite sequence are de ned as the limits pr ) = lim !1 pr ) and pr ) = lim !1 pr ). These limits do exist for the hosen in nite strings. Sc h onhage et al. base their factories on the in nite string = 01(10) for whic h, as can b e easily v eri ed, pr ) = pr ) = 1 5. In our factories, w e also need yp erpairs with c heap er lo er elemen t pruning cost and, alas, more exp ensiv upp er elemen pruning cost, or vice ersa. The follo wing

Theorem presen ts a tradeo b et een the upp er and lo er elemen t pruning costs. Its pro of is omitted due to lac k of space. Theorem 4.1. or any two numb ers a; b such that ther exists binary se quenc W2f 01 10 for which pr ) = and pr ) = are already in p osition to describ e simple but complete factory Select a string Construct yp erpair that con tains the partial order where is a long enough pre x of Prune elemen ts ab o e and elemen ts b elo w the cen tre of this These +1 elemen ts form a cop y of By Lemma 4.1(ii), the remaining elemen ts of form a disjoin t collection of partial orders of

the form , where is a pre x of These partial orders are used to construct a new cop of that will b e used to construct the next Before e output an , w e cut the 2 edges it con tains. When some part of an generated b y the factory is recycled, the elemen ts returned to the factory (as singletons) are used again for the construction of h yp erpairs. It is easy to c hec k that the lo er and upp er elemen t costs of this
Page 6
Dorand Zwick simple factory are b oth ; u pr ) + pr ) + 2. or an W2f 01 10 e get that the lo er and upp er elemen t costs are ; u 5. 4.2 Grafting The costs of the

simple factories de- scrib ed ab o e can b e signi can tly impro ed using gr aft- ing e can c heaply nd elemen ts that are smaller than the cen tre, or elemen ts that are larger than the cen tre (but not b oth usually). The pro cess of nding suc h ele- men ts is called gr afting Pruning is then used to obtain elemen ts on the opp osite side. e demonstrate this notion using a simple example, the grafting of singletons. ak e an elemen , not con- tained in the h yp erpair, and compare it to the cen tre of the h yp erpair. Con tin ue is this w , comparing new elemen ts to the cen tre, un til

either elemen ts ab o e the cen tre, or elemen ts b elo w the cen tre are found. Note that no edges are cut in this pro cess. All the grafted elemen ts are put in the output partial order. The prun- ing pro cess is then used to complete the partial order in to an Adding this pro cess to our simple factory for , the upp er and lo er elemen t costs are reduced to: ; u max pr ; pr + 2 (note that no e ha e to prune elemen ts from at most one side). Th us ; u 5 if w e tak = 01(10) or = 10(01) This supplies pro of to Theorem 2.1. Note that at least one side of eac h generated is comp osed of sin-

gletons, and if this side is recycled, no comparisons can b e reused. 4.3 The factories of Sc h onhage, aterson and Pippenger e no w sk etc h the op eration of the green factories obtained Sc h onhage et al. [SPP76]. These factories impro e up on the simple factories de- scrib ed ab o grafting and recycling pairs. The factory starts b y pro ducing h yp erpairs corresp ond- ing to pre xes of the string 01(10) (the string 10(01) could b e used instead). Let b e the pre x of of length or brevit y w let Some small 's w ere sho wn in Fig. 3. By Lemma 4.1, an , where log( + 1) con tains an After

constructing an , the factory initiates the follo wing pair grafting pro cess: Let x < y b e a pair of elemen ts and let and b e o coun ters initially set to zero. Let denote the cen tre of the h yp erpair. If > p compare and If then compare also and If then increase y one. On the other hand, if then compare and If x < c then compare also and Finally , if x < y < c then increase y one. As in the simple factory describ ed in the previous subsection, the grafting con tin ues un til elemen ts are found ab o or b elo the cen tre and then pruning pro cess is used to complete the generation of an

The elemen ts ab o the cen tre of the generated form collection of disjoin 's and the elemen ts b elo the cen tre form collection of disjoin 's. When the lo er or upp er part of an is returned to the factory some of the existing relations among the elemen ts returned are utilized. The amortized analysis of the green factory encompasses a trade-o b et een the cost of generating an and the utilit obtained from its lo er or upp er parts when these parts are recycled. Although the 's generated the factory of Sc h onhage et al. ma y con tain 's and 's, where i > 1, their factory is only capable of

utilizing pairwise disjoin t relations among the elemen ts returned to it (as the grafting pro cess uses pairs). If a or a , with i > 1, is returned to the factory , it is imm ediately brok en in to 2 's or 's. Note that b oth and simply stand for pair of elemen ts. It can b e hec ed, see [SPP76], that the upp er and lo er elemen t costs of this factory are ; u 3. This is Sc h onhage et al. 's b est result. Adv ancedprinciplesoffactorydesign In this section, outline the principles used to con- struct our impro ed factories that yield the 95 me- dian algorithm. The rst of these principles w as

already men tioned. Allo wing v ariations in the pro duced partial orders. Our factories construct partial orders from . The exact prop ortion b et een the um b er of elemen ts b elo w and ab o e the cen tre of a generated partial order is not xed in adv ance. Recycling larger relations. The factories of Sc h onhage et al. are only capable of recycling pairs (i.e., 's and 's). Our factories recycle larger constructs suc as quartets 00 's and 11 's), o ctets 000 's and 111 's), 16-tuples 0000 's and 1111 's) as ell as pairs, singletons and other structures whic h are not h yp erpairs. The

non-h yp erpair constructs are obtained b y the more sophisticated grafting pro cesses used. Constructing h yp er-pro ducts. As men tioned, our factories ma receiv partial orders that could not b e used for the construction of yp erpairs. These partial orders are used instead for the construction of hyp er-pr ducts yp er- pro duct , where is some partial order with distinguished elemen whic is again called cen tre, is a h yp erpair that eac h of its elemen ts is also the cen tre of a disjoin Hyp erpairs are of course sp ecial cases of h yp er-pro ducts as and
Page 7

SELECTINGTHEMEDIAN Grafting larger relations and mass-grafting. The factories of Sc h onhage et al. use simple pair grafting pro cess. use more complicated grafting pro cesses, ev en if only pairs are in olv ed. or eac h input construct w e ha e di eren t grafting pro cesses. Some of our grafting pro cesses use the tec hnique of mass pro duction. Using sub-factories. The factories of Sc h onhage et al. generate only single family of yp erpairs (corresp onding to = 01(10) ). Our factories generate sev eral t yp es of yp erpairs and yp er-pro ducts, as men tioned ab o e. The construction of

eac h one of these h yp er- pro ducts is carried out in a separate sub-pro duction unit that w e refer to as a sub-factory Di eren t sub- factories also di er in the `ra w-materials' that they can pro cess. Using credits in the amortized complexit y analysis. The last principle is an accoun ting principle. The di eren constructs recycled our factories are of di eren `qualit y'. Some of them can b e used ery ecien tly for the construction of partial or- ders from Others are not so appropriate for this pro cess and using them as ra materials for the construction of partial orders from results

in a m uc h higher pro duction cost. o equalize these costs, eac construct used our factories is as- signed a credit (or debit if negativ e). Unfortunately do not ha enough space in this extended abstract for a full description of our factories. In the next section, describ e a factory that can b e used to obtain 97 median algorithm. This is greatly simpli ed v ersion of our b est factory that yields the 2 95 median algorithm. actoriesformedianselection The construction of the factory satisfying the con- ditions of Theorem 2.3 is extremely in olv ed. o k eep this section relativ ely short, w e

describ e here a simpli- ed ersion of the factory This factory yields the follo wing result whic h is only sligh tly w eak er then Theorem 2.3: Theorem 6.1. Ther e is a gr en factory for with ; u 9677 As as the case with all the other factories considered, the unit cost of this factory is (1) and the initial cost and pro duction residues are ). The main di erences b et een and are the follo wing: utilizes only singletons, pairs and quartets for grafting. Therefore, do es not use credits or un balanced h yp erpairs. Moreo er, is able to recycle only a fraction of at most 638 of the elemen ts in

quartets. If the prop ortion of elemen ts in quartets in the recycled side is larger than then some of these quartets ha e to b e brok en in to pairs. This section is divided in to four subsections. In the rst subsection giv preliminary description of the factory In the second and the third subsections describ e the pairs and quartets grafting pro cesses. Finally , in the last subsection w e giv e a full description of the factory 6.1 Prelimin arie The factory recycles, and therefore receiv es as inputs, singletons, pairs ( 's and 's) and quartets 01 's and 10 's). Singletons are immediately

joined in to pairs. The factory emplo ys four pro cesses: yp erpair generation, pair grafting, quartet grafting, and pruning. The factory emplo ys t o sub-factories that generate balanced yp erpairs. The rst constructs yp erpairs according to the sequence 01(10) The second constructs yp erpairs according to the sequence 10(01) Input 01 's are passed to the rst sub-factory (as 01 's can b e used for the construction of h yp erpairs that corresp ond to 01(10) while input 10 's are passed to the second factory (as 10 's can b e used for the construction of yp erpairs that corresp ond to = 10(01)

). Input pairs are spread b et een the t sub-factories according to demand. use the accoun ting sc heme describ ed in Sec- tion 4 to simplify the complexit y analysis. Hence, the cost of an op eration is the um b er of edges it cuts. When no am biguit y o ccurs, w e let the upp er c ost lower ost ) of an op eration b e the cost of the op eration when the upp er part (or lo er part) of its result is eliminated. Note that upp er and lo er costs are calculated for whole structures, whereas the upp er and lo er element costs are calculated p er eliminated elemen t. The factory is not capable of

recycling elemen ts in structures larger than quartets. An (or ), where i > 2, has to b e cut therefore in to a collection of disjoin 00 's ( 11 's). The price of this op eration is 1 edge p er elemen t. The factory requires some of the elemen ts it receiv es to b e organized in pairs (to b e used for pair grafting). Therefore, some of the quartets that are to b e recycled ma y ha e to b e cut. The exact prop ortion of quartets that w ould ha e to b e cut is not kno wn in adv ance. Whic h partial order should b e c harged for the cutting of these edges? The one b eing recycled or the one b

eing constructed? The answ er is that the cost should b e split b et een these t o. The optimal c harging sc heme, in the case of , turns out to b e the follo wing: When an is recycled, mak sure that at
Page 8
Dorand Zwick most a fraction 638 of the recycled elemen ts are organized in quartets. If more elemen ts are organized in quartets then some of the quartets are cut and this is c harged to the partial order b eing recycled. If during of the construction of partial order more quartets ha to b e cut, the cost of these additional cuts is c harged to the partial order b eing

constructed. In some cases the factory runs out of quartets. It then tak es pairs and turns them in to quartets. No cost is asso ciated with this op eration as no edges are cut. The general approac tak en the factory is the follo wing. If enough elemen ts are supplied to the factory then in at least one the t o sub-factories, a large enough yp erpair can b e built. Additional relations arriving at the factory are then either used for grafting in the rst sub-factory or used for the construction of a large enough yp erpair also in the second sub-factory Whenev er a large enough h yp erpair is

formed, a quartet grafting pro cess is applied on it, then pair grafting pro cess is applied on it. Eac h one of the these grafting pro cesses has a collection of p ossible outcomes. In some outcomes elemen ts with lo w upp er elemen t cost but high lo er elemen cost are obtained. In other outcomes elemen ts with high upp er elemen cost but lo lo er elemen t cost are obtained. Some of these outcomes can b e com bined with some pruned elemen ts in to tuple with lo enough upp er and lo er elemen costs. sho w that if there are no suc h outcomes (whic h can b e com bined with pruned elemen ts) w e

can alw ys com bine outcomes from the preceding cases so that tuples with lo enough upp er and lo er elemen costs are obtained. In general, the upp er (or lo er) elemen cost of eac case is the sum of the upp er (or lo er) costs of the grafting pro cesses, the pruning cost, the cost of cutting quartets in to pairs (if necessary) and the cost of the elimination itself. The elimination cost of eac elemen is alw ys single edge as the output of the grafting precesses is alw ys partial order whic do es not con tain undirected cycles (undirected cycles, if obtained, are brok en). The last remark

regards our optimization sc heme. certain p oin in the algorithm, decide up on the optimal um b er of elemen ts, from eac category that are to b e added to the output partial order. The optimal um b er of elemen ts from eac category ma b e non-in tegral. The sum of the optimal n um b ers, of eac category is rounded to the nearest in teger alue (whic will b e the actual um b er of elemen ts, from this category , in the output partial order). The factory main tains a coun ter for eac category and mak es sure that the n um b er of grafted elemen ts will not di er from this coun ter b y more than

a constan t v alue. 6.2 Graftingpairs In this subsection describ e our pair grafting pro cess, whic is considerably more complicated than the pair grafting pro cess used Sc h onhage et al. [SPP76 ]. Our pro cess uses mass pro duction sc heme to construct a sequence of dominate hyp erp airs The pro cess receiv es parameters: direction bit and the cen tre of the output partial order These parameters are set b y the factory when initiating this pro cess. The grafting recursiv ely builds h yp erpairs whic are dominate the cen tre of the output partial order. A dominated h yp erpair of direction

and lev el is a h yp erpair with cen tre suc that eac elemen of except for is kno wn to b e larger (if 0) or smaller (if 1) than The relation b et een and is usually not determined. The pair grafting pro cess is comp osed of rounds. The -th round receiv es dominated h yp erpairs and (with cen tres and , resp ectiv ely) of lev el and attempts to construct dominated yp erpair of lev el + 1. A t rst a h yp erpair +1 is constructed comparing and Assume, without loss of generalit , that is the cen tre of the new h yp erpair Then, compare with The t o p ossible outcomes, when = 1, are: (1) < c and

is a dominated h yp erpair of lev el + 1. (2) > c and is not dominated b (as > c > c ). Let (1) and (2) b e the corresp onding cases for = 0, i.e., > c and < c , resp ectiv ely If (2) or (2) o ccur, the pro cess is stopp ed. or the purp oses of (and also for those of ), also stop the pro cess when dominated h yp erpair 000 or 111 of lev el 3 is generated. When 0, the yp erpairs and are just singletons and The `cen tres' and are compared and pair is obtained. The pair grafting pro cess receiv es its elemen ts as pairs. It therefore starts with the second comparison of the 0-th round. The o w of

the pair grafting pro cess, for = 1, is sho wn in Fig. 4. The four p ossible outcomes of the pair grafting pro cess with are denoted , and The four p ossible outcomes of this pro cess with 0, whic denote and are symmetric. It is therefore enough to consider the upp er and lo er costs of , and Due to lac k of space, omit the detailed cost analysis. The costs incurred are summerized in able 1. All these costs exclude the elimination cost whic is for eac eliminated elemen t. 6.3 Grafting quartets 01 that con tains the four elemen ts u; v ; w ; z , where u < v and u < w < z , is grafted using the

follo wing simple algorithm:
Page 9
SELECTINGTHEMEDIAN lev el3 lev el2 lev el1 Figure 4: Flo w of pair grafting when = 1. Upperparteliminated Lo erparteliminated Class Cost Num berof Cost Num berof elemen ts elemen ts 10 able 1: Costs of pair grafting 1. Compare and 2. If then remo the edge u; w and return the pair ( u; v ) to the input queue. 3. If w < c then compare with eac h of and The v p ossible outcomes of this pro cess are sho wn in Fig. 5. Note that the fourth partial order obtained (denoted is sp ecial case of the third partial order obtained (denoted ). It is not necessary

therefore to consider the fourth outcome and are left with four cases that denote and The grafting pro cess emplo ed for 10 's is symmetric. The quartet grafting pro cess con tin ues un til three quartets from the same category are obtained. Oncemore, due to lac of space, omit the detailed costs analysis. The costs, for 3, are summerized in T able 2. 6.4 The factoryalgorithm As men tioned b efore, the factory is comp osed of sub-factories. The rst uses the string = 01(10) while the second one uses the string = 10(01) e describ e the op eration of the rst sub-factory (whose inputs are 01 's,

pairs and singletons). The other sub-factory w orks in a symmetric . The op eration of the rst sub-factory is comp osed of the follo wing steps: (1) Generate a h yp erpair , where = 01(10) and log , and let b e its cen tre. The cen tre will b e the cen tre of the generated partial order. (2) The follo wing steps are applied un til elemen ts ab o or elemen ts b elo are placed in the Figure 5: ossible outcomes of 01 grafting. Upperparteliminated Lo erparteliminated Class Cost Num berof Cost Num berof elemen ts elemen ts = able 2: Costs of 01 grafting. output partial order. (2a) Apply the quartet

grafting pro cess un til three tu- ples from one of the categories or are ailable. Elemen ts from category are imme- diately placed in the output partial order and the grafting con tin ues. (2b) If three tuples from or from are ailable, apply the pair grafting pro cess with 0, un til either , or is obtained. Elemen ts found in category are immediately placed in the nal partial order. (2c) If three tuples from are a ailable, apply the pair grafting pro cess with un til either or is obtained. Elemen ts found in category are imm ediately placed in the nal partial order. (2d) The tuple obtained

using the pair grafting, and elemen ts from tuples obtained using quartet graft- ing, are placed in the output partial order. There are nine di eren cases here: ; Q g ; L ; L gf ; U ; U or eac one of these cases w e c ho ose an optimal v alue of (3) Finally , prune elemen ts from in order to ac hiev an optimal size (this is required only if or ere encoun tered) and output this The sub-factory main tains three coun ters and whic h are initially set to 0. Whenev er a or a is obtained, in step (2a), the corresp onding coun ter is incremen ted. When a certain part of a or a is `consumed', in

step (2d), the corresp onding coun ter is decremen ted the appropriate, not necessarily in- tegral, amoun t. The quartet grafting pro cess activ ated in step (2a) is carried out un til one of these coun ters
Page 10
10 Dorand Zwick reac hes a v alue of at least 3. or eac obtained in step (2a), and eac or obtained in steps (2b) and (2c), an appropriate n um- b er of elemen ts is to b e pruned in step (3). Tw o coun ters and main tain the um b er of elemen ts that need to b e pruned b elo and ab o resp ectiv ely Before outputting the partial order, elemen ts ab o and elemen ts b elo

are pruned. e depict the a our of the cost analysis b y consid- ering one of the w orst cases of the factory In the follo w- ing, w e x 637985 whic h is the optimal v alue. F or eac obtained in (2c), w e prune 1382 elemen ts b elo The pruning of the elemen ts b elo cuts pr edges. This pruning generates ho ev er new elemen ts in either singletons or pairs (b ecause the prun- ing pro cess separates singletons, or r = 2 pairs, from the cen tre). These elemen ts can b e returned to the factory as pairs and since 2, at least one pair is returned to the factory for ev ery pair that as utilized.

Hence, there is no need to break quartets in to pairs. Recycling the upp er part cuts one edge for eac h pair and recycling the lo er part cuts (1 = 4) edges (b ecause of the recycling restrictions). Th us, the lo er cost is pr + 1, obtaining eliminated elemen ts and the upp er cost is ( pr ) + 1 = 4) obtaining one eliminated elemen t. Recall also that the elimination cost is a single edge p er eliminated elemen t. Hence, the upp er and lo er elemen t costs are: 1 + (1 5 + 1 = 4) = 1 + + 1 96768 The cost analysis of all the other cases is omitted. Concludingremarks ha impro ed the results of

Sc h onhage et al. [SPP76 and Blum et al. [BFP 73 and obtained b etter algorithm for the selection of the median. Although the impro emen t, is quite mo dest, man y new ideas ere needed to obtain this impro em en t. The new ideas in tro duced ma y lead to further impro emen ts. Our curren constructions, ho ev er, are already quite in olv ed and a considerable e ort w as dev oted to their optimization. Obtaining further impro em en ts is not lik ely to b e an easy task. urther narro wing the gap b et een the kno wn upp er and lo er b ounds on the um b er of comparisons needed to select the

median remains a c hallenging op en problem. References [Aig82] M. Aigner. Selecting the top three elemen ts. Dis- cr ete Applie d Mathematics , 4:247{267, 1982. [BFP 73] M. Blum, R.W. Flo yd, V. Pratt, R.L. Riv est, and R.E. arjan. Time b ounds for selection. Journal of Computer and System Scienc es , 7:448{461, 1973. [BJ85] S.W. Ben and J.W. John. Finding the median requires comparisons. In Pr dings of the 17th nnual CM Symp osium on The ory of Computing, Pr ovidenc e, R ho de Island , pages 213{216, 1985. [CM89] W. Cun to and J.I. Munro. Av erage case selection. Journal of the A CM ,

36(2):270{279, 1989. [DZ95] D. Dor and U. Zwic k. Finding p ercen tile elemen ts. In Pr dings of the 3r d Isr ael Symp osium on The ory and Computing systems , 1995. [Eus93] J. Eusterbro c k. Errata to "Selecting the top three elemen ts" M. Aigner: result of computer- assisted pro of searc h. Discr ete Applie d Mathematics 41:131{137, 1993. [F G78] F. ussenegger and H.N. Gab o w. coun ting ap- proac h to lo er b ounds for selection problems. Journal of the A CM , 26(2):227{238, April 1978. [FJ59] L.R. F ord and S.M. Johnson. A tournamen prob- lem. meric an Mathematic al Monthly 66:387{389,

1959. [FR75] R.W. Flo yd and R.L. Riv est. Exp ected time b ounds for selection. Communic ation of the A CM , 18:165{173, 1975. [HS69] A. Hadian and M. Sob el. Selecting the -th largest using binary errorless comparisons. Col lo quia Mathe- matic a So cietatis J anos Bolyai , 4:585{599, 1969. [Hy a76] L. Hy a l. Bounds for selection. SIAM Journal on Computing , 5:109{114, 1976. [Joh88] J.W. John. A new lo er b ound for the set-partition problem. SIAM Journal on Computing , 17(4):640{647, August 1988. [Kir81] D.G. Kirkpatric k. uni ed lo er b ound for se- lection and set partitionin g

problems. Journal of the CM , 28:150{165, 1981. [Kis64] S.S. Kislitsyn. On the selection of the -th elemen of an ordered set pairwise comparisons. Sibirsk. Mat. Zh. , 5:557{564, 1964. [MP82] I. Munro and .V. oblete. lo er b ound for determining the median. ec hnical Rep ort Researc Rep ort CS-82-21, Univ ersit y of W aterlo o, 1982. [P oh72] I. ohl. sorting problem and its complexit Communic ation of the A CM , 15:462{464, 1972. [RH84] .V. Ramanan and L. Hy a l. New algorithms for selection. Journal of A lgorithms , 5:557{578, 1984. [Sc h32] J. Sc hreier. On tournamen elimination systems.

Mathesis Polska , 7:154{160, 1932. (in P olish). [SPP76] A. Sc h onhage, M. aterson, and N. Pipp enger. Finding the median. Journal of Computer and System Scienc es , 13:184{199, 1976. [SY80] . Sto c kmey er and F.F. Y ao. On the optimalit of linear merge. SIAM Journal on Computing , 9:85{90, 1980. [Y ap76] C.K. Y ap. New upp er b ounds for selection. Com- munic ation of the CM 19(9):501{508, Septem b er 1976.