ARALLEL STOCHASTIC HILL CLIMBING WITH SMALL TEAMS Brian Gerk ey Sebastian Thrun rticial Intel ligenc ab Stanfor University Stanfor d CA  USA gerk eyai
140K - views

ARALLEL STOCHASTIC HILL CLIMBING WITH SMALL TEAMS Brian Gerk ey Sebastian Thrun rticial Intel ligenc ab Stanfor University Stanfor d CA USA gerk eyai

stanfo rdedu thrunstanfo rdedu Geo Gordon Center for utomate arning and Disc overy Carne gie Mel lon University Pittsbur gh 15213 USA ggo rdoncscmuedu Abstract address the basic problem of coordinating the actions of multiple robots that are orking t

Download Pdf

ARALLEL STOCHASTIC HILL CLIMBING WITH SMALL TEAMS Brian Gerk ey Sebastian Thrun rticial Intel ligenc ab Stanfor University Stanfor d CA USA gerk eyai

Download Pdf - The PPT/PDF document "ARALLEL STOCHASTIC HILL CLIMBING WITH SM..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.

Presentation on theme: "ARALLEL STOCHASTIC HILL CLIMBING WITH SMALL TEAMS Brian Gerk ey Sebastian Thrun rticial Intel ligenc ab Stanfor University Stanfor d CA USA gerk eyai"— Presentation transcript:

Page 1
ARALLEL STOCHASTIC HILL- CLIMBING WITH SMALL TEAMS Brian Gerk ey Sebastian Thrun rticial Intel ligenc ab Stanfor University Stanfor d, CA 94305, USA gerk ey@ai.stanfo, thrun@stanfo Geo Gordon Center for utomate arning and Disc overy Carne gie Mel lon University Pittsbur gh, 15213, USA ggo Abstract address the basic problem of coordinating the actions of multiple robots that are orking to ard common goal. This kind of problem is NP-hard, because in order to coordinate system of robots, it is in principle necessary to generate and aluate

number of actions or plans that is xponential in (assuming ). Ho we er we suggest that man instances of coordination problems, despite the NP-hardness of the erall class of problems, do not in practice require xponential computation in order to arri at good solutions. In such problems, it is not necessary to consider all possible actions of the robots; instead an algorithm may restrict its attention to interactions within small teams, and still produce high-quality solutions. use this insight in the de elopment of no el coordination algorithm that we call par allel stoc hastic hill-climbing

with small teams or arish This algo- rithm is designed specifically for use in multi-robot systems: it can run of f-line or on-line, is easily distrib uted across multiple machines, and is ef ficient with re ard to communication. state and analyze the arish algorithm present results from the implementation and application of the algorithm for con- crete problem: multi-robot pursuit-e asion. In this demanding domain, team of robots must coordinate their actions so as to guarantee location of skilled ader
Page 2
1. In tro duction Multi-robot systems ha the potential to be

ar more useful than single robots: multiple robots may perform gi en task more ef ˛ciently than single robot, multiple robots may be more rob ust to ailure than single robot, and multiple robots may be able to achie tasks that are impossible for single robot. Ho we er reaching that potential can be xtremely dif ˛cult, especially in the case where multiple robots mak task achie ement possible rather than simply better The dif ˛culty arises primarily from the combinatorial possi- bilities inherent in the problem of coordinating the actions of multiple robots, which is in general

-hard (Gare and Johnson, 1979). Gi en system of robots and common goal, it may be necessary to generate and aluate number of actions or plans that is xponential in (assuming that ). One common ay to attack such problem is brute-force search in the joint state/action space. That is, treat the multi-robot system as one man y-bodied robot and look through the xponentially man possibilities until the right an- swer is found. Though this approach will produce an optimal solution, it is only viable on simple problems, as the necessary computation quickly becomes in- tractable as the number of robots

and/or the comple xity of the problem gro ws. This act contradicts the intuition that ha ving more robots ailable should mak task easier rather than harder to solv e. Additionally this approach is undesirable for most robotic applications, because it requires centralized planner ecuti e, which precludes local control decisions at the le el of an indi vidual robot. Another more popular approach is to treat the multi-robot system as collection of independent single robots and allo each one to mak indi vid- ual control decisions, irrespecti of the other robots actions. This approach scales ery

well, as it requires each robot to consider only its wn possible actions, the number of which remains constant as the number of robots gro ws. Unfortunately this technique will not necessarily produce good solution. In act, if the actions of the robots must be coordinated in order to achie task, then allo wing them to simply mak indi vidual choices without considering or consulting each other is unlik ely to lead to an solution at all. belie that between these tw xtremes lies fertile ground for the de el- opment of heuristic multi-robot coordination algorithms that produce good so- lutions yet

scale well with the number of robots. In particular we suggest that man multi-robot problems can be solv ed quickly and ef fecti ely by allo wing the formation of and planning for small teams er short time horizons. That is, rather than considering the possible actions of all robots or of just robot, consider groups of up to robots, where ut prefer smaller groups, because the are computationally cheaper to coordinate. In this paper we in- troduce an algorithm, par allel stoc hastic hill-climbing with small teams or
Page 3
Par al lel sto chastic hil l-climbing with smal te ams arish

which combines the idea of small teams with the use of heuristics and stochastic action selection. In addition to scaling well and tending to produce good solutions to coordination problems, arish is easily distrib utable, and can be ecuted either on-line or of f-line, both of which are desirable properties for multi-robot algorithms. ha implemented arish for the problem of multi-r obot pur suit-e vasion in which group of robots must ork together to search gi en en vironment so as to guarantee location of skilled mobile ader This is dif ˛cult prob- lem that clearly requires coordination

among the robots (a single robot is only capable of clearing en vironments that are topologically equi alent to single hall ay). And, unlik more weakly interacti tasks, lik foraging, pursuit- asion occasionally requires ery tight coordination between robots in order to mak an progress at all. pro vide results from tests in simulation of search strate gies produced by arish. 2. Bac kground and related ork The ˛rst rigorous formulation of the pursuit-e asion problem is due to ar sons, who restricted his study to the case in which the en vironment is discrete graph (P arsons, 1976). Nothing

is kno wn about the location or motion of the ader who is assumed to be able to mo arbitrarily ast through the graph. The ader can occup an edge in the graph; to ˛nd the ader searcher must alk along the edge occupied by the ader and touch the ader The entire graph is initially contaminated which means that the ader could be an ywhere. As the search progresses, an edge is clear ed when it is no longer possible for the ader to occup that edge. Should it later happen that the ader could ha mo ed back to pre viously clear edge, that edge is said to be econtaminated Using this terminology the

goal of the problem can be restated as follo ws: ˛nd trajectory for each searcher such that the an initially contaminated graph is cleared. More recently visibility-based ersion of the pursuit-e asion problem as introduced (Suzuki and amashita, 1992), which changed the domain from discrete graphs to continuous polygonal free spaces. Complete algorithms ha been described for searchers ha ving either or flashlights (Lee et al., 2002), omnidirectional vision (Guibas et al., 1999), and limited ˛eld-of-vie vision (Gerk et al., 2004). Randomized pursuit algorithms ha also been

studied, in both discrete graphs (Adler et al., 2003) and polygonal free spaces (Isler et al., 2003). 3. Algorithm The arish algorithm coordinates multi-robot system in scalable manner by considering the possible actions of not only single robots, ut also small
Page 4
teams of robots. The general form of the algorithm can be summarized as follo ws: Algorithm arish: ar allel stoc hastic hill-climbing with small teams Input: robots; multi-robot problem maximum team size alue heuristic probability distrib ution 1. while not done 2. do parallel or each robot 3. do or to 4. do is feasible

-searcher plan in olving f;g 5. Sample from according to 6. if 7. then Ex ecute 8. br eak The value heuristic has tw components: benefit heuristic and cost function The bene˛t heuristic estimates the (possibly ne ati e) mar ginal bene˛t (i.e., progress that ould be made to ard solution) of gi en plan. In other ords, estimates the optimal alue function, which is un- kno wn (computing the optimal alue function is equi alent to solving the orig- inal -hard problem). If plan in olv es an robots that are currently part of other teams that are eng aged in other plans, then includes

an estimate of the (probably ne ati e) bene˛t that will result from disbanding those teams and halting the ecution of those other plans. The function calculates, in the same units as the cost of ecuting gi en plan. This cost can be an salient aspect of the domain that is xternal to progress, such as distance mo ed. The value of plan is then Because the heuristic is only an estimate of the true bene˛t of gi en plan, we cannot al ays select the highest-v alued plan. Such strate gy will, in all ut the simplest problems, lead to local maxima of progress from which the system will not

escape. Thus we emplo stochastic selection rule: rather than greedily selecting the apparently best plan, we sample plan from the set of ailable plans, according to probability distrib ution that prefers higher -v alued plans ut sometimes selects an apparently orse plan. This technique is commonly used in optimization to escape from local xtrema and is in reinforcement learning to balance xploration ag ainst xploitation. So robots ecuting arish are collecti ely hill-climbing according to local progress gradients, ut stochastically mak lateral or do wnw ard mo es to help the system escape from

local maxima. The xact nature of the selection rule can be adjusted according to the accu- rac of the bene˛t heuristic. If is kno wn to be ery accurate estimate of the optimal alue function, then the highest-v alued plan should be selected with accordingly high probability and vice ersa if is kno wn to be less accurate (of course, if is ery inaccurate, then progress will be slo and more ef fort should lik ely be put to ard designing better heuristic).
Page 5
Par al lel sto chastic hil l-climbing with smal te ams Since the robots mak plans indi vidually the computation of the

algorithm can easily be distrib uted across multiple machines, with communication re- quired only to update the state of the problem and to form (or break up) teams. If good model of the en vironment is ailable, then arish can run of f-line, with the robots interacting with this model to produce plan for later ecu- tion. If no good model is ailable, or if the en vironment is dynamic, then arish can run on-line, with the robots interacting with the en vironment di- rectly Also, robots will tend select and ecute single-robot plans, if good ones can be found, because the do not require breaking

up other teams. Thus the will mak indi vidual progress as long as possible, until such time as team formation is more bene˛cial. 3.1 Economic in terpretation As is the case with man multi-agent search algorithms, there is an ob vious economic interpretation of arish. The multi-robot system can be seen as synthetic economy in which indi vidual robots can uy the services of other robots. robot recei es (possibly ne ati e) re ard for making (possibly backw ard) progress to ard the goal. Each robot then sel˛shly tries to earn as much re ard as possible. The alue, that robot attaches to

plan that it has formulated is the price that that robot will pay in order to form the team that will help in ecuting the plan (the robot may of fer price slightly less than in order to retain some positi pro˛t). robot only joins team when it is of fered suf ˛ciently high price to tak it ay from its current team, if an Stochastic plan selection then corresponds to robot occasionally making choice that does not maximize its re ard, to account for the act that, because of inaccuracies in prices (i.e., alues), strict re ard-maximization will not necessarily lead to solution. Although

this economic interpretation relates our algorithm to pre vious ork in economically-inspired multi-robot coordination approaches (e.g., Gerk and Matari c, 2002; Dias and Stentz, 2003), we do not ˛nd it particularly help- ful. Coordination algorithms such as arish can be understood and clearly stated as instances of distrib uted search or optimization; economic interpreta- tions can unnecessarily cloud the discussion by introducing misleading analo- gies between synthetic mark ets as used by robots and real mark ets as used by humans. 3.2 Application to ulti-rob ot pursuit-ev asion no mak

arish concrete by xplaining ho we apply it to the prob- lem of multi-robot pursuit-e asion and stating the resulting algorithm. In the multi-robot pursuit-e asion problem, team of robots is required to search an en vironment (of which map is pro vided) so as to guarantee location of
Page 6
Figur 1. The Botrics Obot mobile obot, equipped with SICK scanning laser ang e-finder whic has 180 sensor field. (a) n0 n10 n1 n12 n2 n13 n3 n14 n4 n15 n5 n16 n6 n17 n7 n18 n8 n9 (b) Figur 2. An of fice-lik en vir onment, decomposed into con ve gions (a) and then tr ans- formed

into discr ete gr aph (b). skilled mobile ader The only information ailable about the ader is its size and maximum speed; no model of its initial position or subsequent tra- jectory is gi en. or our purposes, robot ˛nds the ader if the ader is detected within the robot sensor ˛eld. Our robots are each equipped with scanning laser range-˛nder that pro vides 180 ˛eld of vie and reliable detection range of approximately meters (Figure 1). ˛rst transform our problem to an instance of arsons discrete graph search problem (P arsons, 1976). This transformation in olv es

decomposing the free space in the gi en map into ˛nitely man re gions such that single robot can clear re gion by standing an ywhere on and perpendicular to the re- gion border while looking into the re gion. Furthermore, we ant to guarantee
Page 7
Par al lel sto chastic hil l-climbing with smal te ams that dif ferential-dri robot with 180 ˛eld of vie can mo from one bor der of re gion to an other border of the same re gion and eep the destination border in vie along the ay necessary and suf ˛cient conditions for the re gions are that the each: (i) be con x, and (ii)

ha no dimension greater than the maximum sensor range (8 meters). or the ork presented in this paper the decomposition as performed manually ut the process could be automated according to visibility constraints (e.g., Guibas et al., 1999; Gerk et al., 2004). Gi en such re gion decomposition, we construct an undirected graph where the ertices are the re gions, and the edges are the borders where adjacent re gions meet. An xample decomposition and the resulting graph are sho wn in Figure 2. can then apply arish, stated belo to the graph and transform the resulting solution back to the robots

control space, with each mo in the graph becoming mo to re gion border in the ph ysical en vironment. Preliminaries: Searcher positions and edge contamination states are stored as labels in the graph. The graph, the list of teams, and the list of plans are shared data structures: each searcher has an identical cop of each structure, and mutual xclusion mechanism is used to ensure consistenc when making changes. denotes searcher Gi en list denotes the th element of plan specifies sequence of mo es for one or more searchers. The null plan denoted mak es no mo es. Gi en plan .members()

returns the set of searchers required to ecute denotes the application of plan to graph to produce the resulting graph Gi en team with members, to disband is to separate the members of into singleton teams, one indi vidual per team. Algorithm arish for multi-r obot pur suit-e vasion Input: Connected, undirected graph searchers placed in (if initial placement is not gi en, place them randomly); maximum team size alue heuristic G; probability distrib ution 1. List of teams 2. List of plans 3. or to 4. do Start with singleton teams and no plans 5. .append( 6. .append( 7. while not done 8. do Each

team decides what to do, in parallel 9. parallel or to len( 10. do if 11. then No plan, so this team has only one member; call it
Page 8
12. 13. Consider teams of increasing size, up to 14. or to 15. do Mak some -searcher plans, ut also consider the null plan 16. is feasible -searcher plan in olving f;g 17. Sample from according to 18. if 19. then chose the null plan; eep looking 20. continue 21. else Assemble the team, maybe disbanding other teams 22. or to len( ), 23. do or .members() 24. do if 25. then Disband 26. 27. Store the chosen plan and be gin ecuting it 28. 29. first

step of 30. ha satisf actory plan; stop looking 31. br eak 32. else already ha plan, so eep ecuting it 33. ne xt step of 34. if just ecuted last step of 35. then This team has finished its plan; disband it 36. Disband 4. Results implemented arish as stated in the pre vious section and tested it on se eral en vironments. The tests were carried out using Stage, sensor -based multi-robot simulator; xperience has sho wn that results in Stage can be reli- ably replicated with with ph ysical (indoor planar) robots (Gerk et al., 2003). Animations can be found at: http://ai.stanf gerk

ey/r esear ch/pe/ The bene˛t heuristic is the (possibly ne ati e) number of re gions that ould be cleared by ecuting gi en plan. The cost function is propor tional to distance tra eled during gi en plan, calculated as number of re gions tra ersed. The maximum team size is and the robots are restricted to making plans that mo each team member once. Speci˛cally each robot only considers plans of the follo wing form: (team size 1) Mo to an adjacent re gion. (team size 2) Mo another robot to co er the re gion currently co ered by then mo into an adjacent re gion. The stochastic selection

rule is -greedy in which the highest-v alued plan is selected with probability (1 and otherwise one of the remaining options is chosen with uniform probability or the results presented here,
Page 9
Par al lel sto chastic hil l-climbing with smal te ams Figur 3. (In color wher available). wo obots sear hing an of fice-lik en vir onment. Blac cir cles epr esent obots; blue ar eas ar clear; ed ar eas ar in vie w; and purple ar eas ar contaminated (i.e ., the vader may be hiding ther e). assume the en vironment is static, and so are free to run arish of f-line, then ecute the

resulting plan with the simulated robots. Interestingly adding just this limited and myopic coordination is suf ˛cient to produce good solutions. or xample, sho wn in Figure are snapshots from run with robots in an of ˛ce-lik en vironment. As can be seen in that ˛gure, the robots cooperate to clear the en vironment quite ef ˛ciently without allo wing recontamination. In act, arish reliably produces solutions for this and similar en vironments that are optimal in the total path length. (we compute optimal solutions using brute-force A* search in the joint action/state space

of all the robots). The ef fect of small-team coordination can be clearly seen in Figure 4, tak en from simulation run in which robots ork together to clear one floor of
Page 10
10 Figur 4. (In color wher available). An xample of small-team coor dination tak en fr om test in whic obots clear ed lar uilding As part of 2-r obot plan, the obot that is initially in the lower right corner mo ves up and left to bloc the centr al open ar ea so the obot that another obot can mo ve left and eep sear hing an of ˛ce uilding, using sensor -based map. In this sequence, 2-robot plan

calls for the robot initially at the lo wer right to mo up and block the central
Page 11
Par al lel sto chastic hil l-climbing with smal te ams 11 -2000 0 2000 4000 6000 8000 10000 12000 14000 16000 18000 20000 3-triangle.3 T3.3 gates-simple.2 gates.3 sal2.2 Nodes expanded during search Environment . Number of robots Parish A* (a) 10 15 20 25 30 35 40 45 50 3-triangle.3 T3.3 gates-simple.2 gates.3 sal2.2 Length of path (solution quality) Environment . Number of robots Parish A* (b) Figur 5. Comparison of arish and A* in planning pur suit str ate gies in various en vir on- ments.

Shown in (a) is the number of nodes xpanded during the sear h, and in (b) is the length of the solution found (smaller is better in both cases). Results for arish, whic is stoc hastic, show the xperimental mean and standar de viation, computed fr om 100 runs in eac en vir on- ment. open area so that another robot can mo left and eep searching. ithout such interactions, the robots are not capable of clearing this comple en vironment. 5. Summary and future ork introduced the arish algorithm, which allo ws for scalable and ef ˛cient coordination in multi-robot systems. The insight of the

algorithm is that the
Page 12
12 combination of small teams, simple heuristics, and stochastic action selection can be xtremely ef fecti in solving otherwise dif ˛cult multi-robot problems. Our algorithm is easily distrib utable and can run on-line or of f-line, making it especially suitable for use in ph ysical robots systems. presented results from simulation that demonstrate the ef ˛cac of arish in coordinating robots eng aged in pursuit-e asion task. Our current ork on this algorithm follo ws paths. First, we are mo ving to ph ysical robots, where arish will run

on-line, and fully distrib uted. Second, we are rigorously analyzing arish and comparing it to competitor algorithms, such as non-cooperati greedy and centralized A*. It will be important to establish the erage-case and orst-case performance of arish, in terms of solution quality and computational requirements (i.e., amount of the search space that is actually xplored), as compared to xisting alternati es (Figure 5). Finally we are applying arish to other multi-robot coordination problems. References Adler M., ack e, H., Si adasan, N., Sohler C., and ocking, B. (2003). Randomized Pursuit- Ev

asion in Graphs. Combinatorics, Pr obability and Computing 12(3):225244. Dias, M. B. and Stentz, A. (2003). raderBots: Mark et-Based Approach for Resource, Role, and ask Allocation in Multirobot Coordination. echnical Report CMU-RI-TR-03-19, Robotics Institute, Carne gie Mellon Uni ersity Pittsb ur gh, Pennsylv ania. Gare M. R. and Johnson, D. S. (1979). Computer and Intr actability: Guide to the Theory of NP-Completeness H. Freeman. Gerk B. and Matari c, M. J. (2002). Sold!: Auction methods for multi-robot coordination. IEEE ansactions on Robotics and utomation 18(5):758768. Gerk B. .,

Thrun, S., and Gordon, G. (2004). isibility-based pursuit-e asion with limited field of vie In Pr oc. of the Natl. Conf on Artificial Intellig ence (AAAI) pages 2027, San Jose, California. Gerk B. ., aughan, R. ., and Ho ard, A. (2003). The Player/Stage Project: ools for Multi- Robot and Distrib uted Sensor Systems. In Pr oc. of the Intl. Conf on Advanced Robotics (ICAR) pages 317323, Coimbra, Portug al. Guibas, L. J., Latombe, J.-C., LaV alle, S. M., Lin, D., and Motw ani, R. (1999). isibility- Based Pursuit-Ev asion Problem. Intl. of Computational Geometry Applications 9(4

5):471493. Isler ., Kannan, S., and Khanna, S. (2003). Locating and capturing an ader in polygonal en vironment. echnical Report MS-CIS-03-33, Dept. of Computer Science, Uni of Penn- sylv ania. Lee, J.-H., ark, S.-M., and Chw a, K.-Y (2002). Simple algorithms for searching polygon with ˇashlights. Information Pr ocessing Letter 81:265270. arsons, (1976). Pursuit-e asion in graph. In Ala vi, and Lick, D., editors, Theory and Ap- plications of Gr aphs Lecture Notes in Mathematics 642, pages 426441. Springer -V erlag, Berlin. Suzuki, I. and amashita, M. (1992). Searching for mobile

intruder in polygonal re gion. SIAM on Computing 21(5):863888.