/
Running Map-Reduce Under Condor Running Map-Reduce Under Condor

Running Map-Reduce Under Condor - PowerPoint Presentation

paige
paige . @paige
Follow
2 views
Uploaded On 2024-03-15

Running Map-Reduce Under Condor - PPT Presentation

Cast of thousands Mihai Pop Michael Schatz Dan Sommer University of Maryland Center for Computational Biology Faisal Khan Ken Hahn UW David Schwartz LMCG In 2003 httplabsgooglecompapersgfshtml ID: 1048699

tracker condor map hdfs condor tracker hdfs map running mapreduce job task hadoop small reduce input genome fast posix

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Running Map-Reduce Under Condor" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

1. Running Map-ReduceUnder Condor

2. Cast of thousandsMihai PopMichael SchatzDan SommerUniversity of Maryland Center for Computational BiologyFaisal Khan, Ken Hahn UW David Schwartz, LMCG

3. In 2003…http://labs.google.com/papers/gfs.htmlhttp://labs.google.com/papers/mapreduce.html

4.

5.

6. Shortly thereafter…

7. Two main Hadoop parts

8. For more detailCondorWeek 2009 talk Dhruba Borthakurhttp://www.cs.wisc.edu/condor/CondorWeek2009/condor_presentations/borthakur-hadoop_univ_research.ppt

9.

10. HDFS overviewMaking POSIX distributed file system go fast is easy…

11. HDFS overview…If you get rid of the POSIX partRemoveRandom accessSupport for small filesauthenticationIn-kernel support

12. HDFS OverviewAdd inData replication (key for distributed systems)Command line utilities

13. HDFS Architecture

14. HDFS Condor IntegrationHDFS Daemons run under masterManagement/controlAdded HAD support for namenodeAdded host based security

15. Condor HDFS: IIFile transfer supporttransfer_input_files = hfds://…Spool in hdfs

16. Map Reduce

17. Shell hackers map reducegrep tag input | sort | uniq –c | grep

18. MapReduce lingo for the native Condor speakerTask tracker  startd/starterJob tracker  condor_schedd

19. Map Reduce under CondorZeroth law of software engineeringJob tracker/task tracker must be managed!Otherwise very bad things happen

20. Hadoop on Demand w/Condor

21. Map Reduce as overlay Parallel Universe jobStarts job tracker on rank 0Task trackers everywhere elseOpen Question:Run more small jobs, or fewer biggerOne job tracker per user (i.e. per job)

22. On to real science…David Schwartz, matchmakerMihai Pop

23. Contrail – MR genome assemblyhttp://sourceforge.net/apps/mediawiki/contrail-bio/index.php

24. Genome assembly

25. DNA3 Billion base pairsSequencing machines only read small reads at a time

26. Already done this?

27. High throughput sequencers

28. ContrailScalable Genome Assembly with MapReduceGenome: African male NA18507 (Bentley et al., 2008)Input: 3.5B 36bp reads, 210bp insert (SRA000271)Preprocessor: Quality-Aware Error Correction.Cloud SurfingError CorrectionCompressedInitialNMaxN50>10B 2727>1 B303 bp< 100 bp5.0 M14,007650 bp4.2 M20,594923 bpIn ProgressResolve Repeats

29. Running it under CondorUsed CHTC B-240 cluster~100 machines8 way nehalem cpu12 Gb total1 disk partition dedicated to HDFSHDFS running under condor master

30. Running it on CondorUsed the MapReduce PU overlayStarted with Fruit Flies…And it crashedZeroth law of software engineeringVersion mismatchDebugging…

31. DebuggingAfter a couple of debugging roundsFruit Fly sequenced!!On to humans!

32. CardinalityHow many slots per task tracker?Task tracker, like schedd multi-slotsOne machine8 cores1 disk1 memory systemHow many mappers per slot

33. More MR under CondorMore debugging, NPEsUpdated MR againSome performance regressionsOne power outage12 weeks later…

34. Success!

35.

36. ConclusionsJob trackers must be managed!Glide-in is more than Condor on batchHadoop – more than just MapReduceHDFS – good partner for CondorAll this stuff is moving fast