/
Tachyon: Tachyon:

Tachyon: - PowerPoint Presentation

min-jolicoeur
min-jolicoeur . @min-jolicoeur
Follow
384 views
Uploaded On 2016-07-25

Tachyon: - PPT Presentation

memoryspeed data sharing Haoyuan HY Li Ali Ghodsi Matei Zaharia Scott Shenker Ion Stoica UC Berkeley Memory trumps everything else RAM throughput increasing exponentially ID: 418752

spark block tachyon memory block spark memory tachyon jvm amp engine hdfs challenge process storage hdfsdisk execution task manager

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Tachyon:" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Tachyon: memory-speed data sharing Haoyuan (HY) Li, Ali Ghodsi, Matei Zaharia, Scott Shenker, Ion Stoica

UC BerkeleySlide2

Memory trumps everything elseRAM throughput increasing exponentiallyDisk throughput increasing slowlyMemory-locality key to interactive response

timeSlide3

Realized by many…Frameworks already leverage memorye.g. Spark, Shark, GraphX, …Slide4

Example: - Fast in-memory data processing within a jobKeep only one copy in-memory copy JVMTrack lineage of operations used to derive dataUpon failure, use lineage to re-compute datamapfiltermap

join

reduce

Lineage TrackingSlide5

Challenge 1Spark TaskSpark memoryblock managerblock 1block 3

HDFS

disk

b

lock 1

block 3

b

lock 2

block 4

e

xecution engine &

storage engine

s

ame JVM processSlide6

Challenge 1crashSpark memoryblock managerblock 1block 3

HDFS

disk

b

lock 1

block 3

b

lock 2

block 4

execution engine &

storage engine

same JVM processSlide7

Challenge 1JVM crash: lose all cache

HDFSdisk

block 1

block 3

b

lock 2

block 4

execution engine &

storage engine

same JVM process

crashSlide8

Challenge 2JVM heap overhead:GC & duplicate memory per job

Spark Task

Spark mem block manager

b

lock 1

block 3

Spark Task

Spark

mem

block manager

block 3

Block 1

HDFS

disk

b

lock 1

block 3

b

lock 2

block 4

execution engine &

storage engine

same JVM process

(GC & duplication)Slide9

Challenge 3Different jobs share data:Slow writes to diskSpark Task

Spark

mem block manager

b

lock 1

b

lock 3

Spark Task

Spark

mem

block manager

b

lock 3

b

lock

1

HDFS

disk

b

lock 1

b

lock 3

b

lock 2

b

lock 4

s

torage engine &

e

xecution engine

s

ame JVM process

(slow writes)Slide10

Challenge 3Different frameworks share data: Slow writes to disk

Spark Task

Spark mem block manager

b

lock 1

b

lock 3

Hadoop

MR

YARN

HDFS

disk

b

lock 1

b

lock 3

b

lock 2

Block 4

s

torage engine &

e

xecution engine

s

ame JVM process

(slow writes)Slide11

TachyonReliable data sharing at memory-speed within and across cluster frameworks/jobsSlide12

Challenge 1 revisitedSpark TaskSpark memoryblock managerblock 1HDFS

disk

block 1

block 3

b

lock 2

block 4

execution engine &

storage engine

same JVM process

Tachyon

in-memory

b

lock 1

block 3

block 4Slide13

Challenge 1 revisitedSpark memoryblock managerblock 1HDFSdiskb

lock 1

block 3

b

lock 2

block 4

execution engine &

storage engine

same JVM process

Tachyon

in-memory

b

lock 1

block 3

block 4

crash

HDFS

disk

b

lock 1

block 3

b

lock 2

block 4Slide14

Challenge 1 revisitedHDFSdiskblock 1block 3block 2

block 4

execution engine &

storage engine

same JVM process

Tachyon

in-memory

b

lock 1

block 3

block 4

crash

HDFS

disk

b

lock 1

block 3

b

lock 2

block 4

JVM crash:

keep memory-cacheSlide15

Challenge 2 revisitedOff-heap memory storageNo GC & one memory copySpark Task

Spark

mem

b

lock 1

Spark Task

Spark

mem

b

lock 4

HDFS

disk

b

lock 1

Block 3

b

lock 2

Block 4

execution engine &

storage engine

same JVM process

(no GC & duplication)

HDFS

disk

b

lock 1

block 3

b

lock 2

block 4

Tachyon

in-memory

b

lock 1

block 3

block 4Slide16

Challenge 3 revisitedDifferent frameworks shareat memory-speedexecution engine & storage enginesame JVM process(fast writes)

Spark Task

Spark

mem

b

lock 1

Hadoop

MR

YARN

HDFS

disk

b

lock 1

Block 3

b

lock 2

Block 4

HDFS

disk

b

lock 1

block 3

b

lock 2

block 4

Tachyon

in-memory

b

lock 1

block 3

block 4Slide17

Tachyon and SparkSpark’s of off-JVM-heap RDD-storeIn-memory RDDs (serialized)Fault-tolerant cacheEnablesavoiding GC overheadfine-grained executorsfast RDD sharingSlide18

Tachyon research visionVisionPush lineage down to storage layerUse memory aggressivelyApproachOne in-memory copyRely on recomputation for fault-toleranceSlide19

ArchitectureSlide20

Comparison with in Memory HDFS Slide21

Further Improve Spark’s PerformanceGrepSlide22

Master Faster RecoverySlide23

Open Source StatusNew releaseV0.4.0 (Feb 2014)20 Developers (7 from Berkeley, 13 from outside)11 CompaniesWrites go synchronously to under filesystem (No lineage information in Developer Preview release)MapReduce and Spark can run without any code change (ser/de becomes the new bottleneck)Slide24

Using HDFS vs TachyonSparkval file = sc.textFile(“hdfs://ip:port/path”)SharkCREATE TABLE orders_cached AS SELECT * FROM orders;Hadoop MapReducehadoop jar examples.jar wordcount hdfs://localhost/

input hdfs://localhost

/outputSlide25

Using HDFS vs TachyonSparkval file = sc.textFile(“tachyon://ip:port/path”)SharkCREATE TABLE orders_tachyon AS SELECT * FROM orders;Hadoop MapReducehadoop jar examples.jar

wordcount

tachyon://localhost/input tachyon

:

//

localhost

/

outputSlide26

Thanks to Redhat!Slide27

Future Research FocusIntegration with HDFS cachingMemory Fair SharingRandom Access AbstractionMutable Data SupportSlide28

AcknowledgmentsCalvin Jia, Nick Lanham, Grace Huang, Mark Hamstra, Bill Zhao, Rong Gu, Hobin Yoon, Vamsi Chitters, Joseph Jin-Chuan Tang, Xi Liu,

Qifan

Pu, Aslan Bekirov,

Reynold

Xin

,

Xiaomin

Zhang,

Achal

Soni

,

Xiang

Zhong

,

Dilip

Joseph

,

Srinivas

Parayya

, Tim

St.

Clair,

Shivaram

Venkataraman

, Andrew AshSlide29

Tachyon SummaryHigh-throughput, fault-tolerant in-memory storageInterface compatible to HDFSFurther improve performance for Spark, Shark, and HadoopGrowing community with 10+ organizations contributingSlide30

Thanks!More: https://github.com/amplab/tachyon