/
Twister2 for BDEC2 https://twister2.gitbook.io/twister2/ Twister2 for BDEC2 https://twister2.gitbook.io/twister2/

Twister2 for BDEC2 https://twister2.gitbook.io/twister2/ - PowerPoint Presentation

yoshiko-marsland
yoshiko-marsland . @yoshiko-marsland
Follow
342 views
Uploaded On 2020-01-12

Twister2 for BDEC2 https://twister2.gitbook.io/twister2/ - PPT Presentation

Twister2 for BDEC2 httpstwister2gitbookiotwister2 Poznan Poland Geoffrey Fox May 15 2019 gcfindianaedu httpwwwdscsoicindianaedu httpspidalorg 1 5102019 Digital Science Center ID: 772582

data twister2 streaming mpi twister2 data mpi streaming storm 2019 api level dataflow spark engine batch supporting high heron

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Twister2 for BDEC2 https://twister2.gitb..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Twister2 for BDEC2https://twister2.gitbook.io/twister2/ Poznan, PolandGeoffrey Fox, May 15, 2019gcf@indiana.edu, http://www.dsc.soic.indiana.edu/, http://spidal.org/ 1 5/10/2019 Digital Science Center

Twister2 Highlights I “Big Data Programming Environment” such as Hadoop, Spark, Flink, Storm, Heron but uses HPC wherever appropriate and outperforms Apache systems – often by large factorsRuns preferably under Kubernetes Mesos Nomad but Slurm supportedHighlight is high performance dataflow supporting iteration, fine-grain, coarse grain, dynamic, synchronized, asynchronous, batch and streaming Three distinct communication environmentsDFW Dataflow with distinct source and target tasks; data not message level; Data-level Communications spilling to disks as needed BSP for parallel programming; MPI is default. Inappropriate for dataflow Storm API for streaming events with pub-sub such as KafkaRich state model (API) for objects supporting in-place, distributed, cached, RDD (Spark) style persistence with Tsets (see Pcollections in Beam, Datasets in Flink, Streamlets in Storm, Heron) 2

5/10/2019 3Software model supported by Twister2 Continually interacting in the Intelligent Aether with

Twister2 Highlights II Can be a pure batch engineNot built on top of a streaming engine Can be a pure streaming engine supporting Storm/Heron APINot built on on top of a batch engine Fault tolerance (June 2019) as in Spark or MPI today; dataflow nodes define natural synchronization points Many API’s: Data, Communication, Task High level hiding communication and decomposition (as in Spark) and low level (as in MPI) DFW supports MPI and MapReduce primitives: (All)Reduce, Broadcast, (All)Gather, Partition, Join with and without keysComponent based architecture -- it is a toolkitDefines the important layers of a distributed processing engineImplements these layers cleanly aiming at high performance data analytics

5 Parallel SVM using SGD execution time for 320K data points with 2000 features and 500 iterations, on 16 nodes with varying parallelismTimesSpark RDD > Twister2 Tset > Twister2 Task > MPI

Twister2 Status 100,000 lines of new open source Code: mainly Java but significant Pythonhttps://twister2.gitbook.io/twister2/tutorial Operational with documentation and examplesEnd of June 2019: Fault tolerance, Apache BEAM Linkage, More applicationsFall2019: Python API, C++ Implementation (why Python hard)Not scheduled: TensorFlow Integration, SQL API, Native MPITwo IU application foci are integration of Machine Learning with nano and bio modelling MLforHPC and Streaming using Storm API5/10/2019 6