/
Presto Nipa Das, Ye Jee Kim, Murphy Potts, Sadaf Mirzai Presto Nipa Das, Ye Jee Kim, Murphy Potts, Sadaf Mirzai

Presto Nipa Das, Ye Jee Kim, Murphy Potts, Sadaf Mirzai - PowerPoint Presentation

kittie-lecroy
kittie-lecroy . @kittie-lecroy
Follow
343 views
Uploaded On 2019-06-29

Presto Nipa Das, Ye Jee Kim, Murphy Potts, Sadaf Mirzai - PPT Presentation

Agenda What is Presto History of Presto Architecture Pluggable Backends Applications amp Business Opportunities Pros Cons Citations What is Presto Open source engine that uses Standard Query Language SQL ID: 760651

presto data facebook https data presto https facebook queries query tool tables engineering airbnb based sources petabytes warehouse write

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Presto Nipa Das, Ye Jee Kim, Murphy Pott..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Presto

Nipa Das, Ye Jee Kim, Murphy Potts, Sadaf Mirzai

Slide2

Agenda

What is Presto?

History of Presto

Architecture

Pluggable Backends

Applications & Business Opportunities

Pros

Cons

Citations

Slide3

What is Presto?

Open source engine that uses Standard Query Language (SQL)

Created by Facebook

Runs queries for data sources ranging from gigabytes to petabytes

Allows fast analytics

Can combine data from multiple sources

Slide4

Slide5

Slide6

Slide7

Facebook

Facebook’s warehouse data is stored in a few large Hadoop/HDFS-based clustersDevelopment started Fall 2012 when their warehouse data grew to petabyte sizeFully enrolled into the company by Spring 2013Actively used by over a thousand employees25 PB WarehouseAWS S3 for data warehouse

Netflix

Slide8

Airbnb

Airpal Launch

Optional access control for users

Ability to search and find tables

See metadata, partitions, schemas, and sample rows

Write queries in an easy-to-read editor

Submit queries through a web interface

Track query progress

Get the results back through the browser as a CSV

Create new Hive table based on the results of a query

Save queries once written

Searchable history of all queries run within the tool

Slide9

Pros

Interactive queries

Optimized for latency

Joins with a large Fact table and many smaller Dimension tables

Create Jobs

Slide10

Cons

Limitation on maximum amount, all data must be held in-memory, or process will fail

Lacks ability to write output data back to tables

If processing fails, entire query must be re-run

Slide11

Slide12

Thank you!

Questions?

Slide13

Citations

https://medium.com/airbnb-engineering/airpal-a-web-based-query-execution-tool-for-data-analysis-33c43265ed1f

https://prestodb.io/

https://blog.treasuredata.com/blog/2015/03/20/presto-versus-hive/

https://www.facebook.com/notes/facebook-engineering/presto-interacting-with-petabytes-of-data-at-facebook/10151786197628920/

https://medium.com/netflix-techblog/using-presto-in-our-big-data-platform-on-aws-938035909fd4

https://www.facebook.com/notes/facebook-engineering/presto-interacting-with-petabytes-of-data-at-facebook/10151786197628920/

https://prestodb.io/

https://docs.treasuredata.com/articles/presto

https://gigaom.com/2015/03/05/airbnb-open-sources-sql-tool-built-on-facebooks-presto-database/