Agenda What is Presto History of Presto Architecture Pluggable Backends Applications amp Business Opportunities Pros Cons Citations What is Presto Open source engine that uses Standard Query Language SQL ID: 760651
Download Presentation The PPT/PDF document "Presto Nipa Das, Ye Jee Kim, Murphy Pott..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Presto
Nipa Das, Ye Jee Kim, Murphy Potts, Sadaf Mirzai
Slide2Agenda
What is Presto?
History of Presto
Architecture
Pluggable Backends
Applications & Business Opportunities
Pros
Cons
Citations
Slide3What is Presto?
Open source engine that uses Standard Query Language (SQL)
Created by Facebook
Runs queries for data sources ranging from gigabytes to petabytes
Allows fast analytics
Can combine data from multiple sources
Slide4Slide5Slide6Slide7Facebook
Facebook’s warehouse data is stored in a few large Hadoop/HDFS-based clustersDevelopment started Fall 2012 when their warehouse data grew to petabyte sizeFully enrolled into the company by Spring 2013Actively used by over a thousand employees25 PB WarehouseAWS S3 for data warehouse
Netflix
Slide8Airbnb
Airpal Launch
Optional access control for users
Ability to search and find tables
See metadata, partitions, schemas, and sample rows
Write queries in an easy-to-read editor
Submit queries through a web interface
Track query progress
Get the results back through the browser as a CSV
Create new Hive table based on the results of a query
Save queries once written
Searchable history of all queries run within the tool
Slide9
Pros
Interactive queries
Optimized for latency
Joins with a large Fact table and many smaller Dimension tables
Create Jobs
Slide10
Cons
Limitation on maximum amount, all data must be held in-memory, or process will fail
Lacks ability to write output data back to tables
If processing fails, entire query must be re-run
Slide11Slide12
Thank you!
Questions?
Slide13Citations
https://medium.com/airbnb-engineering/airpal-a-web-based-query-execution-tool-for-data-analysis-33c43265ed1f
https://prestodb.io/
https://blog.treasuredata.com/blog/2015/03/20/presto-versus-hive/
https://www.facebook.com/notes/facebook-engineering/presto-interacting-with-petabytes-of-data-at-facebook/10151786197628920/
https://medium.com/netflix-techblog/using-presto-in-our-big-data-platform-on-aws-938035909fd4
https://www.facebook.com/notes/facebook-engineering/presto-interacting-with-petabytes-of-data-at-facebook/10151786197628920/
https://prestodb.io/
https://docs.treasuredata.com/articles/presto
https://gigaom.com/2015/03/05/airbnb-open-sources-sql-tool-built-on-facebooks-presto-database/