PPT-Taming Massive Distributed Datasets: Data Sampling Using Bi
Author : tawny-fly | Published Date : 2016-07-07
Yu Su Gagan Agrawal Jonathan Woodring Kary Myers Joanne Wendelberger James Ahrens The Ohio State University Los Alamos National Laboratory Motivation
Presentation Embed Code
Download Presentation
Download Presentation The PPT/PDF document "Taming Massive Distributed Datasets: Dat..." is the property of its rightful owner. Permission is granted to download and print the materials on this website for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Taming Massive Distributed Datasets: Data Sampling Using Bi: Transcript
Yu Su Gagan Agrawal Jonathan Woodring Kary Myers Joanne Wendelberger James Ahrens The Ohio State University Los Alamos National Laboratory Motivation Science becomes increasingly data driven. Course Introduction. Mining of Massive Datasets. Jure Leskovec, . Anand. . Rajaraman. , Jeff Ullman . Stanford University. http://www.mmds.org . Note to other teachers and users of these . slides:. We . Shannon Quinn. (with content graciously and viciously borrowed from William Cohen’s 10-605. Machine Learning with Big Data and Stanford’s MMDS MOOC . http://www.mmds.org/. ). “Big Data”. Astronomy. : . A Fault-Tolerant Abstraction for In-Memory Cluster Computing. Matei. . Zaharia. , . Mosharaf. Chowdhury, . Tathagata. Das, . Ankur. Dave, Justin Ma, Murphy McCauley, Michael J. Franklin, Scott . A Fault-Tolerant Abstraction for In-Memory Cluster Computing. Matei Zaharia, Mosharaf Chowdhury. , Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauley, Michael J. Franklin, Scott Shenker, Ion Stoica. Distributed . Geoscience Datasets and . the Role . of Semantic Technologies. Xiaogang (Marshall) Ma. Tetherless World Constellation. Rensselaer Polytechnic Institute. @. MarshallXMa. max7@rpi.edu. x.marshall.ma. Shannon . Quinn. (with thanks to William Cohen of . Carnegie Mellon and . Jure . Leskovec. of Stanford). “Big Data”. Astronomy. Sloan Digital Sky Survey. New Mexico, 2000. 140TB over 10 years. Large Synoptic Survey Telescope. brasÍlia. , 26 . october. 2016. GBIF.org. milestones for 2016. Tim Hirsch, Deputy Director, GBIF Secretariat. New datasets. doi. :10.15468/. xabmiz. . . Photo. : . Catocala. . nupta. . by . M.Virtala. Lemonade from Lemons. Bugs manifest themselves every where in deployed systems.. Each manifestation gives us the chance of inspection and hence the resolutions.. Deployment gives more test cases than the test suite.. (with thanks to . Paco. Nathan and . Databricks. ). Quick Demo. Quick Demo. API Hooks. Scala. / Java. All Java libraries. *.jar. http://www.scala-lang.org. Python. Anaconda: . https://store.continuum.io/cshop/anaconda. A Fault-Tolerant Abstraction for. In-Memory Cluster Computing. Matei. . Zaharia. , . Mosharaf. Chowdhury, . Tathagata. Das, . Ankur. Dave, . Justin Ma, Murphy McCauley, Michael J. Franklin, Scott . : . A Fault-Tolerant Abstraction for In-Memory Cluster Computing. Matei. . Zaharia. , . Mosharaf. Chowdhury, . Tathagata. Das, . Ankur. Dave, Justin Ma, Murphy McCauley, Michael J. Franklin, Scott . Jure Leskovec, . Anand. . Rajaraman. , Jeff Ullman . Stanford University. http://www.mmds.org . Note to other teachers and users of these . slides:. We . would be delighted if you found this our material useful in giving your own lectures. Feel free to use these slides verbatim, or to modify them to fit your own needs. Decision Trees on MapReduce CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu Decision Tree Learning Give one attribute (e.g., lifespan), try to predict the value of new people’s lifespans by means of some of the other available attribute Mark Bun (Boston University), . Jörg. Drechsler (IAB), . Marco . Gaboardi. (Boston University), Audra McMillan (Apple), . Jayshree Sarathy (Harvard University). Outline. Background. Cluster sampling.
Download Document
Here is the link to download the presentation.
"Taming Massive Distributed Datasets: Data Sampling Using Bi"The content belongs to its owner. You may download and print it for personal use, without modification, and keep all copyright notices. By downloading, you agree to these terms.
Related Documents