ESS Big Data Workshop 2016 13 14 October 2016 Ljubljana Philippe NIEUWBOURG p hilippenieuwbourgdecideocom nieuwbourg What I will cover New opportunities coming from new data sources ID: 792471
Download The PPT/PDF document "Skills and Training for Big Data Project..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Skills and Training for Big Data Projects
ESS Big Data Workshop 2016, 13 -14 October 2016, Ljubljana
Philippe NIEUWBOURG
p
hilippe.nieuwbourg@decideo.com@nieuwbourg
Slide2What I will cover
New opportunities coming from new data sourcesSkills framework for Big DataWhy it’s urgent to act and launch proofs of concept
Slide3Wake-up call!
Wayne Smith, Chief Statistician of Canada resigned on Sept. 16“Statistics Canada needed to be more agile because it was facing huge challenges in a world of big data including: demands for up-to-the-minute information that businesses and planners rely on, declining response rates on traditional surveys, and meeting the government’s need for statistics in new policy fields”
Slide4A Big Data Revolution?
Slide5A Big
NEW Data Revolution?
Slide6Volume – Velocity – Variety
Volume is the less importantVelocity is a challenge for people working on long term trendsAbility to connect and capture feeds of data coming from social networks, web applications, sensors…Variety is a challenge for everybodyWe need to invent news ways to find insights into unstructured dataPeople gives today much more valuable information on social media and cell phones trough status, videos, photos, than trough questionnaires
Slide7New Data Sources
From Data Sources YOU create and controlTo Data Sources collected by othersData Quality issuesData feedsData to buy or tradeData you don’t keep control on :FormatSustainabilityNew data sources, and data sources that disappears
Slide8Examples
Data coming from mobile phones: geolocation, content served, usage…Data coming from smart cities: sensors, security cameras…Data coming from social networks: status, images, videos, links, graphsAll data coming from private companies which could be use to understand behaviors:Uber data could be used to understand people mobilityAirbnb data could be used to understand how and where people travelPinterest is used to understand how people cook, what kind of fashion and decoration they like…
Slide9Skills framework
PillarsMain skills
Data ManagementManipulate high volume datasetsMoving from surveys to raw dataDealing with dirty dataGenerate structured data from unstructured dataDeal with uncertainty (new data sources, dead data sources, format evolution…)Mathematics & StatisticsEvolution of statistical modelsNew analytic technics based on large data setsImpact of big data on existing statistical and analytics technicsFind value in new type of datasets (i.e. social media) trough network analysis
Impact of « unlimited » resources available on analytics processes
Predictive analytics and machine learningCommunicationGrammar of graphicsData Storytelling
Legal, Ethics, and Privacy
Refer to the deliverable D.1.1.
Slide10Business Skills
Legal & EthicsIdentify data sourcesNegotiate data sourcesCreate value from data and analysisSell data and analysis
Slide11Data Management skills (collect & clean)
Collect or access existing dataCategory of tools
Examples of requested toolsCollect data from social medias and other sourcesAPI programmingSocial media data resellersRestlet, CNIP, Datasift, Topsy, FlumeTransform dataETLTalend, Informatica, IBM Datastage, Microsoft Integration Services, AB Initio, SqoopClean dataData Quality ManagementTrillium, DataMentors, Pixata, Ataccama, Zookeeper
Slide12Data management skills (Store)
Store dataCategory of tools
Examples of requested toolsStore structured dataRDBMSMicrosoft SQL Server, MySQL, PostgreSQL, Hbase, SybaseIQ, Google BigQuery, Amazon Web ServicesStore XML dataXML databasesVelocityDB, TaffyDB, XML:DB, eXistdbStore unstructured dataNoSQLHDFSCassandra, MongoDBCloudera, Hortonworks…Store relations between objects
Graph Databases
Neo4j, Tibco Graph Database
Slide13Analyze data
Analyze dataCategory of tools
Examples of requested toolsQuery data, manipulate data setsNew tools used for NoSQL and Hadoop environmentMapReduce, Spark, Hive, PigStatisticsStatistical softwareSAS, SPSS, Statistica, Excel, RDevelop your own analytical toolsProgramming languagesJava, Python, Scala, Ruby, JuliaMachine learningSoftware
Programming languages
Dataiku, TIMi, MahoutPython, R, Scala, Google TensorFlowData Mining, Text analytics, Sentiment analysisData Mining toolsRapidMiner, Weka,
Angoss, Kxen (SAP), Tanagra
Slide14Communication skills
Visualize and share dataCategory of tools
Examples of requested toolsCreate charts and dashboardsSelf-service Business Intelligence solutionsQlik, Tableau, Tibco Spotfire, Yellowfin, Microsoft PowerBI, Domo, ZoomdataProgram your own chartsProgramming languagesGraphics librariesR, PythonPlot.lyCreate mapsGeospatial softwareGaligeo, Esri, CartoDBTell a story about your data
Data Storytelling tools
Tableau, Qlik, Yellowfin, Microsoft PowerPoint, Prezi
Slide15NSO’s in a competing world
Slide16NSO’s are not the only source of data
What is the best source to understand people mobility in a country? Mobile telecommunication providers?What is the best source to understand what people think of politics? Twitter, Facebook?What is the best source to measure the happiness of a people? Instagram, Facebook?What is the best source to measure what people spend on vacation? Anonymous data from banks?
Slide17GAFA’s want to make money with data
Google, Apple, Facebook, Amazon…Telecommunications companies, banks, web services…They spend money to collect users dataThey know, better than you, the value of data they collectThey want to make money from that data
Slide18Do you know Watson?
Slide19Will you sustain or fight Algorithmic Regulation?
Algorithmic regulation is a system of governance where more exact data collected from citizens via their smart devices and computers are used for more efficiency in organizing human life as a collective.Big Data is the key of algorithmic regulationWho want to replace you? All Big Data platform if they can make money from their data!
Slide20Skills and Training for Big Data Projects
ESS Big Data Workshop 2016, 13 -14 October 2016, Ljubljana
Philippe NIEUWBOURG
p
hilippe.nieuwbourg@decideo.com@nieuwbourg