Val Fontama PhD Principal Data Scientist Derek Bevan Principal Software Architect Data amp Decision Sciences Group Data Science at Microsoft Summary Building Data Science team Agenda Covering the Analytics Spectrum ID: 433133
Download Presentation The PPT/PDF document "Microsoft Unlocks Business Value with Ma..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Microsoft Unlocks Business Value with Machine Learning
Val Fontama, PhD, Principal Data Scientist
Derek Bevan, Principal Software ArchitectSlide2
Data & Decision Sciences Group
Data Science at Microsoft
Summary
Building Data Science team
AgendaSlide3
Covering the Analytics Spectrum
Descriptive
Diagnostic
Prescriptive
Predictive
IT Professionals
Data Modeling, ETL, Data
Warehousing
, Data Marts and Cubes
Information Worker
Self-Service & Exploration with Power BI
Data Scientists
Advanced Analytics from Microsoft and 3
rd
parties
BI Enablement
Advanced Analytics
Enterprise Data Management
What happened?
Why did it happen?
What will happen?
What should I do?Slide4
Microsoft
DDSG -
Vision, Mission and Services Offerings
Strategic Analytics Consulting
Data Science Community
Big Data Analytics
Big Data Innovation
Predictive and Prescriptive
Causality Studies
Fraud Detection
System Dynamics
Forecasting
Optimization
Big Data Insights & Visualization
Social & Sentiment Analysis
Web Analytics
POC & Pilot Enablement
Solution Design
Architectural Design Consulting
Community Development
Data Science Training
Data Driven Org Strategy
MCS & EPG Partnership
Industry Showcase
Global Field
External Client
Consulting
Simulation Modeling Services
Mission
|
Provide advanced analytic expertise to influence strategy and help drive efficiency,
grow revenue and improve customer satisfaction
Vision
|
Build a Culture of Data Driven Decision MakingSlide5
Industries: DDSG Data Scientist Experience
Telecommunications
Financial Services
Health Care
Fixed Line & Mobile
Banking, Insurance, Real EstatePharmaceuticals, Biotechnology
Industry/UtilityAerospace, Utility, ManufacturingSlide6
Advanced Analytics
at
MicrosoftSlide7
How to build
a predictive model
?
Define Business Problem
Prepare DataDevelop Model Through Iterations
Deploy Model
Monitor Model’s Performance
Business
Insights
1
2
3
4
5Slide8
DDSG
Solving Real Problems: Sample Client
Engagements
Industry Stats
Windows Telemetry
SEGMENTATIONCYCLE TIME REDUCTION
Build a utilization based customer segmentation by analyzing the Click stream from Windows Telemetry panel
MS.COM - Targeting
TARGETING
SURFACE TABLET, WINDOWS PHONE 8
Target visitors that showed an in interest in Surface, Windows Phone, Xbox on the basis of their MS.com/MS Store behavior
CRM Online
CHURN PREDICTION
PROACTIVE SUBSCRIBER RETENTION
Building a predictive churn model – for the CRM online customers to help with retention
ISRM - Security
Enhance ISRM security monitoring and incident response capabilities. Detect potential threats on the Microsoft corporate network.
SECURITY
INTRUSION DETECTION
OEM – Unlicensed Devices
ROI, INSIGHT
WINDOWS
8 DEVICES
Analysis of ROI and development of actionable insight for marketing spend in OEM channels, including manufacturers retailers and distributers
PIRACY DETECTION
REVENUE GROWTH OPPORTUNITY
Analyzing current trends in piracy of MS products and building models to identify instances of pirated software
LCA –
Cybercrime Unit Slide9
VIDEO
Cybercrime:
Piracy Detection
“
There’s no one country, business or organization that can tackle cybercrime threats alone. That’s why we invest in bringing partners into our center – law enforcement agencies, partners and customers – to work alongside us
.”Brad Smith, Microsoft’s general counsel and executive vice president of Legal and Corporate Affairs.Slide10
Cybercrime: Piracy Prevention
Problem:
Cybercrime cost
governments, corporations and the public billions in recent years, but the techniques and level of proof required to solve enterprise cybercrime problems has been extremely challenging in the past. In particular, lost revenue from software piracy impacts an enterprise’s bottom line
Findings: Microsoft’s teams combined cyber forensics, big data analysis and machine learning techniques to enable the ability to identify diverse piracy
mechanics to stop 3 massive operations in different geographies and recouped over $5M in revenueApplied Analytics led to
stopping
piracy at the source by ceasing a daily leak of license keys from a
factory
As a result, several
legal cases were brought to the court of law
recently
Methodology:
Technological
advances and Data
Science enabled
Microsoft Cybercrime Center,
Legal Corporate Affairs
and
Microsoft IT’s Data & Decision Sciences Teams’ to effectively stop
unlicensed activity and piracy, backed by the US Computer Fraud and Abuse Act
Microsoft IT DDSG mined large volumes
of license related data; predictive models built by the Data Scientists were implemented to score millions of product keys
that LCA
used successfully to identify fraudulent behaviorSlide11
Preventing Network Intrusion with Machine Learning
Problem:
Early detection of suspicious activity on the network servers & eliminate the threat.
Methodology:
File system to store massive security data.Fully automated workflow to drive end-to-end data receiving and transformation process.Analysis and visualizations of Windows Events to identify pre-defined threat scenarios.
Move from descriptive analytics to a mature predictive archetype. Slide12
Churn Analysis
Problem:
A business line
is experiencing 36% Churn annually
Findings:
Under-utilization is a key leading indicator (Low usage)
Each 1% reduction of churn results in ~$342K
impact
Methodology:
4
0% of data is missing or incomplete
Enumerated key leading indicators drivers of churn and scored every subscription with probability of churn
Developed
Random Forest model with ~65% accuracySlide13
Customer
Targeting With Machine Learning
Problem:
To leverage the history of a person’s behavior on Microsoft.com to identify their interests and predict future actions
Predict which customers are likely to buy Surface or Windows Phone
Methodology:Big Data Platform – HDP for Windows/Azure HDInsight and Advanced Analytics support
Develop statistical models to determine the probability of users buying a Surface DeviceSlide14
Targeting Models Delivered
Windows Phone
Provided list of cookies that are more likely to land on a Windows Phone pageMonthly scoring during 3 monthsSurfaceProvided list of cookies that are more likely to buy a surfaceMonthly scoring since April 20, 2013
New Targeting Models Developed for Surface and Windows PhoneSlide15
Path analysis
Geography analysis
By
Microsoft’s
PowerMapBig Data Analysis5 months of logs from Microsoft.comAnalysis conducted using Power BI, SQL Server, & Hadoop
Understand the Big Picture of your website’s
logs
Text Mining on external and internal
queries
Recognize your users quickly before their behavior
changes
Big Data Clustering models for user
segmentation
Big Data Predictive models for user behavior /
targeting
Do this for any sub-site, campaign, user segment,
etc.
Leverage big data platform for ongoing model
refinementSlide16
Behavior AnalysisSlide17
Queries in Microsoft.com were logged during a specific time range. The engineering team was interested to know the popular “topics” from this collection of queries (documents
)
A text miner tool pre-processed 3 million queries, and constructed 25 thematic topics formed by “key words”. The 5 most popular “topics” are listed below
Text AnalysisCategoryTopic IdDoc cutoffTerms
cutoffTopicNum of termsNum of queriesMultiple
5.05.032
0.397+window
, +
live
,
windowsmedia
,
xp
,
aspx
26.0
21633.0
Multiple
15.0
3.074
0.304
xp
, +
window
, sp3,
xp
service pack, +
download44.0
18299.0
Multiple
13.0
3.353
0.316
+window, +vista
, +
installer
, +mobile
, +phone
77.0
17771.0
Multiple
2.05.804
0.432
+medium
, +
player
, +
window
, +
download
, +
window
19.0
16713.0
Multiple
4.0
4.999
0.402
+office
, +
microsoft
office,
microsoft
, +
mac
, +
download
24.0
13088.0
Internal
(i.e. on direct Microsoft pages
)
Category
Topic
Id
Doc cutoff
Terms
cutoff
Topic
Num
of terms
Num
of queries
Multiple
5.0
8.793
0.367
+window
, +
phone
, +
bit
, +
theme
, +
install
177.0
213487.0
Multiple
9.0
8.133
0.343
microsoft
, +
microsoft
office
, +
microsoft
word
, +
microsoft
essential
, +
microsoft
outlook
140.0
144995.0
Multiple
10.0
7.305
0.337
+window
, +
phone
, +
installer
, +
vista
, +
server
174.0
132050.0
Multiple
25.0
3.152
0.228
+error
, +
server
, +
file
, +
code
,
sharepoint
545.0
104760.0
Multiple
8.0
7.818
0.343
+download
, +
free
, +
window
, +
explorer
,
microsoft
128.0
85837.0
External
(i.e. referrals from Google, Yahoo, etc.)Slide18
Windows OS users
Internal queries
This chart shows groups of similar queries. There are total 15 end nodes in this chart showing 15 groups. Almost all of these groups are product related.
Text AnalysisSlide19
Results – Better Targeting increased Revenue
Better customer targeting
Targeting coverage improved by 5% due to predictive models and other measures!
Increased revenue from display Ads
Targeted Ads generated up to 19% of revenue
Revenue per 1000 impressions grew by over 8X
Revenue per click grew by 6X!Slide20
Building a
Data
Science TeamSlide21
Data Science Team Composition
Team Experience:
Our Academic Backgrounds
Applied Mathematics
Computer ScienceEconometricsStatistics
Engineering Our Professional Expertise Financial Services
TelecommunicationsInformation TechnologyIndustrials/ManufacturingUtilities
Healthcare
Marketing
Domain Experience:
Forecasting/Modeling
Demand Forecasting
Predictive
Modeling
Demand-Driven Planning
Credit
Modeling
Fraud
Detection
Consumer Relations
Sentiment Analysis/Social Media
Inventory Optimization
Customer
Acquisition/Segmentation
Membership Portfolio Optimization
Click
stream Data Analysis
Data ScienceDesign of experimentsPredictive MaintenanceMachine Learning
Big Data Analytics/Innovation
…a key resource for delivering value to the enterprise and your business Slide22
The Roles: Data Scientist & Customer
…key resources, engaged collaboration essential for delivering value to the enterprise
Data Scientist
Scientific
Method
Domain
Knowledge
Intellectual Curiosity &
Critical Thinking
Visualization & Communication
Math &
Statistics
Advanced Computing &
Data Management
Business Problem
Insights for Decision Making
Ethical
Considerations
Objectivity
Hypotheses
Validation
Transparency
Dialog With Business
Problem
Description
Options
Considered
Receptive
to
Conclusions
Customer,
Partner,
StakeholderSlide23
Best Practices
Data Science is a team sport
Hire complementary skills to build a rounded team!
We need a hybrid Data Science team structure for best resultsNeed a centralized team of Data Scientists to share and promote best practicesAnd Data Scientists in Line of Business groups for domain knowledgeData Science team needs to be peers, but not inside a BI team
Analytics team should span descriptive, diagnostic, predictive and prescriptive analyticsBI only covers descriptive and diagnosticData Scientist in a BI team may be under-utilizedSlide24
Introduced Data & Decision Sciences Group
Data Science at Microsoft
Cybercrime and antipiracyNetwork intrusionCustomer churn predictionCustomer targeting modelsBuilding a Data Science teamSummarySlide25
Resources
Learning
Microsoft Certification & Training Resources
www.microsoft.com/learning
msdn
Resources for Developers
http://microsoft.com/msdn
TechNet
Resources for IT Professionals
http://microsoft.com/technet
Sessions on Demand
http://channel9.msdn.com/Events/TechEdSlide26
Complete an evaluation
and
enter to win!Slide27
Evaluate this session
Scan this
QR
code
to evaluate this session.Slide28
©
2014
Microsoft Corporation. All rights reserved. Microsoft, Windows,
and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.Slide29Slide30
Advanced
Telemetry Analytics for Windows
Problem:
We needed a behavior customer segmentation for Windows and Office
Very large volumes of telemetry data are collected – over 1.7 Billion mouse clicks and 2.4 Billion keystrokes
Findings: Successfully developed 7 user behavioral segments
Prioritize investments around activities people do mostMethodology:
How
can we effectively mine and extract meaning from the data
?
Used clustering techniques to segment data that included hardware, app usage, user data, URLs visited