Behrooz Chitsaz Director IP Strategy Microsoft Research behroozcmicrosoftcom Frank Seide Lead Researcher Microsoft Research fseidemicrosoftcom Kit Thambiratnam Researcher Microsoft Research ID: 660356
Download Presentation The PPT/PDF document "Unlocking Audio/Video Content with Speec..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Unlocking Audio/Video Content with Speech Recognition
Behrooz ChitsazDirector, IP StrategyMicrosoft Researchbehroozc@microsoft.com
Frank SeideLead ResearcherMicrosoft Researchfseide@microsoft.com
Kit Thambiratnam
Researcher
Microsoft Research
kit@microsoft.comSlide2
Division
established in 1991
900+ Researchers in 2010
50+ areas of computing
Open Research cultureImpact on most Microsoft products
Microsoft ResearchSlide3
Multimedia Research
Speech Search
Video summarization
Semantic extraction
Face identification
Object recognition
Visual search
3D ModelingSlide4
Speech Applications
Speech as interface
Speech as 1st
class content
Mobile accessSearch
Automation
PC application
Web service
Text input
Dictation
Indexing
Search
Metadata extraction
Advertising
Transcription
Meeting notes
Closed caption
Voicemail
Translation
Translating phoneSlide5
meta-data
surrounding & anchor text, URLtop-N lists, collaborative filteringeditorial meta-datafile content itselfkeyword search in audio track using speech recognition
Searching Media TodaySlide6
DemoSlide7
Spectral Analysis
Matching (Decoding)
time alignment
most likely hypothesisW’=argmax(w1..wN)p(ot..oT|w1..wN) P(w1..wN
)Acoustic Models
p(
ot..
ot
|phoneme)
Dictionary
P
(phonemes|
w)
Grammar (Language Model)
P
(w1..w
N
)
“
Hello World”
o
1
..
o
T
(
w
1
..
w
N
)^
Speech recognitionSlide8
speech recognition in a nutshell
Acoustic Models
p
(
ot..ot|phoneme)DictionaryP(phonemes|w)Grammar (Language Model)P(w1..wN)Speech recordings+ full manual transcriptsSpeech recognitionSlide9
Acoustic Models
p
(
o
t..ot|phoneme)DictionaryP(phonemes|w)Grammar (Language Model)P(w1..wN)...microscope m:s ay:n k:n r:n ax:n s:n k:n ow:n p:emicrosecond m:s ay:n k:n r:n ax:n s:n eh:n k:n ax:n n:n d:emicrosecond m:s
ay:n k:n r:n ow:n s:n eh:n k:n ax:n n:n d:emicrosoft m:s ay:n k:n r:n ax:n s:n ao:n f:n t:emicrosoft m:s
ay:n k:n r:n ow:n s:n ao:n f:n t:e
…
Speech recognitionSlide10
Acoustic Models
p
(
o
t..ot|phoneme)DictionaryP(phonemes|w)Grammar (Language Model)P(w1..wN)...-0.8790 this is a-2.3045 this is about
-3.1858 this is absolutely-5.2820 this is accomplished-
1.9542 this is actually
...-5.8492 is a barnyard
-5.1004 is a barometer-4.2270 is a baseball
-5.4292 is a baseless-4.4304 is a baseline
Speech recognitionSlide11
Challenges
Speaker accentBackground noiseReverberationVocabularyLanguageSlide12
lattice-based indexing
“into this bank account”Slide13
lattice-based indexing
“into this bank account”
expected benefits from indexing lattices:
alternative recognition candidates recall++confidence scores precision++(time information user experience)Slide14
Speech
Word statistics
Metadata
NP extraction
Web query builderRecognizerBing Search
Docs
Queries
Docs
Base
Dict
Base
LM
Adapt Dictionary
Adapt Language Model
AdaptedDict
AdaptedLM
Vocabulary Adaptation
from NLC
groupSlide15
Architectural decisions
High quality Speech Recognition is compute intensive
Use Azure for indexingMedia content could be anywherePowerShell tools to upload contentCustomer should be able to own search experienceEasy integration with text search infrastructureIntegrate with SQL Server/Sharepoint
/FASTMust support click to play
Silverlight supports accurate seekingSlide16
Microsoft Azure
SQL Server(s)
1. Submit audio/video to index
2. Get back AIB
3. Import AIB in SQL
Web server(s)
Media server(s)
4. Search/Retrieve results
video RSS feed
Azure integrationSlide17
Cloud computing made simple
Windows Azure + Power shell=
Cloud computing at your fingertipsDemo media content submissionSlide18
Microsoft Research
Tell us if you are interestedmmms@microsoft.comVisit us:
http://research.microsoft.com/mavishttp://research.microsoft.comhttp://twitter.com/MSFTResearchhttp://www.facebook.com/microsoftresearch#
http://www.flickr.com/photos/msr_redmond/Slide19
Thank you!
Questions?Slide20Slide21
© 2010 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.
The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions,
it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation.
MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.