University of Central Florida July 20 2012 Applications of Images and Signals in High Schools Contributors Dr Veton Këpuska Faculty Mentor FIT vkepuskafitedu Jacob Zurasky ID: 810758
Download The PPT/PDF document "Speech Processing AEGIS RET All-Hands Me..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Speech Processing
AEGIS RET All-Hands MeetingUniversity of Central FloridaJuly 20, 2012
Applications of Images and Signals in High Schools
Slide2Contributors
Dr. Veton
Këpuska, Faculty Mentor, FITvkepuska@fit.edu
Jacob Zurasky, Graduate Student Mentor, FITjzuraksy@my.fit.eduBecky Dowell, RET Teacher, BPS Titusville High
dowell.jeanie@brevardschools.org
Slide3Speech Processing Project
Speech recognition requires speech to first be characterized by a set of “features”Features are used to determine what words are spoken.
Our project implements the feature extraction stage of a speech processing application.
Slide4Timeline
1874: Alexander Graham Bell proves frequency harmonics from electrical signal can be divided1952: Bell Labs develops first effective speech recognizer1971-1976 DARPA: speech should be understood, not just recognized
1980’s: Call center and text-to-speech products commercially available1990’s: PC processing power allows use of SR software by ordinary user
Timeline of Speech Recognition. http://www.emory.edu/BUSINESS/et/speech/timeline.htm
Slide5Applications
Call center speech recognitionSpeech-to-text applications (e.g. dictation software)
Hands-free user-interface (e.g., OnStar, XBOX Kinect, Siri)Science Fiction 1968: Stanley Kubrick’s
2001: A Space Odyssey http://www.youtube.com/watch?v=6MMmYyIZlC4
Science Fact 2011: Apple iPhone 4S
Siri
http
://
www.apple.com/iphone/features/siri.html
Medical Applications
Parkinson’s Voice Initiative
Detection of Sleep Disorders
Slide6Difficulties
Continuous Speech (word boundaries)NoiseBackgroundOther speakers
Differences in speakersDialects/AccentsMale/female
Slide7Speech Recognition
Front End:
Pre-processing
Back End: Recognition
Speech
Recognized speech
Large amount of data.
Ex: 256 samples
Features
Reduced data size. Ex: 13 features
Front End – reduce amount of data for back end, but keep enough data to accurately describe the signal. Output is feature vector.
256 samples ------> 13 features
Back End - statistical models used to classify feature vectors as a certain sound in speech
Slide8Front-End Processing
of Speech Recognizer
Pre-emphasis
High pass filter to compensate for higher frequency roll off in human speech
Slide9Front-End Processing
of Speech Recognizer
Pre-emphasis
Window
High pass filter to compensate for higher frequency roll off in human speech
Separate speech signal into frames
Apply window to smooth edges of framed speech signal
Slide10Front-End Processing
of Speech Recognizer
Pre-emphasis
Window
FFT
High pass filter to compensate for higher frequency roll off in human speech
Separate speech signal into frames
Apply window to smooth edges of framed speech signal
Transform signal from time domain to frequency domain
Human ear perceives sound based on frequency content
Slide11Front-End Processing
of Speech Recognizer
Pre-emphasis
Window
FFT
Mel-Scale
High pass filter to compensate for higher frequency roll off in human speech
Separate speech signal into frames
Apply window to smooth edges of framed speech signal
Transform signal from time domain to frequency domain
Human ear perceives sound based on frequency content
Convert linear scale frequency (Hz) to logarithmic scale (
mel
-scale)
Slide12Front-End Processing
of Speech Recognizer
Pre-emphasis
Window
FFT
Mel-Scale
log
High pass filter to compensate for higher frequency roll off in human speech
Separate speech signal into frames
Apply window to smooth edges of framed speech signal
Transform signal from time domain to frequency domain
Human ear perceives sound based on frequency content
Convert linear scale frequency (Hz) to logarithmic scale (
mel
-scale)
Take the log of the magnitudes (multiplication becomes addition) to allow separation of signals
Slide13Front-End Processing
of Speech Recognizer
Pre-emphasis
Window
FFT
Mel-Scale
log
IFFT
High pass filter to compensate for higher frequency roll off in human speech
Separate speech signal into frames
Apply window to smooth edges of framed speech signal
Transform signal from time domain to frequency domain
Human ear perceives sound based on frequency content
Convert linear scale frequency (Hz) to logarithmic scale (
mel
-scale)
Take the log of the magnitudes (multiplication becomes addition) to allow separation of signals
Inverse of FFT to transform to
Cepstral
Domain… the result is the set of “features”
Slide14Speech Analysis and Sound Effects (SASE) Project
Graphical User Interface (GUI)Speech inputRecord and save audio
Read sound file (*.wav, *.ulaw, *.au)Graphs the entire audio signalProcess user selected speech frame and display output for each stage of processing
Displays spectrogramApply audio effects
Slide15MATLAB Code
Graphical User Interface (GUI)GUIDE (GUI Development Environment)Callback
functionsFront-end speech processing
Modular functions for reusabilityGraphs display output for each stageSound EffectsEcho, Reverb, Flange, Chorus, Vibrato, Tremolo, Voice Changer
Slide16Slide17GUI Components
Slide18GUI Components
Plotting Axes
Slide19GUI Components
Plotting Axes
Buttons
Slide20SASE Lab Demo
Record, play, save audio to file, open existing audio filesSelect and process speech frame, display graphs of stages of front-end processingDisplay spectrogram for entire speech signal or user selectable 3 second sample
Play speech – all or selected 3 sec sampleShow differences in certain sounds in spectrogram and the features ex: “a e i o u” so audience understands how these graphs tell us about the sounds
Apply sound effects, show user configurable parametersGraphs spectrogram and speech processing on sound effectsShow echo effect in spectrogramUse as teaching tool
Slide21Slide22Future Work on SASE Lab
Audio EffectsEx: Pitch removalNoise Filtering
Slide23Applications of Signal Processing in High Schools
Convey the relevance and
importance of math to high school studentsBring knowledge of engineering, technological innovation, and academic research into high school classroomsOpportunity for students to acquire technical knowledge and analytical skills through hands-on exploration of real-world applications
in the field of Signal Processing Encourage students to pursue higher education and careers in STEM fields
Slide24Unit Plan: Speech Processing
Collection of lesson plans introduce high school students to fundamentals of speech and sound processingConnections to Pre-Calculus mathematics standards (NGSSS and Common Core)
Mathematical ModelingTrigonometric FunctionsComplex Numbers in Rectangular and Polar Form
Function OperationsLogarithmic FunctionsSequences and SeriesMatricesHand-on lessons involving MATLAB projectsTeacher notes
Slide25Unit Introduction
Students research, explore, and discuss current applications of speech and audio processing
Slide26Lesson 1: The Sound of a Sine Wave
Modeling sound as a sinusoidal functionConcepts covered:Continuous vs. Discrete Functions
Frequency of Sine WaveComposite signals Connections to real-world applications:Synthesis of digital speech and music
Slide27Lesson 1: The Sound of a Sine Wave
Student MATLAB ProjectCreate discrete sine waves with given frequenciesCreate composite signal of the sine waves
Plot graphs and play sounds of the sine wavesAnalyze the effect of frequency on the graphs and the sounds of the sine functionsProject ExtensionsPlay songs using sine waves
Synthesize vowel sounds with sine waves
Slide28Lesson 2: Frequency Analysis
Use of Fourier Transformation to transform functions from time domain to frequency domainConcepts covered:Modeling harmonic signals as a series of sinusoids
Sine wave decompositionFourier TransformEuler’s FormulaFrequency spectrumConnections to real-world applications:
Speech processing and recognition
Slide29Lesson 2: Frequency Analysis
Student MATLAB ProjectCreate a composite signal with the sum of harmonic sine wavesPlot graphs and play sounds of the sine wavesCompute the FFT of the composite signal
Plot and analyze the frequency spectrum
Slide30Lesson 3: Sound Effects
Concepts covered:Connections to real-world applications:
Digital music effects and speech sound effects
Slide31Lesson 3: Sound Effects
Student MATLAB Project
Slide32Unit Conclusion
Student presentation and report or posterSummarize and reflect on lessonsAsk research questionsDevelop new ideas for applications of speech processing
Slide33References
Ingle, Vinay K., and John G. Proakis
. Digital signal processing using MATLAB. 2nd ed. Toronto, Ont.: Nelson, 2007.Oppenheim, Alan V., and Ronald W. Schafer.
Discrete-time signal processing. 3rd ed. Upper Saddle River: Pearson, 2010.Weeks, Michael. Digital signal processing using MATLAB and wavelets. Hingham,Mass
.: Infinity Science Press, 2007
.
Timeline of Speech Recognition.
http
://www.emory.edu/BUSINESS/et/speech/timeline.htm
Slide34AEGIS website:
http://research2.fit.edu/aegis-ret/Lesson plans available for download ?????
Contacts:Becky Dowell, dowell.jeanie@brevardschools.org
Dr. Veton Këpuska, vkepuska@fit.eduJacob
Zurasky
,
jzuraksy@my.fit.edu
AEGIS Project
Slide35Thank you!
Questions?