/
Speech Processing AEGIS RET All-Hands Meeting Speech Processing AEGIS RET All-Hands Meeting

Speech Processing AEGIS RET All-Hands Meeting - PowerPoint Presentation

reportssuper
reportssuper . @reportssuper
Follow
346 views
Uploaded On 2020-08-29

Speech Processing AEGIS RET All-Hands Meeting - PPT Presentation

University of Central Florida July 20 2012 Applications of Images and Signals in High Schools Contributors Dr Veton Këpuska Faculty Mentor FIT vkepuskafitedu Jacob Zurasky ID: 810758

signal speech processing frequency speech signal frequency processing sound sine scale high applications human window domain front pre lesson

Share:

Link:

Embed:

Download Presentation from below link

Download The PPT/PDF document "Speech Processing AEGIS RET All-Hands Me..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Speech Processing

AEGIS RET All-Hands MeetingUniversity of Central FloridaJuly 20, 2012

Applications of Images and Signals in High Schools

Slide2

Contributors

Dr. Veton

Këpuska, Faculty Mentor, FITvkepuska@fit.edu

Jacob Zurasky, Graduate Student Mentor, FITjzuraksy@my.fit.eduBecky Dowell, RET Teacher, BPS Titusville High

dowell.jeanie@brevardschools.org

Slide3

Speech Processing Project

Speech recognition requires speech to first be characterized by a set of “features”Features are used to determine what words are spoken.

Our project implements the feature extraction stage of a speech processing application.

Slide4

Timeline

1874: Alexander Graham Bell proves frequency harmonics from electrical signal can be divided1952: Bell Labs develops first effective speech recognizer1971-1976 DARPA: speech should be understood, not just recognized

1980’s: Call center and text-to-speech products commercially available1990’s: PC processing power allows use of SR software by ordinary user

Timeline of Speech Recognition. http://www.emory.edu/BUSINESS/et/speech/timeline.htm

Slide5

Applications

Call center speech recognitionSpeech-to-text applications (e.g. dictation software)

Hands-free user-interface (e.g., OnStar, XBOX Kinect, Siri)Science Fiction 1968: Stanley Kubrick’s

2001: A Space Odyssey http://www.youtube.com/watch?v=6MMmYyIZlC4

Science Fact 2011: Apple iPhone 4S

Siri

http

://

www.apple.com/iphone/features/siri.html

Medical Applications

Parkinson’s Voice Initiative

Detection of Sleep Disorders

Slide6

Difficulties

Continuous Speech (word boundaries)NoiseBackgroundOther speakers

Differences in speakersDialects/AccentsMale/female

Slide7

Speech Recognition

Front End:

Pre-processing

Back End: Recognition

Speech

Recognized speech

Large amount of data.

Ex: 256 samples

Features

Reduced data size. Ex: 13 features

Front End – reduce amount of data for back end, but keep enough data to accurately describe the signal. Output is feature vector.

256 samples ------> 13 features

Back End - statistical models used to classify feature vectors as a certain sound in speech

Slide8

Front-End Processing

of Speech Recognizer

Pre-emphasis

High pass filter to compensate for higher frequency roll off in human speech

Slide9

Front-End Processing

of Speech Recognizer

Pre-emphasis

Window

High pass filter to compensate for higher frequency roll off in human speech

Separate speech signal into frames

Apply window to smooth edges of framed speech signal

Slide10

Front-End Processing

of Speech Recognizer

Pre-emphasis

Window

FFT

High pass filter to compensate for higher frequency roll off in human speech

Separate speech signal into frames

Apply window to smooth edges of framed speech signal

Transform signal from time domain to frequency domain

Human ear perceives sound based on frequency content

Slide11

Front-End Processing

of Speech Recognizer

Pre-emphasis

Window

FFT

Mel-Scale

High pass filter to compensate for higher frequency roll off in human speech

Separate speech signal into frames

Apply window to smooth edges of framed speech signal

Transform signal from time domain to frequency domain

Human ear perceives sound based on frequency content

Convert linear scale frequency (Hz) to logarithmic scale (

mel

-scale)

Slide12

Front-End Processing

of Speech Recognizer

Pre-emphasis

Window

FFT

Mel-Scale

log

High pass filter to compensate for higher frequency roll off in human speech

Separate speech signal into frames

Apply window to smooth edges of framed speech signal

Transform signal from time domain to frequency domain

Human ear perceives sound based on frequency content

Convert linear scale frequency (Hz) to logarithmic scale (

mel

-scale)

Take the log of the magnitudes (multiplication becomes addition) to allow separation of signals

Slide13

Front-End Processing

of Speech Recognizer

Pre-emphasis

Window

FFT

Mel-Scale

log

IFFT

High pass filter to compensate for higher frequency roll off in human speech

Separate speech signal into frames

Apply window to smooth edges of framed speech signal

Transform signal from time domain to frequency domain

Human ear perceives sound based on frequency content

Convert linear scale frequency (Hz) to logarithmic scale (

mel

-scale)

Take the log of the magnitudes (multiplication becomes addition) to allow separation of signals

Inverse of FFT to transform to

Cepstral

Domain… the result is the set of “features”

Slide14

Speech Analysis and Sound Effects (SASE) Project

Graphical User Interface (GUI)Speech inputRecord and save audio

Read sound file (*.wav, *.ulaw, *.au)Graphs the entire audio signalProcess user selected speech frame and display output for each stage of processing

Displays spectrogramApply audio effects

Slide15

MATLAB Code

Graphical User Interface (GUI)GUIDE (GUI Development Environment)Callback

functionsFront-end speech processing

Modular functions for reusabilityGraphs display output for each stageSound EffectsEcho, Reverb, Flange, Chorus, Vibrato, Tremolo, Voice Changer

Slide16

Slide17

GUI Components

Slide18

GUI Components

Plotting Axes

Slide19

GUI Components

Plotting Axes

Buttons

Slide20

SASE Lab Demo

Record, play, save audio to file, open existing audio filesSelect and process speech frame, display graphs of stages of front-end processingDisplay spectrogram for entire speech signal or user selectable 3 second sample

Play speech – all or selected 3 sec sampleShow differences in certain sounds in spectrogram and the features ex: “a e i o u” so audience understands how these graphs tell us about the sounds

Apply sound effects, show user configurable parametersGraphs spectrogram and speech processing on sound effectsShow echo effect in spectrogramUse as teaching tool

Slide21

Slide22

Future Work on SASE Lab

Audio EffectsEx: Pitch removalNoise Filtering

Slide23

Applications of Signal Processing in High Schools

Convey the relevance and

importance of math to high school studentsBring knowledge of engineering, technological innovation, and academic research into high school classroomsOpportunity for students to acquire technical knowledge and analytical skills through hands-on exploration of real-world applications

in the field of Signal Processing Encourage students to pursue higher education and careers in STEM fields

Slide24

Unit Plan: Speech Processing

Collection of lesson plans introduce high school students to fundamentals of speech and sound processingConnections to Pre-Calculus mathematics standards (NGSSS and Common Core)

Mathematical ModelingTrigonometric FunctionsComplex Numbers in Rectangular and Polar Form

Function OperationsLogarithmic FunctionsSequences and SeriesMatricesHand-on lessons involving MATLAB projectsTeacher notes

Slide25

Unit Introduction

Students research, explore, and discuss current applications of speech and audio processing

Slide26

Lesson 1: The Sound of a Sine Wave

Modeling sound as a sinusoidal functionConcepts covered:Continuous vs. Discrete Functions

Frequency of Sine WaveComposite signals Connections to real-world applications:Synthesis of digital speech and music

Slide27

Lesson 1: The Sound of a Sine Wave

Student MATLAB ProjectCreate discrete sine waves with given frequenciesCreate composite signal of the sine waves

Plot graphs and play sounds of the sine wavesAnalyze the effect of frequency on the graphs and the sounds of the sine functionsProject ExtensionsPlay songs using sine waves

Synthesize vowel sounds with sine waves

Slide28

Lesson 2: Frequency Analysis

Use of Fourier Transformation to transform functions from time domain to frequency domainConcepts covered:Modeling harmonic signals as a series of sinusoids

Sine wave decompositionFourier TransformEuler’s FormulaFrequency spectrumConnections to real-world applications:

Speech processing and recognition

Slide29

Lesson 2: Frequency Analysis

Student MATLAB ProjectCreate a composite signal with the sum of harmonic sine wavesPlot graphs and play sounds of the sine wavesCompute the FFT of the composite signal

Plot and analyze the frequency spectrum

Slide30

Lesson 3: Sound Effects

Concepts covered:Connections to real-world applications:

Digital music effects and speech sound effects

Slide31

Lesson 3: Sound Effects

Student MATLAB Project

Slide32

Unit Conclusion

Student presentation and report or posterSummarize and reflect on lessonsAsk research questionsDevelop new ideas for applications of speech processing

Slide33

References

Ingle, Vinay K., and John G. Proakis

. Digital signal processing using MATLAB. 2nd ed. Toronto, Ont.: Nelson, 2007.Oppenheim, Alan V., and Ronald W. Schafer.

Discrete-time signal processing. 3rd ed. Upper Saddle River: Pearson, 2010.Weeks, Michael. Digital signal processing using MATLAB and wavelets. Hingham,Mass

.: Infinity Science Press, 2007

.

Timeline of Speech Recognition.

http

://www.emory.edu/BUSINESS/et/speech/timeline.htm

Slide34

AEGIS website:

http://research2.fit.edu/aegis-ret/Lesson plans available for download ?????

Contacts:Becky Dowell, dowell.jeanie@brevardschools.org

Dr. Veton Këpuska, vkepuska@fit.eduJacob

Zurasky

,

jzuraksy@my.fit.edu

AEGIS Project

Slide35

Thank you!

Questions?