/
Predicting Voice Elicited Emotions Predicting Voice Elicited Emotions

Predicting Voice Elicited Emotions - PowerPoint Presentation

lindy-dunigan
lindy-dunigan . @lindy-dunigan
Follow
386 views
Uploaded On 2016-08-04

Predicting Voice Elicited Emotions - PPT Presentation

Nishant Pandey Synopsis Problem statement and motivation Previous work and background System Intuition and Overview Preprocessing of audio signals Building f eature space Finding patterns in unlabelled data and labelling of samples ID: 432949

frequency system voice time system frequency time voice domain data results prediction signal emotions features clustering forest random learning

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Predicting Voice Elicited Emotions" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Predicting Voice Elicited Emotions

Nishant PandeySlide2

Synopsis

Problem statement and motivation

Previous work and background

System

Intuition and Overview

Pre-processing of audio

signals

Building

f

eature space

Finding patterns in unlabelled data and labelling of samples

Regression Results

Deployed System

Market ResearchSlide3

Motivation

Automate the screening process in

service based

industries

Hourly

job workers (two-thirds of U.S. Labour force or ~50 million job seekers every year)

Problem Statement

To be able to analyse voice and predict listener emotions elicited by the

paralinguistic

elements of the voice.Slide4

Previous work

Current work focuses on predicting the

elicited emotions of voice clips

.

2

set of goals, which includes recognizing-

the type of personality

traits intrinsically possessed by the speaker

, for e.g. speaker trait and speaker state

the types of

emotions carried within the speech

clip, for e.g. acoustic affect (cheerful, trustworthy, deceitful etc.)Slide5

Background – Emotion Taxonomy

The framework articulated by “FEELTRACE”

I

ncludes

all the

emotion

responses

we want

to

p

redict.

Emotions by finite quantifiable dimensions.Slide6

Features - Paralinguistic features of Voices

Concept

Definition

Data Representation

Amplitude

measurement of the variations over time of the acoustic signal

quantified values of a sound wave’s Oscillation

Energy

acoustic signal energy representation in decibels

20*log10(abs(FFT))

Formants

the resonance frequencies of the vocal tract

maxima detected using Linear Prediction on audio windows with high tonal content

Perceived pitch

Perceived Fundamental frequency and harmonics

Formants

Fundamental frequency

the reciprocal of time duration of one glottal cycle - a strict definition of “pitch”

first formantSlide7

System – Intuition

Spectrogram of two job applicants responding to “Greet me as if I am a customer”Slide8

System – OverviewSlide9

System – Pre-Processing of Audio Signals

Pre-processing tasks involve

:

Removing voice clips with <2 seconds length and containing

noise

audio signal to data in time and frequency domain

Short-term Fast Fourier Transform per frameEnergy measures in frequency domain per frameLinear prediction coefficient in frequency domain per frameSlide10

System - Feature Space Construction

We experimented with feature construction based on

the following

dimensions and combinations

:

Signal measurements such as energy and

amplitude.Statistics such as min, max, mean, and standard deviation on signal measurementsMeasurement

window in time domain

: different time

size and

entire time window

Measurement window in frequency domain

: all

frequencies, optimal

audible frequencies, and selected frequency

rangesSlide11

System – Labels and Right set of Features?

Conventional

approach – getting voice samples rated by

experts

Unsupervised Learning

– Analyse features and their effectiveness

Process:Unsupervised learning is used to find patterns in unlabelled data.Now, training data sets are constructed based on clustering results and manual labelling.Slide12

System – How do we get the labels? Contd.

Parameters

Cost Function:

Connectivity

Dunn Index

Silhouette

Clustering Results

Technique:

Hierarchical Clustering

Number of clusters:

5

Manual validation of clusters was also doneSlide13

System – Visualization of clustersSlide14

System – Modelling

Supervised Learning algorithms

Logistic Regression

Support Vector Machine

Random Forest

Semi-Supervised Learning algorithm

KODAMAOutput:Binary outcome (positive or negative)Numerical scoresSlide15

Case Study –

Modelling

Prediction – Positive vs Negative Response

A

positive response

could be one or multiple perceptions of a “pleasant voice”, “makes me feel good”, “cares about me”, “makes me feel comfortable”, or “makes me feel engaged

”.System.V1 -> Using SVM and V2 -> Random ForestInterview Prompts: “Greet me as If I am a customer”Slide16

System - Prediction Results

Accuracy

:

0.86

95

% CI : (0.76, 0.92)P-Value [

Acc > NIR] : 5.76e-07Sensitivity : 0.81Specificity : 0.88Pos

Pred

Value :

0.81

Neg

Pred

Value :

0.88Slide17

System - Prediction Results

(KODAMA

)

Kodama performs

feature extraction from noisy

and high-dimensional data.Output of Kodama includes dissimilarity

matrix from which we can perform clustering and classification.Slide18

Deployed SystemSlide19

Market Research

Demographics Matters

Young listeners (18-29 years old) and Income

less than $29000/year

have more strict criteria of how they sense engaging.

No Correlation

b/w emotion elicited vs

age/ ethnicity/ education

level.

Bias towards female voice.Slide20

ThanksSlide21

Time and Frequency Domain

Time Domain:

https://en.wikipedia.org/wiki/Time_domain#/media/File:Fourier_transform_time_and_frequency_domains_(small).

gif

Frequency Domain:

https

://en.wikipedia.org/wiki/Frequency_domain#/media/File:Fourier_transform_time_and_frequency_domains_(small).gifSlide22

Learnings – Difference in Voice Characteristics

Result Improves by 10% - when a decision tree is layered by features related to voice characteristic on top of the Random Forest.Slide23

Prediction Results – SVM vs Random Forest