/
TOWARDS PREDICTING PHYSIOLOGY FROM SPEECH DURING STRESSFUL TOWARDS PREDICTING PHYSIOLOGY FROM SPEECH DURING STRESSFUL

TOWARDS PREDICTING PHYSIOLOGY FROM SPEECH DURING STRESSFUL - PowerPoint Presentation

envrrolex
envrrolex . @envrrolex
Follow
345 views
Uploaded On 2020-06-13

TOWARDS PREDICTING PHYSIOLOGY FROM SPEECH DURING STRESSFUL - PPT Presentation

CONVERSATIONS HEART RATE AND RESPIRATORY SINUS ARRHYTHMIA Arindam Jati 1 Paula G Williams 2 Brian Baucom 2 Panayiotis Georgiou 1 1 University of Southern California Department of Electrical Engineering CA USA ID: 776768

Share:

Link:

Embed:

Download Presentation from below link

Download The PPT/PDF document "TOWARDS PREDICTING PHYSIOLOGY FROM SPEEC..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

TOWARDS PREDICTING PHYSIOLOGY FROM SPEECH DURING STRESSFULCONVERSATIONS: HEART RATE AND RESPIRATORY SINUS ARRHYTHMIA

Arindam

Jati

1

, Paula G. Williams

2

, Brian Baucom

2

, Panayiotis Georgiou

1

1

University of Southern California, Department of Electrical Engineering, CA, USA

2

The University of Utah, Department of Psychology, UT, USA

Slide2

Relationship of stress, physiology and speechPrediction of physiology from speech

Datasets

MethodologyExperiments and resultsSummary

Outline of the talk

2

Slide3

Relationship of stress, physiology and speechPrediction of physiology from speech

Datasets

MethodologyExperiments and results

Summary

Outline of the talk

3

Slide4

Relationship of stress, physiology and speech

4

Effects of mental stress on physiology:Excessive stress can lead to physiological, psychological, and psychosomatic health conditions such as anxiety and depression

Significant change in Heart Rate (HR) and HR Variability (HRV) due to mental stress

Can be utilized for measuring stress levels from these signals (

Taelman

et.

al.

)

Occurrence of hyperventilation or over-breathing due to stress Respiratory Sinus Arrhythmia (RSA) as a technique to asses stress RSA: periodical alteration of heart rate in association with the phase of respiration (Paul Grossman)

Joachim

Taelman

et. al., “Influence of mental stress on heart rate and heart rate variability,” 4th European conference of the international federation for medical and biological engineering. Springer,

2009.

Paul

Grossman, “Respiration, stress, and cardiovascular function,” Psychophysiology, vol.

20, no. 3, pp. 284–300, 1983.

Slide5

Relationship of stress, physiology and speech

5

Stress detection from physiology:Use of Electrodermal

Activity (EDA), HR and HRV to detect stress

Problem

:

intrusive and in some cases invasive methods to acquire the physiological signals

Stress detection from speech:

A hefty amount of work using the SUSAS dataset (Zhou

et. al

.)Nonlinear Teager Energy Operator (TEO) feature seemed to be usefulBenefit: Non-intrusiveMultimodal detection of stress (using both speech and galvanic skin response) This work:

Predicts physiological signals indicative of stress directly from speech

Guojun

Zhou, John HL Hansen, and James F Kaiser, “Nonlinear feature based classification of speech under stress,”

IEEE Transactions

on speech and audio processing, vol. 9, no. 3, pp.201–216, 2001.

Slide6

Relationship of stress, physiology and speechPrediction of physiology from speech

Datasets

MethodologyExperiments and resultsSummary

Outline of the talk

6

Slide7

Prediction of physiology from speech

7

Effect of a psychological variable (mental stress) on two modalities:

Physiology

Speech

Mental Stress

Well studied

Well studied

Not well explored

Goal:

Explore relationship between physiology and speech (through studying correlations)

Predict physiology from speech during stressful conversations

Insights to learn from this study:

How a psychological variable (the reason of stress) can lead to both physiological and vocal activations, and how these two are related

The insights can

help developing

future applications

like –

Development of a multi-modal stress detection systems

Higher resolution quantitative metrics for the intensity of stress

Slide8

Prediction of physiology from speech

8

Some previous works to predict physiology from speech:HR from

pronunciation of vowels (Skopin

et. al.

)

Schuller

et. al.

vowel pronunciation and reading a sentence loud with and without physical load Recent study (Tsiartas et. al.) to classify change in the direction of HR from conversation with an artificial dialog system. Their newer study (Smith et. al.): regression analysis to predict HR from audio.

D

Skopin

and S

Baglikov

, “Heartbeat feature extraction from vowel speech signal using 2D spectrum representation,” in Proc. of the 4th International Conference on Information Technology (ICIT), Amman, Jordan,

2009.

Bjorn

Schuller, Felix Friedmann, and Florian

Eyben

, “

Automatic

recognition of physiological parameters in the human voice: Heart rate and skin conductance,” in Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on. IEEE, 2013, pp.

7219–7223.

Andreas

Tsiartas

, Andreas

Kathol

, Elizabeth

Shriberg, Massimiliano de Zambotti, and Adrian Willoughby, “Prediction of heart rate changes from speech features during interaction with a misbehaving dialog system,” in Sixteenth Annual Conference of the International Speech Communication Association, 2015.

Jennifer Smith, Andreas Tsiartas, Elizabeth Shriberg, Andreas Kathol

, Adrian Willoughby, and Massimiliano de

Zambotti

, “Analysis and prediction of heart rate using speech features from natural speech,” in Acoustics, Speech and Signal Processing (ICASSP), 2017 IEEE International Conference on. IEEE, 2017, pp. 989–993.

Slide9

What’s new?

9

Study: Investigates relationship between acoustics and physiology (HR and RSA)

Analyzes correlation and regression performancesAnalyzes stressful

conversations between humans in two separate datasets

Main novelties and importance of this work:

To our

knowledge, the first

attempt to explore the relationship between the two modalities for

real conversations between humans

To our knowledge, the first study to predict RSA from speechTwo very distinct datasets; addresses issues related to robustness of the acoustic features across different domainsProvides regression analysis on the actual value of the physiology; possibility of building an audio-based automated real-time stress or physiology monitoring system

Slide10

Relationship of stress, physiology and speechPrediction of physiology from speech

Datasets

MethodologyExperiments and results

Summary

Outline of the talk

10

Slide11

Datasets

11

† Number of unique males/females = 60

For both datasets: multiple baseline physiology (resting HR and RSA) for every participant

Paula

G Williams

et. al.,

“The effects of poor sleep on cognitive, affective, and physiological responses to a

laboratory stressor

,” Annals of Behavioral Medicine, vol. 46, no. 1, pp. 40–51, 2013

PropertyStress Induction (SI) datasetCouples’ Interaction (CI) dataset

Stressor (reasons

of stress

)

“Re-experienced”

Young adults re-experienced their top stressors in an interview (Williams

et. al.

)

New stressors

Married couples

d

iscuss two current, serious relationship problems, one chosen by each partner

Session duration

8-12

minutes

10 minutes (per problem)

Physiology

Mean HR and RSA

of the participant

over the entire session

Mean HR and RSA

of both husband and wife

over the entire session

# Sessions

54

226

# Males

29

# Females

25

Total duration

6.20 hours

22.30 hours

Property

Stress Induction (SI) dataset

Couples’ Interaction (CI) dataset

Stressor

(

reasons

of stress

)

“Re-experienced”

Young adults re-experienced their top

stressors

in an

interview (Williams

et. al.

)

New stressors

Married couples

d

iscuss two current, serious relationship problems, one chosen by each

partner

Session duration

10

minutes (per problem)

Physiology

Mean HR and RSA

of the participant

over the entire session

Mean HR and RSA

of both husband and wife

over the entire session

# Sessions

54

226

# Males

29

# Females

25

Total duration

6.20 hours

22.30 hours

Slide12

Relationship of stress, physiology and speechPrediction of physiology from speech

Datasets

MethodologyExperiments and results

Summary

Outline of the talk

12

Slide13

Methodology

13

speech

Denoising

VAD

Diarization

Gender detection

female

’s

speech

male

’s

speech

Session level acoustic features

Average

Physiology

female

 

Average

Physiology

male

 

couples’ interaction from CI dataset

 

S

ession level acoustic features

 

 

For

female,

Task 1:

Analyze Pearson’s correlations between different

acoustic features (i.e.

elements of the

feature vector

)

and

.

Task

2:

Predict

from acoustic

feature vector

using nonlinear regression model

.

Similarly for

male,

do the same analysis between

and

.

 

Slide14

Methodology (contd.)

14

Session-level acoustic features:

88 dimensional eGeMAPS features (

Eyben

et. al.

) over the whole session using

OpenSMILE

toolkit

SI dataset: From participant’s speech over the entire session CI dataset: Separately from husband and wife, separate analysisSome examples (statistical functionals of some of them are also there) from eGeMAPS feature set:

frequency related parameters

:

pitch, jitter, and formant frequencies

energy/amplitude related parameters:

shimmer, loudness, and harmonics to noise ratio

spectral balance parameters:

alpha ratio,

Hammarberg

index, and harmonic differences

temporal features:

rate of loudness, and mean length of voiced regions

cepstral features:

MFCC and spectral

flux

Florian Eyben, Klaus R Scherer, Bjorn W Schuller, Johan Sundberg, Elisabeth Andre, Carlos Busso

, Laurence Y

Devillers

, Julien Epps, Petri

Laukka

,

Shrikanth

S Narayanan, et al., “The

geneva

minimalistic acoustic parameter set (

gemaps

) for voice research and affective computing,” IEEE Transactions on Affective Computing, vol. 7, no. 2, pp. 190–202, 2016

.

Slide15

Methodology (contd.)

15

Gender specific models: Experimentally found to be more usefulIllustrated in the

tSNE plot

Regression of model and parameters:

Estimating raw and normalized values of RSA and HR from acoustic features

Root Mean Squared Error (RMSE) loss

AdaBoost

regressor with decision tree regressor as base estimator

5-fold stratified Cross Validation (CV), with no session overlap between train, test and dev sets

3-fold cross validation on train+dev set for model selection based on minimum dev set RMSEThe whole process of 5-fold CV repeated 5 times to get a better estimate of test error

Figure

: t-SNE plot of the acoustic features on both datasets

clear clusters were formed within males and females from both datasets

possibly because of fundamental differences between some of the acoustics features (for example pitch) among men and women

Slide16

Relationship of stress, physiology and speechPrediction of physiology from speech

Datasets

MethodologyExperiments and results

Summary

Outline of the talk

16

Slide17

Experiments and results

17

Analysis of two datasets separately: Pearson’s correlations (all statistically significant, i.e. p < 0.05) between the physiological variables (HR or RSA) and the best correlated feature

Observation:

Different features dominating across datasets for same physiology in same gender group

Possible reason: difference in the tasks

(re-experienced in SI dataset vs

.

new stressors in CI

dataset) the speakers are performing in the two datasets

Example: Different levels of emotional or vocal arousal in the two different tasks

Gender

SI dataset

CI dataset

RSA

HR

RSA

HR

Male

−0.40

(mean falling slope of loudness)

−0.53

(mean bandwidth of 3rd formant)

−0.40

(coefficient of variation of bandwidth of 1st formant)

0.36

(coefficient of variation of bandwidth of 1st formant)

Female

−0.42

(20th percentile of loudness)

0.55

(range of 20th to 80th percentile of loudness)

0.40

(voiced segments per second)

0.35

(mean harmonics to noise ratio)

Slide18

Experiments and results(contd.)

18

Combining two datasets: Pearson’s correlation (all statistically significant) between raw or

normalized physiological variables (HR or RSA) and the best correlated feature

Gender

Raw

Normalized

RSA

HR

RSA

HR

Male

0.22

(mean bandwidth of 3rd formant)

−0.24

(coefficient of variation of shimmer)

0.33

(coefficient of variation of bandwidth of 3rd formant)

0.43

(mean alpha ratio)

Female

−0.22

(mean length of voiced segments in seconds)

−0.20

(mean of shimmer)

0.19

(standard deviation of falling slope of pitch)

0.35

(50th percentile of loudness)

Observation:

Raw:

Drop in the absolute correlation values by a large margin (

from the values we obtained separately in two datasets

)

Possible reason:

different distributions of the physiological signals in the two datasets because of the inherent difference between the tasks the users are doing there

Normalized physiology (subtracting baseline physiology):

boost in the correlations for most cases

except for RSA for female case

Slide19

Experiments and results(contd.)

19

Analysis of two datasets separately (contd.): RMSE for regressing raw physiological variables (HR or RSA)

Gender

SI dataset

CI dataset

RSA

HR

RSA

HR

Male

1.81

12.61

1.14

8.99

Female

2.01

11.28

1.22

11.94

Notes:

The range

of raw values (minimum, maximum) for RSA and HR over both the datasets are (2.04, 8.89) and (50.0, 112.08) respectively

Regression

results on SI dataset are a little bit worse than that of CI dataset

We suspect the reason to be the far fewer number of training samples in SI dataset

The observed correlations and RMSE values (for the case of HR) align well with previous study by Schuller

et. al.

although the speaking tasks are very

different

Bjorn Schuller, Felix Friedmann, and Florian

Eyben

, “

Automatic

recognition of physiological parameters in the human voice: Heart rate and skin conductance,” in Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on. IEEE, 2013, pp. 7219–7223.

Slide20

Experiments and results(contd.)

20

Gender

Raw

Normalized

RSA

HR

RSA

HR

Male1.13

9.90

1.07

8.82

Female

1.42

11.82

0.89

7.99

Gender

SI dataset

CI dataset

RSA

HR

RSA

HR

Male

1.81

12.61

1.14

8.99

Female

2.01

11.28

1.22

11.94

Analysis

of two datasets

separately:

RMSE for regressing

raw

physiological variables

More data helps (CI bigger than SI)

The observed correlations and RMSE values (for the case of HR) align well with previous study by Schuller

et. al.

although the speaking tasks are very different

Slide21

Experiments and results(contd.)

21

Combining two datasets: RMSE for regressing raw

or normalized physiological variables (HR or RSA) in combined dataset

Gender

Raw

Normalized

RSA

HR

RSA

HR

Male

1.13

9.90

1.07

8.82

Female

1.42

11.82

0.89

7.99

Gender

SI dataset

CI dataset

RSA

HR

RSA

HR

Male

1.81

12.61

1.14

8.99

Female

2.01

11.28

1.22

11.94

RECAP: Analysis of two datasets separately

Overall better performance for predicting raw values (even though correlations degraded) than what we obtained only on SI dataset

More data helps (CI + SI bigger than SI or CI)

Slide22

Relationship of stress, physiology and speechPrediction of physiology from speech

Datasets

MethodologyExperiments and results

Summary

Outline of the talk

22

Slide23

A study to find relationship between acoustics and physiology during stressful conversations between humansKey findings:Gender specific models more useful than gender independent modelsDifferent acoustic features highly correlates with physiology on different datasets, possibly because of difference in stress types: re-experienced vs. new stressors

Degradation in correlation in combined dataset, but per-speaker normalization of physiology helped

While regressing the physiological variables separately on two datasets, we observed overall better performance for the dataset with more participantsSummary

Slide24

Conclusion:Significant correlations agree with initial hypothesis that stress has an effect on both modalities, speech and physiologyThe regression results support the need for more dataNormalization & gender specific models better

points to likely gains with individualized (or clustered)

modelsFuture plans:Investigate on finding better acoustic featuresDeep learning models to exploit temporal pattern in the speech signal that could help us predicting physiological responses more accuratelyConnection between physiology and acoustics can be studied without any human labelingConclusions

Slide25

THANK YOUSignal processing for Communication Understanding and Behavior Analysis laboratory (scuba),

University of Southern California (USC)

http://scuba.usc.edu/