CONVERSATIONS HEART RATE AND RESPIRATORY SINUS ARRHYTHMIA Arindam Jati 1 Paula G Williams 2 Brian Baucom 2 Panayiotis Georgiou 1 1 University of Southern California Department of Electrical Engineering CA USA ID: 776768
Download The PPT/PDF document "TOWARDS PREDICTING PHYSIOLOGY FROM SPEEC..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
TOWARDS PREDICTING PHYSIOLOGY FROM SPEECH DURING STRESSFULCONVERSATIONS: HEART RATE AND RESPIRATORY SINUS ARRHYTHMIA
Arindam
Jati
1
, Paula G. Williams
2
, Brian Baucom
2
, Panayiotis Georgiou
1
1
University of Southern California, Department of Electrical Engineering, CA, USA
2
The University of Utah, Department of Psychology, UT, USA
Slide2Relationship of stress, physiology and speechPrediction of physiology from speech
Datasets
MethodologyExperiments and resultsSummary
Outline of the talk
2
Slide3Relationship of stress, physiology and speechPrediction of physiology from speech
Datasets
MethodologyExperiments and results
Summary
Outline of the talk
3
Slide4Relationship of stress, physiology and speech
4
Effects of mental stress on physiology:Excessive stress can lead to physiological, psychological, and psychosomatic health conditions such as anxiety and depression
Significant change in Heart Rate (HR) and HR Variability (HRV) due to mental stress
Can be utilized for measuring stress levels from these signals (
Taelman
et.
al.
)
Occurrence of hyperventilation or over-breathing due to stress Respiratory Sinus Arrhythmia (RSA) as a technique to asses stress RSA: periodical alteration of heart rate in association with the phase of respiration (Paul Grossman)
Joachim
Taelman
et. al., “Influence of mental stress on heart rate and heart rate variability,” 4th European conference of the international federation for medical and biological engineering. Springer,
2009.
Paul
Grossman, “Respiration, stress, and cardiovascular function,” Psychophysiology, vol.
20, no. 3, pp. 284–300, 1983.
Slide5Relationship of stress, physiology and speech
5
Stress detection from physiology:Use of Electrodermal
Activity (EDA), HR and HRV to detect stress
Problem
:
intrusive and in some cases invasive methods to acquire the physiological signals
Stress detection from speech:
A hefty amount of work using the SUSAS dataset (Zhou
et. al
.)Nonlinear Teager Energy Operator (TEO) feature seemed to be usefulBenefit: Non-intrusiveMultimodal detection of stress (using both speech and galvanic skin response) This work:
Predicts physiological signals indicative of stress directly from speech
Guojun
Zhou, John HL Hansen, and James F Kaiser, “Nonlinear feature based classification of speech under stress,”
IEEE Transactions
on speech and audio processing, vol. 9, no. 3, pp.201–216, 2001.
Slide6Relationship of stress, physiology and speechPrediction of physiology from speech
Datasets
MethodologyExperiments and resultsSummary
Outline of the talk
6
Slide7Prediction of physiology from speech
7
Effect of a psychological variable (mental stress) on two modalities:
Physiology
Speech
Mental Stress
Well studied
Well studied
Not well explored
Goal:
Explore relationship between physiology and speech (through studying correlations)
Predict physiology from speech during stressful conversations
Insights to learn from this study:
How a psychological variable (the reason of stress) can lead to both physiological and vocal activations, and how these two are related
The insights can
help developing
future applications
like –
Development of a multi-modal stress detection systems
Higher resolution quantitative metrics for the intensity of stress
Slide8Prediction of physiology from speech
8
Some previous works to predict physiology from speech:HR from
pronunciation of vowels (Skopin
et. al.
)
Schuller
et. al.
vowel pronunciation and reading a sentence loud with and without physical load Recent study (Tsiartas et. al.) to classify change in the direction of HR from conversation with an artificial dialog system. Their newer study (Smith et. al.): regression analysis to predict HR from audio.
D
Skopin
and S
Baglikov
, “Heartbeat feature extraction from vowel speech signal using 2D spectrum representation,” in Proc. of the 4th International Conference on Information Technology (ICIT), Amman, Jordan,
2009.
Bjorn
Schuller, Felix Friedmann, and Florian
Eyben
, “
Automatic
recognition of physiological parameters in the human voice: Heart rate and skin conductance,” in Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on. IEEE, 2013, pp.
7219–7223.
Andreas
Tsiartas
, Andreas
Kathol
, Elizabeth
Shriberg, Massimiliano de Zambotti, and Adrian Willoughby, “Prediction of heart rate changes from speech features during interaction with a misbehaving dialog system,” in Sixteenth Annual Conference of the International Speech Communication Association, 2015.
Jennifer Smith, Andreas Tsiartas, Elizabeth Shriberg, Andreas Kathol
, Adrian Willoughby, and Massimiliano de
Zambotti
, “Analysis and prediction of heart rate using speech features from natural speech,” in Acoustics, Speech and Signal Processing (ICASSP), 2017 IEEE International Conference on. IEEE, 2017, pp. 989–993.
What’s new?
9
Study: Investigates relationship between acoustics and physiology (HR and RSA)
Analyzes correlation and regression performancesAnalyzes stressful
conversations between humans in two separate datasets
Main novelties and importance of this work:
To our
knowledge, the first
attempt to explore the relationship between the two modalities for
real conversations between humans
To our knowledge, the first study to predict RSA from speechTwo very distinct datasets; addresses issues related to robustness of the acoustic features across different domainsProvides regression analysis on the actual value of the physiology; possibility of building an audio-based automated real-time stress or physiology monitoring system
Slide10Relationship of stress, physiology and speechPrediction of physiology from speech
Datasets
MethodologyExperiments and results
Summary
Outline of the talk
10
Slide11Datasets
11
† Number of unique males/females = 60
For both datasets: multiple baseline physiology (resting HR and RSA) for every participant
Paula
G Williams
et. al.,
“The effects of poor sleep on cognitive, affective, and physiological responses to a
laboratory stressor
,” Annals of Behavioral Medicine, vol. 46, no. 1, pp. 40–51, 2013
PropertyStress Induction (SI) datasetCouples’ Interaction (CI) dataset
Stressor (reasons
of stress
)
“Re-experienced”
Young adults re-experienced their top stressors in an interview (Williams
et. al.
)
New stressors
Married couples
d
iscuss two current, serious relationship problems, one chosen by each partner
Session duration
8-12
minutes
10 minutes (per problem)
Physiology
Mean HR and RSA
of the participant
over the entire session
Mean HR and RSA
of both husband and wife
over the entire session
# Sessions
54
226
# Males
29
# Females
25
Total duration
6.20 hours
22.30 hours
Property
Stress Induction (SI) dataset
Couples’ Interaction (CI) dataset
Stressor
(
reasons
of stress
)
“Re-experienced”
Young adults re-experienced their top
stressors
in an
interview (Williams
et. al.
)
New stressors
Married couples
d
iscuss two current, serious relationship problems, one chosen by each
partner
Session duration
10
minutes (per problem)
Physiology
Mean HR and RSA
of the participant
over the entire session
Mean HR and RSA
of both husband and wife
over the entire session
# Sessions
54
226
# Males
29
# Females
25
Total duration
6.20 hours
22.30 hours
Slide12Relationship of stress, physiology and speechPrediction of physiology from speech
Datasets
MethodologyExperiments and results
Summary
Outline of the talk
12
Slide13Methodology
13
speech
Denoising
VAD
Diarization
Gender detection
female
’s
speech
male
’s
speech
Session level acoustic features
Average
Physiology
female
Average
Physiology
male
couples’ interaction from CI dataset
S
ession level acoustic features
For
female,
Task 1:
Analyze Pearson’s correlations between different
acoustic features (i.e.
elements of the
feature vector
)
and
.
Task
2:
Predict
from acoustic
feature vector
using nonlinear regression model
.
Similarly for
male,
do the same analysis between
and
.
Methodology (contd.)
14
Session-level acoustic features:
88 dimensional eGeMAPS features (
Eyben
et. al.
) over the whole session using
OpenSMILE
toolkit
SI dataset: From participant’s speech over the entire session CI dataset: Separately from husband and wife, separate analysisSome examples (statistical functionals of some of them are also there) from eGeMAPS feature set:
frequency related parameters
:
pitch, jitter, and formant frequencies
energy/amplitude related parameters:
shimmer, loudness, and harmonics to noise ratio
spectral balance parameters:
alpha ratio,
Hammarberg
index, and harmonic differences
temporal features:
rate of loudness, and mean length of voiced regions
cepstral features:
MFCC and spectral
flux
Florian Eyben, Klaus R Scherer, Bjorn W Schuller, Johan Sundberg, Elisabeth Andre, Carlos Busso
, Laurence Y
Devillers
, Julien Epps, Petri
Laukka
,
Shrikanth
S Narayanan, et al., “The
geneva
minimalistic acoustic parameter set (
gemaps
) for voice research and affective computing,” IEEE Transactions on Affective Computing, vol. 7, no. 2, pp. 190–202, 2016
.
Slide15Methodology (contd.)
15
Gender specific models: Experimentally found to be more usefulIllustrated in the
tSNE plot
Regression of model and parameters:
Estimating raw and normalized values of RSA and HR from acoustic features
Root Mean Squared Error (RMSE) loss
AdaBoost
regressor with decision tree regressor as base estimator
5-fold stratified Cross Validation (CV), with no session overlap between train, test and dev sets
3-fold cross validation on train+dev set for model selection based on minimum dev set RMSEThe whole process of 5-fold CV repeated 5 times to get a better estimate of test error
Figure
: t-SNE plot of the acoustic features on both datasets
clear clusters were formed within males and females from both datasets
possibly because of fundamental differences between some of the acoustics features (for example pitch) among men and women
Slide16Relationship of stress, physiology and speechPrediction of physiology from speech
Datasets
MethodologyExperiments and results
Summary
Outline of the talk
16
Slide17Experiments and results
17
Analysis of two datasets separately: Pearson’s correlations (all statistically significant, i.e. p < 0.05) between the physiological variables (HR or RSA) and the best correlated feature
Observation:
Different features dominating across datasets for same physiology in same gender group
Possible reason: difference in the tasks
(re-experienced in SI dataset vs
.
new stressors in CI
dataset) the speakers are performing in the two datasets
Example: Different levels of emotional or vocal arousal in the two different tasks
Gender
SI dataset
CI dataset
RSA
HR
RSA
HR
Male
−0.40
(mean falling slope of loudness)
−0.53
(mean bandwidth of 3rd formant)
−0.40
(coefficient of variation of bandwidth of 1st formant)
0.36
(coefficient of variation of bandwidth of 1st formant)
Female
−0.42
(20th percentile of loudness)
0.55
(range of 20th to 80th percentile of loudness)
0.40
(voiced segments per second)
0.35
(mean harmonics to noise ratio)
Slide18Experiments and results(contd.)
18
Combining two datasets: Pearson’s correlation (all statistically significant) between raw or
normalized physiological variables (HR or RSA) and the best correlated feature
Gender
Raw
Normalized
RSA
HR
RSA
HR
Male
0.22
(mean bandwidth of 3rd formant)
−0.24
(coefficient of variation of shimmer)
0.33
(coefficient of variation of bandwidth of 3rd formant)
0.43
(mean alpha ratio)
Female
−0.22
(mean length of voiced segments in seconds)
−0.20
(mean of shimmer)
0.19
(standard deviation of falling slope of pitch)
0.35
(50th percentile of loudness)
Observation:
Raw:
Drop in the absolute correlation values by a large margin (
from the values we obtained separately in two datasets
)
Possible reason:
different distributions of the physiological signals in the two datasets because of the inherent difference between the tasks the users are doing there
Normalized physiology (subtracting baseline physiology):
boost in the correlations for most cases
except for RSA for female case
Slide19Experiments and results(contd.)
19
Analysis of two datasets separately (contd.): RMSE for regressing raw physiological variables (HR or RSA)
Gender
SI dataset
CI dataset
RSA
HR
RSA
HR
Male
1.81
12.61
1.14
8.99
Female
2.01
11.28
1.22
11.94
Notes:
The range
of raw values (minimum, maximum) for RSA and HR over both the datasets are (2.04, 8.89) and (50.0, 112.08) respectively
Regression
results on SI dataset are a little bit worse than that of CI dataset
We suspect the reason to be the far fewer number of training samples in SI dataset
The observed correlations and RMSE values (for the case of HR) align well with previous study by Schuller
et. al.
although the speaking tasks are very
different
Bjorn Schuller, Felix Friedmann, and Florian
Eyben
, “
Automatic
recognition of physiological parameters in the human voice: Heart rate and skin conductance,” in Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on. IEEE, 2013, pp. 7219–7223.
Slide20Experiments and results(contd.)
20
Gender
Raw
Normalized
RSA
HR
RSA
HR
Male1.13
9.90
1.07
8.82
Female
1.42
11.82
0.89
7.99
Gender
SI dataset
CI dataset
RSA
HR
RSA
HR
Male
1.81
12.61
1.14
8.99
Female
2.01
11.28
1.22
11.94
Analysis
of two datasets
separately:
RMSE for regressing
raw
physiological variables
More data helps (CI bigger than SI)
The observed correlations and RMSE values (for the case of HR) align well with previous study by Schuller
et. al.
although the speaking tasks are very different
Slide21Experiments and results(contd.)
21
Combining two datasets: RMSE for regressing raw
or normalized physiological variables (HR or RSA) in combined dataset
Gender
Raw
Normalized
RSA
HR
RSA
HR
Male
1.13
9.90
1.07
8.82
Female
1.42
11.82
0.89
7.99
Gender
SI dataset
CI dataset
RSA
HR
RSA
HR
Male
1.81
12.61
1.14
8.99
Female
2.01
11.28
1.22
11.94
RECAP: Analysis of two datasets separately
Overall better performance for predicting raw values (even though correlations degraded) than what we obtained only on SI dataset
More data helps (CI + SI bigger than SI or CI)
Slide22Relationship of stress, physiology and speechPrediction of physiology from speech
Datasets
MethodologyExperiments and results
Summary
Outline of the talk
22
Slide23A study to find relationship between acoustics and physiology during stressful conversations between humansKey findings:Gender specific models more useful than gender independent modelsDifferent acoustic features highly correlates with physiology on different datasets, possibly because of difference in stress types: re-experienced vs. new stressors
Degradation in correlation in combined dataset, but per-speaker normalization of physiology helped
While regressing the physiological variables separately on two datasets, we observed overall better performance for the dataset with more participantsSummary
Slide24Conclusion:Significant correlations agree with initial hypothesis that stress has an effect on both modalities, speech and physiologyThe regression results support the need for more dataNormalization & gender specific models better
points to likely gains with individualized (or clustered)
modelsFuture plans:Investigate on finding better acoustic featuresDeep learning models to exploit temporal pattern in the speech signal that could help us predicting physiological responses more accuratelyConnection between physiology and acoustics can be studied without any human labelingConclusions
Slide25THANK YOUSignal processing for Communication Understanding and Behavior Analysis laboratory (scuba),
University of Southern California (USC)
http://scuba.usc.edu/