application to the diagnosis of Alzheimers disease from EEG Justin Dauwels LIDS MIT LMI Harvard Medical School Amari Research Unit Brain Science Institute RIKEN June 9 2008 RIKEN Brain Science Institute ID: 801376
Download The PPT/PDF document "Machine learning techniques for quantify..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Machine learning techniques for quantifying neural synchrony: application to the diagnosis of Alzheimer's disease from EEG
Justin DauwelsLIDS, MITLMI, Harvard Medical SchoolAmari Research Unit, Brain Science Institute, RIKENJune 9, 2008
Slide2RIKEN Brain Science Institute
RIKEN Wako Campus (near Tokyo) about 400 researchers and staff (20% foreign)
300 research fellows and visiting scientists
about 60 laboratories
research covers most aspects of brain science
Collaborators
François
Vialatte
*, Theo Weber
+, Shun-ichi Amari*, Andrzej Cichocki* (*RIKEN, +MIT) ProjectEarly diagnosis of Alzheimer’s disease based on EEGFinancial Support
Slide3Research Overview
EEG (RIKEN,
MIT, MGH)
diagnosis of Alzheimer’s
disease
detection/prediction of epileptic seizures
analysis of EEG evoked by visual/auditory stimuli
EEG during meditation
projects related to brain-computer interface (BMI) Calcium imaging (RIKEN, NAIST, MIT)effect of calcium on neural growthrole of calcium propagation in gliacells
and neurons Diffusion MRI (Brigham&Women’s
Hospital, Harvard Medical School, MIT) estimation and clustering of tracts (future project)
Machine learning & signal processing for applications in
NEUROSCIENCE
= development of
ALGORITHMS
to analyze
brain signals
subject
of this
talk
Slide4OverviewAlzheimer’s Disease (AD)EEG of AD patients: decrease in synchronySynchrony measure in time-frequency domain
Pairs of EEG signalsCollections of EEG signalsNumerical ResultsOutlook
Slide5Alzheimer's disease
Outside glimpse: clinical perspective
Mild (early stage)
becomes less energetic or spontaneous
noticeable cognitive deficits
still independent (able to compensate)
Moderate (middle stage)
Mental abilities decline
personality changes
become dependent on caregivers
Severe (late stage)complete deterioration of the personality
loss of control over bodily functionstotal dependence on caregivers
Apathy
Memory
(forgetting
relatives)
Evolution of the disease (stages)
One disease,
many symptoms
Loss of
Self-control
Video sources: Alzheimer society
2 to 5 years before
mild cognitive impairment (often unnoticed)
6 to 25 % progress to Alzheimer's per year
memory, language, executive functions,
apraxia, apathy, agnosia, etc…
2% to 5% of people over 65 years old
up to 20% of people over 80 Jeong 2004 (Nature)
EEG data
Slide6A
lzheimer's diseaseInside glimpse: brain atrophy
Video source: P. Thompson, J.Neuroscience, 2003
Images: Jannis Productions.
(R. Fredenburg; S. Jannis)
amyloid plaques and
neurofibrillary tangles
Video source:
Alzheimer society
Slide7OverviewAlzheimer’s Disease (AD)EEG of AD patients: decrease in synchronySynchrony measure in time-frequency domain
Pairs of EEG signalsCollections of EEG signalsNumerical ResultsOutlook
Slide8Alzheimer's disease
Inside glimpse: abnormal EEGAD vs. MCI (Hogan et al. 203; Jiang et al., 2005)AD vs. Control (Hermann, Demilrap, 2005,
Yagyu et al. 1997; Stam et al., 2002; Babiloni
et al. 2006)MCI
vs.
mildAD
(
Babiloni
et al., 2006).Decrease of synchronyBrain “slow-down”
slow rhythms (0.5-8 Hz) fast rhythms (8-30 Hz)
(
Babiloni
et al., 2004;
Besthorn et al., 1997; Jelic et al. 1996,
Jeong
2004;
Dierks
et al., 1993).
Images: www.cerebromente.org.br
EEG system: inexpensive, mobile, useful for screening
focus of this project
Slide9Spontaneous (scalp) EEG
Fourier power
f (Hz)
t (sec)
amplitude
Fourier
|X(f)|
2
EEG
x(t)
Time-frequency
|X(t,f)|
2
(wavelet transform)
Time-frequency patterns
(“bumps”)
Slide10Fourier transform
High frequency
Low frequency
Frequency
1
2
3
2
1
3
Slide11Windowed Fourier transform
*
=
Fourier basis functions
Window
function
windowed basis
functions
Windowed
Fourier
Transform
t
f
Slide12Spontaneous EEG
Fourier power
f (Hz)
t (sec)
amplitude
Fourier
|X(f)|
2
EEG
x(t)
Time-frequency
|X(t,f)|
2
(wavelet transform)
Time-frequency patterns
(“bumps”)
Slide13Signatures of local synchrony
f (Hz)
t (sec)
Time-frequency patterns
(“bumps”)
EEG stems from
thousands
of neurons
bump
if
neurons
are
phase-locked
= local synchrony
Slide14Alzheimer's disease
Inside glimpse: abnormal EEGAD vs. MCI (Hogan et al. 203; Jiang et al., 2005)AD vs. Control (Hermann, Demilrap, 2005,
Yagyu et al. 1997; Stam et al., 2002; Babiloni
et al. 2006)MCI
vs.
mildAD
(
Babiloni
et al., 2006).Decrease of synchronyBrain “slow-down”
slow rhythms (0.5-8 Hz) fast rhythms (8-30 Hz)
(
Babiloni
et al., 2004;
Besthorn et al., 1997; Jelic et al. 1996,
Jeong
2004;
Dierks
et al., 1993).
Images: www.cerebromente.org.br
EEG system: inexpensive, mobile, useful for screening
focus of this project
Slide15OverviewAlzheimer’s Disease (AD)EEG of AD patients: decrease in synchronySynchrony measure in time-frequency domain
Pairs of EEG signalsCollections of EEG signalsNumerical ResultsOutlook
Slide16Comparing EEG signal rhythms ?
PROBLEM I:
Signals of 3
seconds sampled at 100 Hz (
300
samples
)
Time-frequency representation of one signal = about
25 000 coefficients
2 signals
Slide17Numerous
neighboring pixels
Comparing EEG signal rhythms ?(2)
One pixel
PROBLEM II:
Shifts in time-frequency!
Slide18Sparse representation: bump model
Assumptions:
time-frequency
map is suitable representation
oscillatory bursts
(“bumps”) convey key information
Bumps
Sparse representation
F. Vialatte et al. “
A machine learning approach to the analysis of time-frequency maps and its application to neural dynamics”, Neural Networks (2007).
Normalization:
10
4
- 10
5
coefficients
about 10
2
parameters
t (sec)
f(Hz)
f(Hz)
t (sec)
f(Hz)
t (sec)
Slide19Similarity of bump models...
How “similar” or
“synchronous
”
are two bump models
?
=
GLOBAL
synchrony
Reminder: bumps due to LOCAL synchrony= MULTI-SCALE approach
Slide20... by matching bumps
y
1
y
2
Some bumps
match
Offset
between matched bumps
SIMILAR
bump models if:
Many
matches
Strongly
overlapping
matches
Slide21... by matching bumps (2)
Bumps in one model, but NOT in other → fraction of “spurious” bumps ρ
spur
Bumps in
both
models, but with offset
→ Average time offset
δ
t
(delay) → Timing jitter with variance st → Average frequency offset δf
→ Frequency jitter with variance
sf
Synchrony:
only
st
and
ρ
spur
relevant
PROBLEM:
Given two bump models, compute
(
ρ
spur
,
δt,
st, δf
, sf )
Stochastic Event Synchrony (SES) =
(ρspur,
δ
t
,
s
t
,
δ
f
,
s
f
)
Slide22OverviewAlzheimer’s Disease (AD)EEG of AD patients: decrease in synchronySynchrony measure in time-frequency domain
Pairs of EEG signalsCollections of EEG signalsNumerical ResultsOutlook
Slide23Average synchrony
3. SES for each pair of models
4. Average the SES parameters
Group electrodes in regions
Bump model for each region
Slide24Beyond pairwise interactions...
Pairwise similarity
Multi-variate similarity
Slide25...by clustering
y
1
y
2
y
3
y
4
y
5
y
1
y
2
y
3
y
4
y
5
Constraint: in each cluster
at most
one bump from each signal
Models similar if
few deletions/large clusters
little jitter
HARD combinatorial problem!
Slide26OverviewAlzheimer’s Disease (AD)EEG of AD patients: decrease in synchronySynchrony measure in time-frequency domain
Pairs of EEG signalsCollections of EEG signalsNumerical ResultsOutlook
Slide27EEG Data
EEG data provided by Prof. T. Musha
EEG of 22 Mild Cognitive Impairment (MCI) patients and 38 age-matched
control subjects (CTR) recorded while in rest with closed eyes
→
spontaneous EEG
All 22 MCI patients suffered from Alzheimer’s disease (AD) later on Electrodes located on 21 sites according to 10-20 international system Electrodes grouped into 5 zones (reduces number of pairs) 1 bump model per zone Used continuous “artifact-free” intervals of 20s
Band pass filtered between 4 and 30 Hz
Slide28Similarity measures
Correlation
and coherence
Granger causality (linear system):
DTF, ffDTF, dDTF, PDC, PC, ...
Phase Synchrony:
compare
instantaneous phases
(wavelet/Hilbert transform)
State space
based measures
sync likelihood, S-estimator, S-H-N-indices, ...
Information-theoretic
measures
KL divergence, Jensen-Shannon divergence, ...
No Phase Locking
Phase Locking
TIME
FREQUENCY
Slide29Slide30Sensitivity (average synchrony)
Granger
Info. Theor.
State Space
Phase
SES
Corr/Coh
Mann-Whitney test:
small p value suggests large difference in statistics of both groups
Significant differences for ffDTF and
ρ
!
Slide31Classification
Clear separation, but not yet useful as diagnostic toolAdditional indicators needed (fMRI, MEG, DTI, ...)
Can be used for screening population (inexpensive, simple, fast)
ffDTF
Slide32Strong (anti-) correlations „families“ of sync measures
Correlations
Slide33OverviewAlzheimer’s Disease (AD)EEG of AD patients: decrease in synchronySynchrony measure in time-frequency domain
Pairs of EEG signalsCollections of EEG signalsNumerical ResultsOutlook
Slide34Ongoing work
Time-varying similarity parameters
st
low s
t
high s
t
high s
t
no stimulus
no stimulus
stimulus
low s
t
high s
t
high
s
t
Slide35Future workMatching event
patterns instead of single events
= allows us to extract
patterns in time-frequency map of EEG!
HYPOTHESIS:
Perhaps specific patterns occur in time-frequency EEG maps
of
AD patients
before onset of
epileptic seizuresREMARK:Such patterns are ignored by classical approaches: STATIONARITY/AVERAGING!
coupling between
frequency bands
t (sec)
f(Hz)
Slide36ConclusionsMeasure for similarity of
point processes („stochastic event synchrony“)Key idea: alignment of eventsSolved by statistical inference
Application: EEG synchrony of MCI patients
About 85%
correctly classified; perhaps useful for
screening
population
Ongoing/future work
: time-varying SES, extracting patterns of bumps
Slide37References + software
ReferencesQuantifying Statistical Interdependence by Message Passing on Graphs: Algorithms and Application to Neural Signals, Neural Computation (under revision)A Comparative Study of Synchrony Measures for the Early Diagnosis of Alzheimer's Disease Based on EEG, NeuroImage (under revision)
Measuring Neural Synchrony by Message Passing, NIPS 2007
Quantifying the Similarity of Multiple Multi-Dimensional Point Processes by Integer Programming with Application to Early Diagnosis of Alzheimer's Disease from EEG,
EMBC 2008 (submitted)
Software
MATLAB implementation
of the synchrony
measures
Slide38Machine learning techniques for quantifying neural synchrony: application to the diagnosis of Alzheimer's disease from EEG
Justin DauwelsLIDS, MITLMI, Harvard Medical SchoolAmari Research Unit, Brain Science Institute, RIKENJune 9, 2008
Slide39Machine learning for neuroscience
Multi-scale in time and spaceData fusion: EEG, fMRI, spike data, bio-imaging, ...Large-scale inference
Visualization
Behavior ↔ Brain ↔ Brain Regions ↔ Neural Assemblies ↔ Single neurons ↔ Synapses ↔ Ion channels
Slide40Estimation
Deltas: average offset
Sigmas: var of offset
...where
Simple closed form expressions
artificial observations (conjugate prior)
Slide41Large-scale synchrony
Apparently, all brain regions affected...
Slide42Alzheimer's disease
Outside glimpse: the future (prevalence)USA (Hebert et al. 2003)
World (
Wimo
et al. 2003)
Million of sufferers
Million of sufferers
2% to 5% of people over 65 years old
Up to 20% of people over 80
Jeong 2004 (Nature)
Slide43Ongoing and future work
Applications alternative inference techniques (e.g., MCMC, linear programming) time dependent (Gaussian processes)
multivariate (T.Weber)
Fluctuations of EEG synchrony
Caused by auditory stimuli and music (T. Rutkowski)
Caused by visual stimuli (F. Vialatte)
Yoga professionals (F. Vialatte)
Professional shogi players (RIKEN & Fujitsu)
Brain-Computer Interfaces (T. Rutkowski)
Spike data from interacting monkeys (N. Fujii)
Calcium propagation in gliacells (N. Nakata)Neural growth (Y. Tsukada & Y. Sakumura)
...
Algorithms
Slide44Fitting bump models
Signal
Bump
Initialisation
After adaptation
Adaptation
gradient method
F. Vialatte et al. “
A machine learning approach to the analysis of time-frequency maps and its application to neural dynamics
”, Neural Networks (2007).
Slide45Boxplots
SURPRISE!No increase in jitter, but significantly less matched activity!Physiological interpretation neural assemblies more localized
? harder to establish large-scale synchrony?
Slide46Similarity of bump models...
How “similar” or “
synchronous
”
are two bump models?
Slide47POINT ESTIMATION: θ(i+1)
= argmaxx log p(y, y’, c(i+1) ,θ ) Uniform prior p(
θ): δ
t, δ
f
= average offset,
s
t
, sf = variance of offset Conjugate prior p(θ): still closed-form expressionOther kind of prior p(θ): numerical optimization (gradient method)
Probabilistic inference
Slide48MATCHING: c(i+1)
= argmaxc log p(y, y’, c, θ(i) )ALGORITHMS
Polynomial-time algorithms gives optimal solution(s) (Edmond-Karp and Auction algorithm)
Linear programming
relaxation: extreme points of LP polytope are
integral
Max-product
algorithm
gives optimal solution if unique [Bayati et al. (2005), Sanghavi (2007)]EQUIVALENT to (imperfect) bipartite max-weight matching problemc(i+1) = argmaxc log p(y, y’, c,
θ(i) ) = argmaxc
Σkk’ w
kk’(i)
ckk’
s.t. Σ
k’
c
kk’
≤ 1
and
Σk
ckk’ ≤ 1
and ckk’ 2
{0,1}
Probabilistic inference
not necessarily perfectfind heaviest set of disjoint
edges
Slide49p(y, y’, c, θ
) / I(c) pθ(θ) Πkk’ (N(t k’ – tk ;
δt ,s
t,kk’) N(f k’
– f
k ;
δ
f ,
sf, kk’) β-2)ckk’Max-product algorithm
MATCHING: c
(i+1) = argmaxc log p(y, y’, c
, θ(i)
)
Generative model
Slide50Max-product algorithm
MATCHING: c(i+1) = argmaxc log p(y, y’, c, θ(i)
)
μ↑
μ↑
μ↓
μ↓
Conditioning on
θ
Max-product algorithm (2)
Iteratively compute messages
At convergence, compute
marginals
p(c
kk’
) =
μ↓
(c
kk’) μ↓(ckk’) μ↑(ckk’) Decisions
: c*kk’ = argmaxckk’ p(ckk’)
Slide52Algorithm
MATCHING → max-productESTIMATION → closed-formPROBLEM: Given two bump models, compute (
ρspur,
δt,
s
t
,
δ
f, sf )APPROACH: (c*,θ*) = argmaxc,θ
log p(y, y’, c, θ)
θ
SOLUTION:
Coordinate descent
c
(i+1)
= argmax
c
log p(y, y’,
c
, θ(i)
) θ(i+1)
= argmaxx log p(y, y’, c(i+1)
,θ )
Slide53Generative model
Generate bump model (
hidden
)
geometric prior for number n of bumps
p(n) = (1-
λ
S) (
λ
S)
-n
bumps are uniformly distributed in rectangle
amplitude, width (in t and f) all i.i.d
.
Generate two
“noisy” observations
offset between hidden and observed bump
= Gaussian random vector with
mean
( ±
δ
t
/2, ±
δ
f
/2) covariance diag(st
/2, sf /2) amplitude, width (in t and f) all i.i.d. “deletion” with probability pd
y
hidden
y
y
’
Easily
extendable
to more than 2 observations…
( -
δ
t
/2, -
δ
f
/2)
(
δ
t
/2,
δ
f
/2)
Slide54Generative model (2)
Binary variables ckk’ ckk’ = 1 if k and k’ are observations of same hidden bump, else
ckk’ = 0
(e.g., c
ii’
= 1 c
ij’
= 0
)
Constraints: bk = Σk’ ckk’ and bk’ = Σk ckk’
are binary (“matching constraints”)
Generative Model p(y, y’, yhidden ,
c, δt
, δ
f , st
,
s
f
) (
symmetric in y and y’)
Eliminate yhidden
→ offset is Gaussian RV with mean = ( δt
, δf ) and covariance
diag (st , sf) Probabilistic Inference:
(c*,θ*) = argmaxc,θ log p(y, y’, c
, θ)
y
y
’
( -
δ
t
/2, -
δ
f
/2)
(
δ
t
/2,
δ
f
/2)
i
i’
j’
p(y, y’,
c
,
θ
) =
∫
p(y, y’, y
hidden
,
c
,
θ
) d
y
hidden
θ
Slide55Bumps in one model, but NOT in other → fraction of “spurious” bumps ρspur
Bumps in
both models, but with offset → Average time offset
δ
t
(delay)
→ Timing jitter with variance
st → Average frequency offset δf → Frequency
jitter with variance sf
PROBLEM:
Given two bump models, compute (
ρ
spur
,
δ
t
,
s
t
,
δf,
sf )
APPROACH: (c*,θ*) = argmaxc,θ log p(y, y’, c, θ)
θ
Summary
Slide56Objective function
Logarithm of model: log p(y, y’, c, θ) = Σkk’ wkk’ ckk’ + log I(c
) + log pθ(θ) +
γ
w
kk
’
= -
(
1/st (t k’ – tk – δt)2 + 1/sf (f k’ – fk–
δf)2 ) - 2
log β
β = p
d (λ/V)1/2
Euclidean distance between bump centers
Large
w
kk
’
if : a) bumps are
close
b)
small pd c) few bumps per volume element
No need to specify pd , λ, and V, they only appear through β = knob to control # matches
y
y
’
( -
δ
t
/2, -
δ
f
/2)
(
δ
t
/2,
δ
f
/2)
i
i’
j’
Slide57Distance measures
wkk’ = 1/st,kk’ (t k’ – tk – δ
t)2 +
1/sf,kk’ (f
k’
– f
k
–
δf)2 + 2 log βst,kk’ = (Δtk + Δt’
k) st s
f,kk’ = (Δf
k + Δ
f’k)
sf
Scaling
Non-Euclidean
Slide58p(y, y’, c, θ
) / I(c) pθ(θ) Πkk’ (N(t k’ – tk ;
δt ,s
t,kk’) N(f k’
– f
k ;
δ
f ,
sf, kk’) β-2)ckk’Generative model
Slide59Expect bumps to appear at about
same frequency, but delayed Frequency shift requires non-linear transformation, less likely than delayConjugate priors for st and sf (scaled inverse chi-squared):
Improper prior for
δt
and
δ
t
: p(δt) = 1 = p(δf)
Prior for parameters
Slide60CTR
MCIPreliminary results for multi-variate model
linear comb of p
c
Slide61Probabilistic inference
MATCHINGPOINT ESTIMATIONPROBLEM: Given two bump models, compute (ρ
spur,
δt,
s
t
,
δ
f, sf )APPROACH: (c*,θ*) = argmaxc,θ log p(y, y’,
c, θ)
θ
SOLUTION:
Coordinate descent
c
(i+1)
= argmax
c
log p(y, y’,
c
, θ(i)
) θ(i+1) = argmax
x log p(y, y’, c(i+1)
,θ )
X
Y
Min
x
2
X
,
y
2
Y
d(
x
,
y
)
Slide62Generative model
Generate bump model (
hidden
)
geometric prior for number n of bumps
p(n) = (1-
λ
S) (
λ
S)-n bumps are uniformly distributed in rectangle
amplitude, width (in t and f) all i.i.d.
Generate M
“noisy” observations offset between hidden and observed bump
= Gaussian random vector with mean (
δt,m
/2,
δ
f,m
/2)
covariance
diag
(st,m/2,
sf,m /2)
amplitude, width (in t and f) all i.i.d.
“deletion” with probability pd (other prior pc0 for cluster size)
yhidden
y
1
y
2
y
3
y
4
y
5
Parameters:
θ
=
δ
t,m
,
δ
f,m ,
s
t,m
, s
f,m
,
p
c
p
c
(i) = p(cluster size = i |y) (i = 1,2,…,M)
(Hebb 1949, Fuster 1997)
Stimuli
Consolidation
Stimulus
Voice
Face
Voice
Role of local synchrony
Assembly activation
Hebbian consolidation
Assembly recall
Slide64Probabilistic inference
CLUSTERING (IP or MP)POINT ESTIMATIONPROBLEM: Given M bump models, compute
θ = δ
t,m , δ
f,m ,
s
t,m
, s
f,m
, pcAPPROACH: (c*,θ*) = argmaxc,θ
log p(y, y’, c, θ)
SOLUTION:
Coordinate descent
c(i+1) = argmax
c log p(y, y’, c,
θ
(i)
)
θ
(i+1) = argmax
x log p(y, y’, c(i+1)
,θ )
Integer program
Max-product algorithm (MP) on sparse graph Integer programming methods (e.g., LP relaxation)