/
Machine learning techniques for quantifying neural synchrony: Machine learning techniques for quantifying neural synchrony:

Machine learning techniques for quantifying neural synchrony: - PowerPoint Presentation

slygrat
slygrat . @slygrat
Follow
345 views
Uploaded On 2020-08-07

Machine learning techniques for quantifying neural synchrony: - PPT Presentation

application to the diagnosis of Alzheimers disease from EEG Justin Dauwels LIDS MIT LMI Harvard Medical School Amari Research Unit Brain Science Institute RIKEN June 9 2008 RIKEN Brain Science Institute ID: 801376

frequency eeg bump time eeg frequency time bump log bumps synchrony disease

Share:

Link:

Embed:

Download Presentation from below link

Download The PPT/PDF document "Machine learning techniques for quantify..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Machine learning techniques for quantifying neural synchrony: application to the diagnosis of Alzheimer's disease from EEG

Justin DauwelsLIDS, MITLMI, Harvard Medical SchoolAmari Research Unit, Brain Science Institute, RIKENJune 9, 2008

Slide2

RIKEN Brain Science Institute

RIKEN Wako Campus (near Tokyo) about 400 researchers and staff (20% foreign)

300 research fellows and visiting scientists

about 60 laboratories

research covers most aspects of brain science

Collaborators

François

Vialatte

*, Theo Weber

+, Shun-ichi Amari*, Andrzej Cichocki* (*RIKEN, +MIT) ProjectEarly diagnosis of Alzheimer’s disease based on EEGFinancial Support

Slide3

Research Overview

EEG (RIKEN,

MIT, MGH)

diagnosis of Alzheimer’s

disease

detection/prediction of epileptic seizures

analysis of EEG evoked by visual/auditory stimuli

EEG during meditation

projects related to brain-computer interface (BMI) Calcium imaging (RIKEN, NAIST, MIT)effect of calcium on neural growthrole of calcium propagation in gliacells

and neurons Diffusion MRI (Brigham&Women’s

Hospital, Harvard Medical School, MIT) estimation and clustering of tracts (future project)

Machine learning & signal processing for applications in

NEUROSCIENCE

= development of

ALGORITHMS

to analyze

brain signals

subject

of this

talk

Slide4

OverviewAlzheimer’s Disease (AD)EEG of AD patients: decrease in synchronySynchrony measure in time-frequency domain

Pairs of EEG signalsCollections of EEG signalsNumerical ResultsOutlook

Slide5

Alzheimer's disease

Outside glimpse: clinical perspective

Mild (early stage)

becomes less energetic or spontaneous

noticeable cognitive deficits

still independent (able to compensate)

Moderate (middle stage)

Mental abilities decline

personality changes

become dependent on caregivers

Severe (late stage)complete deterioration of the personality

loss of control over bodily functionstotal dependence on caregivers

Apathy

Memory

(forgetting

relatives)

Evolution of the disease (stages)

One disease,

many symptoms

Loss of

Self-control

Video sources: Alzheimer society

2 to 5 years before

mild cognitive impairment (often unnoticed)

6 to 25 % progress to Alzheimer's per year

memory, language, executive functions,

apraxia, apathy, agnosia, etc…

2% to 5% of people over 65 years old

up to 20% of people over 80 Jeong 2004 (Nature)

EEG data

Slide6

A

lzheimer's diseaseInside glimpse: brain atrophy

Video source: P. Thompson, J.Neuroscience, 2003

Images: Jannis Productions.

(R. Fredenburg; S. Jannis)

amyloid plaques and

neurofibrillary tangles

Video source:

Alzheimer society

Slide7

OverviewAlzheimer’s Disease (AD)EEG of AD patients: decrease in synchronySynchrony measure in time-frequency domain

Pairs of EEG signalsCollections of EEG signalsNumerical ResultsOutlook

Slide8

Alzheimer's disease

Inside glimpse: abnormal EEGAD vs. MCI (Hogan et al. 203; Jiang et al., 2005)AD vs. Control (Hermann, Demilrap, 2005,

Yagyu et al. 1997; Stam et al., 2002; Babiloni

et al. 2006)MCI

vs.

mildAD

(

Babiloni

et al., 2006).Decrease of synchronyBrain “slow-down”

slow rhythms (0.5-8 Hz) fast rhythms (8-30 Hz)

(

Babiloni

et al., 2004;

Besthorn et al., 1997; Jelic et al. 1996,

Jeong

2004;

Dierks

et al., 1993).

Images: www.cerebromente.org.br

EEG system: inexpensive, mobile, useful for screening

focus of this project

Slide9

Spontaneous (scalp) EEG

Fourier power

f (Hz)

t (sec)

amplitude

Fourier

|X(f)|

2

EEG

x(t)

Time-frequency

|X(t,f)|

2

(wavelet transform)

Time-frequency patterns

(“bumps”)

Slide10

Fourier transform

High frequency

Low frequency

Frequency

1

2

3

2

1

3

Slide11

Windowed Fourier transform

*

=

Fourier basis functions

Window

function

windowed basis

functions

Windowed

Fourier

Transform

t

f

Slide12

Spontaneous EEG

Fourier power

f (Hz)

t (sec)

amplitude

Fourier

|X(f)|

2

EEG

x(t)

Time-frequency

|X(t,f)|

2

(wavelet transform)

Time-frequency patterns

(“bumps”)

Slide13

Signatures of local synchrony

f (Hz)

t (sec)

Time-frequency patterns

(“bumps”)

EEG stems from

thousands

of neurons

bump

if

neurons

are

phase-locked

= local synchrony

Slide14

Alzheimer's disease

Inside glimpse: abnormal EEGAD vs. MCI (Hogan et al. 203; Jiang et al., 2005)AD vs. Control (Hermann, Demilrap, 2005,

Yagyu et al. 1997; Stam et al., 2002; Babiloni

et al. 2006)MCI

vs.

mildAD

(

Babiloni

et al., 2006).Decrease of synchronyBrain “slow-down”

slow rhythms (0.5-8 Hz) fast rhythms (8-30 Hz)

(

Babiloni

et al., 2004;

Besthorn et al., 1997; Jelic et al. 1996,

Jeong

2004;

Dierks

et al., 1993).

Images: www.cerebromente.org.br

EEG system: inexpensive, mobile, useful for screening

focus of this project

Slide15

OverviewAlzheimer’s Disease (AD)EEG of AD patients: decrease in synchronySynchrony measure in time-frequency domain

Pairs of EEG signalsCollections of EEG signalsNumerical ResultsOutlook

Slide16

Comparing EEG signal rhythms ?

PROBLEM I:

Signals of 3

seconds sampled at 100 Hz (

 300

samples

)

Time-frequency representation of one signal = about

25 000 coefficients

2 signals

Slide17

Numerous

neighboring pixels

Comparing EEG signal rhythms ?(2)

One pixel

PROBLEM II:

Shifts in time-frequency!

Slide18

Sparse representation: bump model

Assumptions:

time-frequency

map is suitable representation

oscillatory bursts

(“bumps”) convey key information

Bumps

Sparse representation

F. Vialatte et al. “

A machine learning approach to the analysis of time-frequency maps and its application to neural dynamics”, Neural Networks (2007).

Normalization:

10

4

- 10

5

coefficients

about 10

2

parameters

t (sec)

f(Hz)

f(Hz)

t (sec)

f(Hz)

t (sec)

Slide19

Similarity of bump models...

How “similar” or

“synchronous

are two bump models

?

=

GLOBAL

synchrony

Reminder: bumps due to LOCAL synchrony= MULTI-SCALE approach

Slide20

... by matching bumps

y

1

y

2

Some bumps

match

Offset

between matched bumps

SIMILAR

bump models if:

Many

matches

Strongly

overlapping

matches

Slide21

... by matching bumps (2)

Bumps in one model, but NOT in other → fraction of “spurious” bumps ρ

spur

Bumps in

both

models, but with offset

→ Average time offset

δ

t

(delay) → Timing jitter with variance st → Average frequency offset δf

→ Frequency jitter with variance

sf

Synchrony:

only

st

and

ρ

spur

relevant

PROBLEM:

Given two bump models, compute

(

ρ

spur

,

δt,

st, δf

, sf )

Stochastic Event Synchrony (SES) =

(ρspur,

δ

t

,

s

t

,

δ

f

,

s

f

)

Slide22

OverviewAlzheimer’s Disease (AD)EEG of AD patients: decrease in synchronySynchrony measure in time-frequency domain

Pairs of EEG signalsCollections of EEG signalsNumerical ResultsOutlook

Slide23

Average synchrony

3. SES for each pair of models

4. Average the SES parameters

Group electrodes in regions

Bump model for each region

Slide24

Beyond pairwise interactions...

Pairwise similarity

Multi-variate similarity

Slide25

...by clustering

y

1

y

2

y

3

y

4

y

5

y

1

y

2

y

3

y

4

y

5

Constraint: in each cluster

at most

one bump from each signal

Models similar if

few deletions/large clusters

little jitter

HARD combinatorial problem!

Slide26

OverviewAlzheimer’s Disease (AD)EEG of AD patients: decrease in synchronySynchrony measure in time-frequency domain

Pairs of EEG signalsCollections of EEG signalsNumerical ResultsOutlook

Slide27

EEG Data

EEG data provided by Prof. T. Musha

EEG of 22 Mild Cognitive Impairment (MCI) patients and 38 age-matched

control subjects (CTR) recorded while in rest with closed eyes

spontaneous EEG

All 22 MCI patients suffered from Alzheimer’s disease (AD) later on Electrodes located on 21 sites according to 10-20 international system Electrodes grouped into 5 zones (reduces number of pairs) 1 bump model per zone Used continuous “artifact-free” intervals of 20s

Band pass filtered between 4 and 30 Hz

Slide28

Similarity measures

Correlation

and coherence

Granger causality (linear system):

DTF, ffDTF, dDTF, PDC, PC, ...

Phase Synchrony:

compare

instantaneous phases

(wavelet/Hilbert transform)

State space

based measures

sync likelihood, S-estimator, S-H-N-indices, ...

Information-theoretic

measures

KL divergence, Jensen-Shannon divergence, ...

No Phase Locking

Phase Locking

TIME

FREQUENCY

Slide29

Slide30

Sensitivity (average synchrony)

Granger

Info. Theor.

State Space

Phase

SES

Corr/Coh

Mann-Whitney test:

small p value suggests large difference in statistics of both groups

Significant differences for ffDTF and

ρ

!

Slide31

Classification

Clear separation, but not yet useful as diagnostic toolAdditional indicators needed (fMRI, MEG, DTI, ...)

Can be used for screening population (inexpensive, simple, fast)

ffDTF

Slide32

Strong (anti-) correlations „families“ of sync measures

Correlations

Slide33

OverviewAlzheimer’s Disease (AD)EEG of AD patients: decrease in synchronySynchrony measure in time-frequency domain

Pairs of EEG signalsCollections of EEG signalsNumerical ResultsOutlook

Slide34

Ongoing work

Time-varying similarity parameters

st

low s

t

high s

t

high s

t

no stimulus

no stimulus

stimulus

low s

t

high s

t

high

s

t

Slide35

Future workMatching event

patterns instead of single events

= allows us to extract

patterns in time-frequency map of EEG!

HYPOTHESIS:

Perhaps specific patterns occur in time-frequency EEG maps

of

AD patients

before onset of

epileptic seizuresREMARK:Such patterns are ignored by classical approaches: STATIONARITY/AVERAGING!

coupling between

frequency bands

t (sec)

f(Hz)

Slide36

ConclusionsMeasure for similarity of

point processes („stochastic event synchrony“)Key idea: alignment of eventsSolved by statistical inference

Application: EEG synchrony of MCI patients

About 85%

correctly classified; perhaps useful for

screening

population

Ongoing/future work

: time-varying SES, extracting patterns of bumps

Slide37

References + software

ReferencesQuantifying Statistical Interdependence by Message Passing on Graphs: Algorithms and Application to Neural Signals, Neural Computation (under revision)A Comparative Study of Synchrony Measures for the Early Diagnosis of Alzheimer's Disease Based on EEG, NeuroImage (under revision)

Measuring Neural Synchrony by Message Passing, NIPS 2007

Quantifying the Similarity of Multiple Multi-Dimensional Point Processes by Integer Programming with Application to Early Diagnosis of Alzheimer's Disease from EEG,

EMBC 2008 (submitted)

Software

MATLAB implementation

of the synchrony

measures

Slide38

Machine learning techniques for quantifying neural synchrony: application to the diagnosis of Alzheimer's disease from EEG

Justin DauwelsLIDS, MITLMI, Harvard Medical SchoolAmari Research Unit, Brain Science Institute, RIKENJune 9, 2008

Slide39

Machine learning for neuroscience

Multi-scale in time and spaceData fusion: EEG, fMRI, spike data, bio-imaging, ...Large-scale inference

Visualization

Behavior ↔ Brain ↔ Brain Regions ↔ Neural Assemblies ↔ Single neurons ↔ Synapses ↔ Ion channels

Slide40

Estimation

Deltas: average offset

Sigmas: var of offset

...where

Simple closed form expressions

artificial observations (conjugate prior)

Slide41

Large-scale synchrony

Apparently, all brain regions affected...

Slide42

Alzheimer's disease

Outside glimpse: the future (prevalence)USA (Hebert et al. 2003)

World (

Wimo

et al. 2003)

Million of sufferers

Million of sufferers

2% to 5% of people over 65 years old

Up to 20% of people over 80

Jeong 2004 (Nature)

Slide43

Ongoing and future work

Applications alternative inference techniques (e.g., MCMC, linear programming) time dependent (Gaussian processes)

multivariate (T.Weber)

Fluctuations of EEG synchrony

Caused by auditory stimuli and music (T. Rutkowski)

Caused by visual stimuli (F. Vialatte)

Yoga professionals (F. Vialatte)

Professional shogi players (RIKEN & Fujitsu)

Brain-Computer Interfaces (T. Rutkowski)

Spike data from interacting monkeys (N. Fujii)

Calcium propagation in gliacells (N. Nakata)Neural growth (Y. Tsukada & Y. Sakumura)

...

Algorithms

Slide44

Fitting bump models

Signal

Bump

Initialisation

After adaptation

Adaptation

gradient method

F. Vialatte et al. “

A machine learning approach to the analysis of time-frequency maps and its application to neural dynamics

”, Neural Networks (2007).

Slide45

Boxplots

SURPRISE!No increase in jitter, but significantly less matched activity!Physiological interpretation neural assemblies more localized

? harder to establish large-scale synchrony?

Slide46

Similarity of bump models...

How “similar” or “

synchronous

are two bump models?

Slide47

POINT ESTIMATION: θ(i+1)

= argmaxx log p(y, y’, c(i+1) ,θ ) Uniform prior p(

θ): δ

t, δ

f

= average offset,

s

t

, sf = variance of offset Conjugate prior p(θ): still closed-form expressionOther kind of prior p(θ): numerical optimization (gradient method)

Probabilistic inference

Slide48

MATCHING: c(i+1)

= argmaxc log p(y, y’, c, θ(i) )ALGORITHMS

Polynomial-time algorithms gives optimal solution(s) (Edmond-Karp and Auction algorithm)

Linear programming

relaxation: extreme points of LP polytope are

integral

Max-product

algorithm

gives optimal solution if unique [Bayati et al. (2005), Sanghavi (2007)]EQUIVALENT to (imperfect) bipartite max-weight matching problemc(i+1) = argmaxc log p(y, y’, c,

θ(i) ) = argmaxc

Σkk’ w

kk’(i)

ckk’

s.t. Σ

k’

c

kk’

≤ 1

and

Σk

ckk’ ≤ 1

and ckk’ 2

{0,1}

Probabilistic inference

not necessarily perfectfind heaviest set of disjoint

edges

Slide49

p(y, y’, c, θ

) / I(c) pθ(θ) Πkk’ (N(t k’ – tk ;

δt ,s

t,kk’) N(f k’

– f

k ;

δ

f ,

sf, kk’) β-2)ckk’Max-product algorithm

MATCHING: c

(i+1) = argmaxc log p(y, y’, c

, θ(i)

)

Generative model

Slide50

Max-product algorithm

MATCHING: c(i+1) = argmaxc log p(y, y’, c, θ(i)

)

μ↑

μ↑

μ↓

μ↓

Conditioning on

θ

Slide51

Max-product algorithm (2)

Iteratively compute messages

At convergence, compute

marginals

p(c

kk’

) =

μ↓

(c

kk’) μ↓(ckk’) μ↑(ckk’) Decisions

: c*kk’ = argmaxckk’ p(ckk’)

Slide52

Algorithm

MATCHING → max-productESTIMATION → closed-formPROBLEM: Given two bump models, compute (

ρspur,

δt,

s

t

,

δ

f, sf )APPROACH: (c*,θ*) = argmaxc,θ

log p(y, y’, c, θ)

θ

SOLUTION:

Coordinate descent

c

(i+1)

= argmax

c

log p(y, y’,

c

, θ(i)

) θ(i+1)

= argmaxx log p(y, y’, c(i+1)

,θ )

Slide53

Generative model

Generate bump model (

hidden

)

geometric prior for number n of bumps

p(n) = (1-

λ

S) (

λ

S)

-n

bumps are uniformly distributed in rectangle

amplitude, width (in t and f) all i.i.d

.

Generate two

“noisy” observations

offset between hidden and observed bump

= Gaussian random vector with

mean

( ±

δ

t

/2, ±

δ

f

/2) covariance diag(st

/2, sf /2) amplitude, width (in t and f) all i.i.d. “deletion” with probability pd

y

hidden

y

y

Easily

extendable

to more than 2 observations…

( -

δ

t

/2, -

δ

f

/2)

(

δ

t

/2,

δ

f

/2)

Slide54

Generative model (2)

Binary variables ckk’ ckk’ = 1 if k and k’ are observations of same hidden bump, else

ckk’ = 0

(e.g., c

ii’

= 1 c

ij’

= 0

)

Constraints: bk = Σk’ ckk’ and bk’ = Σk ckk’

are binary (“matching constraints”)

Generative Model p(y, y’, yhidden ,

c, δt

, δ

f , st

,

s

f

) (

symmetric in y and y’)

Eliminate yhidden

→ offset is Gaussian RV with mean = ( δt

, δf ) and covariance

diag (st , sf) Probabilistic Inference:

(c*,θ*) = argmaxc,θ log p(y, y’, c

, θ)

y

y

( -

δ

t

/2, -

δ

f

/2)

(

δ

t

/2,

δ

f

/2)

i

i’

j’

p(y, y’,

c

,

θ

) =

p(y, y’, y

hidden

,

c

,

θ

) d

y

hidden

θ

Slide55

Bumps in one model, but NOT in other → fraction of “spurious” bumps ρspur

Bumps in

both models, but with offset → Average time offset

δ

t

(delay)

→ Timing jitter with variance

st → Average frequency offset δf → Frequency

jitter with variance sf

PROBLEM:

Given two bump models, compute (

ρ

spur

,

δ

t

,

s

t

,

δf,

sf )

APPROACH: (c*,θ*) = argmaxc,θ log p(y, y’, c, θ)

θ

Summary

Slide56

Objective function

Logarithm of model: log p(y, y’, c, θ) = Σkk’ wkk’ ckk’ + log I(c

) + log pθ(θ) +

γ

w

kk

= -

(

1/st (t k’ – tk – δt)2 + 1/sf (f k’ – fk–

δf)2 ) - 2

log β

β = p

d (λ/V)1/2

Euclidean distance between bump centers

Large

w

kk

if : a) bumps are

close

b)

small pd c) few bumps per volume element

No need to specify pd , λ, and V, they only appear through β = knob to control # matches

y

y

( -

δ

t

/2, -

δ

f

/2)

(

δ

t

/2,

δ

f

/2)

i

i’

j’

Slide57

Distance measures

wkk’ = 1/st,kk’ (t k’ – tk – δ

t)2 +

1/sf,kk’ (f

k’

– f

k

δf)2 + 2 log βst,kk’ = (Δtk + Δt’

k) st s

f,kk’ = (Δf

k + Δ

f’k)

sf

Scaling

Non-Euclidean

Slide58

p(y, y’, c, θ

) / I(c) pθ(θ) Πkk’ (N(t k’ – tk ;

δt ,s

t,kk’) N(f k’

– f

k ;

δ

f ,

sf, kk’) β-2)ckk’Generative model

Slide59

Expect bumps to appear at about

same frequency, but delayed Frequency shift requires non-linear transformation, less likely than delayConjugate priors for st and sf (scaled inverse chi-squared):

Improper prior for

δt

and

δ

t

: p(δt) = 1 = p(δf)

Prior for parameters

Slide60

CTR

MCIPreliminary results for multi-variate model

linear comb of p

c

Slide61

Probabilistic inference

MATCHINGPOINT ESTIMATIONPROBLEM: Given two bump models, compute (ρ

spur,

δt,

s

t

,

δ

f, sf )APPROACH: (c*,θ*) = argmaxc,θ log p(y, y’,

c, θ)

θ

SOLUTION:

Coordinate descent

c

(i+1)

= argmax

c

log p(y, y’,

c

, θ(i)

) θ(i+1) = argmax

x log p(y, y’, c(i+1)

,θ )

X

Y

Min

x

2

X

,

y

2

Y

d(

x

,

y

)

Slide62

Generative model

Generate bump model (

hidden

)

geometric prior for number n of bumps

p(n) = (1-

λ

S) (

λ

S)-n bumps are uniformly distributed in rectangle

amplitude, width (in t and f) all i.i.d.

Generate M

“noisy” observations offset between hidden and observed bump

= Gaussian random vector with mean (

δt,m

/2,

δ

f,m

/2)

covariance

diag

(st,m/2,

sf,m /2)

amplitude, width (in t and f) all i.i.d.

“deletion” with probability pd (other prior pc0 for cluster size)

yhidden

y

1

y

2

y

3

y

4

y

5

Parameters:

θ

=

δ

t,m

,

δ

f,m ,

s

t,m

, s

f,m

,

p

c

p

c

(i) = p(cluster size = i |y) (i = 1,2,…,M)

Slide63

(Hebb 1949, Fuster 1997)

Stimuli

Consolidation

Stimulus

Voice

Face

Voice

Role of local synchrony

Assembly activation

Hebbian consolidation

Assembly recall

Slide64

Probabilistic inference

CLUSTERING (IP or MP)POINT ESTIMATIONPROBLEM: Given M bump models, compute

θ = δ

t,m , δ

f,m ,

s

t,m

, s

f,m

, pcAPPROACH: (c*,θ*) = argmaxc,θ

log p(y, y’, c, θ)

SOLUTION:

Coordinate descent

c(i+1) = argmax

c log p(y, y’, c,

θ

(i)

)

θ

(i+1) = argmax

x log p(y, y’, c(i+1)

,θ )

Integer program

Max-product algorithm (MP) on sparse graph Integer programming methods (e.g., LP relaxation)