Mining Mouse Vocalizations Prepared by Jesin Zakaria and Eamonn Keogh CREATE SPECTROGRAM Run the code createSpectrom to create spectrogram from a wav file idealize the spectrogram ID: 560848
Download Presentation The PPT/PDF document "User Manual of" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
User Manual ofMining Mouse Vocalizations
Prepared by
Jesin
Zakaria
and
Eamonn
KeoghSlide2
CREATE SPECTROGRAM
Run the code
createSpectro.m
to
create spectrogram from a
.wav
file
idealize
the spectrogram
extract
candidate syllables
from
idealized spectrogram
Try the following example
Set,
rec
= ‘..\031611KOKO02MATED.wav';
% put the address and name of the wav file
D = ‘...\031611KOKO02MATEDspectro\';
% location of the folder
% that will contain syllables
Depending on the size of main memory and recording set range of the
for
loop
In each iteration we created spectrogram of
two
minutes of the recording,
this value can be changed to create spectrogram of longer section of the recording.
RUNNING TIME
:
Since the running time is faster than real time, we did not include running time analysis in our paper.
For example,
It took on average,
(12.95 + 12.81 + 12.67)/3 = 12.81 second, to create spectrogram of a
two
minute long recording
It took, 85.7 second to extract
connected components
from the
idealized spectrogram
of a
six
minute long recordingSlide3
CREATE SPECTROGRAM
rec
= 'C:\Users\Jesin\Desktop\temp\031611KOKO02MATED.wav';
t1 = 124000*250;
t2 = 125000*250;
[Y, FS] = wavread(rec,[t1,t2]);[y,F,T,P]=spectrogram(Y,512,256,512,FS,'yaxis'); C = -10*log10(P);C(C<35)=0;C(C>80)=0;C(C~=0)=1; imshow(~C);
124
Time (second)
125
40
kHz
100
l
aboratory
mice
Figure 1
: Use the following code to create the idealized spectrogram.Slide4
EXTRACT CANDIDATE SYLLABLES
In
createSpectro.m
we marked the part of code to extract candidate syllables
Results of all filtering steps are included in the extractcandidatesyllable.zip folderThe folder …\031611KOKO02MATEDspectro contains all connected components with duration >10 and <300 and within frequency range 30 to 110kHzThe folder …\031611KOKO02MATED contains all candidate syllables after filtering out some noise and excluding all the syllables but one that appear in the same time stampThe folder …\sametime contains syllables that were excluded for appearing in same timestampSlide5
CLASSIFY CANDIDATE SYLLABLES
Run the code
classifySyllables.m
Require:
labelGrndTruth.txt
contains labels of the ground truththeta.txt contains thresholds for each class. mean, sigma, mean+sigma and mean+2*sigma for each class of syllables in the ground truth are included in column 1, 2, 4 and 5 of theta.txtNomalized Ground truthCandidate syllables bitmapsList of candidate syllables in sorted orderResult:For our sample example,‘dis031611KOKO02MATED.txt’, contains distance
of the candidate syllables to GroundTruth
‘label 031611KOKO02MATED.txt
’, contains labels of all the candidate syllables
If you want to see class distribution unblock the code for class distribution in classifySyllables.mSlide6
CLASSIFY CANDIDATE SYLLABLES
Normalization method
In our paper we said that all the candidate syllables and ground truth are normalized
before computing the GHT distance between them.
B
ut for brevity we did not include details about our normalization method and also did not validate our normalization method.In the next slide we will present detail about our normalization method.Slide7
CLASSIFY CANDIDATE SYLLABLES
Normalization method
Set:
16
syllables of class 1, 3, 4 and 11 (non confusing classes)Syllables that are not clustered correctly are marked with red circleGHT is calculated without normalizing the syllablesSlide8
CLASSIFY CANDIDATE SYLLABLES
Normalization method
Set:
16
syllables of class
1
,
3
,
4
and 11 (non confusing classes)Still there are some syllables that are not clustered correctly as evident from the following figureGHT is calculated after normalizing the syllables by dividing x and y by the larger dimension(row or column)
Same set of syllables after normalizationSlide9
CLASSIFY CANDIDATE SYLLABLES
Normalization method (we used in our paper)
Set:
16
syllables of class 1, 3, 4 and 11 (non confusing classes)All the syllables except one (marked with arrow), are clustered correctly as evident from the following figureGHT is calculated after normalizing the syllables by dividing x and y by the size of row and column respectively
Same set of syllables after normalizationSlide10
CLASSIFY CANDIDATE SYLLABLES
Same set of syllables after normalization
Set:
16
syllables of class
1 and 27 syllables of class 9 (Confusing classes)
Normalization method (we used in our paper)
GHT
is
calculated after
normalizing the syllables by dividing x and y by the size of row and column respectively Slide11
EDITING GROUND TRUTH
0
100
200
300
400
500
600
700
0
0.2
0.4
0.6
0.8
1
Adding more instances
Classification Accuracy
for edited ground truth
for all the labeled syllables
Run
accuracyGrndTrth.m
to generate the plot
It requires,
editMatrix.txt
dis692.txt
label692.txt
DESCRIPTION OF THE FILES
In our paper we have mentioned about the
692
annotated syllables by the domain expert.
Instead of using that
692
syllables as
ground truth
we used data editing technique,
that resulted in a set of
108
syllables which we used as
GROUNDTRUTH
for our experiments
1. editMatrix.txt
contains result of editing
692
annotated syllables
Column 2, 3, 4 and 5 represent the number of syllable added to the ground truth, class label of
the syllable, total number of classified syllable using the edited ground truth and accuracy rate.
2.
dis692.txt
contains GHT distances of the
692
annotated syllables
3. label692.txt
contains class labels of the
692
syllables
groundtruth.zip
contains the set
of
692
syllable and
108
syllables
that we mentioned in our paper.Slide12
MOTIF DISCOVERY
Run
findMotif.m
to find motifs from
a vocalization
944.7 – 945.2 sec194.8 – 195.2 sec
Instruction:
In
findMotif.m
need to change location of the folders that will contain motifs,
.wav file, list of syllables, label of the syllablesAnd also create folder e.g. …/motif/6 …/motif/7 before running the code.
These folders will contain motifs of length 6, 7 etc.motif.zip contains motifs from the attached .wav file.Slide13
Clustering mice vocalizations
Run
clusterMtf.m
to cluster motifs from
mice vocalizations
The folder ‘dendo_mice’ contains all the required files used to generate the dendrograms of figure 12 and figure 13.Slide14
d
d
q
d
ddqd
(‘q’ means, unknown class)
QUERY
Similarity search / Query by content
Some additional results are attached here
10 NN from four vocalizations are presented.Slide15
qaiaiacia
(‘q’ means, unknown class)
QUERY
Similarity search / Query by content
Some additional results are attached here
10 NN from four vocalizations are presented.
a
q
i
a
i
ac
iaSlide16
Motif Significance
Run
mtfSgnfnc.m
to assess significance of motifs based on their z-score.
The folder ‘
../mtfSgnfcn’ contains all the required files used to generate the plot of figure 17.Slide17
Contrast sets
createContrastset.m
is used to create the contrast sets.
contratset.m
is used to extract the patterns in contrast sets, from a vocalization.The folder ‘../contrastSet’ contains some examples of contrast set that we mentioned in our paper. It also contains necessary files needed in createContrastset.m‘contrastset.txt’ contains the list of substrings sorted in descending order of their information gain. Slide18
Question/ comment?Email at, jzaka001@cs.ucr.edu