identification through information retrieval Ralph Niels Franc Grootjen amp Louis Vuurpijl August 21st 2008 ICFHR Montreal A search engine for forensic experts Writer ID: 289653
Download Presentation The PPT/PDF document "Writer" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Writer identification through information retrieval
Ralph Niels, Franc Grootjen & Louis Vuurpijl
August 21st, 2008
ICFHR, MontrealSlide2
A search engine for forensic expertsWriter
identification through
information
retrieval
Ralph NielsFranc GrootjenLouis VuurpijlSlide3
OverviewForensic writer identificationPrototypical shapes in handwritingInformation retrieval (IR)
TraditionalWriter identification usingprototypesExperiments
Method
Results
Conclusions & future work
Writer
identification
throughinformation retrieval
Ralph Niels
Franc Grootjen
Louis VuurpijlSlide4
Forensic writer identification
Writer
identification
through
information
retrieval
Ralph Niels
Franc Grootjen
Louis VuurpijlSlide5
Forensic information retrievalWeb search: query of words to search in documents containing wordsForensic search: query of
characters to search in documents containing characters
Previous work
*
: sub-character level, binary featuresBased on characters: improves justification possibilities
Writer identification through
information
retrieval
Ralph Niels
Franc Grootjen
Louis Vuurpijl
*
A.
Bensefia
, T.
Paquet
, and L.
Heutte
. A
writer
identification
and
verification
system.
Pattern
Recogn
. Letters, 26(13):2080–2092, 2005.Slide6
Forensic information retrievalDictionary of character shapes: prototypesExperts use prototypesDescribe query & documents by prototype usage
instances of
prototype
Writer
identification
through
information
retrieval
Ralph Niels
Franc Grootjen
Louis Vuurpijl
PrototypesSlide7
Character to prototype matcherFind most similar prototype for each character
W
48
h
16 a9 t1
y2 o1 u23 d16 i25 d12 i6 s12 (…)
a5
a9
a16
a52
(…)
Writer
identification
through
information
retrieval
Ralph Niels
Franc Grootjen
Louis VuurpijlSlide8
PrototypesAveraged shapes of real handwritten charactersDynamic Time Warping-distance to find most similar prototype
Writer
identification
through
information retrievalRalph Niels
Franc Grootjen
Louis VuurpijlR. Niels & L. Vuurpijl & L. Schomaker.
Automatic
allograph
matching
in
forensic
writer
identification
.
International Journal of
Pattern
Recognition
and
Artificial
Intelligence. Vol. 21, No. 1. Pages 61-81. February 2007.
PrototypesSlide9
The IR model for writer identificationCharacter to prototype matcher
Indexing
Matching
Character to prototype matcher
Writer input
Query input
Prototype list
af
(q)
af
(w)
aw(w)
Ranked
list
Justification
Writer
identification
through
information
retrieval
Ralph Niels
Franc Grootjen
Louis VuurpijlSlide10
Indexing: create weighted vectorsVector of prototype usage for each writer: af(w)Adjust weight of prototypes in that vector:
Protos used by many writers: not distinctive -> lower weightwf(p)
= number of writers using proto
p
Weighted vector
of prototype use for each writer
Writer
identification through
information
retrieval
Ralph Niels
Franc Grootjen
Louis VuurpijlSlide11
The IR model for writer identificationCharacter to prototype matcher
Indexing
Matching
Character to prototype matcher
Writer input
Query input
Prototype list
af
(q)
af
(w)
aw(w)
Ranked
list
Justification
Prototype frequency in query
Writer
identification
through
information
retrieval
Ralph Niels
Franc Grootjen
Louis VuurpijlSlide12
The IR model for writer identificationCharacter to prototype matcher
Indexing
Matching
Character to prototype matcher
Writer input
Query input
Prototype list
af
(q)
af
(w)
aw(w)
Ranked
list
Justification
Writer
identification
through
information
retrieval
Ralph Niels
Franc Grootjen
Louis VuurpijlSlide13
MatchingInput‘Database writers’: Indexed writer vectors
aw(w)‘Query writer’: Vector af(q)
Match:
Calculate cosine of angle between
af(q) and each aw(w)
OutputRanked list of writers (similarity to query)Writer
identification through
information retrieval
Ralph Niels
Franc Grootjen
Louis VuurpijlSlide14
The IR model for writer identificationCharacter to prototype matcher
Indexing
Matching
Character to prototype matcher
Writer input
Query input
Prototype list
af
(q)
af
(w)
aw(w)
Ranked
list
Justification
Writer
identification
through
information
retrieval
Ralph Niels
Franc Grootjen
Louis VuurpijlSlide15
JustificationSimilarity value (cosine of angle)Prototype contribution to retrieval result
Writer
identification
through
information retrieval
Ralph NielsFranc Grootjen
Louis VuurpijlSlide16
JustificationForensic expert can further inspect justification
Writer identification
through
information
retrievalRalph Niels
Franc GrootjenLouis VuurpijlSlide17
Experiment43 writers from plucoll databaseOnline data
Segmented into charactersHow well does our technique perform given a certain amount of data (characters)?Amount of characters in database (
d
)
Amount of characters in query (q)
Writer identification through
information
retrieval
Ralph Niels
Franc Grootjen
Louis VuurpijlSlide18
ExperimentPick d random letters from each database
writerPick q random other letters from one writer,
and use those as
query
Find most similar writerPrototypes
iwf(p), aw(w)MatchingVary d and q
Repeat 10 times for each writer
Writer
identification
through
information
retrieval
Ralph Niels
Franc Grootjen
Louis Vuurpijl
Repeat
10 times for each comb. of
d
and
qSlide19
Results
100300
500
1000
10
5979838830
8697
99100
50
94
99
100
100
70
96
100
100
100
100
98
100
100
100
d
q
d
q
Writer
identification
through
information
retrieval
Ralph Niels
Franc Grootjen
Louis VuurpijlSlide20
Conclusions & future workNeeded for 100%: 70 chars (q), 300 chars (
d)Average English sentence: 75-100 charactersNo black box: results are justified
Online data: forensic practice?
Extract semi-automatically with help expert
Use offline matching technique
Just 43 writersBigger (n writers & n techniques) experiments plannedPromising resultsWriter
identification through
information retrieval
Ralph Niels
Franc Grootjen
Louis VuurpijlSlide21
Writer identification
throughinformation
retrieval
Ralph Niels
Franc Grootjen
Louis Vuurpijl
A search engine for forensic experts