/
Writer  Identification  and  Verification Writer  Identification  and  Verification

Writer Identification and Verification - PowerPoint Presentation

stefany-barnette
stefany-barnette . @stefany-barnette
Follow
342 views
Uploaded On 2020-01-26

Writer Identification and Verification - PPT Presentation

Writer Identification and Verification for online Handwriting Sachin Gupta sachingstudentsiiitacin Advisor Dr Anoop M Namboodiri Handwriting Graphical representation of thoughts Using predefined symbols ID: 773866

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Writer Identification and Verificatio..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Writer Identification and Verification for online Handwriting Sachin Gupta (sachin_g@students.iiit.ac.in) Advisor Dr. Anoop M. Namboodiri

HandwritingGraphical representation of thoughts Using predefined symbolsStill used frequently (e.g., note taking)An acquired skillYears of habituation and practiceComplex generation processNeuromuscular perceptual-motor taskHand contains some 27 bones and 40 muscles

Handwriting IdentificationHandwritten documents have associated identity Handwriting IdentificationStudy of writership of the documentsComparison with reference handwritten documents

Individuality (example)X Y X

Recognition Vs IdentificationHandwriting RecognitionTo automatically understand the underline text in the documentDesign of automated handwritten document reading systemsSuppress variation due to writer or handwriting styleHandwriting IdentificationStudy to determine the writer of the documentEnhance the variation due to different handwriting styles

Problem StatementWriter IdentificationIdentify writer of a questioned document Given pool of writersWriter VerificationVerify whether the claimed identity is right?Given: Data based of writersForensic Document AnalysisVerify whether two given documents are written by same person?

Identification Reference Data Base Questioned Document 35 50 65 Matching Score Result Writer - 3 Comparisons Who wrote this document? 1: N Matching

Verification Reference Data Base Questioned Document Mayank: I wrote this document !!! Mayank Sachin Amit Comparator Distance < Threshold Yes NO Threshold: decided based on training documents’ Within and Between writer distance distributions 1: 1 Matching

Individuality FeaturesSub-character and character levelShape and sizeChoice of allographWord levelConnections and character spacingAspect RatioLine levelSlant and slopeWord spacingParagraph and page level Indentations and arrangements of text Uniformity of margins W1 W2 Character Level Individuality W1 W2 Word Level Individuality

Line and Paragraph Level Writer-1 Writer-2 Slant and Slope of lines Parallelism of Lines Word Spacing – number of words in a line Uniformity of Margins Overall Texture

Challenges High within writer variations Due to mood dependent nature of handwriting No two piece of handwriting by any individual are same Low between writer variations Handwriting must be readable Degree of variations are low

Online Vs OfflineOfflineMatrix of integersOnly shape and size information is availableTemporal information about how stroke is drawn is lostOnlineSequence of X-Y coordinates, Pen up-down eventsShape and size information is availableSequencing of points and strokes is available

Data collection and AnnotationMajor HurdleSequential process: Devices needed for online handwritingPeople are reluctant to writingStandard databases are not availableOnline handwriting collection devices are not accurateAutomatic segmentation and annotationResearch problemData collection600 pages of data from around 50 writers in various scripts

State of the ArtDone by handwriting expertsMostly manuallyState of art systems are not availableUsing Context dependent information such as origin, type and condition of the documentsDifficult to model mathematically

ThemeIdentifying consistent features automaticallyTo discriminate between writersUsability of discriminating featuresPreserve discrimination

Major ContributionsText-independent writer identificationDesigning codebook of writersAutomatically identifying and extracting discriminating features Text-dependent writer verification Writer-specific text generation Robust to forgery Forensic document examination Repudiation detection in handwritten documents

Text-independent writer identification

Text-independent ? Underline text is not knownData is not annotatedGiven: Sequence of strokes and x-y coordinate values Challenges of text-independent Extract consistent curves (features) from documentsCompare similar features between two documents Design codebook of individual writers

Consistency…X Y

Codebook of a writer Six different clusters extracted from Devanagari script.

Theoretical backgroundHandwriting modeling studiesStrokes is the combination of different forcesHandwriting curves become consistent due to habituationRelative velocity points of strokes are constant for same writer (Empirical results) Velocity Profile of above stroke Stroke from Devanagari Script

ClassifierSoft Classification NN 1 NN 2 NN 3 NN n ……. Combined Result Classify Writers Summarized framework Questioned document Cluster into different clusters Writer Classification

ResultsExperimented withRoman, Hindi, Cyrillic, Arabic and Hebrew Training dataApprox. 300-400 curves for Roman Approx. 700-800 curves for othersTest Data100 curves for Roman200-300 curves for others Tables and graphs are on next page…..

Varying No of CurvesAccuracy increases with number of curves.>85% accuracy reached with 200 curves (10-12 words). Accuracy with 12 words

Script Vs Accuracy ~10 writers for all scripts For Most Scripts Top-2 accuracy is nearly 100% except Chinese Confusion between pairs of writers

Related workLine level featuresWord spacingLower and Upper profileFractal & wavelet features Loops and BlobsParagraph level featuresImage processingGrey scale histogram Run length coding Fractal image compression Texture features Gabor filter, Wavelet Contour-let GGD Grey scale covariance matrix Online features Pen pressure, velocity, azimuthVelocity of Bary centerCodebook generationUsing directional featuresOur approachCode book design using Sub-character features Script independent frameworkOnline handwriting data Identification with less amount of dataAutomatic Identification of consistent and discriminating features

Result comparisonSchomaker et al[28]Combination of directional, texture and image processing features Identification: accuracy of 87% with 900 writersVerification: Equal error rate of 3%-8%Test Data size: 1 page of handwritten dataOur approach[5]Using shape based featuresIdentification accuracy of ~85% with 15 writersTest data size: 12 words (1 line)

AnalysisShape and size based primitives Obtain reasonable accuracy with simple algorithm.Chinese scriptMost of the strokes are straight line segmentInter-stroke relations based features can be usedTo increase accuracyRobust clustering and classification algorithmFusion with high level like line and paragraph primitive

Text dependent writer Verification

Problem StatementText-independent systems Large amount of data needed Text-dependent framework Higher Accuracy Small amount of data needed Problems (Text-dependent systems) Forgery (due to fixed text known in advance) Authentication text not known (usually random text is used)

Signature Vs Text-dependentSignature and Text-dependent handwritingVariations are unlimited, signature need not be readableWriter consciously tries to write the same signatureChallengesDiscrimination between Within and Between writer variation has to be done Discriminating distance method have to find out

System SpecificationEmpirical finding Discriminating power of primitives vary for individuals Primitives : sub-characters, characters, words, etc. System Specifications Writer – specific text For higher accuracies With limited amount of text Varying text across multiple authentication Robust to forgery

Boosting?Classifier combination method Combines weak classifiers to generate a accurate learning algorithm Greedy algorithm Select weak classifiers on each stage based on previously selected classifier Maintains a distribution of weights over training samples

FrameworkVerification as 2-class problem Positive samples Vs Negative samples Given Set of writers and primitives Table of discriminating power Randomness is included at each stage Proportional to the Discriminating power of the classifier More Discriminating: more probable to be accepted

Text Generation Process Bag of Primitives List of Writers W1 W2 W3 W4 W5 W6 Randomness is included at selection process. Threshold selected Is biased: accepting the writer For lower False Rejection Rates Fix Threshold and Reject Writers Select it or not? Accuracy

Effect of Boosting Distance Probability X1 Within writer Distance Between writer Distance Number of Boosting Stages

Dynamic Time Warping Naïve Alignment Re-sampled series DTW Alignment Time Series Alignment Dynamic Programming Approach Different length feature vectors can be compared

Stroke ComparisonDynamic Time WarpingAlignment of stroke done using dynamic programming Directional featuresStrokes representation: 12 Bins of curvature directionsCurvature angle: Different between adjacent tangents direction 1 1 2 3 3 4 3 0 0 0 0 1 0 360

ResultsExperimented with English script (20 writers) and Hindi script(10 writers)DTW and Directional feature extraction methods are usedEach user written about 10-12 words each3 fold cross-validation is used

Performance measuresFalse acceptance ratePercentage of user forge user those are acceptedShould be lower for forensic applicationSecurity is the major concernFalse rejection ratesPercentage of genuine users those are rejectedShould be lower for civilian applicationsUsability is the major concern

False Accept Rate (Directional Feature)

False Reject Rate(Directional Features)

False Accept Rate (DTW)

False Reject Rate(DTW)

DefinitionThreshold-1Control the range of variations within writersDecided based on positive samplesThreshold-2Confidence before rejecting other writers (negative samples)Lower threshold-2 == Higher confidence

Effect of thresholds..(DTW and Hindi script)

Effect of thresholds.. (DTW and Hindi script)

No. of word comparisons..(DTW & Hindi script)

Effect of thresholds.. (Directional feature and Hindi script)

Effect of thresholds.. (Directional feature and Hindi script)

Effect of thresholds.. (Directional features and English script)

Effect of thresholds.. (Directional features and English script)

No. of word comparisons..(Directional & Hindi script)

No. of word comparisons..(Directional & English Script)

Number of writers Vs Accuracy(English)

Number of writers Vs Accuracy(Hindi Script)

Analysis and Summary Writer-specific text generation framework Automatic text generation Automatic threshold generation Text is Varied Robust to forgery

Related workFeaturesCharacter levelGSC featuresStructural featuresDirectional features Word levelWord model recognitionShape curvatureShape contextMorphological features Feature selection Static feature selection PCA based discriminating power Our approach Writer-specific text generation Boosting based framework Text variation Higher accuracy with limited amount of data

ComparisonSrihari et al.[17]Shape context, Shape curvature, GSC features, WMR featuresPerformance: 42%, 22%, 62% and 28% respectively (1000 writers)Test data size- 10 wordsOur approachDirectional features Performance: 95% (20 writers) Test data size: 5 words

Repudiation Detection in Handwriting Documents

Traditional writer identification Vs QDEAssumption of Natural HandwritingBiometrics TermsRepudiation (Negative Biometrics)Forgery (Positive Biometrics)Quantity and quality of data availableCost factor involved Used as expert witness in legal Verdict

RepudiationThe rejection or renunciation of a duty or obligation (as under a contract)Merriam-Webster's Dictionary of LawHandwriting Repudiation Deliberately alter his natural handwriting to avoid detection To deny involvement in the case

Repudiation Comparator Calculate Distance Significant Distance? 1 : 1 Matching Questioned Document Data Base Reference Document Same Writer ? Different Writers ? Hypothesis Testing Written by same writer? No Database Dis

Verify whether given documents written by same person or differentwithout assuming Natural Handwriting

hard problem? Normal Handwriting Repudiated Handwriting

ChallengesWith in writer variations become highBetween-writer variations become less as compared.Learning can’t be done as data is not available.

Ray of HopeOne can’t exclude from one’s own writing, those discriminating elements of which he/she is not awareMaximum and minimum velocity points remain the same in-spite of absolute velocity. Words have significant overlap at sub-character level.

Framework Statistically significant score between two documents. Utilize online information that can be availableNo assumptions about distribution of data. May lead to erroneous conclusions.

AssumptionsQuestioned and reference document either have significant overlap or are same at word level. Reference document is collected in online mode.

System FrameworkHypothesis Testing Word Segmentation Word Comparison

Hypothesis Testing To calculate significance of distance between two distributions.According to Neyman Pearson paradigm H0 : Documents written by same writer (Null Hypothesis) H1 : Document written by different writers (Alternative Hypothesis) Intra-document word distances and inter-document word distances are two distribution to be compared. Distributions are compared to find out whether they are generated from same population.

Distribution ComparisonKL divergence test (make assumptions on nature of distribution)Kolmogorov Smirnov Test (don’t make any assumptions)

ResultsData being collected from 23 different users in English. Each users 3 pages of normal data and 3 pages of repudiated data is collected.Preprocessing: Words are segmented using semi-automatic toolkit for word segmentation.

ResultsIntra-document distance Inter-document distance

ROC CurveGenuine Rejection – 82% @ Genuine Acceptance – 100%

Analysis of Results Semi automatic System Used as an aid to expert Null Hypothesis is never accepted without expert intervention. -1 1 0 Similar Different strong probability of identification probable indications no conclusion indications did not probably did not strong probability did not Scale Used by Forensic Experts

Conclusion and Future workLearning based framework to learn similarity, in-spite of discrimination between documents.Can we tell whether writer is trying to repudiate.Framework which can learn more features and can give independent scores on each feature.

ConclusionsProposed algorithms for automatic identification and extraction of discriminating features for online handwritingFramework proposed for writer-specific text generation and text variations for text-dependent systems Introduced the problem of repudiation and proposed a hypothesis testing based framework for the same

Sachin Gupta and Anoop M. Namboodiri, Repudiation Detection in Handwritten Documents Proc of The 2nd International Conference on Biometrics (ICB'07), PP. 356-365 Seoul, Korea, 27-29 August, 2007. Anoop M. Namboodiri and Sachin Gupta Text Independent Writer Identification from Online Handwriting , International Workshop on Frontiers in Handwriting Recognition(IWFHR'06), October 23-26, 2006, La Baule, Centre de Congress Atlantia, France. Sachin Gupta and Anoop M. Namboodiri Text dependent Writer Verification using Boosting , submitted to International Conference on Frontiers in Handwriting Recognition (ICFHR’08), Montreal, Canada Sachin Gupta and Anoop M. Namboodiri Text dependent Writer Verification , planned in IEEE Transactions on Information Forensics and Security , 2008Publications

Future workFusion of online and offline features for higher accuraciesCan we automatically detect person intention to repudiate or forge Based on single documentMore robust algorithms for feature extractionDifferent than standard feature selection approaches

Thanking You  gupta.sachin25@gmail.com

Representation: Incident Angle [1]Curvature [2-4]Size [5-8] Where: S j be the j th primitive C k be the kth cluster Wi be the i th Writer is the discriminability of the kth cluster for the ith writer. Proposed framework Online Text Document Critical Points: Minimum and Maximum velocity points. Shape curve: Curve between any two consecutive minimum velocity points. Velocity Profile of above stroke Stroke from Devanagari Script 1 4 3 2 5 6 7 8 Consistent Primitive Repeating curves Extraction Unsupervised learning algorithms Experimental setup K-Means Six different clusters extracted From Devanagari script. Curve Extraction Representation Characteristic curve Extraction Writer Identification

Number of Writers Vs Accuracy Accuracy Number of writers Results for Devanagari Script Accuracy dependent on the individuality of specific writer

Proposed Framework (example)

Framework (Authentication)

Writer-specific Text GenerationGiven A bunch of primitives Varying discriminating power for different pairs of writers Aim To select the optimal set of weights for primitives To discriminate specific writer from others Dynamic Feature selection Static feature selection achieve single optimum

Writer-specific Text Generation Text Variation require features robust to forgery Handwriting can have different optimums Different combination of handwriting can provide desired results

Boosting Algorithm Given set of training samples(X) and underline labels(Y) Set of weak hypothesis (h) Initialize weights distribution(D) ( over training samples ) Select weak hypothesis h j , such that m – total number of training samples t - boosting stage

Boosting Update weights Where, Final Hypothesis -- Weight of the classifier t - boosting stage T– total number of Boosting stages

Discriminating Power of primitives

Text Generation ProcessWriter Weights X1 0.18 0.16 0.09 X2 0.17 0.03 0.01 X3 0.25 0.14 0.32X40.100.22 0.34X50.120.05 0.02X60.180.40 0.22 WriterWeights X1 0.35 0.35 0.45X3 0.200.120.01X4 0.300.400.52X6 0.150.130.02 Writer Weights X1 0.60 0.750.99 X4 0.40 0.25 0.01 Rejected Writer Distance Probability X1 X2 X3 X4 X5 X6 Rejected Writer Distance Probability X1 X3 X4 X6 Rejected Writer Distance Probability X1 X4 Randomness is included at each stage. Each classifier might be rejected Based on discriminating power. Threshold is Biased towards accepting writer Writer specific thresholds Rejection at any stage will also reject claims Calculate Threshold Select or not?

Normal HandwritingRepudiated Handwriting Repudiated writer - 1 Repudiated writer - 2 Normal writer - 1 Normal writer - 2 Why Repudiation is hard problem? I am confused

Word ComparisonSub-character Information Critical Point Matching DTW Matching