Inventor Disambiguation Workshop

Inventor Disambiguation Workshop Inventor Disambiguation Workshop - Start

Added : 2016-11-15 Views :52K

Download Presentation

Inventor Disambiguation Workshop




Download Presentation - The PPT/PDF document "Inventor Disambiguation Workshop" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.



Presentations text content in Inventor Disambiguation Workshop

Slide1

Slide2

Inventor Disambiguation Workshop

EVALUATION OUTCOMES

Slide3

7 Participant Teams

U Mass Amherst IESL (USA)

Centre for European Economic Research (Mannheim, DE)

KU Leuven (Belgium)

Penn State University (USA)

InnovationPulse

(USA)

Centre for Transformative Innovation (CTI) at Swinburne University of Technology (Australia)

Institute of Scientific and Technical Information of China (China)

Slide4

Timeline

June 2 – Training datasets posted online

July 15 – ‘Intent to Participate’ deadline

August 30 – Initial submission deadline

Output dataset + documentation + runtime

September 1-4 – Phase 1 evaluation

September 5 - Notification of progression to Phase 2

September 18 – Phase 2 submission

2 output datasets + final documentation + AWS runtime

September 18 – Phase 2 evaluation

September 21 – Judges identify successful team

Slide5

Training/test datasets

Labeled datasets: human-validated inventor clusters (inventors a, b, c… on patents d, e, f… are definitely the same people)5 datasets generously provided by:Pierre Azoulay: de-identified academic life sciences (Azoulay 2007 and 2012)Ivan Png LinkedIN inventors (Chunmiam et al., 2015) Erica Fuchs, Sam Ventura: de-identified optoelectronics patents (Akinsanmi et al., 2014)Manuel Trajtenburg: Israeli inventors (Trajtenberg and Shiff, 2008)Francesco Lissoni, University of Bordeaux, EPO benchmark datasets

* Documentation available at

www.dev.patentsview.org

/workshop

Slide6

Dataset

Patent-Inventor Records Unique InventorsReferenceOE98,762824Akinsanmi et al, 2014ALS42,3764,801Azoulay, 2007; Azoulay, 2012IS9,1563,845Trajtenberg and Shiff, 2008E&S96,10414,293Chunmian et al., 2015EPO1,922; 1,088424; 312Lissoni et al., 2010

Documentation available at http://

www.dev.patentsview.org

/workshop/data/

README.pdf

Slide7

Phase 1 Evaluation

5 teams successfully submitted a full output dataset of inventor clusters

Output datasets were evaluated using 4 withheld labeled datasets:

Azoulay

(full dataset, as it was

deidentified

in the training version)

Png

(~20% of the data)

Trajtenberg

(~20% of the data)

Azoulay

– common names (a subset of the larger dataset containing only common names)

Metrics calculated:

Precision, lumping, recall, splitting, F1 score

Algorithm runtime also reported but with different environments, this is not a valid metric

Judges reviewed the results and identified 3 teams to proceed to Phase 2 evaluation

Slide8

Slide9

Team

Test Data

Precision

Recall

F1

Average F1

CEER

als

0.999401347

0.891268353

0.942242627

CEER

ens

1

0.778727154

0.875600456

CEER

is

0.996245978

0.922224061

0.957806995

CEER

als_common

0.999989575

0.77409957

0.872663589

0.912078417

Innovation Pulse

als

0.991784517

0.655106334

0.789031427

Innovation Pulse

ens

0.997609642

0.658972609

0.793679189

Innovation Pulse

is

0.998310811

0.637434664

0.778064712

Innovation Pulse

als_common

0.99363578

0.638165016

0.777182601

0.784489482

ISTIC

als

0.996649248

0.954090692

0.974905728

ISTIC

ens

0.99947885

0.921752133

0.959043207

ISTIC

is

0.996926117

0.751365216

0.856900212

ISTIC

als_common

0.984901199

0.934397521

0.958984893

0.93745851

PSU

als

0.999297

0.642464098

0.782102155

PSU

ens

1

0.661001562

0.795907213

PSU

is

0.999578059

0.588035744

0.740466764

PSU

als_common

0.999986296

0.588888979

0.741255047

0.764932795

UMass

als

0.99888066

0.976346914

0.987485253

UMass

ens

1

0.966156229

0.982786835

UMass

is

0.998875335

0.955320205

0.976612392

UMass

als_common

0.996885177

0.963393671

0.979853322

0.98168445

Current PatentsView

 

 

 

 

 

Fleming/Li

als

0.999043089

0.885710148

0.938969182

Fleming/Li

ens

1

0.812357315

0.896464851

Fleming/Li

is

0.998781859

0.881929505

0.936725547

Fleming/Li

als_common

0.998039234

0.883168029

0.93709647

0.927314013

Slide10

Phase 2 Evaluation

3 teams progressed to Phase 2:

CEER, UMass Amherst, ISTIC/China

CEER withdrew

Set up identical AWS evaluation environments

2 training datasets provided

Random sample from all Phase 1 evaluation datasets

Subset of Phase 1 evaluation datasets with similar characteristics to full USPTO dataset

Inventors/patents, % missing assignees,

Algorithms trained and run on full USPTO dataset (twice)

Algorithm documentation provided (user manual and manuscript)

O

utput datasets evaluated with Phase 1 metrics

Judges’ final determination of successful algorithm based on:

F1 score

Run time

Usability (based on documentation)

Slide11

Team

A

Team B

FLEMING/Li

Results when trained on random mixture dataset

Results when trained on common characteristics dataset

Results when trained on random mixture dataset

Results when trained on common characteristics dataset

No training

Precision

0.999709

0.999719

0.998488

0.991932

0.999941418

Splitting

0.033936

0.033358

0.116845

0.103866

0.184882461

Recall

0.966064

0.966642

0.883155

0.896134

0.815117539

Lumping

0.000281

0.000271

0.001337

0.007289

4.78E-05

F score

0.982599

0.982903

0.937287

0.941603

0.898119352

True Positives

384367

384597

351380

356544

324310

False Negatives

13502

13272

46489

41325

73559

False Positives

112

108

532

2900

19

Runtime

7 hours on c3.8xlarge AWS instance

7 hours on c3.8xlarge AWS instance

N/A

(CPU usage topped at 69%)

(CPU usage topped at 11.85%)

Slide12

test data

precision

recall

Fscore

Team B

Results when trained on random mixture dataset

eval_als

0.998171124

0.949989322

0.973484408

eval_als_common

0.993201838

0.920831551

0.955648522

eval_ens

0.999463175

0.894823972

0.944253473

eval_is

0.995252994

0.763279828

0.863966284

Results when trained on common characteristics dataset

eval_als

0.996064686

0.961459702

0.978456322

eval_als_common

0.982322588

0.963280689

0.972708456

eval_ens

0.998730234

0.903074643

0.948496831

eval_is

0.995673975

0.837911633

0.910005841

Team A

Results when trained on random mixture dataset

eval_als

0.998816547

0.975242586

0.986888808

eval_als_common

0.996589319

0.959737881

0.977816513

eval_ens

1

0.968265624

0.983876985

eval_is

0.998879793

0.959126262

0.978599468

Results when trained on common characteristics dataset

eval_als

0.998895036

0.977163092

0.987909564

eval_als_common

0.997035976

0.966411918

0.981485124

eval_ens

1

0.967544691

0.983504665

eval_is

0.998879117

0.958547079

0.978297585

Slide13

Slide14

Slide15

Slide16

Slide17


About DocSlides
DocSlides allows users to easily upload and share presentations, PDF documents, and images.Share your documents with the world , watch,share and upload any time you want. How can you benefit from using DocSlides? DocSlides consists documents from individuals and organizations on topics ranging from technology and business to travel, health, and education. Find and search for what interests you, and learn from people and more. You can also download DocSlides to read or reference later.
Youtube