/
A Static Rank Framework for A Static Rank Framework for

A Static Rank Framework for - PowerPoint Presentation

lindy-dunigan
lindy-dunigan . @lindy-dunigan
Follow
398 views
Uploaded On 2016-06-27

A Static Rank Framework for - PPT Presentation

Lucene Solr Mike Schultz mikeschultzgmailcom Static Rank for Solr Lucene Dynamic Rank Why Static Rank Combining Scores Static Rank Components Multiple Fields Multiple Types ID: 379658

rank static pubdate query static rank query pubdate score dynamic textbody ediatype snews ears system combined years multiple custom isnews mediatype valuesource

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "A Static Rank Framework for" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

A Static Rank Framework for Lucene/Solr

Mike Schultzmike.schultz@gmail.comSlide2

Static Rank for Solr/Lucene

Dynamic RankWhy Static Rank

Combining

Scores

Static Rank ComponentsSlide3

Multiple Fields / Multiple Types

PubDate

I

sNews

M

ediaType

TextBody

Continuous (Date,

Int

, Float, …)Slide4

Multiple Fields / Multiple Types

PubDate

I

sNews

M

ediaType

TextBody

Continuous (Date,

Int

, Float, …)

Boolean

(

T

rue

,

False

)Slide5

Multiple Fields / Multiple Types

PubDate

I

sNews

M

ediaType

TextBody

Continuous (Date,

Int

, Float, …)

Boolean

(True

, F

alse

)

Enum

(Book, CD, DVD, Cassette)Slide6

Multiple Fields / Multiple Types

PubDate

I

sNews

M

ediaType

TextBody

Continuous (Date,

Int

, Float, …)

Boolean

(True

,

False

)

Enum

(Book, CD, DVD, Cassette)

Text (Natural Language)Slide7

Dynamic Rank

PubDate

I

sNews

M

ediaType

TextBody

TF * IDF

Query

Dynamic ScoreSlide8

Dynamic Rank

Query Dependent = F(Q,D)

PubDate

I

sNews

M

ediaType

TextBody

TF * IDF

Query

Dynamic ScoreSlide9

Dynamic Rank

Query Dependent = F(Q,D)Huge dynamic range (0.001-1502.3)

PubDate

I

sNews

M

ediaType

TextBody

TF * IDF

Query

Dynamic ScoreSlide10

Dynamic Rank

Query Dependent = F(Q,D)Huge dynamic range (0.001-1502.3)Not comparable across queries

PubDate

I

sNews

M

ediaType

TextBody

TF * IDF

Query

Dynamic ScoreSlide11

Dynamic Rank

Query Dependent = F(Q,D)Huge dynamic range (0.001-1502.3)Not comparable across queriesNot easily normalized

PubDate

I

sNews

M

ediaType

TextBody

TF * IDF

Query

Dynamic ScoreSlide12

Why Static Rank?

PubDate

I

sNews

M

ediaType

TextBody

Query

Static Rank

System

Static ScoreSlide13

Why Static Rank?

PubDate

I

sNews

M

ediaType

TextBody

Query

Static Rank

System

Static Score

All (dynamic) things equal, I want

Newer over olderSlide14

Why Static Rank?

PubDate

I

sNews

M

ediaType

TextBody

Query

Static Rank

System

Static Score

All (dynamic) things equal, I want

Newer over older

CD over cassetteSlide15

Why Static Rank?

PubDate

I

sNews

M

ediaType

TextBody

Query

Static Rank

System

Static Score

All (dynamic) things equal, I want

Newer over older

CD over cassette

Arbitrary feature A over arbitrary feature BSlide16

Static Rank

PubDate

I

sNews

M

ediaType

TextBody

Query

Static Rank

System

Query Independent = F(D)

i.e. static across queries

Static ScoreSlide17

Static Rank

PubDate

I

sNews

M

ediaType

TextBody

Query

Static Rank

System

Query Independent = F(D)

i.e. static across queries

More easily bounded

Static ScoreSlide18

Combined Rank

PubDate

I

sNews

M

ediaType

TextBody

TF * IDF

Query

Static Rank

System

Custom Query

Combined ScoreSlide19

Framework - Requirements

Custom Query

Combined Score

Intuitive, hand-tunable,

debuggableSlide20

Framework - Requirements

Custom Query

Combined Score

Intuitive, hand-tunable,

debuggable

Query-time only, no re-indexingSlide21

Framework - Requirements

Custom Query

Combined Score

Intuitive, hand-tunable,

debuggable

Query-time only, no re-indexing

Minimal parametersSlide22

Framework - Requirements

Custom Query

Combined Score

Intuitive, hand-tunable,

debuggable

Query-time only, no re-indexing

Minimal parameters

Static Rank should boost / demote

But not too much!

Docs should stay in their own dynamic rank “neighborhood”.Slide23

Combining Scores - Approaches

Custom Query

Combined Score

Addition?

Dynamic(0.0001) + Static(0.3) = 0.3001

Dynamic(1542.1) + Static(0.3) = 1542.4

Difficult to get right across queriesSlide24

Combining Scores - Approaches

Custom Query

Combined Score

Multiplication?

Dynamic(50.0) * Static(0.3) = 15.0

Dynamic(10.0) * Static(2.0) = 20.0

Could work, but awkwardSlide25

Combining Scores - Approaches

Linear Query

Combined Score

Bound

StaticScore

: -1.0 to 1.0

CScore

=

DScore

*(100+S%*

SScore

)

At most,

staticRank

will boost/demote

dynamicScore

by S%

CScore

= 0.014 * (100+30*0.5)

CScore

= 145.3 * (100+30*-0.5)Slide26

LinearQuerySlide27

Static Rank

PubDate

I

sNews

M

ediaType

TextBody

Query

Static Rank

System

Static ScoreSlide28

Static Rank

PubDate

I

sNews

M

ediaType

TextBody

Query

Static Rank

System

Static Score

Extend

solr.ValueSource

/Parser Slide29

Static Rank

PubDate

I

sNews

M

ediaType

TextBody

Query

Static Rank

System

Static Score

Extend

solr.ValueSource

/Parser

Uses field cache for inputsSlide30

Static Rank

PubDate

I

sNews

M

ediaType

TextBody

Query

Static Rank

System

Static Score

Extend

solr.ValueSource

/Parser

Uses field cache for inputs

Extremely fastSlide31

Static Rank

PubDate

IsNews

MediaTypeSlide32

Static Rank

PubDate

IsNews

MediaType

AgoValueSource

y

ears

agoSlide33

Static Rank

PubDate

IsNews

MediaType

MuxValueSource

0

T

F

AgoValueSource

y

ears

ago

y

ears

agoSlide34

MuxValueSource ConfigSlide35

Static Rank

PubDate

IsNews

MediaType

0

T

F

EnumValueSource

MuxValueSource

AgoValueSource

y

ears

ago

y

ears

agoSlide36

EnumValueSource Config

Maps Fixed-Vocabulary to YEARS AGO

A hierarchy and 3 values: MIN,0,MAX

All things equal (dynamically), DVD = +3.3 yearsSlide37

Static Rank

PubDate

IsNews

MediaType

0

T

F

SumValueSource

EnumValueSource

MuxValueSource

AgoValueSource

y

ears

ago

y

ears

ago

y

ears

ago

y

ears

ago

?

-1

1Slide38

Mapping YearsAgo to -1.0 – 1.0

Step Function: if > 10 years-ago = -1, else = +11 parameter

Too abruptSlide39

Mapping YearsAgo to -1.0 – 1.0

Step Function: if > 10 years-ago = -1, else = +11 parameter

Too abrupt

Linear

No parameters (fixed)

Too gradual over 2000+ yearsSlide40

Mapping YearsAgo to -1.0 – 1.0

Step Function: if > 10 years-ago = -1, else = +11 parameterToo abrupt

Linear

No parameters (fixed)

Too gradual over 2000+ years

Sigmoid

2 parameters

Smooth over entire range

Easy to calculateSlide41

Sigmoid

SlopeSlide42

Sigmoid

Slope

x

-intercept (year)Slide43

1.0

-1.0

Years-ago

x0 = 1.5 years agoSlide44

Static Rank

PubDate

IsNews

MediaType

0

T

F

SumValueSource

EnumValueSource

MuxValueSource

AgoValueSource

SigmoidValueSource

-1

1

y

ears

ago

y

ears

ago

y

ears

agoSlide45

SigmoidValueSource ConfigSlide46

Static Rank ConfigSlide47

Conclusions

olr.ValueSource/Parser - fast and flexibleSlide48

Conclusions

olr.ValueSource/Parser - fast and flexibleCScore =

DScore

* (100 + S% *

SScore

)

-1.0 <

SScore

< 1.0Slide49

Conclusions

olr.ValueSource/Parser - fast and flexibleCScore =

DScore

* (100 + S% *

SScore

)

-1.0 <

SScore

< 1.0 “Time” as a common currency for static features