Lucene Solr Mike Schultz mikeschultzgmailcom Static Rank for Solr Lucene Dynamic Rank Why Static Rank Combining Scores Static Rank Components Multiple Fields Multiple Types ID: 379658
Download Presentation The PPT/PDF document "A Static Rank Framework for" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
A Static Rank Framework for Lucene/Solr
Mike Schultzmike.schultz@gmail.comSlide2
Static Rank for Solr/Lucene
Dynamic RankWhy Static Rank
Combining
Scores
Static Rank ComponentsSlide3
Multiple Fields / Multiple Types
PubDate
I
sNews
M
ediaType
TextBody
Continuous (Date,
Int
, Float, …)Slide4
Multiple Fields / Multiple Types
PubDate
I
sNews
M
ediaType
TextBody
Continuous (Date,
Int
, Float, …)
Boolean
(
T
rue
,
False
)Slide5
Multiple Fields / Multiple Types
PubDate
I
sNews
M
ediaType
TextBody
Continuous (Date,
Int
, Float, …)
Boolean
(True
, F
alse
)
Enum
(Book, CD, DVD, Cassette)Slide6
Multiple Fields / Multiple Types
PubDate
I
sNews
M
ediaType
TextBody
Continuous (Date,
Int
, Float, …)
Boolean
(True
,
False
)
Enum
(Book, CD, DVD, Cassette)
Text (Natural Language)Slide7
Dynamic Rank
PubDate
I
sNews
M
ediaType
TextBody
TF * IDF
Query
Dynamic ScoreSlide8
Dynamic Rank
Query Dependent = F(Q,D)
PubDate
I
sNews
M
ediaType
TextBody
TF * IDF
Query
Dynamic ScoreSlide9
Dynamic Rank
Query Dependent = F(Q,D)Huge dynamic range (0.001-1502.3)
PubDate
I
sNews
M
ediaType
TextBody
TF * IDF
Query
Dynamic ScoreSlide10
Dynamic Rank
Query Dependent = F(Q,D)Huge dynamic range (0.001-1502.3)Not comparable across queries
PubDate
I
sNews
M
ediaType
TextBody
TF * IDF
Query
Dynamic ScoreSlide11
Dynamic Rank
Query Dependent = F(Q,D)Huge dynamic range (0.001-1502.3)Not comparable across queriesNot easily normalized
PubDate
I
sNews
M
ediaType
TextBody
TF * IDF
Query
Dynamic ScoreSlide12
Why Static Rank?
PubDate
I
sNews
M
ediaType
TextBody
Query
Static Rank
System
Static ScoreSlide13
Why Static Rank?
PubDate
I
sNews
M
ediaType
TextBody
Query
Static Rank
System
Static Score
All (dynamic) things equal, I want
Newer over olderSlide14
Why Static Rank?
PubDate
I
sNews
M
ediaType
TextBody
Query
Static Rank
System
Static Score
All (dynamic) things equal, I want
Newer over older
CD over cassetteSlide15
Why Static Rank?
PubDate
I
sNews
M
ediaType
TextBody
Query
Static Rank
System
Static Score
All (dynamic) things equal, I want
Newer over older
CD over cassette
Arbitrary feature A over arbitrary feature BSlide16
Static Rank
PubDate
I
sNews
M
ediaType
TextBody
Query
Static Rank
System
Query Independent = F(D)
i.e. static across queries
Static ScoreSlide17
Static Rank
PubDate
I
sNews
M
ediaType
TextBody
Query
Static Rank
System
Query Independent = F(D)
i.e. static across queries
More easily bounded
Static ScoreSlide18
Combined Rank
PubDate
I
sNews
M
ediaType
TextBody
TF * IDF
Query
Static Rank
System
Custom Query
Combined ScoreSlide19
Framework - Requirements
Custom Query
Combined Score
Intuitive, hand-tunable,
debuggableSlide20
Framework - Requirements
Custom Query
Combined Score
Intuitive, hand-tunable,
debuggable
Query-time only, no re-indexingSlide21
Framework - Requirements
Custom Query
Combined Score
Intuitive, hand-tunable,
debuggable
Query-time only, no re-indexing
Minimal parametersSlide22
Framework - Requirements
Custom Query
Combined Score
Intuitive, hand-tunable,
debuggable
Query-time only, no re-indexing
Minimal parameters
Static Rank should boost / demote
But not too much!
Docs should stay in their own dynamic rank “neighborhood”.Slide23
Combining Scores - Approaches
Custom Query
Combined Score
Addition?
Dynamic(0.0001) + Static(0.3) = 0.3001
Dynamic(1542.1) + Static(0.3) = 1542.4
Difficult to get right across queriesSlide24
Combining Scores - Approaches
Custom Query
Combined Score
Multiplication?
Dynamic(50.0) * Static(0.3) = 15.0
Dynamic(10.0) * Static(2.0) = 20.0
Could work, but awkwardSlide25
Combining Scores - Approaches
Linear Query
Combined Score
Bound
StaticScore
: -1.0 to 1.0
CScore
=
DScore
*(100+S%*
SScore
)
At most,
staticRank
will boost/demote
dynamicScore
by S%
CScore
= 0.014 * (100+30*0.5)
CScore
= 145.3 * (100+30*-0.5)Slide26
LinearQuerySlide27
Static Rank
PubDate
I
sNews
M
ediaType
TextBody
Query
Static Rank
System
Static ScoreSlide28
Static Rank
PubDate
I
sNews
M
ediaType
TextBody
Query
Static Rank
System
Static Score
Extend
solr.ValueSource
/Parser Slide29
Static Rank
PubDate
I
sNews
M
ediaType
TextBody
Query
Static Rank
System
Static Score
Extend
solr.ValueSource
/Parser
Uses field cache for inputsSlide30
Static Rank
PubDate
I
sNews
M
ediaType
TextBody
Query
Static Rank
System
Static Score
Extend
solr.ValueSource
/Parser
Uses field cache for inputs
Extremely fastSlide31
Static Rank
PubDate
IsNews
MediaTypeSlide32
Static Rank
PubDate
IsNews
MediaType
AgoValueSource
y
ears
agoSlide33
Static Rank
PubDate
IsNews
MediaType
MuxValueSource
0
T
F
AgoValueSource
y
ears
ago
y
ears
agoSlide34
MuxValueSource ConfigSlide35
Static Rank
PubDate
IsNews
MediaType
0
T
F
EnumValueSource
MuxValueSource
AgoValueSource
y
ears
ago
y
ears
agoSlide36
EnumValueSource Config
Maps Fixed-Vocabulary to YEARS AGO
A hierarchy and 3 values: MIN,0,MAX
All things equal (dynamically), DVD = +3.3 yearsSlide37
Static Rank
PubDate
IsNews
MediaType
0
T
F
SumValueSource
EnumValueSource
MuxValueSource
AgoValueSource
y
ears
ago
y
ears
ago
y
ears
ago
y
ears
ago
?
-1
1Slide38
Mapping YearsAgo to -1.0 – 1.0
Step Function: if > 10 years-ago = -1, else = +11 parameter
Too abruptSlide39
Mapping YearsAgo to -1.0 – 1.0
Step Function: if > 10 years-ago = -1, else = +11 parameter
Too abrupt
Linear
No parameters (fixed)
Too gradual over 2000+ yearsSlide40
Mapping YearsAgo to -1.0 – 1.0
Step Function: if > 10 years-ago = -1, else = +11 parameterToo abrupt
Linear
No parameters (fixed)
Too gradual over 2000+ years
Sigmoid
2 parameters
Smooth over entire range
Easy to calculateSlide41
Sigmoid
SlopeSlide42
Sigmoid
Slope
x
-intercept (year)Slide43
1.0
-1.0
Years-ago
x0 = 1.5 years agoSlide44
Static Rank
PubDate
IsNews
MediaType
0
T
F
SumValueSource
EnumValueSource
MuxValueSource
AgoValueSource
SigmoidValueSource
-1
1
y
ears
ago
y
ears
ago
y
ears
agoSlide45
SigmoidValueSource ConfigSlide46
Static Rank ConfigSlide47
Conclusions
olr.ValueSource/Parser - fast and flexibleSlide48
Conclusions
olr.ValueSource/Parser - fast and flexibleCScore =
DScore
* (100 + S% *
SScore
)
-1.0 <
SScore
< 1.0Slide49
Conclusions
olr.ValueSource/Parser - fast and flexibleCScore =
DScore
* (100 + S% *
SScore
)
-1.0 <
SScore
< 1.0 “Time” as a common currency for static features