/
Netflix and Beyond Tuning Netflix and Beyond Tuning

Netflix and Beyond Tuning - PowerPoint Presentation

hirook
hirook . @hirook
Follow
345 views
Uploaded On 2020-11-06

Netflix and Beyond Tuning - PPT Presentation

Solr for great results Walter Underwood http wunderwoodorgmostcasualobserver Typical Web Query Mix informational navigational knownsite transactional knownitem Andrei Broder ID: 816775

queries words finding traffic words queries traffic finding click netflix item clicks good incredibles rank meet problematic matches search

Share:

Link:

Embed:

Download Presentation from below link

Download The PPT/PDF document "Netflix and Beyond Tuning" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Netflix and Beyond

Tuning

Solr

for great results.

Walter Underwood

http://

wunderwood.org/most_casual_observer

/

Slide2

Typical Web Query Mixinformational

navigational (known-site)transactional (known-item)

(Andrei

Broder

, AltaVista, 2002)

Slide3

“talking rat movie”

Slide4

Top Queries October 2006

finding neverlandbridget jones

closer

the

incredibles

incredibles

ladder 49

fat

albertbeing juliaraynational treasure

alfie

spanglish

star wars

meet the

fockers

final cut

hotel

rwanda

neverland

after the sunset

million dollar baby

hitch

Slide5

Netflix Queries92% movie titles5% genres and categories

3% peopleKnown-item queries make up 95% of Netflix traffic.

Slide6

Slide7

Problematic User BehaviorOne or two words?Partial words

Misspellings

Slide8

One or Two Words?

Slide9

Partial WordsPeople

don’t like to make mistakes:rat, rata, ratatapoc

koyaanisq

Phonetic encoding (

soundex

) assumes complete words

Slide10

Autocomplete Finishes Words

Load movie titles and popular people10% improvement in search quality (MRR)10X as much traffic as search queriesDedicated

Solr

with

RAMDirectory

Front-end HTTP cache, 1 hour lifetime, 80% hit rate

Slide11

Some Misspellings

shakespearthe incredablesseven

samarai

breakfast at

tiffiney

blazing

sadles

selen

scorupkotaekuchristopher

walkin

return to

lonsom

dove

teh

matrix

comdy

tv

pirhana

dungens

and dragons

pufi

yami

al

pachino

incredables

gundan

seed mobile suit

chatterluy

white

fany

to the

rsecue

meet the

faulkers

brigette

joes

diary

oh brother where are thou?

pirartes

of the

carr

Slide12

Switch from Phonetic to FuzzyTested a dozen algorithms with users

250K queries per test cellJaroWinkler slightly better than Levenstein

JaroWinkler

with 0.7 is very, very broad match

koyaanisqatsi

” matches “

koy

” (yuck!)but “1048” matches “1408”

Slide13

Problematic Corpus Behavior

Missing moviesOllie Hopnoodle’s Haven of BlissCJ7

Hard-to-spell names

Ratatouille

Coraline

Inglourious

Basterds

Hard-to-remember namesClickApocalyptoSeven Up Plus Seven

Slide14

Slide15

Metrics: MRRMean Reciprocal Rank

Weighted clickthrough, measured on site traffic#1 is a full click

#2 is a half click

#3 is one third click

etc.

Daily, weekly, and seasonal variations

Overall customer satisfaction

Good for A/B tests, weak for finding bugs

Slide16

Per-query MetricsUseful for finding problems

MRRClickthrough percentMost-clicked rank (#1 is good)Percentage of clicks on most-clicked

known-item queries are

over 80

%

categories are under

50

%

Slide17

Success Looks Like …MRR consistently over 0.5

85% of clicks on #1

Slide18

Questions?