Ryan Gehring Indiegogo Practical Search for Rubyists Elasticsearch SOLR alternatives roundup Essential plugins you need to install today Semi SOA search design Schemaless ID: 354311
Download Presentation The PPT/PDF document "Search Stack Secrets" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Search Stack Secrets
Ryan
Gehring
-
IndiegogoSlide2
Practical Search for Rubyists
Elasticsearch
/ SOLR / alternatives roundup.
Essential plugins
you
need to
install today.
Semi SOA search design.
Schemaless
is for amateurs!
Mappings
=
friend.
Problem solving with analyzers.
Avoiding Tire
DSL- query
json
ingredients.Slide3
Elasticsearch v SOLR v …
Horizontal scalability
GREAT API
Developer support (analyzers, etc.)
Downside: slightly less great ruby
client.Slide4
Awesome Plugins
elasticsearch
-head
A web front end for an
ElasticSearch
cluster
http://
mobz.github.com/elasticsearch-head
Elasticsearch JDBC riverhttps://github.com/jprante/elasticsearch-river-jdbc
ElasticSearch
Paramedic
Paramedic is a simple yet sexy tool to monitor and inspect
ElasticSearch
clusters.Slide5
One solid service-y and Rails 4-approved design
Webform
in view supplies GET parameters, submits to a search controller.
Search controller okays the proper, permissioned parameters via strong parameters, instantiates a search object.
Search model
translates parameters into a query --- either using Tire (the ruby client) or JSON.
Query fired and results are served!Slide6
Mappings + Analyzers: Ingredients for Success!
Elasticsearch
is
schemaless
by default, but you can optimize by providing a schema.
What fields to index,
How to
analyze+tokenize fields.These analyzers help a lot!Slide7
Problem solving with analyzers
My search isn’t robust to misspellings!
N-gram
Edge n-gram
My search isn’t robust to plurals / caps / whitespace/ etc.
Snowball (
standard+lowercase+some
english language stemming + stopwording)
I can only solve one of these at once!Multi field analysis.Slide8
Problem solving with boosts
Boosts are a concept from
Lucene
; they are multipliers on scores.
You can set the relative importance of matching fields: example: title -> 10, vs.
free_text
-> 1
You can set the relative importance of matching on ANALYZED fields: example: ngram_title -> 6, snowball_title -> 10.
Bonus for fields with exact token matches.Slide9
Key queries in Elasticsearch
Filtered Query:
Apply binary filters to an arbitrary query; try it with the
query_string
query type for full text, analyzed search queries + filters.
Custom Score Query
Provide the exact equation for scoring --- you can take mathematical transforms of variables using MVAL or even python with the right plugin.Slide10
Theoretical Section
Integrating models via custom scoring.
Learning models – a qualitative,
quantitiative
process.
Data sources and paradigms.
Key metrics for search.
Monitoring statistical model performance.Slide11
Custom score queries are regression equations.
You can use supervised learning methods to train them over time like Google.Slide12
Statistical learning & search.
Clickstream models
Logistic regression
Binary target, click no click
Learn boosts, coefficients, etc.
Paired comparison models
Logistic regression
Binary target, A > BLearn boosts, coefficients, etc.Slide13
Search model training is a qualitative-first process.
Review search algorithms before you push them.
Have other people review search results before you push them.
Make your app robust to new search query models – abstract the regression to a query model.
Do side-by-side qualitative search QA.Slide14
Search success metrics… any googlers
here?
Items consumed / session for browse pages.
1- abandoned search % for search pages.
Conversion rate originating from search page.Slide15
Search model learning
Explain output --- the ultimate training data, in a nasty, semi-structured mess.
Built
an AST parser for
L
ucene
explain
output so you can get clean rows of observations.Every query’s intimate scoring details are logged into a DB as lines of training data.Slide16
Search model monitoring
You can calculate stability metrics for thousands of queries between two models and highlight the least stable queries.
You can monitor prediction accuracy on clickstream data for performance degradation.