/
Search Stack Secrets Search Stack Secrets

Search Stack Secrets - PowerPoint Presentation

tawny-fly
tawny-fly . @tawny-fly
Follow
369 views
Uploaded On 2016-06-09

Search Stack Secrets - PPT Presentation

Ryan Gehring Indiegogo Practical Search for Rubyists Elasticsearch SOLR alternatives roundup Essential plugins you need to install today Semi SOA search design Schemaless ID: 354311

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Search Stack Secrets" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Search Stack Secrets

Ryan

Gehring

-

IndiegogoSlide2

Practical Search for Rubyists

Elasticsearch

/ SOLR / alternatives roundup.

Essential plugins

you

need to

install today.

Semi SOA search design.

Schemaless

is for amateurs!

Mappings

=

friend.

Problem solving with analyzers.

Avoiding Tire

DSL- query

json

ingredients.Slide3

Elasticsearch v SOLR v …

Horizontal scalability

GREAT API

Developer support (analyzers, etc.)

Downside: slightly less great ruby

client.Slide4

Awesome Plugins

elasticsearch

-head

A web front end for an

ElasticSearch

cluster

http://

mobz.github.com/elasticsearch-head

Elasticsearch JDBC riverhttps://github.com/jprante/elasticsearch-river-jdbc

ElasticSearch

Paramedic

Paramedic is a simple yet sexy tool to monitor and inspect

ElasticSearch

clusters.Slide5

One solid service-y and Rails 4-approved design

Webform

in view supplies GET parameters, submits to a search controller.

Search controller okays the proper, permissioned parameters via strong parameters, instantiates a search object.

Search model

translates parameters into a query --- either using Tire (the ruby client) or JSON.

Query fired and results are served!Slide6

Mappings + Analyzers: Ingredients for Success!

Elasticsearch

is

schemaless

by default, but you can optimize by providing a schema.

What fields to index,

How to

analyze+tokenize fields.These analyzers help a lot!Slide7

Problem solving with analyzers

My search isn’t robust to misspellings!

N-gram

Edge n-gram

My search isn’t robust to plurals / caps / whitespace/ etc.

Snowball (

standard+lowercase+some

english language stemming + stopwording)

I can only solve one of these at once!Multi field analysis.Slide8

Problem solving with boosts

Boosts are a concept from

Lucene

; they are multipliers on scores.

You can set the relative importance of matching fields: example: title -> 10, vs.

free_text

-> 1

You can set the relative importance of matching on ANALYZED fields: example: ngram_title -> 6, snowball_title -> 10.

Bonus for fields with exact token matches.Slide9

Key queries in Elasticsearch

Filtered Query:

Apply binary filters to an arbitrary query; try it with the

query_string

query type for full text, analyzed search queries + filters.

Custom Score Query

Provide the exact equation for scoring --- you can take mathematical transforms of variables using MVAL or even python with the right plugin.Slide10

Theoretical Section

Integrating models via custom scoring.

Learning models – a qualitative,

quantitiative

process.

Data sources and paradigms.

Key metrics for search.

Monitoring statistical model performance.Slide11

Custom score queries are regression equations.

You can use supervised learning methods to train them over time like Google.Slide12

Statistical learning & search.

Clickstream models

Logistic regression

Binary target, click no click

Learn boosts, coefficients, etc.

Paired comparison models

Logistic regression

Binary target, A > BLearn boosts, coefficients, etc.Slide13

Search model training is a qualitative-first process.

Review search algorithms before you push them.

Have other people review search results before you push them.

Make your app robust to new search query models – abstract the regression to a query model.

Do side-by-side qualitative search QA.Slide14

Search success metrics… any googlers

here?

Items consumed / session for browse pages.

1- abandoned search % for search pages.

Conversion rate originating from search page.Slide15

Search model learning

Explain output --- the ultimate training data, in a nasty, semi-structured mess.

Built

an AST parser for

L

ucene

explain

output so you can get clean rows of observations.Every query’s intimate scoring details are logged into a DB as lines of training data.Slide16

Search model monitoring

You can calculate stability metrics for thousands of queries between two models and highlight the least stable queries.

You can monitor prediction accuracy on clickstream data for performance degradation.