Saehoon Kim Yuxiong He Seungwon Hwang Sameh Elnikety Seungjin Choi Web Search Engine Requirement 2 Queries High quality Low latency This talk focuses on how to achieve low latency without compromising the quality ID: 932071
Download Presentation The PPT/PDF document "1 Delayed-Dynamic-Selective (DDS) Predic..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
1
Delayed-Dynamic-Selective (DDS) Prediction for Reducing Extreme Tail Latency in Web Search
Saehoon Kim§, Yuxiong He*, Seung-won Hwang§, Sameh Elnikety*, Seungjin Choi§
§
*
Slide2Web Search Engine
Requirement2
QueriesHigh quality + Low latency
This talk focuses on how to achieve low latency without compromising the quality
Slide3Low Latency for All Users
Reduce tail latency (high-percentile response time)Reducing average latency is not sufficient3
LatencyCommercial search engine reduces 99th-percentile latency
Slide4Reducing End-to-End Latency
4Long(-running )query
Aggregator
ISN
ISN
ISN
ISN
40 Index Server Nodes (ISNs)
The 99
th
–percentile
response time < 120ms
The
99.99
th
–percentile
response time < 120ms
Slide5Reducing Tail Latency by Parallelization
Opportunity of Parallelization Available idle coresCPU-intensive workloads5
ResourceLatencyNetwork4.26 msQueueing0.15 msI/O4.70 msCPU
194.95 ms
Slide6Challenges of Exploiting Parallelism
6Parallelizing all queriesInefficient under medium to high loadParallelizing short queriesNo speed up
Parallelizing long queriesGood speed upParallelize only long(-running) queries
Slide7Prior Work - PREDictive
ParallelizationPredict the query execution time Parallelize the predicted long queries onlyExecute the predicted short queries sequentially 7
“WSDM”
Long
Short
Feature
Extraction
Regression
function
Prediction model
Predictive Parallelization: Taming Tail Latencies in Web Search, [M. Jeon, SIGIR’14]
Slide8Requirements
99th tail latency at aggregator <= 120msReduce 99.99th tail latency at each ISN <= 120ms8
RecallPrecisionRequirements>= 98.9%Should be highReasonTo optimize 99.99th tail latencyLess queries to be parallelizedPRED98.9%1.1%
PRED cannot effectively reduce
99.99th tail latency
Slide9Contributions
Key Contributions:Proposes DDS (Delayed-Dynamic-Selective) prediction to achieve very high recall and good precisionUse DDS prediction to effectively reduce extreme tail latency
9
Slide10Overview of DDS
10Query
FinishedQueries < 10msDelayed prediction
Queries
> 10ms
Predictor for execution time
Long
Short
Dynamic prediction
Predictor for confidence level
Not
confident
Selective
prediction
Slide11Delayed Prediction
Complete many short queries sequentiallyCollect dynamic features11
Slide12Dynamic Features
What are dynamic features?Features that can only be collected at runtimeTwo categoriesNumEstMatchDocs: to estimate the total # matched docsDynScores: to predict early termination
12
Slide13Primary Factors for Execution Time
13
ProcessingDoc 1Doc 2Doc 3…….Doc N-2Doc N-1Doc NDocs sorted by static scores
Highest
Lowest
Web documents
…….
…….
1. # total matched documents
Inverted index for “WSDM”
Inverted index for “2015”
Primary Factors for Execution Time
14
ProcessingDoc 1Doc 2Doc 3…….Doc N-2Doc N-1Doc NDocs sorted by static scores
Highest
Lowest
Web documents
…….
…….
1. # total matched documents
Inverted index for “WSDM”
Inverted index for “2015”
2. Early termination
Not evaluated
Slide15Early Termination
15
Inverted index for “WSDM”
Processing
Not evaluated
Doc 1
Doc 2
Doc 3
…….
Doc N-2
Doc N-1
Doc N
Docs sorted by static scores
Highest
Lowest
Web documents
…….
…….
Top-3 Results
If
m
in. Dynamic score > threshold, then stop.
Doc ID
Dynamic Score
Doc 1
-4.11
Doc ID
Dynamic Score
Doc 3
-4.01
Doc
1
-4.11
Doc ID
Dynamic Score
Doc 3
-4.01
Doc
1
-4.11
Doc 5
-4.23
Doc ID
Dynamic Score
Doc 3
-4.01
Doc
8
-4.10
Doc 1
-4.11
To predict early termination,
Consider a dynamic score distribution
Slide16Importance of Dynamic Features
Top-10 feature importance by boosted regression treeNumEstMachDoc helps to predict # total matched docsDynScore helps to predict early termination
16
Slide17Selective Prediction
17Find out almost all long queries with good precisionIdentify the outliers (long query predicted as short)
Predicted execution
time
Slide18Selective Prediction
18
Predicted execution
time
Predicted
error
Long queries
Short queries
Overview of DDS
19Query
FinishedQueries < 10msDelayed prediction
Queries
> 10ms
Predictor for execution time
Long
Short
Dynamic prediction
Predictor for confidence level
Not
confident
Selective
prediction
Slide20Evaluations of Predictor Accuracy (1/3)
Baseline (PRED)Static features with no delayed prediction IDF, Static score (e.x. PageRank), etc.Proposed method (DDS)Dynamic (+static) features with Delayed and
Selective prediction20
Slide21Evaluations of Predictor Accuracy (2/3)
69,010 Bing queries at production workload14,565 queries >= 10ms635 queries >= 100msBoosted regression tree with 10-fold cross validationFor PRED, we use 69,010 queries For DDS, we use 14,565 queries
21
Slide22Evaluations of Predictor Accuracy
(3/3)957% Improvement over PRED22
Slide23Evaluations of Predictor Accuracy
(3/3)957% Improvement over PRED23
Delayed
Slide24Evaluations of Predictor Accuracy
(3/3)957% Improvement over PRED24
DelayedDynamic featuresSelective features
Slide25Simulation Results on Tail Latency Reduction
Baseline (PRED)Predict query execution time before running itParallelize the long query with 4-way parallelism Proposed method (DDS)Run a query for 10ms sequentially
Parallelizes the long or unpredictable queries with 4-way parallelism 25
Slide26ISN Response Time
26
Slide27ISN Response Time
27
Slide28ISN Response Time
28
70% throughput increase
Slide29Aggregator Response Time
DDS can optimize 99th-percentile tail latency at aggregator under high QPS29
Slide30Conclusion
Proposes a novel prediction frameworkDelayed prediction/Dynamic features/Selective predictionAchieves a high precision and recall compared to PREDReduces 99th-percentile aggregator response time <= 120ms under high load!30
Slide3131