1 Delayed-Dynamic-Selective (DDS) Prediction for Reducing Extreme Tail Latency in Web - PowerPoint Presentation

342 views
Uploaded On 2022-08-01

1 Delayed-Dynamic-Selective (DDS) Prediction for Reducing Extreme Tail Latency in Web - PPT Presentation

Saehoon Kim Yuxiong He Seungwon Hwang Sameh Elnikety Seungjin Choi Web Search Engine Requirement 2 Queries High quality Low latency This talk focuses on how to achieve low latency without compromising the quality ID: 932071

dynamic doc prediction time doc dynamic time prediction queries latency long tail predictor dds features response short execution isn

Link:

Copy

Embed:

<iframe width="560" height="315" src="https://www.docslides.com/embed/932071" frameborder="0" allowfullscreen></iframe>

Download Presentation from below link

Download Presentation The PPT/PDF document "1 Delayed-Dynamic-Selective (DDS) Predic..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.

Presentation Transcript

Slide1

Delayed-Dynamic-Selective (DDS) Prediction for Reducing Extreme Tail Latency in Web Search

Saehoon Kim§, Yuxiong He*, Seung-won Hwang§, Sameh Elnikety*, Seungjin Choi§

Slide2

Web Search Engine

Requirement2

QueriesHigh quality + Low latency

This talk focuses on how to achieve low latency without compromising the quality

Slide3

Low Latency for All Users

Reduce tail latency (high-percentile response time)Reducing average latency is not sufficient3

LatencyCommercial search engine reduces 99th-percentile latency

Slide4

Reducing End-to-End Latency

4Long(-running )query

Aggregator

ISN

40 Index Server Nodes (ISNs)

The 99

–percentile

response time < 120ms

The

99.99

–percentile

response time < 120ms

Slide5

Reducing Tail Latency by Parallelization

Opportunity of Parallelization Available idle coresCPU-intensive workloads5

ResourceLatencyNetwork4.26 msQueueing0.15 msI/O4.70 msCPU

194.95 ms

Slide6

Challenges of Exploiting Parallelism

6Parallelizing all queriesInefficient under medium to high loadParallelizing short queriesNo speed up

Parallelizing long queriesGood speed upParallelize only long(-running) queries

Slide7

Prior Work - PREDictive

ParallelizationPredict the query execution time Parallelize the predicted long queries onlyExecute the predicted short queries sequentially 7

“WSDM”

Long

Short

Feature

Extraction

Regression

function

Prediction model

Predictive Parallelization: Taming Tail Latencies in Web Search, [M. Jeon, SIGIR’14]

Slide8

Requirements

99th tail latency at aggregator <= 120msReduce 99.99th tail latency at each ISN <= 120ms8

RecallPrecisionRequirements>= 98.9%Should be highReasonTo optimize 99.99th tail latencyLess queries to be parallelizedPRED98.9%1.1%

PRED cannot effectively reduce

99.99th tail latency

Slide9

Contributions

Key Contributions:Proposes DDS (Delayed-Dynamic-Selective) prediction to achieve very high recall and good precisionUse DDS prediction to effectively reduce extreme tail latency

Slide10

Overview of DDS

10Query

FinishedQueries < 10msDelayed prediction

Queries

> 10ms

Predictor for execution time

Long

Short

Dynamic prediction

Predictor for confidence level

Not

confident

Selective

prediction

Slide11

Delayed Prediction

Complete many short queries sequentiallyCollect dynamic features11

Slide12

Dynamic Features

What are dynamic features?Features that can only be collected at runtimeTwo categoriesNumEstMatchDocs: to estimate the total # matched docsDynScores: to predict early termination

Slide13

Primary Factors for Execution Time

ProcessingDoc 1Doc 2Doc 3…….Doc N-2Doc N-1Doc NDocs sorted by static scores

Highest

Lowest

Web documents

…….

1. # total matched documents

Inverted index for “WSDM”

Inverted index for “2015”

Slide14

Primary Factors for Execution Time

ProcessingDoc 1Doc 2Doc 3…….Doc N-2Doc N-1Doc NDocs sorted by static scores

Highest

Lowest

Web documents

…….

1. # total matched documents

Inverted index for “WSDM”

Inverted index for “2015”

2. Early termination

Not evaluated

Slide15

Early Termination

Inverted index for “WSDM”

Processing

Not evaluated

Doc 1

Doc 2

Doc 3

…….

Doc N-2

Doc N-1

Doc N

Docs sorted by static scores

Highest

Lowest

Web documents

…….

Top-3 Results

in. Dynamic score > threshold, then stop.

Doc ID

Dynamic Score

Doc 1

-4.11

Doc ID

Dynamic Score

Doc 3

-4.01

Doc

-4.11

Doc ID

Dynamic Score

Doc 3

-4.01

Doc

-4.11

Doc 5

-4.23

Doc ID

Dynamic Score

Doc 3

-4.01

Doc

-4.10

Doc 1

-4.11

To predict early termination,

Consider a dynamic score distribution

Slide16

Importance of Dynamic Features

Top-10 feature importance by boosted regression treeNumEstMachDoc helps to predict # total matched docsDynScore helps to predict early termination

Slide17

Selective Prediction

17Find out almost all long queries with good precisionIdentify the outliers (long query predicted as short)

Predicted execution

time

Slide18

Selective Prediction

Predicted execution

time

Predicted

error

Long queries

Short queries

Slide19

Overview of DDS

19Query

FinishedQueries < 10msDelayed prediction

Queries

> 10ms

Predictor for execution time

Long

Short

Dynamic prediction

Predictor for confidence level

Not

confident

Selective

prediction

Slide20

Evaluations of Predictor Accuracy (1/3)

Baseline (PRED)Static features with no delayed prediction IDF, Static score (e.x. PageRank), etc.Proposed method (DDS)Dynamic (+static) features with Delayed and

Selective prediction20

Slide21

Evaluations of Predictor Accuracy (2/3)

69,010 Bing queries at production workload14,565 queries >= 10ms635 queries >= 100msBoosted regression tree with 10-fold cross validationFor PRED, we use 69,010 queries For DDS, we use 14,565 queries

Slide22

Evaluations of Predictor Accuracy

(3/3)957% Improvement over PRED22

Slide23

Evaluations of Predictor Accuracy

(3/3)957% Improvement over PRED23

Delayed

Slide24

Evaluations of Predictor Accuracy

(3/3)957% Improvement over PRED24

DelayedDynamic featuresSelective features

Slide25

Simulation Results on Tail Latency Reduction

Baseline (PRED)Predict query execution time before running itParallelize the long query with 4-way parallelism Proposed method (DDS)Run a query for 10ms sequentially

Parallelizes the long or unpredictable queries with 4-way parallelism 25

Slide26

ISN Response Time

Slide27

ISN Response Time

Slide28

ISN Response Time

70% throughput increase

Slide29

Aggregator Response Time

DDS can optimize 99th-percentile tail latency at aggregator under high QPS29

Slide30

Conclusion

Proposes a novel prediction frameworkDelayed prediction/Dynamic features/Selective predictionAchieves a high precision and recall compared to PREDReduces 99th-percentile aggregator response time <= 120ms under high load!30

Slide31