/
Online Rank Elicitation for Online Rank Elicitation for

Online Rank Elicitation for - PowerPoint Presentation

joedanone
joedanone . @joedanone
Follow
343 views
Uploaded On 2020-08-27

Online Rank Elicitation for - PPT Presentation

PlackettLuce A Dueling Bandits Approach Balázs Szörényi Technion Haifa Israel MTASZTE Research Group on Artificial Intelligence Hungary Róbert BusaFekete Adil ID: 804831

pairwise ranking algorithm item ranking pairwise item algorithm items set order quicksort plpac ampr probability complexity sample marginal bqs

Share:

Link:

Embed:

Download Presentation from below link

Download The PPT/PDF document "Online Rank Elicitation for" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Online Rank Elicitation for Plackett-Luce: A Dueling Bandits Approach

Balázs

SzörényiTechnion, Haifa, Israel /MTA-SZTE Research Group onArtificial Intelligence, Hungary

Róbert Busa-Fekete, Adil Paul, Eyke HüllermeierDepartment of Computer Science,University of Paderborn Paderborn, Germany

Twenty-ninth Annual Conference on Neural Information Processing Systems (NIPS 2015

)

Slide2

Problem: Rank Elicitation from pairwise preferences

The set of items to be ranked

Given: a set of stochastic pairwise preferencese

.g. {, ,

Goal: To infer a complete ranking of all items

e.g.

 

>

>

>

1.

2

.

3.

4

.

Slide3

Moving to online setting

The learner is allowed to sample the pairwise preference activelyThe set of items to be ranked

Each iteration: compare two items

Observe a stochastic pairwise preferenceThe goal is to learn the post probable ranking over all itemsReferred as the Dueling Bandit problem

 

Slide4

The online ranking problem

denotes the probability that item

is preferred over item

Without probabilistic assumption on Sample complexity grows quadratically in number

M, i.e.

Different online ranking methods have different assumptions

 

Slide5

Stochastic transitivity assumptions

if

and

Then

Allow us to devise algorithm with a lower sample complexity

e

.g.

Establish a connection to

sorting algorithms

 

Slide6

Connection with sorting algorithmNaively apply a sorting algorithm

as sampling schemeSince all the pairwise comparisons are stochastic

A random order will be producedWhat can we say about the optimality of such an order?

>

>

>

>

>

Slide7

Contributions

We combine QuickSort

algorithm and a stochastic preference modelThis harmony was first presented in [Ailon-2008]We exploit this harmony for online rank elicitation We succeed in developing a budged version of QuickSort with complexity of

We devise PAC-style algorithms based on Budged QuickSort to:Find close-to-optimal itemFind close-to-optimal ranking

 

Slide8

Preliminaries

A ranking

is a bijection on , where is the set of items to be ranked

Also represented as a vector

Where

is the rank of the

th

item

If

is preferred over

in

, i.e.

, then

The set of rankings can be identified with the symmetric group

of order

The inverse

defined by

for all

We denote

by

for the set of rankings for which

is preferred over

 

Slide9

Dueling Bandit Framework

Sample the pairwise preference between 

 and 

 

Observe a binary feedback

,

means

means

 

Updates the estimate

 

Continue

or

terminate?

Prediction

Repeat

 

Parameter:

 

Predication achieve with probability at least

 

Slide10

Estimation of pairwise probability

Pair of items chosen in t-th

step:

Set of steps decides to compare item and :

Size of this set:

The proportion

of “wins” of

against

by time

:

w

hich is a estimation of the pairwise probability

 

Slide11

The Plackett-Luce Model

Widely-used probability distribution on rankings

Parameterized by a “skill” vector

is the skill associated with item

An item with a higher skill is more preferred

 

 

 

 

 

 

Slide12

The Plackett-Luce Model

The probability of observing a particular ranking

, is where

Mimics the successive construction of a rankingEach time choosing one of the remaining items with probability proportional to its skill  

Slide13

Properties of PL ModelThe marginal probabilities

are easy to calculate

Satisfies the stochastic transitivityMost probable ranking: simply sort the items according to their skill parameters

Slide14

Harmony of QuickSort and Plackett-Luce

model

This harmony was investigated in [Alion-2008]The pairwise comparisons are drawn from the pairwise marginal of the Plackett-Luce modelThe probability distribution of ranking returned by QuickSort:

where the matrix

contains the marginal of

Plackett-Luce

model

is the pairwise marginal of item

and

 

Slide15

Harmony of QuickSort and Plackett-Luce model

The probability distribution of ranking returned by

QuickSort:

It was shown that

obeys the property of

pairwise stability

i.e. it preserves the pairwise marginal of Plackett-Luce modelTheorem

1 (Theorem 4.1 in [Alion-2008])Let be given by the pairwise marginal, i.e.,

.

Then,

 

Slide16

Budgeted QuickSort algorithm

Generate ranking from

QuickSortWorst case sample complexity:

We introduce a budgeted version of the QuickSort algorithmTerminates if the algorithm compares too many pairsUpon termination, it may return a partial orderStill preserves the pairwise stability property

 

Slide17

Budgeted QuickSort algorithm

Terminate as soon as the number of pairwise comparisons exceeds the budget

 

Slide18

Random tree of QuickSort algorithm

BQS(

) recovers the original QuickSort algorithmA run of BQS(

) presented as a random tree Such tree determines a ranking, denoted by

 

1 2 3 4

6

7 8 9

1

2

3 4

7

8

9

1

3

4

4

9

9

Slide19

For

, denote the tree returned by BQS

() as

Let denote the set of all possible outcomes of

 

Random

tree

of Budgeted QuickSort

1 2 3 4

6 8 7 9

1

2

3 4

8 7 9

1

3

4

4

Suppose budget used up here

Item 8,7,9 are

incomparable

i

n the rankingWe just know they are

 

Slide20

Pairwise Stability of Budget QuickSort

BQS does not introduce any bias in the marginal

Let

denote set of tree in which and

are incomparable in the associated ranking

Proposition 2

: For any , any set

and any indices

, the partial order

generated by BQS(

) satisfies

i.e. whenever two items

and

are comparable by the partial ranking r generated by BQS,

with probability exactly

.

 

Slide21

Proof sketch of Proposition 2

Proposition

2: For any , any set and any indices

, the partial order

generated by BQS(

) satisfies

Conditioned on the event that

and

are incomparable by

would have been obtained with

probability

in

case execution of BQS has been continued

The results follows by combining this with Theorem 1

 

Slide22

First Goal of learner: PAC-item

Optimal item referring to the Condorcet winner

An item is a Condorcet winner if

for all

Difficult to determine an order between

and

when

Hence, we relax the goal to the find the PAC-item

An item

is a PAC-item, if it is beaten by the Condorcet winner with at most an

-margin:

 

Slide23

PLPAC AlgorithmGoal: Finding the PAC item

In each iteration,

Generate a partial ranking (line 6)Translate ranking into pairwisecomparisonsUpdate the estimates of marginal

Slide24

PLPAC Algorithm

Apply a elimination strategy:

Remove if it is significantly beatenby another Terminates whenthe PAC item set has

at least one item 

Slide25

Sample Complexity analysis of PLPAC

In each iteration,

partial orderings in line 6 defines a bucket orderPairs are incomparable within a bucketBut pairs from different buckets are comparableWith budget

, the bucket order has only two bucketsAfter the first partition of BQS, it will use up all the budget  

4

1 3 2

6

8 7 9

Slide26

Sample Complexity analysis of PLPAC

Observation:

The optimal arm and an arbitrary arm

fall into different buckets “often enough”Allow us to upper-bound the number of pairwise comparisonsTheorem 3: Set

f

or

each index

. The total number of samples for PLPAC algorithm is

.

The dependence on

is of order

 

Slide27

Second goal of learner: AMPR

The most probable ranking,

Difficult to determine an order between

and

when

Hence, difficult to find the most probable ranking

Relax

the goal to find the

A

pproximately

M

ost Probable

Ranking

 

Slide28

Second goal of learner: AMPR

Find

has the following property:No pair of items

, such that

and

Ranking

is allowed to differ from

only for those items whose pairwise probabilities close to

Any ranking

satisfying this property is called an approximately most probable ranking (AMPR)

 

Slide29

PLPAC-AMPR algorithm

The

of a PL model is the ranking that sorts items in decreasing order of their skill values:

iff

for any

Moreover, since

implies

,

We can also sort

the items based on Copeland score

y

ields a most probable ranking

The PLPAC-AMPR

algorithm is

based on estimating the Copeland score of the

items

 

Slide30

PLPAC-AMPR

Slide31

PLPAC-AMPR algorithm

In each iteration,

Generates rankings based on sortingUpdate pairwise probability estimates Compute a lower and upper bound

and for each scores

Which is the number of items that are beaten significantly by item

based on current estimates of pairwise marginal.

, where

is the number of pairs for which cannot decided their order

 

Slide32

PLPAC-AMPR algorithm

We don’t need to sort the whole item set

in each iterationBecause if

,

then we already know the order of item

and

Consider the interval graph

,

w

here

Denote the connected components by

In each iteration, call the Budged Quick sort with the connected components

 

Slide33

PLPAC-AMPR algorithm

The algorithm terminates if There

is no pair of item and , for which the ordering has not been elicited yet, i.e.

and their

pairwise probabilities is close to ½, i.e.,

 

Slide34

Sample Complexity Analysis of PLPAC-AMPR

Concentration property of the performance of

QuickSortNo pair of items falls into the same bucket “too often” in the partial order returned by BQSAllows us to upper-bound the number of pairwise comparisons with high probabilityTheorem 4: Set

for each

, where

denotes the

i-th

largest skill parameters. The total number of samples for PLPAC-AMPR algorithm is

.

The dependence on

is of order

 

Slide35

Experiments – The PAC-item Problem

Compare with other preference-based online algorithms applicable in our setting, including:

1. INTERLEAVED FILTER (IF) [Yue et al., 2012]2. BEAT THE MEAN [Yue et al., 2011]3. MALLOWSMPI [Busa-Fekete et al., 2014]Setting the parameters of PL to

with

controls the complexity of the rank elicitation task

The larger the value of

, the more difficult to determine the order between two items

and

 

Slide36

Experiments – The PAC-item Problem

The sample complexity for M

{5, 10, 15}, = 0.1, = 0. The results are averaged over

100 repetitions. 

Slide37

Experiment – The AMPR Problem

RankCentrality algorithm[Negahban et.al-2012] is taken as base line

Setting the parameters of PL to

with  

Slide38

Experiment – The AMPR Problem

Sample complexity for M

{5, 10, 15}, = 0.1, = 0. The results are averaged over 100 repetitions.

 

Slide39

Conclusion

In the setting of dueling bandits under the PL model assumption

We consider two task:Find the approximate best armFind the approximate most probable rankingWe propose algorithms for both tasks based on a budgeted quick sort algorithm by exploiting the pairwise stability of quick sort algorithmWe also give a sample complexity bound for both algorithms: