PlackettLuce A Dueling Bandits Approach Balázs Szörényi Technion Haifa Israel MTASZTE Research Group on Artificial Intelligence Hungary Róbert BusaFekete Adil ID: 804831
Download The PPT/PDF document "Online Rank Elicitation for" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Online Rank Elicitation for Plackett-Luce: A Dueling Bandits Approach
Balázs
SzörényiTechnion, Haifa, Israel /MTA-SZTE Research Group onArtificial Intelligence, Hungary
Róbert Busa-Fekete, Adil Paul, Eyke HüllermeierDepartment of Computer Science,University of Paderborn Paderborn, Germany
Twenty-ninth Annual Conference on Neural Information Processing Systems (NIPS 2015
)
Slide2Problem: Rank Elicitation from pairwise preferences
The set of items to be ranked
Given: a set of stochastic pairwise preferencese
.g. {, ,
Goal: To infer a complete ranking of all items
e.g.
>
>
>
1.
2
.
3.
4
.
Slide3Moving to online setting
The learner is allowed to sample the pairwise preference activelyThe set of items to be ranked
Each iteration: compare two items
Observe a stochastic pairwise preferenceThe goal is to learn the post probable ranking over all itemsReferred as the Dueling Bandit problem
The online ranking problem
denotes the probability that item
is preferred over item
Without probabilistic assumption on Sample complexity grows quadratically in number
M, i.e.
Different online ranking methods have different assumptions
Stochastic transitivity assumptions
if
and
Then
Allow us to devise algorithm with a lower sample complexity
e
.g.
Establish a connection to
sorting algorithms
Connection with sorting algorithmNaively apply a sorting algorithm
as sampling schemeSince all the pairwise comparisons are stochastic
A random order will be producedWhat can we say about the optimality of such an order?
>
>
>
>
>
Slide7Contributions
We combine QuickSort
algorithm and a stochastic preference modelThis harmony was first presented in [Ailon-2008]We exploit this harmony for online rank elicitation We succeed in developing a budged version of QuickSort with complexity of
We devise PAC-style algorithms based on Budged QuickSort to:Find close-to-optimal itemFind close-to-optimal ranking
Preliminaries
A ranking
is a bijection on , where is the set of items to be ranked
Also represented as a vector
Where
is the rank of the
th
item
If
is preferred over
in
, i.e.
, then
The set of rankings can be identified with the symmetric group
of order
The inverse
defined by
for all
We denote
by
for the set of rankings for which
is preferred over
Dueling Bandit Framework
Sample the pairwise preference between
and
Observe a binary feedback
,
means
means
Updates the estimate
Continue
or
terminate?
Prediction
Repeat
Parameter:
Predication achieve with probability at least
Estimation of pairwise probability
Pair of items chosen in t-th
step:
Set of steps decides to compare item and :
Size of this set:
The proportion
of “wins” of
against
by time
:
w
hich is a estimation of the pairwise probability
The Plackett-Luce Model
Widely-used probability distribution on rankings
Parameterized by a “skill” vector
is the skill associated with item
An item with a higher skill is more preferred
The Plackett-Luce Model
The probability of observing a particular ranking
, is where
Mimics the successive construction of a rankingEach time choosing one of the remaining items with probability proportional to its skill
Slide13Properties of PL ModelThe marginal probabilities
are easy to calculate
Satisfies the stochastic transitivityMost probable ranking: simply sort the items according to their skill parameters
Slide14Harmony of QuickSort and Plackett-Luce
model
This harmony was investigated in [Alion-2008]The pairwise comparisons are drawn from the pairwise marginal of the Plackett-Luce modelThe probability distribution of ranking returned by QuickSort:
where the matrix
contains the marginal of
Plackett-Luce
model
is the pairwise marginal of item
and
Harmony of QuickSort and Plackett-Luce model
The probability distribution of ranking returned by
QuickSort:
It was shown that
obeys the property of
pairwise stability
i.e. it preserves the pairwise marginal of Plackett-Luce modelTheorem
1 (Theorem 4.1 in [Alion-2008])Let be given by the pairwise marginal, i.e.,
.
Then,
Budgeted QuickSort algorithm
Generate ranking from
QuickSortWorst case sample complexity:
We introduce a budgeted version of the QuickSort algorithmTerminates if the algorithm compares too many pairsUpon termination, it may return a partial orderStill preserves the pairwise stability property
Budgeted QuickSort algorithm
Terminate as soon as the number of pairwise comparisons exceeds the budget
Random tree of QuickSort algorithm
BQS(
) recovers the original QuickSort algorithmA run of BQS(
) presented as a random tree Such tree determines a ranking, denoted by
1 2 3 4
6
7 8 9
1
2
3 4
7
8
9
1
3
4
4
9
9
Slide19For
, denote the tree returned by BQS
() as
Let denote the set of all possible outcomes of
Random
tree
of Budgeted QuickSort
1 2 3 4
6 8 7 9
1
2
3 4
8 7 9
1
3
4
4
Suppose budget used up here
Item 8,7,9 are
incomparable
i
n the rankingWe just know they are
Pairwise Stability of Budget QuickSort
BQS does not introduce any bias in the marginal
Let
denote set of tree in which and
are incomparable in the associated ranking
Proposition 2
: For any , any set
and any indices
, the partial order
generated by BQS(
) satisfies
i.e. whenever two items
and
are comparable by the partial ranking r generated by BQS,
with probability exactly
.
Proof sketch of Proposition 2
Proposition
2: For any , any set and any indices
, the partial order
generated by BQS(
) satisfies
Conditioned on the event that
and
are incomparable by
would have been obtained with
probability
in
case execution of BQS has been continued
The results follows by combining this with Theorem 1
First Goal of learner: PAC-item
Optimal item referring to the Condorcet winner
An item is a Condorcet winner if
for all
Difficult to determine an order between
and
when
Hence, we relax the goal to the find the PAC-item
An item
is a PAC-item, if it is beaten by the Condorcet winner with at most an
-margin:
PLPAC AlgorithmGoal: Finding the PAC item
In each iteration,
Generate a partial ranking (line 6)Translate ranking into pairwisecomparisonsUpdate the estimates of marginal
Slide24PLPAC Algorithm
Apply a elimination strategy:
Remove if it is significantly beatenby another Terminates whenthe PAC item set has
at least one item
Slide25Sample Complexity analysis of PLPAC
In each iteration,
partial orderings in line 6 defines a bucket orderPairs are incomparable within a bucketBut pairs from different buckets are comparableWith budget
, the bucket order has only two bucketsAfter the first partition of BQS, it will use up all the budget
4
1 3 2
6
8 7 9
Slide26Sample Complexity analysis of PLPAC
Observation:
The optimal arm and an arbitrary arm
fall into different buckets “often enough”Allow us to upper-bound the number of pairwise comparisonsTheorem 3: Set
f
or
each index
. The total number of samples for PLPAC algorithm is
.
The dependence on
is of order
Second goal of learner: AMPR
The most probable ranking,
Difficult to determine an order between
and
when
Hence, difficult to find the most probable ranking
Relax
the goal to find the
A
pproximately
M
ost Probable
Ranking
Second goal of learner: AMPR
Find
has the following property:No pair of items
, such that
and
Ranking
is allowed to differ from
only for those items whose pairwise probabilities close to
Any ranking
satisfying this property is called an approximately most probable ranking (AMPR)
PLPAC-AMPR algorithm
The
of a PL model is the ranking that sorts items in decreasing order of their skill values:
iff
for any
Moreover, since
implies
,
We can also sort
the items based on Copeland score
y
ields a most probable ranking
The PLPAC-AMPR
algorithm is
based on estimating the Copeland score of the
items
PLPAC-AMPR
Slide31PLPAC-AMPR algorithm
In each iteration,
Generates rankings based on sortingUpdate pairwise probability estimates Compute a lower and upper bound
and for each scores
Which is the number of items that are beaten significantly by item
based on current estimates of pairwise marginal.
, where
is the number of pairs for which cannot decided their order
PLPAC-AMPR algorithm
We don’t need to sort the whole item set
in each iterationBecause if
,
then we already know the order of item
and
Consider the interval graph
,
w
here
Denote the connected components by
In each iteration, call the Budged Quick sort with the connected components
PLPAC-AMPR algorithm
The algorithm terminates if There
is no pair of item and , for which the ordering has not been elicited yet, i.e.
and their
pairwise probabilities is close to ½, i.e.,
Sample Complexity Analysis of PLPAC-AMPR
Concentration property of the performance of
QuickSortNo pair of items falls into the same bucket “too often” in the partial order returned by BQSAllows us to upper-bound the number of pairwise comparisons with high probabilityTheorem 4: Set
for each
, where
denotes the
i-th
largest skill parameters. The total number of samples for PLPAC-AMPR algorithm is
.
The dependence on
is of order
Experiments – The PAC-item Problem
Compare with other preference-based online algorithms applicable in our setting, including:
1. INTERLEAVED FILTER (IF) [Yue et al., 2012]2. BEAT THE MEAN [Yue et al., 2011]3. MALLOWSMPI [Busa-Fekete et al., 2014]Setting the parameters of PL to
with
controls the complexity of the rank elicitation task
The larger the value of
, the more difficult to determine the order between two items
and
Experiments – The PAC-item Problem
The sample complexity for M
{5, 10, 15}, = 0.1, = 0. The results are averaged over
100 repetitions.
Slide37Experiment – The AMPR Problem
RankCentrality algorithm[Negahban et.al-2012] is taken as base line
Setting the parameters of PL to
with
Slide38Experiment – The AMPR Problem
Sample complexity for M
{5, 10, 15}, = 0.1, = 0. The results are averaged over 100 repetitions.
Conclusion
In the setting of dueling bandits under the PL model assumption
We consider two task:Find the approximate best armFind the approximate most probable rankingWe propose algorithms for both tasks based on a budgeted quick sort algorithm by exploiting the pairwise stability of quick sort algorithmWe also give a sample complexity bound for both algorithms: