Zwick. Tel Aviv University. April 2016. Last updated: . June 13, . 2016. Algorithms . in Action. The Multiplicative Weights Update Method. 2. On each on of . days:. . “. experts. ” give us their prediction (Up/Down).. ID: 550517
DownloadNote  The PPT/PDF document "1 Haim Kaplan, Uri" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, noncommercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
1
Haim Kaplan, Uri
Zwick
Tel Aviv University
April 2016Last updated: June 13, 2016
Algorithms in Action
The Multiplicative Weights Update Method
Slide22
On each on of days:
“experts” give us their prediction (Up/Down).
We need to make a binary decision (Up/Down).
Based on their advice, we make a choice.
We then find out whether our choice is correct.
If our choice is wrong, we pay a penalty of 1.
Our goal, of course, is to pay as little as possible.
If our choice is right, we do not pay anything.
“Using
expert
advice”
A basic binary setting
Slide33
12345CostExpert 1UUUUD2Expert 2DDUDD3Expert 3UDDUD4Our decisionUDUUD3OutcomeUUUDU
Days
“Using
expert
advice”
A basic binary setting
Slide44
Days
1
2
3
4
5
Cost
Expert
1
U
U
D
U
U
1
Expert
2
D
D
D
D
D
3
Expert
3
U
D
U
U
D
4
Our decision
U
D
D
U
U
2
Outcome
U
U
D
D
U
“Using
expert
advice”
A basic binary setting
Slide55
How well can we do?
We would like to do almost as well as the best expert,with hindsight.
If all “experts” are bad, we cannot do too well.
“Using
expert
advice”
A basic binary setting
Slide66
The Weighted Majority algorithm[LittlestoneWarmuth (1994)]
Assign each expert a weight.The weight of the th expert at day is .
On day 1, all weights are 1: , .
At day , predict Up or Down according to the weighted majority of the experts.
Choose a parameter .
Predict up, if , – sets of experts predicting Up/Down at day .
Update the weights:
Slide7
7
The Weighted Majority algorithm[LittlestoneWarmuth (1994)]
– Number of mistakes of up to time .
– Number of mistakes of up to time .
Theorem: For every ,
In particular, the inequality holds for the best expert.
Thus, the cost of the Weighted Majority algorithm is only slightly larger than twice the cost of the best expert!
(We can do even better using a randomized algorithm.)
Slide88
The Weighted Majority algorithm[LittlestoneWarmuth (1994)]
Theorem: For every ,
Let . Clearly, .
If makes a mistake in day , then.
Proof:
Slide9
9
The Weighted Majority algorithm[LittlestoneWarmuth (1994)]
Using
, for
.
Slide10
10
, for
.
Slide11
“Using expert advice”A more general setting
On each on of days:
(Each one of “experts” suggests a course of action.)
We choose a (probability) distribution over the experts.
We pay the average cost according to the distribution chosen.
The costs of choosing each expert are revealed.All costs are in .
Our goal is to minimize our total cost.
Alternative interpretation: On each day a random expert is drawn according to the distribution chosen. We pay the expected cost.
Slide1212
1234Cost123Our cost
1234Cost123Our cost
Experts
“Using expert advice”A more general setting
Days
Slide13“Using expert advice”A more general setting
Experts
Days
1
2
3
4
Cost
1
2
3
Our cost
Slide1414
The Multiplicative Weights algorithm[CesaBianchi, Mansour, Stoltz (2007)]
The weight of expert at day is .
, .
Choose a parameter
.
Update the weights:
Let be the costs at day
At day use the distribution:
Slide15
15
Theorem: Assume that and that .Let be the distribution used by at day .Then, for every ,
The Multiplicative Weights algorithm[CesaBianchi, Mansour, Stoltz (2007)]
Slide16
The Multiplicative Weights algorithm[CesaBianchi, Mansour, Stoltz (2007)]
Using
,
for
.
Slide17
17
Applications of theMultiplicative Weights algorithm
Learning a linear classifier (The Winnow algorithm)
Boosting the performance of weak learners (cf. Adaboost)
Approximately solving 0sum 2player games
Approximately solving packing Linear Programs
Special case: Multicommodity flow
Approximately solving Semidefinte Programs
Special case: SDP relaxation of MAX CUT
…
Approximately solving
covering
Linear Programs
Slide1818
Learning a Linear Classifier
Assume,
w.l.o.g
., that the hyperplane passes through
the
origin
and that
,
.
Slide19
19
Learning a Linear Classifier
Assume there exists , ,such that , for .
Find , ,such that , for .
Let ,
Let
.
Slide20
20
Learning a Linear Classifier The Winnow algorithm [Littlestone (1987)]
Experts correspond coordinates (also known as features).
Run with .
In each iteration, if is a good classifier, stop.
Otherwise, let be such that .
Let .
Theorem: If there exists a classifier such , , then Winnow finds a classifier such that , , after at most iterations.
Slide21
21
Learning a Linear Classifier The Winnow algorithm [Littlestone (1987)]
For every coordinate (expert) we have:
Thus, for every distribution we have:
We choose .
Slide22
22
The Winnow algorithm [Littlestone (1987)]
, for some
such that
.
Slide23
23
0sum 2player matrix games
ROW
chooses a row .
COLUMN chooses a column .
ROW pays COLUMN .
No player wants to go first…
Suppose the players play
simultaneously.
Playing deterministically is like playing first.
Use
randomized
(
mixed
) strategies.
Slide2424
0sum 2player matrix games
Randomized (mixed) strategy for ROW:A distribution over the rows of .
Randomized (mixed) strategy for COLUMN:A distribution over the columns of .
If ROW uses and COLUMN uses , the expected payoff is:
Slide25
25
0sum 2player matrix games
Von Neumann’s minmax theorem:
s.t.
s.t.
LP
Duality
Slide2626
0sum 2player matrix games
ROW
chooses a row .
COLUMN chooses a column .
ROW pays COLUMN .
What is the
value and what are the optimal strategies?
value
=
1
Slide2727
Solving 0sum games approximately
Value and optimal strategies can be found by solving an LP.
Can be done in polynomial time, but relatively slowly.
In many situations a good approximation is sufficient.
W.l.o.g., assume that all entries of are in .
Let be the value of . Let .
and are optimal strategies iff:
Slide28
28
0sum games using Multiplicative Updates[FreundSchapire (1999)]
Experts correspond to the rows of .
A distribution over the experts is a mixed strategy for ROW.
In iteration , the algorithm produces a distribution
The cost vector is the column of maximizing .
Theorem: If is run with for iterations, then the best strategy obtained is optimal for ROW.If has columns, the total running time is .An optimal strategy for COLUMN can also be found .
Note that .
Slide29
29
0sum games using Multiplicative Updates[FreundSchapire (1999)]
For any distribution , and in particular , we have
Slide30
30
0sum games using Multiplicative Updates[FreundSchapire (1999)]
For at least one
we have:
Thus, if
minimizes , then is optimal for ROW.
( is also optimal for ROW.)
Slide31
31
0sum games using Multiplicative Updates[FreundSchapire (1999)]
Let
be such that
.
For every ,
Hence,
, for every ,so is optimal for COLUMN.
Slide32
Rewards instead of costs
On day we get a reward vector , instead of a cost vector .
Maximize reward instead of minimizing cost.
Simply let .
Multiplicative weight update:
Theorem: Assume that and that .Let be the distribution used by at day .Then, for every ,
Slide33
33
5
12
8
15
3
1
9
16
13
2
17
25
11
30
18
22
Maximum
Multicommodity
Flow
13
8
2
2
Slide34
Maximum Multicommodity Flow
– A directed graph (the flow network)
– A capacity function
 sourcesink pairs.
Maximize the total flow, i.e., the flow sent from to , plus the flow sent from to , etc.
Different commodities can share the edges of the network.
The total flow on an edge should not exceed its capacity.
Exercise: Express the maximum multicommodity flow as a linear program of polynomial size. (Hint: For everyedge introduce flow variables ….
Slide35
35
Maximum Multicommodity Flow
We use a different LP formulation of the problemof possibly exponential size (!)
Let be the set of simple directed paths from to , and from to , etc.
For , let be a variable that expresses the flow, of the appropriate commodity, on .
We want to maximize
subject to , for every .
s.t. , ,
Slide36
A polynomial time approximation algorithm.
Maintain a weight function on the edges.
Find a shortest path w.r.t. .
Route units of flow on , where .
Define , if , and , otherwise.
Use Multiplicative weight updates with .
Let be the total flow so far on . Stop when .
Downscale the flow to establish all capacity constraints.
Maintain a flow (may violate the capacity constraints).
In each iteration:
Maximum Multicommodity flow[GargKönemman (2007)]
Our presentation follows
[Arora
Hazan
Kale (2012)]
.
Slide37Maximum Multicommodity flow[GargKönemman (2007)]
Let
be the optimal flow and let
Slide38
38
Maximum Multicommodity flow[GargKönemman (2007)]
Let
be the optimal flow and let
Let be arbitrary edge weights.
Let be a shortest path w.r.t. edge lengths .
Slide39
39
Maximum Multicommodity flow[GargKönemman (2007)]
Maximum congestion
Upon termination:
Scale down the flow by
:
is an

approximate maximal flow!
Slide40
40
Maximum Multicommodity flow[GargKönemman (2007)]
How many iterations are needed?
We stop the algorithm when maximum congestion .
Each iteration adds 1 to the congestion of at least one edge.
Thus, number of iterations is at most .
Total running time is .
[Fleisher (2000)] reduced the running time to .
Slide41
Positive Semidefinite Programming
max s.t. ,
,
(matrix inner product)
((symmetric) positive semidefinite)
for every
Can also be approximated using multiplicative updates.
Interesting application:
Approximation algorithm for
MAX CUT
Slide4242
Bibliography
Sanjeev Arora,
Elad
Hazan
,
Satyen
Kale,
The Multiplicative Weights Update Method:
A MetaAlgorithm and Applications,
Theory of Computing, Volume 8 (2012), pp. 121164
Slide4343
Bonus material
Not covered in class this term
“Careful. We don’t want to learn from this.”(Calvin in Bill Watterson’s “Calvin and Hobbes”)
Slide4444
Packing Linear Programs
Find a feasible ,or show that none exists.
, , is a “simple” convex set
, for every .
Packing:
By scaling, we sometimes assume that .
Willing to settle for such that
ORACLE: Given a distribution on the rows of ,return such that , or “no” if none exists.
If ORACLE returns “no” for any distribution , then the problem is infeasible.
Slide45
45
Packing Linear Programs
is convex iff ,
“Simple” is used informally. The only requirement is that ORACLE can be efficiently implemented.
Example: .
, , is a “simple” convex set
Find a feasible ,or show that none exists.
Slide46
46
Packing Linear Programs
ORACLE is bounded iff for every point returned and every ,
This is automatic,
as , .
The
width.
,
, is a “simple” convex set
ORACLE: Given a distribution on the rows of ,return such that , or “no” if none exists.
Find a feasible ,or show that none exists.
Slide47
Packing LPs using multiplicative weights[PlotkinShmoysTardos (1995)]
Experts correspond to the linear constraints (rows of ).
A distribution corresponds to the constraint .
The costs at iteration are determined by a point .
Note: Satisfied constraints are more costly.
Use to produce distributions .
In iteration apply ORACLE to to obtain and .
If ORACLE returns “no” in any iteration, problem infeasible.
Run for and return .
Slide48
Packing LPs using multiplicative weights[PlotkinShmoysTardos (1995)]
Theorem: For any , after iterations of , , with an ORACLE, the point satisfies and .
As
is the
ORACLE’s response to , we have:
For every constraint (“expert”) we have:
Slide49
49
Useful fact:
Slide50
Packing LPs using multiplicative weights[PlotkinShmoysTardos (1995)]
ORACLE
is
bounded
Slide51
Maximum Multicommodity Flow
Using binary search can be essentially reduced to:Is there a feasible multicommodity flow of value ?
This is now a packing problem.
s.t. , ,
ORACLE is given a weight for each edgeand has to find a flow , if there is one, such that
Note: The flow returned by ORACLE does not have to satisfy allthe capacity constraints. Only one weighted capacity constraint.
Slide52
Maximum Multicommodity flow
,
ORACLE is given a weight for each edgeand has to find a flow , if there is one, such that
Find a path that minimizes .
If , send units of flow on , i.e., .
Otherwise, return “no”.
ORACLE just needs to solve shortest paths problems.
Slide53
53
Maximum Multicommodity flow
How good is the algorithm obtained using the framework?
Number of iterations is
In each iteration, solve shortest paths problems in time.
We also need to multiply by the cost of the binary search.
ORACLE
is bounded iff for every point returned and every ,
In our case:
, where
The running time is
The running time is not polynomial!
Today's Top Docs
Related Slides