The Internet Economy With i Ilan Kremer and Yishay Mansour 1 ii Jacob Glazer and Ilan Kremer Study Internet but not only applications like Crowd funding Tripadvisor ID: 784990
Download The PPT/PDF document "Implementing the “Wisdom of the Crowd..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Implementing the “Wisdom of the Crowd”The Internet Economy
Withi) Ilan Kremer and Yishay Mansour
1
ii) Jacob Glazer and
Ilan
Kremer
Slide2Study Internet (but not only) applications like Crowd funding, Tripadvisor, Netflix,
Waze, Amazon, OKCupid, and many more, that attempt to implement the wisdom of the crowd.
2
The multi-arm bandit problem (first paper).
Rel. Lit: “Optimal Design for Social Learning” Horner and Che
To study these applications, we take a mechanism design approach to two classical economic problems.
Information cascading (second paper). Rel. Lit: “Optimal Voting Schemes with Costly Information Acquization” Gershkov and Szentes
These sites (often called expert sites) collect information from customers while making recommendations to them.
MOTIVATION
Slide3Model Agents arrive sequentially:H
ave prior on possible rewards from a set of actions/arms. Each makes one choice, and gets reward.
Only planner observes (part of) the history. Interested
in maximizing social welfare. Choose what information to
reveal.
Agents are strategic. Know planner’s strategy
Model I: Planner observes the whole history, choices & rewards.
When IC constraints are ignored this is the well known Multi Arm Bandit problem.
Model II: Planner observes only the choices made by agents but not their rewards. When history is fully revealed, this is the model of
Information Cascade (with costly signals).
Slide4Research QuestionControlling revelation of information, can the planner induce exploration, prevent an early information cascading?
What is the optimal policy of the planner.
What is the expected loss compared to the first best outcome?
Slide5Waze: Social Media User based navigationReal time navigation recommendations based on user inputs; Cellular and GPS.
Recommendation dilemma:Need to try alternate routes to estimate timeWorks well only if attracts large number of users
Motivation:
The site’s manager is interested in maximizing the social welfare
Slide66
Websites such as TripAdvisor.com and yelp.com (and many others) try to Implement the ‘wisdom of the crowds’.
Motivation
How the ranking is done? How it should be done?
They collect information from customers while making recommendations to them by providing a ranking.
The site’s manager is interested in maximizing the social welfare
Works
well only if
attracts
large
number of users
Slide7Motivation
Also Crowd funding websites (
InvestingZone
or
CrowdCube
), or matching site like
OKCupid, and many others, are all relevant examples.
I
n both cases the same conflict arises between the site and the agents. Your Amazon.com “We compare your activity on our site with that of other customers, and using this comparison, are able to recommend other items that may interest you in your
Amazon.com
Your
recommendations change regularly based on a
number
of
factors,
including
…..,
as well as changes in
the
interests of other customers like you.
“
Slide8In an interview to the NYT (Sep. 6, 2014), Mr. Rudder CEO and cofounder of OkCupid said:
“We told users something that wasn’t true....People come to us because they want the website to work, and we want the website to work.”
“
Guess what, everybody,” he
added,
“if you use
the Internet, you’re the subject of hundreds of experiments at any given time, on every
site.”We are interested in how much “manipulation” (experimentation) can be exercised when agents are strategic.
Motivation
Slide9Multi-Arms Model(simplest possible example)Two actions: a1 and a
2N risk-neutral agents
Each action has a fix unknown reward
R
1
and R2
(r.v).Prior over the rewards; E[R1
] > E[R2]=μ2
Planner observes choices and rewards.
provides agent n with message mn Some information about past.
Slide10ExampleAction 1 has prior Uniform in [-1,+2]Action 2 has prior Uniform in [-2,+2]
No information:All agents prefer action 1, the better a priori actionNo explorationFull information:
Assume first agent observes a value of zero or above
Then no
incentive for
other agents
to explore action 2
0
-12-22Can we do better?Action 2
Action 1
Slide11Impossibility ExampleAction 1 prior Unif[3/4, 5/4]Action 2 prior Unif[-2,+2]E[R
2] = 0 < R1Agent n knows:all prior agents preferred action 1Hence, he too prefers action 1Planner has no influence
Action 2
Action 1
µ
2
0
+2
-2Required Assumption: Pr[ R1 < μ2 ] > 0
Slide12Basic properties of optimal mechanism:A mechanism is a sequence of functions {M
t}tєN where
M
t
: H
t-1
→ MSufficient to consider recommendations policy that are IC (Myerson 1986).
{Пt}tєN
where Пt: Ht-1 → {1,2}
Two natural IC constraintsE[R2-R1 | recommend(2) ] ≥ 0
E[R
1
-R
2
| recommend(1) ] ≥ 0
Sufficient to consider only action
2
A mechanism that is
IC
for action
2
is automatically
IC
for action
1
Slide13The optimal policy is a partition policyRecommends to first agent action 1The only IC recommendationIf both actions are sampled, recommend the better
Mapping from values of r1 to agent that explores Conclusion: Consider only partition policy
R
1
agent 4
agent 4
agent 3
agent 5
No exploration
Slide14The optimal policy is a threshold policyAgent 1: recommend
action 1. Planner observes reward r1Agent 2: explores for all values below E[R2
] (and above) Thresholds
Agent
t
>2
:Both actions sampled: recommend the better actionOtherwise: If r1 < θt then recommend action 2 otherwise action
1Intuition: Inherent tradeoff between two potential reasons for being recommended action 2
Agent 2 Agent 3
Agent 4Agent 5µ2 No exploration
IC
constraints are tight
Slide15Recall the basic IC constraint:
OPTOMALITY
Proper Swap:
B
2
B
1
Agent t
1
(5)
Agent t
2
(10)
R
1
b
2
b
1
Since
B
2
<B
1
Pr
[b
2
]>
Pr
[b
1
]
What is NOT a threshold policy:
Exploitation > 0
Exploration
< 0
Slide16Information Cascading Modelplanner observes only choices not outcomes
AGENT: risk-neutral arrive sequentially.Known arrival order; do not observe history.
Each agent is asked to choose
an
action and then get a reward
Before making a choice, an agent, at a
cost
c>0, can obtain an informative signal.
Two actions
A and B. One action is “good” and yields a payoff of one while the other is “bad” and yields a payoff of zero.
Slide17There exists a planner who observes (only) the chosen actions (
A or B) taken by all agents. For
every t
the
planner decides what message to send
the agent.
Planner's objective is to maximize the discounted present value of all agents' payoffs.
Let pt : H
t-1 → [0,1] denotes planner’s posterior after t-1 observations.
Let μt : {M} → [0,1] denotes agent t’s posterior.
Slide18Information structure and belief’s law of motionIf A is the good action
, the signal gets the value sa with
probability 1
If
B
is the good action, the signal gets the value sb
with probability q, and s
a with probability 1-q. Note that sb is fully revealing.
If a signal is obtained by agent t, then01
p
0
s
a
s
a
s
a
s
b
Prob
(A)
Slide19Agent t's utility maximizing decision is given by
0
μ
b
1
μ
a
p
0
Planner’s first-best maximizing decision is given by a threshold. Commitment to full revelation → too little explorationCommitment to no revelation → too much exploration
b
a
e
Preliminaries
0
p
b
1
p
a
p
0
b
a
e
Slide20Basic properties of optimal mechanism:T
he optimal mechanism is:
Phase one:
As long as there is no conflict between the planner and agent
t
,
(i.e., pt є [μb
, μa] ) full revelation is exercised.
20
0
1
μ
a
p
0
e
20
0
1
μ
a
p
0
Or
t =1
t =2
t =3
t =
e
(
ii)
Public Mechanism
(i) A
Recommendation Mechanism where
M
t
: H
t-1
→ {
a,b,e
}
(
iii)
Three phases
Slide21Phase two: If the first
agents obtained a signal sa then for all
<
t≤t
*,
m
t=e, and μ
t= μa
. This is achieved by committing to recommend e even after the planner learned that the good action is B.
0μa
p
0
t =1
t =2
t =3
t =
Phase three:
For all
t≥t
*
the planner recommend
B
is
p
t
=0
, and otherwise
A
. Note that
p
t
*
is either zero or less than p
a
.
t =
t*
p
a
Main idea of proof. Second best is like first best with increasing cost. The extra cost of keeping
μ
t
=
μ
a.
Thank You !
Slide23Thank You !23
Slide2424Your Amazon.com We compare your activity on our site with that of other customers, and using this comparison, are able to recommend other items that may interest you in Your recommendations change regularly based on a number of factors, including ….., as well as changes in the interests of other customers like you.
Slide25ExampleAssume: R₁ ~ U [-1 , 5] R
2 ~ U [-5 , 5] N large (optimal to test the two alternatives).25
Full Transparency
Agent 2 chooses second alternative only if R
₁≤0. Otherwise
all agents choose the first alternative.
Outcome is suboptimal for large N
0
55-1-5
Slide26recommends 2nd alternative to agent 2 whenever R₁≤1.This is IC because E[R1 | recommend(2) ] = 0
26This outcome
is more efficient than the one under full transparency.
But
we can do even better.
0
5
5
-1-51
Slide27recommends third agent to use 2nd action if one of two cases occursSecond agent tested 2nd action (R₁≤1) and the planner learned that R₂>R₁.
1<R₁≤1+x , so the third agent is the first to test 2nd action.Agent n>4 never explores regardless of N. So at most 3 agents choose the wrong action.
27
1
R
1
1+x=3.23
I
2I3
0=E[R2]I4-15
Slide28IC AnalysisAgent t1 unchangedAdded b2 to and subtracted b1 Proper swap implies equal effect.Agents other than t
1 and t2Before t1 and after t2: unchangedBetween t1 and t2:Gain (Pr
[b2] -
Pr
[b
1]) max{r
1,r2} IC holds28
Slide29Multi-Arm BanditSimple, one player, decision modelMultiple independent (costly) actionsUncertainty regarding the rewardsTradeoff between exploration and exploitation (Gittins index)29
Slide30Reflecting on RealityReport-card systemsHealth-care, education, …Public disclosure of information Patients health, students scores, …Pro:Incentives to improve qualityInformation to usersCons:Incentives to “game” the system avoid problematic cases
30We suggest a different point of view
Slide3131
Websites such as TripAdvisor.com and yelp.com (and many others) try to Implement the ‘wisdom of the crowds’.
Motivation: The New Internet Economy
How the ranking is done? How it should be done?
They collect information from customers while making recommendations to them by providing a ranking.
The site’s manager is interested in maximizing the social welfare
Works
well only if
attracts
large
number of users
Slide32Recommendation PolicyRecommendation Policy:Proof (Myerson’s (1986)):32
For agent n,Gives recommendation xn ϵ{a1,a
2}
Recommendation is
IC
if
E[Rj – Ri | xn = aj ] ≥ 0
Note that IC Implies: recommend to agent 1 action a1Claim: Optimal policy is a Recommendation Policy
M(j,n) – set of messages that cause agent n to select action aj.H(j,n) – the corresponding historiesE[Rj-Ri|m] ≥ 0 for m ϵ M(j,n)
Consider the recommendation aj after H(j,n)Still IC, identical outcomes
Slide33Partition PolicyPartition Policy:Optimal policy is a partition:33
Recommendation policyAgent 1: recommending action a1 and observing r1If r1 in In
,
n≤
N
Agent
n the one to explore a2Any agent n’>n uses the better of the actionsPayoff max{r1,r2
}If r1 in IN+1 no agent explores a2Disjoint subsets
In Recommending the better action when both are knownOptimizes sum of payoffsStrengthen the IC
Agent 2
Agent 3
Slide34Only worse action (a2) is “important”Proof:34Lemma:
Any policy that is IC w.r.t. a2 is IC w.r.t. a1Let
Kn denotes the
set of histories that cause
x
n
=a2E[R2–R1|hϵ Kn
] ≥0Since it is an IC policyOriginally: E[R2–R1] <0
Therefore E[R2 – R1 | not in Kn] < 0
Slide35Optimality → Tight IC constraintsLemma: Proof:35
If agent n+1 explores (Pr[In+1]>0), then agent n has a tight
IC constraint.
Move exploration from agent
n+1
to agent n (
r1 ϵ ѴϵIn+1)Improves sum of payoffs For r1
ϵѴ replaces r1+R2 by R2 + max{r
1,r2}Keeps the IC for agent n (since it was not tight) and n+1 (remove exploration)
R1InIn+1( Ѵ )
Slide36Information CascadingBikhchandani, Hirshleifer, and Welch (1992), Banerjee (1992) 36
OR
Agents ignore (or do not acquire) own signals.
Same exercise is conducted but now planner observes only actions, and private signals are costly (Netflix)
Slide37The Story of Coventry and TuringIn November 1940, Prime Minister, Winston Churchill, knew several days in advance that the Germans would attack Coventry but deliberately held back the information. His intelligence came
from the scientists at Bletchley Park, who, in utmost secrecy, had cracked the Enigma code the Germans used for their military communications. Warning the city of Coventry and its residents of the imminent threat would have alerted the Germans to the fact that their codes had been cracked.
Churchill considered it worth the sacrifice of a whole city and its people to protect his back-door route into
Berlin’s
secrets.
The imitation game
37
Slide38How good is optimal?!The expected loss due to ICBounded (independent of N)Bounding the number of exploring agents by:Where38
Slide39ProofConsider the ‘exploitation’ term for agent n>2. It is an increasing sequence as for higher n the planner becomes better informed. Hence, it is bounded from bellow by the ‘exploitation’ term of agent
3. This in turns is bounded below by α.The sum of the ‘exploration’ terms is bounded by 39
Extensions40
Slide41Introducing money transferBasically same policyPlanner invest all the money in agent 2Gets more exploration as early as possible.Otherwise, same construction.When money costs money:The planner will subsidize some exploration of agent 2Other agents as before.
41
Slide42Relaxing agents knowledgeSo far agents knew their exact placeRelaxation: Agents are divided to blocksearly users, medium, late usersEssentially the same property holdsIn each block only the first explores
Blocks can only increase social welfareThe bigger the blocks the closer to first-best42
Slide43Optimal Policy: performanceAction 1 is better:Only one agent explores action 2Action 2 is better:Only a finite number of agents explore action 1. This number is bounded and the bound is independent of
N. => Conclusion Aggregate loss compared to first best is bounded43
Slide44Now to some proofs …44
Slide45Basic IC constraintsRecommendation policy With sets In
45
Positive (exploitation)
Negative (exploration)
R
1
I
n-1
InIn-1
InIn+1E[R2
]
Slide46Threshold policyPartition policy such that In = (in-1,in
]I2 = (-∞,i2)IN+1 = (iN,∞)
Main Characterization: The optimal policy is a threshold policy
46
No exploration
Agent 2
Agent 3
Agent 4
Agent 5R1
Slide4747
Motivation: The New Internet Economy
Also websites such as Netflix, Amazon OKCupid, Tripadvisor and many others.
Regardless of what the planner/site observes, in both cases the same conflict arises between the site and the agents.
Slide4848Crowd Funding sites collect information from investors by monitoring their choices and, use this information in making recommendations to future investors.
Motivation: The New Internet Economy
Also websites such as Netflix, Amazon OKCupid, Tripadvisor and many others.
Regardless of what the planner/site observes, in both cases the same conflict arises between the site and the agents.