/
1 Basics of information theory and information complexity 1 Basics of information theory and information complexity

1 Basics of information theory and information complexity - PowerPoint Presentation

liane-varnes
liane-varnes . @liane-varnes
Follow
361 views
Uploaded On 2018-10-30

1 Basics of information theory and information complexity - PPT Presentation

June 1 2013 Mark Braverman Princeton University a tutorial Part I Information theory Information theory in its modern format was introduced in the 1940s to study the problem of transmitting data over physical channels ID: 703849

communication information cost complexity information communication complexity cost protocol entropy channel bits direct alice learns bob prior error compression

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "1 Basics of information theory and infor..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

1

Basics of information theory and information complexity

June 1, 2013

Mark BravermanPrinceton University

a tutorialSlide2

Part I: Information theory

Information theory, in its modern format was introduced in the 1940s to study the problem of transmitting data over physical channels.

2

communication channel

Alice

BobSlide3

Quantifying “information”

Information is measured in bits.The basic notion is Shannon’s entropy.

The entropy of a random variable is the (typical) number of bits needed to remove the uncertainty of the variable. For a discrete variable:

 

3Slide4

Shannon’s entropy

Important examples and properties:If

is a constant, then

If

is uniform on a finite set

of possible values, then

.

If

is supported on at most

values, then

.

If

is a random variable determined by

, then

.

 

4Slide5

Conditional entropy

For two (potentially correlated) variables

, the

conditional entropy of given is the amount of uncertainty left in

given

:

.

One can show

.

This important fact is knows as the

chain rule.

If

, then

 

5Slide6

Example

Where

.

Then

;

;

;

 

6Slide7

Mutual information

 

7

 

 

 

 

 

 

 

 

 

 Slide8

Mutual information

The mutual information is defined as

“By how much does knowing

reduce the entropy of

?”

Always non-negative

.

Conditional mutual information:

Chain rule for mutual information:

Simple intuitive interpretation.

 

8Slide9

Example – a biased coin

A coin with -Heads or Tails bias is tossed several times.

Let

be the bias, and suppose that a-priori both options are equally likely:

.

How many tosses needed to find

?

Let

be a sequence of tosses.

Start with

.

 

9Slide10

What do we learn about

?

 

Similarly,

To determine

with constant accuracy, need

.

 

10Slide11

Kullback–Leibler

(KL)-DivergenceA distance metric between distributions on the same space.

Plays a key role in information theory.

, with equality when

.

Caution:

!

 

11Slide12

Properties of KL-divergence

Connection to mutual information:

If

,

then

,

and both sides are

.

Pinsker’s

inequality:

Tight!

.

 

12Slide13

Back to the coin example

.

“Follow the information learned from the coin tosses”

Can be done using combinatorics, but the information-theoretic language is more natural for expressing what’s going on.

 Slide14

Back to communication

The reason Information Theory is so important for communication is because information-theoretic quantities readily operationalize.

Can attach operational meaning to Shannon’s entropy:

“the cost of transmitting ”.Let

be the (expected) cost of transmitting a sample of

.

 

14Slide15

?

 

Not quite.

Let trit

.

It is always the case that

.

 

15

1

0

2

10

3

11Slide16

But

and

are close

 Huffman’s coding:

This is a

compression result

: “an uninformative message turned into a short one”.

Therefore:

.

 

16Slide17

Shannon’s noiseless coding

The cost of communicating many copies of

scales as

. Shannon’s source coding theorem:Let

be the cost of transmitting

independent copies of

. Then the

amortized transmission cost

.

This equation gives

operational meaning.

 

17Slide18

communication channel

 

18

 

per copy to transmit

’s

 Slide19

is nicer than

 

is additive for independent variables.

Let

be independent trits.

.

Works well with concepts such as

channel capacity.

 

19Slide20

“Proof” of Shannon’s noiseless coding

Therefore

.

 

20

Additivity of entropy

Compression (Huffman)Slide21

Operationalizing other quantities

Conditional entropy

(cf.

Slepian-Wolf Theorem).  

communication channel

 

per copy

to transmit

’s

 

 Slide22

communication channel

Operationalizing other quantities

Mutual information

:

 

 

per copy to

sample

’s

 

 Slide23

Information theory and entropy

Allows us to formalize intuitive notions. Operationalized in the context of one-way transmission and related problems. Has nice properties (additivity, chain rule…)Next, we discuss extensions to more interesting communication scenarios.

23Slide24

Communication complexity

Focus on the two party randomized setting.

24

A

B

X

Y

A & B implement a functionality

.

 

F(X,Y)

e.g.

 

Shared randomness RSlide25

Communication complexity

A

B

X

Y

Goal

: implement a functionality

.

A protocol

computing

:

 

F(X,Y)

m

1

(X,R)

m

2

(Y,m

1

,R)

m

3

(X,m

1

,m

2

,R)

Communication cost = #of bits exchanged.

Shared randomness RSlide26

Communication complexity

Numerous applications/potential applications (some will be discussed later today). Considerably more difficult to obtain lower bounds than transmission (still much easier than other models of computation!).

26Slide27

Communication complexity

(Distributional) communication complexity with input distribution

and error :

Error

w.r.t.

.

(Randomized/worst-case) communication complexity:

.

Error

on all inputs.

Yao’s minimax:

.

 

27Slide28

Examples

Equality

.

.

 

28Slide29

Equality

is

.

is a distribution where

w.p

.

and

w.p

.

are random.

 

A

B

X

Y

Shows that

 

MD5(X) [128 bits]

X=Y? [1 bit]

Error?Slide30

Examples

I

.

.

In fact, using information complexity:

.

 

30Slide31

Information complexity

Information complexity

::

communication complexity

a

s

Shannon’s entropy

::

transmission cost

 

31Slide32

Information complexity

The smallest amount of information

Alice and Bob need to exchange to solve . How is information measured?

Communication cost of a protocol?Number of bits exchanged. Information cost of a protocol?Amount of information revealed.

 

32Slide33

Basic definition 1: The information cost of a protocol

Prior distribution:

.

 

A

B

X

Y

Protocol

π

Protocol transcript

 

 

what Alice learns about

Y

+ what Bob learns

about

XSlide34

Example

is

.

is a distribution where

w.p

.

and

w.p

.

are random.

 

A

B

X

Y

1 + 64.5 = 65.5 bits

 

what Alice learns about

Y

+

what

Bob learns

about

X

MD5(X) [128 bits]

X=Y? [1 bit]Slide35

Prior

matters a lot for information cost!

 

If

a singleton,

 

35Slide36

Example

is

.

is a distribution where

are just uniformly random.

 

A

B

X

Y

0 + 128 = 128 bits

 

what Alice learns about

Y

+

what

Bob learns

about

X

MD5(X) [128 bits]

X=Y? [1 bit]Slide37

Basic definition 2: Information complexity

Communication complexity:

.

Analogously:

.

 

37

Needed!Slide38

Prior-free information complexity

Using minimax can get rid of the prior. For communication, we had:

.

For information

.

 

38Slide39

Ex: The information complexity of Equality

What is

?

Consider the following protocol.  

39

A

B

X in {0,1}

n

Y in {0,1}

n

A

non-singular in

 

A

1

·X

A

1

·Y

A

2

·X

A

2

·Y

Continue for

n

steps, or until a disagreement is discovered.Slide40

Analysis (sketch)

If X≠Y, the protocol will terminate in O(1)

rounds on average, and thus reveal O(1) information. If X=Y… the players only learn the fact that

X=Y (≤1 bit of information). Thus the protocol has O(1) information complexity for any prior

.

 

40Slide41

Operationalizing IC: Information equals amortized communication

Recall [Shannon]:

.

Turns out:

, for

.

[Error

allowed on each copy]

For

:

.

[

an interesting open problem.]

 

41Slide42

Information = amortized communication

.

Two directions: “

” and “

”.

 

 

Additivity of entropy

Compression (Huffman)Slide43

The “

” direction

 

.

Start with a protocol

solving

, whose

is close to

.

Show how to

compress

many copies of

into a protocol whose communication cost is close to its information cost.

More on compression later.

 

43Slide44

The “

” direction

 

.

Use the fact that

Additivity

of information complexity:

 

44Slide45

Proof: Additivity of information complexity

Let

and

be two two-party tasks.

E.g. “Solve

with error

w.r.t.

Then

” is easy.

” is the interesting direction.

 Slide46

 

Start from a protocol

for

with prior

, whose information cost is

.

Show how to construct two protocols

for

with prior

and

for

with prior

, with information costs

and

, respectively, such that

.

 

46Slide47

 

Publicly sample

Bob privately samples

Run

 

Publicly sample

Alice privately samples

Run

 Slide48

Publicly sample

Bob privately samples

Run

 

Analysis -

 

Alice learns about

:

Bob learns about

 

48Slide49

Publicly sample

Alice privately samples

Run

 

Analysis -

 

Alice learns about

:

Bob learns about

 

49Slide50

Adding

and

 

.

 

50Slide51

Summary

Information complexity is additive. Operationalized via “Information = amortized communication”.

.

Seems to be the “right” analogue of entropy for interactive computation.

 

51Slide52

Entropy vs. Information Complexity

Entropy

IC

Additive?

Yes

Yes

Operationalized

Compression?

Huffman:

???!

Entropy

IC

Additive

?

Yes

Yes

Operationalized

Compression?

???!Slide53

Can interactive communication be compressed?

Is it true that

?

Less ambitiously:

(Almost) equivalently: Given a protocol

with

, can Alice and Bob simulate

using

communication?

Not known in general…

 

53Slide54

Direct sum theorems

Let

be any functionality. Let

be the cost of implementing . Let

be the functionality of implementing

independent copies of

.

The direct sum problem:

“Does

?”

In most cases it is obvious that

.

 

54Slide55

Direct sum – randomized communication complexity

Is it true that

?

Is it true that

?

 

55Slide56

Direct product – randomized communication complexity

Direct sum

?

Direct product

?

 

56Slide57

Direct sum for randomized CC and interactive compression

Direct sum:

?

In the limit:

?

Interactive compression:

?

Same question!

 

57Slide58

The big picture

 

 

 

 

additivity (=direct sum) for information

information = amortized communication

direct sum for communication?

interactive compression?Slide59

Current results for compression

A protocol

that has

bits of communication, conveys bits of information over prior

, and works in

rounds can be simulated:

Using

bits of communication.

Using

bits of communication.

Using

bits of communication.

If

,

then using

bits of communication.

 

59Slide60

Their direct sum counterparts

For product distributions

,

When the number of rounds is bounded by

, a direct sum theorem holds.

 

60Slide61

Direct product

The best one can hope for is a statement of the type:

Can prove:

 

61Slide62

Proof 2: Compressing a one-round protocol

Say Alice speaks:

Recall KL-divergence:

Bottom line:

Alice has

; Bob has

;

Goal

: sample from

using

communication.

 

62Slide63

The dart board

q

1

q

2

q

3

q

4

q

5

q

6

q

7

….

u

1

u

2

u

3

u

4

u

5

u

6

u

7

1

0

Interpret the public randomness as random points in

,

where

is the universe of all possible messages.

First message under the histogram of

is distributed

.

 

 

u

1

u

2

u

3

u

4

u

5Slide64

64

Proof Idea

Sample using

communication with statistical error

ε

.

 

M

X

M

Y

u

1

u

1

u

2

u

2

u

3

u

3

u

4

u

4

u

4

~|U| samples

Public randomness:

q

1

q

2

q

3

q

4

q

5

q

6

q

7

….

u

1

u

2

u

3

u

4

u

5

u

6

u

7

M

X

M

Y

1

1

0

0Slide65

Proof Idea

Sample using

communication with statistical error

ε

.

 

u

4

u

2

h

1

(u

4

)

h

2

(u

4

)

65

M

X

M

Y

u

4

1

1

0

0

u

2

M

X

M

YSlide66

66

66

Proof Idea

Sample using

communication with statistical error

ε

.

 

u

4

u

2

h

4

(u

4

)…

h

log 1/

ε

(u

4

)

u

4

h

3

(u

4

)

M

X

2M

Y

M

X

M

Y

u

4

u

4

h

1

(u

4

), h

2

(u

4

)

1

1

0

0Slide67

67

Analysis

If

, then the protocol will reach round

of doubling.

There will be

candidates.

About

hashes to narrow to one.

The contribution of

to cost:

 

Done!

 Slide68

External information cost

.

 

A

B

X

Y

Protocol

π

Protocol transcript

π

 

what Charlie learns about

(X,Y)

CSlide69

Example

69

F is

“X=Y?”.μ is a distribution where

w.p

.

½ X=Y

and

w.p

.

½ (X,Y)

are random.

MD5(X)

X=Y?

A

B

X

Y

 

what Charlie learns about

(X,Y)Slide70

External information cost

It is always the case that

If

is a product distribution, then

 

70Slide71

External information complexity

.

Can it be operationalized?

 

71Slide72

Operational meaning of

?

 

Conjecture: Zero-error communication scales like external information:

Recall:

 

72Slide73

Example – transmission with a strong prior

is such that

, and

with a very high probability (say

).

is just the “transmit

” function.

Clearly,

should just have Alice send

to Bob.

.

 

73Slide74

Example – transmission with a strong prior

.

Other examples, e.g. the two-bit AND function fit into this picture.

 

74Slide75

Additional directions

75

Information complexity

Interactive coding

Information theory in TCSSlide76

Interactive coding theory

So far focused the discussion on noiseless coding. What if the channel has noise?

[What kind of noise?]In the non-interactive case, each channel has a capacity .

 76Slide77

Channel capacity

The amortized number of channel uses needed to send over a noisy channel of capacity

is

Decouples the task from the channel!

 

77Slide78

Example: Binary Symmetric Channel

Each bit gets independently flipped with probability

One way capacity

 

78

0

1

0

1

 

 

 

 Slide79

Interactive channel capacity

Not clear one can decouple channel from task in such a clean way. Capacity much harder to calculate/reason about.

Example: Binary symmetric channel. One way capacity

Interactive (for simple pointer jumping,[Kol-Raz’13]):

 

0

1

0

1

 

 

 Slide80

Information theory in communication complexity and beyond

A natural extension would be to multi-party communication complexity. Some success in the number-in-hand case.

What about the number-on-forehead?Explicit bounds for

players would imply explicit

circuit lower bounds.

 

80Slide81

Naïve multi-party information cost

+

 

81

A

B

C

YZ

XZ

XYSlide82

Naïve multi-party information cost

+

Doesn’t seem to work.

Secure multi-party computation [Ben-

Or,Goldwasser

,

Wigderson

], means that anything can be computed at near-zero information cost.

Although, these construction require the players to share private channels/randomness.

 

82Slide83

Communication and beyond…

The rest of today:Data structures; Streaming;Distributed computing;

Privacy. Exact communication complexity bounds.Extended formulations lower bounds. Parallel repetition?…

83Slide84

84

Thank You!Slide85

Open problem: Computability of IC

Given the truth table of

,

and , compute

Via

can compute a sequence of upper bounds.

But the rate of convergence as a function of

is unknown.

 

85Slide86

Open problem: Computability of IC

Can compute the

-round

information complexity of

.

But the rate of convergence as a function of

is unknown.

Conjecture

:

This is the relationship for the two-bit AND.

 

86