/
Margin Learning, Online Learning, and The Voted Margin Learning, Online Learning, and The Voted

Margin Learning, Online Learning, and The Voted - PowerPoint Presentation

sherrill-nordquist
sherrill-nordquist . @sherrill-nordquist
Follow
402 views
Uploaded On 2017-07-04

Margin Learning, Online Learning, and The Voted - PPT Presentation

Perceptron SPLODD AE 3 2011 Autumnal Equinox Review Computer science is full of equivalences SQL relational algebra YFCL optimizing on the training data g cc O4 ID: 566353

classif markov random review markov classif review random models loglinear learning distribution net dependency hmm program probability bayes crf

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Margin Learning, Online Learning, and Th..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Margin Learning, Online Learning, and The Voted Perceptron

SPLODD~= AE* – 3, 2011

* Autumnal EquinoxSlide2

Review

Computer science is full of equivalencesSQL  relational algebra

YFCL optimizing … on the training data

g

cc

–O4

foo.c

gcc

foo.c

Also full of

relationships

between

sets

:

Finding smallest error-free decision tree >> 3-SAT

DataLog

>> relational algebra

CFL >>

Det

FSMs =

RegExSlide3

Review

Bayes Nets: describe a (family of) joint distribution(s) between random variables

They are an

operational description

(a program) for how data can be

generated

They are a

declarative description

(a definition) for the joint distribution, and from this we can derive

algorithms

for doing stuff other than generation

There is a close connection between Naïve

Bayes

and

loglinear

modelsSlide4

NB vs loglinear

models

Loglinear

classif

.

NB

classif

.

Multinomial?

c

lassif

.

*

SymDir(100)

* AbsDisc(0.01)

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

* Max CL(y|x) + G(0,1.0)

*

NB-JL

NB-CL

NB-CL*Slide5

NB vs

loglinear models

Loglinear

classif

.

NB

classif

.

Multinomial?

c

lassif

.

*

SymDir(100)

*

*

*

*

*

*

*

*

* Max CL(y|x) + G(0,1.0)

*

Y

W

j

“Optimal if”Slide6

Similarly for sequences…

An HMM is a Bayes netIt implies a set of independence assumptions

ML parameter setting and

Viterbi

are optimal if these hold

A CRF is a Markov field

It implies a set of independence assumptions

These, plus the goal of maximizing Pr(

y

|x

), give us a learning algorithmYou can construct features so that any HMM can be emulated by a CRF with those featuresSlide7

In sequence space…

CRF/

l

oglinear

models

HMMs

Multinomial? models

*

SymDir

(100)

*

AbsDisc

(0.01)

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

* Max CL(

y|x

) + G(0,1.0)

*

JL

CL

CL*Slide8

Review: CRFs/Markov Random Fields

When will

prof

Cohen post the notes

Semantics of a Markov random field

Y

1

Y2

Y3

Y4

Y5

Y6

Y7

What’s independent

: Pr(

Y

i

|other

Y’s) = Pr(Y

i

|Y

i-1

,Y

i+1

)

Probability distribution: Slide9

Review: CRFs/Markov Random Fields

B

I

O

B

I

O

B

I

O

B

I

O

B

I

O

B

I

O

B

I

O

When will prof Cohen post the notes … Slide10

Review: CRFs/Markov Random Fields

When will

prof

Cohen post the notes

Y

1

Y2

Y3

Y4

Y5

Y6

Y7

What’s independent

: Pr(

Y

i

|other

Y’s) = Pr(

Y

i

|neighbors

of Y

i

)

Probability distribution:

YfSlide11

Pseudo-likelihood and dependency networks

Any Markov field defines a (family of) probability distributions DBut

not

a simple program for generation/sampling

We can use MCMC in the general case

If you have for each node

i

, P

D

(X

i|Pai), that’s a dependency netStill no simple program for generation/sampling (but can use Gibbs)

You can learn these from data using YFCLEquivalently: learning this maximizes pseudo-likelihood, just as HMM learning maximizes (real) likelihood on a sequence.

A weirdness: every MRF has an equivalent dependency net, but every dependency net (set of local conditionals) does not have an equivalent MRFSlide12

And now for …