/
Experiments Experiments

Experiments - PowerPoint Presentation

karlyn-bohler
karlyn-bohler . @karlyn-bohler
Follow
380 views
Uploaded On 2016-04-09

Experiments - PPT Presentation

Synthetic data random linear scoring function with random constraints Information extraction Given a citation extract author booktitle title etc Given ads text extract ID: 277221

decl learning global constraints learning decl constraints global exactness decompositions inference training time variables learn structural pairwise data exact

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Experiments" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Experiments

Synthetic data:

random linear scoring function with random constraints

Information extraction:

Given a

citation, extract author, book-title, title etc.Given ads text, extract features, size, neighborhood, etc.Constraints like: ‘Title’ tokens are likely to appear together in a single block,A paper should have at most one ‘title’Domain Knowledge: HMM transition matrix is diagonal heavy – generalization of submodular pairwise potentials.)

Accuracy

Training Time (hours)

Multi-label Document Classification

Experiments on Reuters dataDocuments with multi-labels corn, crude, earn, grain, interest…Modeled as a PMN over a complete graph over the labels – singleton and pairwise components

F1 Scores

Training Time (hours)

Local Learning (LL) baselines

Global Learning (GL

) and

Dec. Learning (

DecL

)-2,3

DecL-1 aka

P

seudomax

No. of training examples

Avg. Hamming Loss

Structured Prediction

Predict

y

= {

y

1

,

y2,…,yn} 2 Y given input x Features: Á (x, y); weight parameters: w Inference: argmaxy2 Y f(x,y) = w¢ Á(x,y) Learning: estimate w

Global Learning (GL)

Exactness for Special Cases

Pairwise Markov Network over a graph with edges

E

Assume domain knowledge on

W*: we know that for separating w, if Ái,k (.;w) is:Submodular: Ái,k(0,0)+ Ái,k(1,1) > Ái,k(0,1) + Ái,k(1,0) ORSupermodular: Ái,k(0,0)+ Ái,k(1,1) < Ái,k(0,1) + Ái,k(1,0).

Structural SVMs learn by minimizing

Global inference as an intermediate step

Global Inference slow

)

Global learning time consuming!!!

Decomposed learning (DecL)

Reduce inference to a neighborhood around yj

Small neighborhoods ) efficient learning We theoretically and experimentally show that Decomposed Learning with small neighborhoods can be identical to Global Learning (GL)

Theoretical Results: Decompositions which yield Exactness

W

*: {w* | f(xj, yj ;w*) ¸ f(xj, y ;w*)+ ¢(yj,y), 8 y 2 Y , yj 2 training-data }

Wdecl: {w* | f(xj, yj ;w*) ¸ f(xj, y ;w*)+ ¢(yj,y), 8 y 2 nbr(yj), yj 2 training-data }

Exactness: DecL is exact if it has the same set of separating weights as GL – Wdecl = W*Exactness with finite data is much more useful than asymptotic consistencyMain Theorem: DecL is exact if 8 w 2 W *, 9 ² > 0, such that 8 w’2 B(w,²), 8 ( xj, yj) 2 D we have if 9 y 2 Y such that f(xj, y ; w′) + ¢(yj, y) > f(xj, yj ; w′) then 9 y’ 2 nbr(yj) with f(xj, y’ ; w′) + ¢(yj, y’) > f(xj, yj ; w′)

E

E

j

sub

(

Á

)

sup

(

Á

)

1

0

Theorem:

S

pair

decomposition consisting of connected components of

E

j

yields Exactness

Linear scoring function structured via constraints on

Y

For simple constraints, possible to show exactness for decompositions with set sizes independent of

n

Theorem

:

If

Y

is specified by

k

OR

constraints, then DecL

-

(

k

+1) is exactAs an example consequence, when Y is specified by k horn-clauses: y1,1 Æ y1,2 … Æ y1,r ! y1,r+1 , y2,1 Æ y2,2 … Æ y2,r ! y2,r+1 ,  yk,1 Æ yk,2 … Æ yk,l ! yk,r+1 decompositions with set-size (k+1), i.e. independent of the number of variables in constraints, r, yield exactness.

Baseline: Local Learning (

LL)

Approximations

to GL which ignore certain structural interactions such that the remain structure become easy to learn E.g. ignoring global constraints or pairwise interactions in a Markov network Another baseline: LL+C where we learn pieces independently and apply full structural inference (e.g. constraints, if available), during test-time Fast but oblivious to rich structural information!!

For weights immediately outside W*,Global Inseparability )DecL Inseparability

Efficient Decomposed Learning for Structured

Prediction

Rajhans Samdani and Dan Roth, University of Illinois at Urbana-Champaign

DecL: Learning via

Decompositions

Learn by varying a subset of the output variables, while fixing the remaining to their gold labels in yj

y

1

y

3

y

6

y

5

y

2

y

4

y

1

y

3

y

6

y

5

y

2

y

4

y

1

y

3

y

6

y

5

y

2

y

4

y

1

y

3

y

6

y

5

y

2

y

4

!

Decomposition

is

a collection of different (non-inclusive, possibly overlapping) sets of variables which we perform

argmax

over

S

j

= {

s

1

,

,

s

l

|

8

i

,

s

i

µ

{

1

,

,

n

};

8

i

,

k

, si * sk} Learning with Decompositions in which all subsets of size k are considered: DecL-k In practice, decompositions based on domain knowledge highly coupled variables together

Supported by the Army Research Laboratory (ARL), Defense Advanced Research Projects Agency (DARPA), and the Office of Naval Research (ONR)..

Exact Inference

Update