What’s next for Parallel Distributed Processing?
73K - views

What’s next for Parallel Distributed Processing?

Similar presentations


Download Presentation

What’s next for Parallel Distributed Processing?




Download Presentation - The PPT/PDF document "What’s next for Parallel Distributed P..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.



Presentation on theme: "What’s next for Parallel Distributed Processing?"— Presentation transcript:

Slide1

What’s next for Parallel Distributed Processing?Mathematical Cognition and Other New Directions

Jay McClelland

Stanford University

Slide2

Core features of the PDP approach to representation and learning

The knowledge is in the connectionsIt’s intrinsically implicitIt is acquired by a blind automatic procedureBy performing gradient descent rather than by making explicit inferences or engaging in any kind of reasoning processIt can approximate systems of rulesWithout ever having anyIt captures the gradual nature of developmental changeAnd emphasizes the importance of the gradual accumulation of small changes

H I N T

/h/ /i/ /n/ /t/

Slide3

Second and Third Waves ofNeural Networks

Some classical applications of PDP models:

Reading and morphology

Sentence processing

Semantic cognition

Intuitive physics

Some recent breakthroughs in machine learning:

Object classification

Speech recognition

Language processing

Surpassing human performance in Atari games

Slide4

Sentiment Analysis (Socher et al, 2013)

Slide5

Slide6

What’s Next?

Slide7

My lab’s new direction:

Mathematical Cognition

Slide8

Why is Math so Hard to Learn?

Late grade-school-aged kids misunderstand equationsWhat goes in the blank: 7 + 3 + 4 = __ + 4Many middle-school-aged kids misunderstand fractionsIs 19/20 closer to 1 or 21?Most Stanford undergraduates don’t understand the rudiments of trigonometryWhich expression below has the same value as cos(-30°)? sin(30°) -sin(30°) cos(30°) -cos(30°)

Slide9

Failure to attach the appropriate meaning to mathematical expressions

A fraction N/D represents a certain number N of pieces of a unit whole divided into D equal partsAn equation represents an equivalence relation between two quantities, one to the left and one to the right of the equals signThe sine / cosine of an angle θ in degrees representsthe projection of a point on the unit circle specified by θ onto the vertical / horizontal axis through the center of the circle, or equivalently, the coordinates of the point on the circle

XX

X

4

7

5

?

Slide10

cos(70)

Slide11

cos(–70+0)

Slide12

sin(-

θ) cos(-θ)

Reported Circle Use: “A Lot” “A Little” or “Not at all”

Slide13

Why are these things hard to learn?

Slide14

Learning Depends on the Prepared Mind

Algebra for eighth graders?

“though strong math students can benefit from taking algebra in eighth grade, it is "decidedly harmful" for weaker math students to be rushed into advanced math

concepts”

Failure to appreciate what X/Y means

Setting up the appropriate encoding habits

Failure to rely on the unit circle?

Failure of a module for visuospatial cognition or failure to develop the habit of mapping numbers into

a multi-faceted coordinate

framework?

Slide15

Habits of Mind1

Learning to encode expressions automatically so that their meaning is readily apparent in the mind requires gradual connection adjustments that occurs incrementally over repeated opportunities to learnThis is no different in principle from learning to read words aloudWe quickly loose awareness that we are engaging in these processes – once we understand well, meaning is a habit of mind we cannot readily appreciate that others do not have

Margolis, H. (1987).

Patterns,

Thinking

and Cognition.

U. of Chicago Press.

Slide16

Case Study in Readiness:The Balance Scale

Slide17

Training involved more cases in which the weight varied than cases

In which the distance variedNetwork’s task was to activate the unit corresponding to the side thatShould go down, or (if the sides are in balance) to set the activation ofboth output units to .5

Balance Scale Model

Slide18

Slide19

Slide20

Slide21

Siegler’s Readiness Experiment

Two groups of children

5 year old Rule 1 children

7-8 year old Rule 1 children

After pretext:

Children saw 15 conflict problems with feedback

1/3: the side with greater weight would go down

1/3: the side with greater distance would go down

1/3: the two sides balance

Most 7-8

yr

olds progressed to rule 2

Most 5

yr

olds showed no change or reverted to guessing

Slide22

Rule 1

Start

Rule 1

End

Slide23

Slide24

Benefit From a Brief Lesson in the Unit Circle Depends on a Prepared Mind

Slide25

Applying these ideas to Mathematical Cognition

Application 1: Representation of approximate number

Stoianov

&

Zorzi

, 2012; Zou & McClelland (poster)

Application 2: Learning to correctly

solve

equivalence problems

Mickey & McClelland (talk)

Application 3: Incremental improvements in strategies for adding small numbers

Hanson, McKenzie & McClelland (talk)

Application 4: Learning to geometry

Me and anyone who is willing to help me!

Slide26

The Approximate Number Problem

Must we really imagine that a system for representing number approximately is innate, or can the problem be solved using a generic neural network?Can we account for the developmental improvement in acuity of the approximate number system?Can we understand why our representations of approximate number have the properties that they do?Specifically, why does our sensitivity to approximate numbers approximately conform to Weber’s law?

Slide27

Why Neural Networks?

Deep (unsupervised) learning can create an invertible internal representation that is driven solely by the goal of capturing the content of its inputsAs Stoianov & Zorzi (2012) showed, this is sufficient to support human level performance in numerosity judgment.Using ‘stochastic gradient descent’ instead of batch learning allows us to explore both the initial state and progressive refinement of representations.Zou & McClelland explore the developmental trajectory, also explored in subsequent work by S & Z.

Slide28

Errors on Equivalence Problems

Children are reliably incorrect in answers problems

of the form:

a = b + __

They tend to put the sum of a and b in the blank, rather than the correct answer, which is b – a.

When given such equations in a brief presentation, and asked to reproduce them, they tend to reproduce them as

a + b = __

Children’s experiences are biased in ways that are consistent with these errors.

Slide29

Why a Neural Network?

Gradually learns in a way that depends on statistics of training set

Exhibits ‘pattern completion’

biases that capture both

math errors and problem

reconstruction errors

Gradually learns it was out of its errors, capturing patterns in the data

Slide30

These two models are great, but…

When we solve mathematical problems, we often perform a sequence of operations.

These operations are not rigidly structured, so we need flexibility

And as we gain facility, we can (spontaneously) develop more efficient strategies

Slide31

Strategy change in simple addition5 + 2 = 7

Children appear to gradually progress through a series of alternative ‘strategies’, with strategy choice being probabilistic and with the probabilities changing gradually over age

Children can be induced to change strategies if given problems that give a clear advantage to one strategy

Children’s strategies seem constrained to be consistent with the principles of addition, even though children can’t necessarily articulate such principles.

Slide32

Incremental, Hierarchical, Supervised Reinforcement Learning in a Neural Network

A strategy is a sequence of steps, and reward only comes at the end

The time it takes as well as the outcome are automatically considerations in reinforcement learning.

The use of a neural network as function

approximator

supports generalization.

Re-use of number skills previously acquired leads to selection of task-appropriate rather than task inappropriate strategies.

The ‘strategy’ as a whole emerges as an assemblage of strategy chunks, each associated with a component skill relevant to addition.

A key idea is that learning is curriculum based:

The culture and educational system provide early experiences in initial components that then provide the previously acquired skills.

Slide33

Intuitive Geometry Project: Motivations

Geometrical intuition as developing gradually with age, through a series of ‘levels’.A year’s course in Geometry has no special impact on student’s ‘level’.Lessons learned from presenting students with the Socratic Dialog uncovering the supposed prior understanding of how to create a square with twice the area of a given square.In spite of profession of ‘understanding’ after walking through the dialog, those with many misconceptions can’t demonstrate the solution on a new square.Geometry as grounded in Intuition but ultimately connected to proofCarmenga, Transforming Geometric Proof with Reflections, Rotations and TranslationsHenderson, Experiencing Geomety

Slide34

Example: ASA

(Informal)

Given: ∠A≅∠A’, AC≅A’C’, ∠C≅∠C’Prove: △ABC≅△A’B’C’Idea: translate A to A’rotate △ABC until AC coincides with A’C’reflect over A’C’ if necessary. Then the whole triangle coincides!

Slide35

Example: ASA

(Rigorous)

Given: ∠A≅∠A’, AC≅A’C’, ∠C≅∠C’Translate △ABC so that A coincides with A’.Rotate △ABC so that ray AC coincides with ray A’C’. Since AC≅A’C’, C coincides with C’.If B and B’ are on different sides of line AC, reflect △ABC over line AC.Since ∠A≅∠A’ and AC and A’C’ coincide and are on the same side of the angle, ∠A coincides with ∠A’.Since the angles coincide, the other rays AB and A’B’ coincide. Similarly, since ∠C≅∠C’ and AC and A’C’ coincide, ∠C coincides with ∠C’ and the other rays CB and C’B’ coincide. Since ray AB coincides with ray A’B’ and ray CB with ray C’B’and two lines intersect in at most one point, B coincides with B’. Since all sides and angles coincide, △ABC≅△A’B’C’.

Slide36

How Can We Begin to Make Progress on this Ambitious Project?

Create a simulated agent that must carry out tasks in a virtual world

Similar to the

Deepmind

ATARI project

Agent has a few actions it can perform

Change its point of view on its (2-D) world

Move, rotate and flip objects

Measure, copy, and construct objects to instruction using geometry tools

E.g., adjustable straightedge and compass

Demo some time during the conference!

Train the agent using incremental supervised learning

Initial tasks:

Find named objects, find objects that have the same shape as a given target

Translate, rotate, and flip objects to fit them through shaped wholes

Learn to measure length and angle

Learn to impose alternative frames of reference to identify congruent shapes under rotations and flips

Slide37

Two Relevant Ideas

Use eye movements to bring objects to the center of gaze, where they can be recognized in canonical position.Plaut, McClelland, & Seidenberg, NCPW 1Mnih et al, 2014Use transforming autoencoders to learn effects of transformations on the way objects look.Hinton et al, 2011

Slide38

Later Stages

V. Carry

out Euclidean constructions

to instruction

- Based on given diagrams

- Purely from instruction

VI

. Determine perimeter and area of polygons and circles in given units

VII

. Establish correspondence between figures A and B

VIII

.

Solve

complex geometry problems requiring several intermediate inferences and

computations

Slide39

One Example Problem

Slide40

Challenges, Open Questions, and Broader Directions

Explicit cognition, metacognitive knowledge, ant their relation to knowledge in connections

Abstract mathematics, proof, and justification

Are they, at least in part, extensions of concrete embodied reasoning

A broader understanding of understanding in embodied terms

We don’t just map mathematical expressions onto learned conceptual structures, we do the same in general when we understand ideas expressed in language as well

These are the questions for the next ten years