/
Parsing APL for Static Analysis Parsing APL for Static Analysis

Parsing APL for Static Analysis - PowerPoint Presentation

lindy-dunigan
lindy-dunigan . @lindy-dunigan
Follow
409 views
Uploaded On 2015-12-09

Parsing APL for Static Analysis - PPT Presentation

Speaker Anders SchackNielsen PhD Sept 23 rd 2014 Background and Motivation Variable Types Static Analysis Tool Parsing APL Kind Inference BNF Grammar Outline 2 APL codebase in SimCorp ID: 219887

foo apl vector kind apl foo kind vector parsing left args mat1 assumptions functions kinds vtstring token static analysis

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Parsing APL for Static Analysis" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Parsing APL for Static Analysis

Speaker: Anders Schack-Nielsen, Ph.D.

Sept. 23

rd

2014Slide2

Background and MotivationVariable Types

Static Analysis ToolParsing APLKind InferenceBNF Grammar

Outline

2Slide3

APL codebase in SimCorp:

68000 functions1.7m lines of code215 APL developers actively developing and maintaining this codebase

Additional functions and developers covering utilities, etc.

BackgroundSlide4

Motivation – example

Programmer A writes function foo.

A makes certain assumptions about the input arguments.

A documents his assumptions.

Programmer K writes function

bar

and calls

foo

.

K has read the header of foo so he knows what sort of arguments to supply.For good measure he also tests it.

∇ foo args ⍝: args should be ... mat1 mat2 strings←args ... (implicit assumptions)∇

∇ bar

...

foo

mat1 mat2

strings

...

∇ Slide5

Translating:A’s assumptions

 documentation of foo

 K’s understanding

A lot can be missed, misinterpreted, or left out

Test might not catch this

Maintenance:

Updates to

foo

Updates to

bar

Assumptions change – requires a synchronous update in three places to be correct.Motivation – what can go wrong?Slide6

Solution: Variable Types

Formalize assumptions – make them checkable.

Introduce

variable types

and static analysis

.

Check header specification.

Check

foo

against its header.Check the call to foo from bar.

∇ foo args ⍝: args[1] : mat1 As vtINT[;] ⍝: [2] : mat2 As vtINT[mat1:1;mat1:2] ⍝: [3] : strings As vtCHAR[][] mat1 mat2 strings←args

...

∇ Slide7

First type checker was introduced in

SimCorp 10 years ago.Worked well, but had many flaws.Recently, the tool has been rewritten from scratch.

Many interesting challenges, e.g. parsing APL.

8k lines of F# including 500 lines of

FsLex

/

FsYacc

.

Understands the semantics of all APL symbols and control-flow constructs.

New type checker catches many things the old did not, e.g. potentially all rank errors.Static Analysis Tool

7Slide8

Real life example

8

r←y

textStringRemove

x;h

⍝2: y As

vtSTRING|vtSTRING

[] : (string1)(string2).....⍝3: x As vtSTRING|vtCHAR[;] : text vector or matrix⍝4: r As vtSTRING|vtCHAR[;] : resulting text vector or m

...

...

dbsource

←' '

textStringRemove

dbsource

tokens←'('

textSplitAt

')'

textStringRemove

dbsource...

vtSTRING

is a short-hand for

vtCHAR

[]Slide9

APL is statically un-

parsable!However, it becomes parsable with only a few very minor restrictions.

In fact, we can make an LALR(1) parser:

Parsing APL

9

It is possible to define a completely disambiguated BNF grammar, allowing us to code-generate the parser using

Yacc

. I.e. we can parse APL from left to right with only a single token

lookahead

and no backtracking.Slide10

Parsing APL

10

x/¨y

MonadicApply

ArrayVariable

(y)

OperatorApply

Each

FunctionVariable

(x)

OperatorApply

Reduce

DyadicApply

ArrayVariable

(

y)

ArrayVariable

(x)

OperatorApply

Each

ReplicateSlide11

Values come in 3 kinds

: Arrays, Functions, and Operators.Sequences of Arrays form vectors.Functions associate to the right.

Operators associate to the left.

Parsing needs complete kind information.

Solution: Separate parsing in two steps with a kind inference algorithm sandwiched in-between:

Parse control-flow and matching parentheses, effectively representing expressions as mere token trees.

Do kind inference on the token trees.

Parse the token trees as full-fledged expressions.

Parsing APL

11Slide12

Kind inference naturally proceeds from left to right:Consider e.g.: “

x.y”, “x/y

”, “

x[y]

Left-to-right, depth-first scan:

Individual tokens can be inferred based on the kinds of the tokens to the left of it.

Parenthesized expressions can have their compound kind inferred based on the kinds of their subparts.

Tag all tokens with their kind and all left-parentheses with the compound kind they enclose.

Kind Inference

12Slide13

Kind sequence rewrite algorithm:

Uses an elaboration into 5 kinds: Array (A), Function (F), Namespace indexer (.), Monadic operator (M), and Dyadic operator (D).

Inferring compound kinds

13

K

 K (done)

A

A

Ks  A Ks

A . Ks  Ks

A F Ks  A (done)K D D Ks  A (done) // outer productF F Ks  A (done)F A Ks  A (done)[AF] M Ks  F KsK D A A Ks  K D A KsK D A . Ks  K D Ks[AF] D [AF] Ks  F Ks*Assumes a minor preprocessing step that wraps “A . F” with parentheses. Also slightly simplified assuming no “A . D” or “A . M”.Slide14

Expr

:

| Vector

Func

Expr

{

DyadicApply

(vector $1, $2, $3) }

|

FuncLeftmost Expr { MonadicApply($1, $2) }

| Vector { vector $1 }

Vector:

|

SimpleExprLeftmost

{ [$1] }

|

SimpleExprLeftmost

SimpleVector

{ $1 :: $2 }

SimpleExprLeftmost

:

|

AtomicExpr

{ $1 }

| Vector LBRACKET

IdxList

RBRACKET { Index(vector $1, $3) }

|

NameSpaceExprLeftmost

AtomicExpr

{

NameSpace

($1, $2) }

AtomicExpr

:

| LPAREN

Expr

RPAREN { $2 }

| IDARRAY {

IdenArray

($1) }

| INT { Value(parseInt($1)) }

| FLOAT { Value(Float(parseDouble

($1))) } | STRING { Value(parseStringValue

($1)) } | APLVALUE { Value(

AplNil(parseNiladic

($1))) }

Func:

| Func

MonadicOperator { MonadicOpApply

($1, $2) } |

Func DyadicOperatorFuncFunc

SimpleFunc

{

DyadicOpApply

($2, FF($1, $3)) }

| JOT

DyadicOperatorFuncFunc

SimpleFunc

{

DyadicOpApply

($2, FF(

AplFunction

(

OuterProduct), $3)) } | SimpleFunc { $1 }

BNF Grammar (sample excerpt)

14Slide15

What were those restrictions to allow parsing?Defined operators need a static description of whether their operands are functions or arrays. This is not a problem in practice.

We need an environment describing all global variables and functions. We need this anyway to typecheck

function calls.

(Minor quirk related to the :Until-:

AndIf

construction.)

Restrictions – the fine print

15Slide16

16