Speaker Anders SchackNielsen PhD Sept 23 rd 2014 Background and Motivation Variable Types Static Analysis Tool Parsing APL Kind Inference BNF Grammar Outline 2 APL codebase in SimCorp ID: 219887
Download Presentation The PPT/PDF document "Parsing APL for Static Analysis" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Parsing APL for Static Analysis
Speaker: Anders Schack-Nielsen, Ph.D.
Sept. 23
rd
2014Slide2
Background and MotivationVariable Types
Static Analysis ToolParsing APLKind InferenceBNF Grammar
Outline
2Slide3
APL codebase in SimCorp:
68000 functions1.7m lines of code215 APL developers actively developing and maintaining this codebase
Additional functions and developers covering utilities, etc.
BackgroundSlide4
Motivation – example
Programmer A writes function foo.
A makes certain assumptions about the input arguments.
A documents his assumptions.
Programmer K writes function
bar
and calls
foo
.
K has read the header of foo so he knows what sort of arguments to supply.For good measure he also tests it.
∇ foo args ⍝: args should be ... mat1 mat2 strings←args ... (implicit assumptions)∇
∇ bar
...
foo
mat1 mat2
strings
...
∇ Slide5
Translating:A’s assumptions
documentation of foo
K’s understanding
A lot can be missed, misinterpreted, or left out
Test might not catch this
Maintenance:
Updates to
foo
Updates to
bar
Assumptions change – requires a synchronous update in three places to be correct.Motivation – what can go wrong?Slide6
Solution: Variable Types
Formalize assumptions – make them checkable.
Introduce
variable types
and static analysis
.
Check header specification.
Check
foo
against its header.Check the call to foo from bar.
∇ foo args ⍝: args[1] : mat1 As vtINT[;] ⍝: [2] : mat2 As vtINT[mat1:1;mat1:2] ⍝: [3] : strings As vtCHAR[][] mat1 mat2 strings←args
...
∇ Slide7
First type checker was introduced in
SimCorp 10 years ago.Worked well, but had many flaws.Recently, the tool has been rewritten from scratch.
Many interesting challenges, e.g. parsing APL.
8k lines of F# including 500 lines of
FsLex
/
FsYacc
.
Understands the semantics of all APL symbols and control-flow constructs.
New type checker catches many things the old did not, e.g. potentially all rank errors.Static Analysis Tool
7Slide8
Real life example
8
∇
r←y
textStringRemove
x;h
⍝2: y As
vtSTRING|vtSTRING
[] : (string1)(string2).....⍝3: x As vtSTRING|vtCHAR[;] : text vector or matrix⍝4: r As vtSTRING|vtCHAR[;] : resulting text vector or m
...
∇
...
dbsource
←' '
textStringRemove
dbsource
tokens←'('
textSplitAt
')'
textStringRemove
dbsource...
vtSTRING
is a short-hand for
vtCHAR
[]Slide9
APL is statically un-
parsable!However, it becomes parsable with only a few very minor restrictions.
In fact, we can make an LALR(1) parser:
Parsing APL
9
It is possible to define a completely disambiguated BNF grammar, allowing us to code-generate the parser using
Yacc
. I.e. we can parse APL from left to right with only a single token
lookahead
and no backtracking.Slide10
Parsing APL
10
x/¨y
MonadicApply
ArrayVariable
(y)
OperatorApply
Each
FunctionVariable
(x)
OperatorApply
Reduce
DyadicApply
ArrayVariable
(
y)
ArrayVariable
(x)
OperatorApply
Each
ReplicateSlide11
Values come in 3 kinds
: Arrays, Functions, and Operators.Sequences of Arrays form vectors.Functions associate to the right.
Operators associate to the left.
Parsing needs complete kind information.
Solution: Separate parsing in two steps with a kind inference algorithm sandwiched in-between:
Parse control-flow and matching parentheses, effectively representing expressions as mere token trees.
Do kind inference on the token trees.
Parse the token trees as full-fledged expressions.
Parsing APL
11Slide12
Kind inference naturally proceeds from left to right:Consider e.g.: “
x.y”, “x/y
”, “
x[y]
”
Left-to-right, depth-first scan:
Individual tokens can be inferred based on the kinds of the tokens to the left of it.
Parenthesized expressions can have their compound kind inferred based on the kinds of their subparts.
Tag all tokens with their kind and all left-parentheses with the compound kind they enclose.
Kind Inference
12Slide13
Kind sequence rewrite algorithm:
Uses an elaboration into 5 kinds: Array (A), Function (F), Namespace indexer (.), Monadic operator (M), and Dyadic operator (D).
Inferring compound kinds
13
K
K (done)
A
A
Ks A Ks
A . Ks Ks
A F Ks A (done)K D D Ks A (done) // outer productF F Ks A (done)F A Ks A (done)[AF] M Ks F KsK D A A Ks K D A KsK D A . Ks K D Ks[AF] D [AF] Ks F Ks*Assumes a minor preprocessing step that wraps “A . F” with parentheses. Also slightly simplified assuming no “A . D” or “A . M”.Slide14
Expr
:
| Vector
Func
Expr
{
DyadicApply
(vector $1, $2, $3) }
|
FuncLeftmost Expr { MonadicApply($1, $2) }
| Vector { vector $1 }
Vector:
|
SimpleExprLeftmost
{ [$1] }
|
SimpleExprLeftmost
SimpleVector
{ $1 :: $2 }
SimpleExprLeftmost
:
|
AtomicExpr
{ $1 }
| Vector LBRACKET
IdxList
RBRACKET { Index(vector $1, $3) }
|
NameSpaceExprLeftmost
AtomicExpr
{
NameSpace
($1, $2) }
AtomicExpr
:
| LPAREN
Expr
RPAREN { $2 }
| IDARRAY {
IdenArray
($1) }
| INT { Value(parseInt($1)) }
| FLOAT { Value(Float(parseDouble
($1))) } | STRING { Value(parseStringValue
($1)) } | APLVALUE { Value(
AplNil(parseNiladic
($1))) }
Func:
| Func
MonadicOperator { MonadicOpApply
($1, $2) } |
Func DyadicOperatorFuncFunc
SimpleFunc
{
DyadicOpApply
($2, FF($1, $3)) }
| JOT
DyadicOperatorFuncFunc
SimpleFunc
{
DyadicOpApply
($2, FF(
AplFunction
(
OuterProduct), $3)) } | SimpleFunc { $1 }
BNF Grammar (sample excerpt)
14Slide15
What were those restrictions to allow parsing?Defined operators need a static description of whether their operands are functions or arrays. This is not a problem in practice.
We need an environment describing all global variables and functions. We need this anyway to typecheck
function calls.
(Minor quirk related to the :Until-:
AndIf
construction.)
Restrictions – the fine print
15Slide16
16