in Programming Language Design Alfred V Aho ahocscolumbiaedu COMS W4995 Design Bjarne Stroustrup April 24 2015 Computational Thinking Computational thinking is a fundamental skill for everyone ID: 760136
Download Presentation The PPT/PDF document "The Evolution of AWK: Computational Thin..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
The Evolution of AWK:Computational Thinking inProgramming Language Design
Alfred V. Ahoaho@cs.columbia.edu
COMS W4995 Design -
Bjarne Stroustrup April 24, 2015
Slide2Computational Thinking
Computational thinking is a fundamental skill for everyone, not just for computer scientists. To reading, writing, and arithmetic, we should add computational thinking to every child’s analytical ability. Just as the printing press facilitated the spread of the three Rs, what is appropriately incestuous about this vision is that computing and computers facilitate the spread of computational thinking.Jeannette M. WingComputational ThinkingCACM, vol. 49, no. 3, pp. 33-35, 2006
Slide3What is Computational Thinking?
The thought processes involved in formulating problems so their solutions can be represented as computation steps and algorithms.
Alfred V. Aho
Computation and Computational Thinking
The Computer Journal
, vol. 55, no. 7, pp. 832- 835, 2012
Slide4Software in Our World Today
How much software does the world use today?Guesstimate: over one trillion lines of source codeWhat is the sunk cost of the legacy base?$100 per line of finished, tested source codeHow many bugs are there in the legacy base?10 to 10,000 defects per million lines of source code
A. V. Aho
Software and the Future of Programming Languages
Science, February 27, 2004, pp. 1131-1133
Slide5Programming Languages Today
Today there are thousands of programming languages.
The website http://www.99-bottles-of-beer.net has programs in over 1,500 different programming languages and variations to generate the lyrics to the song “99 Bottles of Beer.”
Slide6“99 Bottles of Beer”
99 bottles of beer on the wall, 99 bottles of beer.
Take one down and pass it around, 98 bottles of beer on the wall.
98 bottles of beer on the wall, 98 bottles of beer.
Take one down and pass it around, 97 bottles of beer on the wall.
.
.
.
2 bottles of beer on the wall, 2 bottles of beer.
Take one down and pass it around, 1 bottle of beer on the wall.
1 bottle of beer on the wall, 1 bottle of beer.
Take one down and pass it around, no more bottles of beer
on the wall.
No more bottles of beer on the wall, no more bottles of beer.
Go to the store and buy some more, 99 bottles of beer on the wall.
[Traditional]
Slide7“99 Bottles of Beer” in C++
#include <
iostream
>
using namespace
std
;
int
main()
{
int
bottles = 99;
while ( bottles > 0 )
{
cout
<< bottles << " bottle(s) of beer on the wall," <<
endl
;
cout
<< bottles << " bottle(s) of beer." <<
endl
;
cout
<< "Take one down, pass it around," <<
endl
;
cout
<< --bottles << " bottle(s) of beer on the wall." <<
endl
;
}
return 0;
}
[Tim Robinson, http://www.99-bottles-of-beer.net/language-
c++
-109.html]
Slide8Stroustrup’s Version
#
include <
iostream
>
using namespace
std
;
int
main()
{
for (
int
bottles = 99; bottles>0; --bottles)
cout
<< bottles << " bottle(s) of beer on the wall,\n"
<< bottles << " bottle(s) of beer.\n"
<< "Take one down, pass it around,\n"
<< " bottle(s) of beer on the wall.\n\n";
}
[
Bjarne
Stroustrup
, personal communication, April 17, 2015]
Slide9“99 Bottles of Beer” in the Whitespace language
[Andrew Kemp, http://
compsoc.dur.ac.uk
/whitespace
/
]
Why Are There So Many Languages?
One language cannot serve all application areas well
e.g., programming web pages (JavaScript)
e.g., electronic design automation (VHDL)
e.g., parser generation (YACC)
Programmers often have strongly held opinions about
what makes a good language
how programming should be done
There is no universally accepted metric for a good language!
Slide11Evolution of Programming Languages
1970FortranLispCobolAlgol 60APLSnobol 4 Simula 67BasicPL/1Pascal
2015JavaCC++Objective-CC#JavaScriptPHPPythonVisual BasicVisual Basic .NETTIOBE Index April 2015
2015
Java
PHP
Python
C#
C++
C
JavaScript
Objective-C
MATLAB
R
PYPL Index
April 2015
Slide12Evolutionary Forces on Languages
Increasing diversity of applicationsStress on increasing programmer productivity and shortening time to market Need to improve software security, reliability and maintainabilityEmphasis on mobility and distributionSupport for parallelism and concurrencyNew mechanisms for modularityTrend toward multi-paradigm programming
Slide13Models of Computation in Languages
Early programming languages usually had only one model of computation: Fortran (1957): Procedural Lisp (1958): Functional Simula (1967): Object oriented Prolog (1972): Logic SQL (1974): Relational algebra
Slide14Models of Computation in Languages
N
ew
programming languages
are often designed around several models
of
computation
And legacy languages are incorporating additional models of computation to support multiple programming paradigms
Example: Elm
Elm is a functional programming language for declaratively creating web browser based graphical user interfaces.It uses functional reactive programming and purely functional graphical layout to build user interfaces without any destructive updates.Elm was designed in 2012 by Evan Czaplicki.The key features in Elm are signals, immutability, static types, and interoperability with HTML, CSS, and JavaScript.
elm-lang.org
Slide16Example: Rust
Rust is a general-purpose, multi-paradigm, compiled programming language.It is designed to be a safe, concurrent, practical language.First pre-alpha release of the Rust compiler was in 2012.It supports pure-functional, concurrent-actor, imperative-procedural, and object-oriented programming styles.Rust was originally designed by Graydon Hoare and is supported by Mozilla Research.It advertises itself as “a systems programming language that runs blazingly fast, prevents almost all crashes, and eliminates data races.”
www.rust-lang.org
Slide17Example: Swift
Swift is Apple’s new programming language for iOS and OS X whose code is designed to work with Objective-CIt was designed with code safety and performance in mind.Some of the features of Swift include named parameters inferred types modules automatic memory management closures with unified function pointers functional programming patterns like map and filter
https://developer.apple.com/library/ios/documentation/Swift/Conceptual/Swift_Programming_Language/
Slide18AWK is a scripting language for routine data-processing tasks designed by Al Aho, Brian Kernighan, Peter Weinberger at Bell Labs around 1977Each of the co-designers had slightly different motivationsAho wanted a generalized grepKernighan wanted a programmable editorWeinberger wanted a database query toolAll co-designers wanted a simple, easy-to-use language
The Birth of AWK
Slide19Kleene Regular Expressions
Regular ExpressionMatchescthe character c itself except when it is (, ), or *r1 | r2r1 or r2r1 r2r1 followed by r2r *zero or more instances of r( r )r
‘|’ has lowest precedence, then concatenation, then *
‘
|’
and
concatentation
are left associative, * is right associative
For example, a
| b*c = (a) | ((b*) c
)
Slide20Kleene Regular Expressions andFinite Automata are Equivalent
The set of strings denoted by any
Kleene
regular expression can be recognized by a deterministic finite automaton.
The set of strings recognized by any finite automaton can be denoted by a regular expression.
Slide21Grep Regular Expressions
Regular ExpressionMatchescthe character c itself except when c is . [ ] ^ $ * \r1r2r1 followed by r2r *zero or more instances of r.any character^beginning of line when ^ is first character in regexp$end of line when $ is last character in regexp[abc]an a, b, or c[a-z]any lower-case letter[^abc]any character except an a, b, or c [^0-9]any character that is not a digit\cc unless c is ( ) or a digit\( r \)tagged regular expression that matches r the matched strings are available as \1, \2, etc.
Slide22Back Referencing in Grep Regular Expressions
Grep
regular expressions can match non-regular languages:
^\([
ab
]*\)\1$ matches strings of the form
xx
where
x
is any string of a’s and b’s
Back referencing makes the string pattern matching problem NP-complete:
Theorem:
Let
r
be a
grep
regular expression with back referencing and
s
an input string. Determining whether
s
contains a substring matched by
r
is NP-complete.
Proof
. Reduction from vertex-cover. See A. V. Aho, “Algorithms for finding patterns in strings”,
Handbook of Theoretical Computer Science
, MIT Press 1990, pp. 255-300.
Slide23Egrep Regular Expressions
Started with
grep
regular expressions except for back referencing
Added ‘|’ for union as in
Kleene
regular expressions
Added
parentheses for grouping as in
Kleene
regular expressions
Current
egrep
uses POSIX regular expressions.
Slide24Egrep Regular Expression Pattern-Matching Algorithm
Constructs the transitions for a deterministic finite automaton on demand from the regular expression using an LR(0)-like algorithm
Uses a fixed size cache to store the transitions of the DFA
Adds a transition in a given state on a given input character only when it is needed
When the cache becomes full, it flushes it and adds transitions to the empty cache as needed
Observed time complexity given a regular expression r and an input string s is O(|r| + |s|). It is an open question if this can be achieved in the worst case.
Slide25Prototypical use casesselection: “print all lines containing the word AWK in the first field” $1 ~ /AWK/transformation: “print the second and first field of every line” { print $2, $1 }report generation: “sum the values in the first field of every line and then print the sum and average” { sum += $1 }END { print "sum = " sum, "avg = " sum/NR }
The Evolution of AWK
Slide26An AWK program is a sequence of pattern-action statementspattern { action }pattern { action }. . .Each pattern is a boolean combination of regular, numeric, and string expressionsAn action is a C-like programIf there is no { action }, the default is to print the lineInvocationawk ‘program’ [file1 file2 . . . ] awk –f progfile [file1 file2 . . . ]
Structure and Invocation of an AWK Program
Slide27for each file for each line of the current file for each pattern in the AWK program if the pattern matches the input line then execute the associated action
AWK’s Model of Computation:Pattern-Action Programming
Slide28Input is read automatically across multiple fileslines are split into fields $1, $2, . . ., $NFwhole line is $0Variables are dynamic and can contain string or numeric values or bothno declarations: types determined by context and useinitialized to 0 and empty stringbuilt-in variables for frequently used valuesOperators work on strings or numberscoerce type/value according to contextwhat does $1 == $2 mean?Associative arrays take arbitrary subscriptsRegular expressions as in egrep
AWK in a Nutshell - I
Slide29Control-flow operators similar to Cif-else, while, for, doBuilt-in functions for arithmetic, string processing, regular expressions, editing text, . . .Supports user-defined functionsprintf for formatted outputgetline for input from files or processors
AWK in a Nutshell - II
Slide30Print the total number of input linesEND { print NR }Print the last field of every line{ print $NF }Print each line preceded by its line number{ print NR, $0 }Print all non-empty linesNF > 0What does this AWK program do?!x[$0]++
Some Useful AWK “One-liners”
Slide31AWK is a scripting language designed for routine data-processing tasks on strings and numbersE.g.: given a list of name-value pairs, print the total value associated with each name.
AWK Summary
eve 20 bob 15 alice 40
alice 10
eve 20
bob 15
alice 30
{ total[$1] += $2 }
END { for (x in total) print x, total[x] }
An AWK program
is a sequence of
pattern-action statements
Slide32Comparison: Regular Expression Pattern Matchingin Perl, Python, Ruby vs. AWK
Time to check whether a?nan matches an
regular expression and text size
n
Russ Cox, Regular expression matching can be simple and fast (but is slow in Java, Perl, PHP, Python, Ruby, ...) [http://
swtch.com
/~
rsc
/
regexp
/regexp1.html, 2007]
Slide33“99 Bottles of Beer” in AWK (bottled version)
BEGIN{ split( \ "no mo"\ "rexxN"\ "o mor"\ "exsxx"\ "Take "\ "one dow"\ "n and pas"\ "s it around"\ ", xGo to the "\ "store and buy s"\ "ome more, x bot"\ "tlex of beerx o"\ "n the wall" , s,\ "x"); for( i=99 ;\ i>=0; i--){ s[0]=\ s[2] = i ; print \ s[2 + !(i) ] s[8]\ s[4+ !(i-1)] s[9]\ s[10]", " s[!(i)]\ s[8] s[4+ !(i-1)]\ s[9]".";i?s[0]--:\ s[0] = 99; print \ s[6+!i]s[!(s[0])]\ s[8] s[4 +!(i-2)]\ s[9]s[10] ".\n";}}
[Wilhem Weske, http://www.99-bottles-of-beer.net/language-awk-1910.html