0K - views

The Evolution of AWK: Computational Thinking

in. Programming Language Design . Alfred V. Aho. aho@cs.columbia.edu. COMS W4995 Design - . Bjarne. . Stroustrup. . April 24, 2015. . Computational . Thinking. Computational thinking is a . fundamental skill for everyone.

Tags :
Embed :
Presentation Download Link

Download Presentation - The PPT/PDF document "The Evolution of AWK: Computational Thin..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.

The Evolution of AWK: Computational Thinking






Presentation on theme: "The Evolution of AWK: Computational Thinking"— Presentation transcript:

Slide1

The Evolution of AWK:Computational Thinking inProgramming Language Design

Alfred V. Ahoaho@cs.columbia.edu

COMS W4995 Design -

Bjarne Stroustrup April 24, 2015

Slide2

Computational Thinking

Computational thinking is a fundamental skill for everyone, not just for computer scientists. To reading, writing, and arithmetic, we should add computational thinking to every child’s analytical ability. Just as the printing press facilitated the spread of the three Rs, what is appropriately incestuous about this vision is that computing and computers facilitate the spread of computational thinking.Jeannette M. WingComputational ThinkingCACM, vol. 49, no. 3, pp. 33-35, 2006

Slide3

What is Computational Thinking?

The thought processes involved in formulating problems so their solutions can be represented as computation steps and algorithms.

Alfred V. Aho

Computation and Computational Thinking

The Computer Journal

, vol. 55, no. 7, pp. 832- 835, 2012

Slide4

Software in Our World Today

How much software does the world use today?Guesstimate: over one trillion lines of source codeWhat is the sunk cost of the legacy base?$100 per line of finished, tested source codeHow many bugs are there in the legacy base?10 to 10,000 defects per million lines of source code

A. V. Aho

Software and the Future of Programming Languages

Science, February 27, 2004, pp. 1131-1133

Slide5

Programming Languages Today

Today there are thousands of programming languages.

The website http://www.99-bottles-of-beer.net has programs in over 1,500 different programming languages and variations to generate the lyrics to the song “99 Bottles of Beer.”

Slide6

“99 Bottles of Beer”

99 bottles of beer on the wall, 99 bottles of beer.

Take one down and pass it around, 98 bottles of beer on the wall.

98 bottles of beer on the wall, 98 bottles of beer.

Take one down and pass it around, 97 bottles of beer on the wall.

.

.

.

2 bottles of beer on the wall, 2 bottles of beer.

Take one down and pass it around, 1 bottle of beer on the wall.

1 bottle of beer on the wall, 1 bottle of beer.

Take one down and pass it around, no more bottles of beer

on the wall.

No more bottles of beer on the wall, no more bottles of beer.

Go to the store and buy some more, 99 bottles of beer on the wall.

[Traditional]

Slide7

“99 Bottles of Beer” in C++

#include <

iostream

>

using namespace

std

;

int

main()

{

int

bottles = 99;

while ( bottles > 0 )

{

cout

<< bottles << " bottle(s) of beer on the wall," <<

endl

;

cout

<< bottles << " bottle(s) of beer." <<

endl

;

cout

<< "Take one down, pass it around," <<

endl

;

cout

<< --bottles << " bottle(s) of beer on the wall." <<

endl

;

}

return 0;

}

[Tim Robinson, http://www.99-bottles-of-beer.net/language-

c++

-109.html]

Slide8

Stroustrup’s Version

#

include <

iostream

>

using namespace

std

;

int

main()

{

    for (

int

bottles = 99;  bottles>0; --bottles)

       

cout

<< bottles << " bottle(s) of beer on the wall,\n"

            << bottles << " bottle(s) of beer.\n"

            << "Take one down, pass it around,\n"

            << " bottle(s) of beer on the wall.\n\n";

}

[

Bjarne

Stroustrup

, personal communication, April 17, 2015]

Slide9

“99 Bottles of Beer” in the Whitespace language

[Andrew Kemp, http://

compsoc.dur.ac.uk

/whitespace

/

]

Slide10

Why Are There So Many Languages?

One language cannot serve all application areas well

e.g., programming web pages (JavaScript)

e.g., electronic design automation (VHDL)

e.g., parser generation (YACC)

Programmers often have strongly held opinions about

what makes a good language

how programming should be done

There is no universally accepted metric for a good language!

Slide11

Evolution of Programming Languages

1970FortranLispCobolAlgol 60APLSnobol 4 Simula 67BasicPL/1Pascal

2015JavaCC++Objective-CC#JavaScriptPHPPythonVisual BasicVisual Basic .NETTIOBE Index April 2015

2015

Java

PHP

Python

C#

C++

C

JavaScript

Objective-C

MATLAB

R

PYPL Index

April 2015

Slide12

Evolutionary Forces on Languages

Increasing diversity of applicationsStress on increasing programmer productivity and shortening time to market Need to improve software security, reliability and maintainabilityEmphasis on mobility and distributionSupport for parallelism and concurrencyNew mechanisms for modularityTrend toward multi-paradigm programming

Slide13

Models of Computation in Languages

Early programming languages usually had only one model of computation: Fortran (1957): Procedural Lisp (1958): Functional Simula (1967): Object oriented Prolog (1972): Logic SQL (1974): Relational algebra

Slide14

Models of Computation in Languages

N

ew

programming languages

are often designed around several models

of

computation

And legacy languages are incorporating additional models of computation to support multiple programming paradigms

Slide15

Example: Elm

Elm is a functional programming language for declaratively creating web browser based graphical user interfaces.It uses functional reactive programming and purely functional graphical layout to build user interfaces without any destructive updates.Elm was designed in 2012 by Evan Czaplicki.The key features in Elm are signals, immutability, static types, and interoperability with HTML, CSS, and JavaScript.

elm-lang.org

Slide16

Example: Rust

Rust is a general-purpose, multi-paradigm, compiled programming language.It is designed to be a safe, concurrent, practical language.First pre-alpha release of the Rust compiler was in 2012.It supports pure-functional, concurrent-actor, imperative-procedural, and object-oriented programming styles.Rust was originally designed by Graydon Hoare and is supported by Mozilla Research.It advertises itself as “a systems programming language that runs blazingly fast, prevents almost all crashes, and eliminates data races.”

www.rust-lang.org

Slide17

Example: Swift

Swift is Apple’s new programming language for iOS and OS X whose code is designed to work with Objective-CIt was designed with code safety and performance in mind.Some of the features of Swift include named parameters inferred types modules automatic memory management closures with unified function pointers functional programming patterns like map and filter

https://developer.apple.com/library/ios/documentation/Swift/Conceptual/Swift_Programming_Language/

Slide18

AWK is a scripting language for routine data-processing tasks designed by Al Aho, Brian Kernighan, Peter Weinberger at Bell Labs around 1977Each of the co-designers had slightly different motivationsAho wanted a generalized grepKernighan wanted a programmable editorWeinberger wanted a database query toolAll co-designers wanted a simple, easy-to-use language

The Birth of AWK

Slide19

Kleene Regular Expressions

Regular ExpressionMatchescthe character c itself except when it is (, ), or *r1 | r2r1 or r2r1 r2r1 followed by r2r *zero or more instances of r( r )r

‘|’ has lowest precedence, then concatenation, then *

|’

and

concatentation

are left associative, * is right associative

For example, a

| b*c = (a) | ((b*) c

)

Slide20

Kleene Regular Expressions andFinite Automata are Equivalent

The set of strings denoted by any

Kleene

regular expression can be recognized by a deterministic finite automaton.

The set of strings recognized by any finite automaton can be denoted by a regular expression.

Slide21

Grep Regular Expressions

Regular ExpressionMatchescthe character c itself except when c is . [ ] ^ $ * \r1r2r1 followed by r2r *zero or more instances of r.any character^beginning of line when ^ is first character in regexp$end of line when $ is last character in regexp[abc]an a, b, or c[a-z]any lower-case letter[^abc]any character except an a, b, or c [^0-9]any character that is not a digit\cc unless c is ( ) or a digit\( r \)tagged regular expression that matches r the matched strings are available as \1, \2, etc.

Slide22

Back Referencing in Grep Regular Expressions

Grep

regular expressions can match non-regular languages:

^\([

ab

]*\)\1$ matches strings of the form

xx

where

x

is any string of a’s and b’s

Back referencing makes the string pattern matching problem NP-complete:

Theorem:

Let

r

be a

grep

regular expression with back referencing and

s

an input string. Determining whether

s

contains a substring matched by

r

is NP-complete.

Proof

. Reduction from vertex-cover. See A. V. Aho, “Algorithms for finding patterns in strings”,

Handbook of Theoretical Computer Science

, MIT Press 1990, pp. 255-300.

Slide23

Egrep Regular Expressions

Started with

grep

regular expressions except for back referencing

Added ‘|’ for union as in

Kleene

regular expressions

Added

parentheses for grouping as in

Kleene

regular expressions

Current

egrep

uses POSIX regular expressions.

Slide24

Egrep Regular Expression Pattern-Matching Algorithm

Constructs the transitions for a deterministic finite automaton on demand from the regular expression using an LR(0)-like algorithm

Uses a fixed size cache to store the transitions of the DFA

Adds a transition in a given state on a given input character only when it is needed

When the cache becomes full, it flushes it and adds transitions to the empty cache as needed

Observed time complexity given a regular expression r and an input string s is O(|r| + |s|). It is an open question if this can be achieved in the worst case.

Slide25

Prototypical use casesselection: “print all lines containing the word AWK in the first field” $1 ~ /AWK/transformation: “print the second and first field of every line” { print $2, $1 }report generation: “sum the values in the first field of every line and then print the sum and average” { sum += $1 }END { print "sum = " sum, "avg = " sum/NR }

The Evolution of AWK

Slide26

An AWK program is a sequence of pattern-action statementspattern { action }pattern { action }. . .Each pattern is a boolean combination of regular, numeric, and string expressionsAn action is a C-like programIf there is no { action }, the default is to print the lineInvocationawk ‘program’ [file1 file2 . . . ] awk –f progfile [file1 file2 . . . ]

Structure and Invocation of an AWK Program

Slide27

for each file for each line of the current file for each pattern in the AWK program if the pattern matches the input line then execute the associated action

AWK’s Model of Computation:Pattern-Action Programming

Slide28

Input is read automatically across multiple fileslines are split into fields $1, $2, . . ., $NFwhole line is $0Variables are dynamic and can contain string or numeric values or bothno declarations: types determined by context and useinitialized to 0 and empty stringbuilt-in variables for frequently used valuesOperators work on strings or numberscoerce type/value according to contextwhat does $1 == $2 mean?Associative arrays take arbitrary subscriptsRegular expressions as in egrep

AWK in a Nutshell - I

Slide29

Control-flow operators similar to Cif-else, while, for, doBuilt-in functions for arithmetic, string processing, regular expressions, editing text, . . .Supports user-defined functionsprintf for formatted outputgetline for input from files or processors

AWK in a Nutshell - II

Slide30

Print the total number of input linesEND { print NR }Print the last field of every line{ print $NF }Print each line preceded by its line number{ print NR, $0 }Print all non-empty linesNF > 0What does this AWK program do?!x[$0]++

Some Useful AWK “One-liners”

Slide31

AWK is a scripting language designed for routine data-processing tasks on strings and numbersE.g.: given a list of name-value pairs, print the total value associated with each name.

AWK Summary

eve 20 bob 15 alice 40

alice 10

eve 20

bob 15

alice 30

{ total[$1] += $2 }

END { for (x in total) print x, total[x] }

An AWK program

is a sequence of

pattern-action statements

Slide32

Comparison: Regular Expression Pattern Matchingin Perl, Python, Ruby vs. AWK

Time to check whether a?nan matches an

regular expression and text size

n

Russ Cox, Regular expression matching can be simple and fast (but is slow in Java, Perl, PHP, Python, Ruby, ...) [http://

swtch.com

/~

rsc

/

regexp

/regexp1.html, 2007]

Slide33

“99 Bottles of Beer” in AWK (bottled version)

BEGIN{ split( \ "no mo"\ "rexxN"\ "o mor"\ "exsxx"\ "Take "\ "one dow"\ "n and pas"\ "s it around"\ ", xGo to the "\ "store and buy s"\ "ome more, x bot"\ "tlex of beerx o"\ "n the wall" , s,\ "x"); for( i=99 ;\ i>=0; i--){ s[0]=\ s[2] = i ; print \ s[2 + !(i) ] s[8]\ s[4+ !(i-1)] s[9]\ s[10]", " s[!(i)]\ s[8] s[4+ !(i-1)]\ s[9]".";i?s[0]--:\ s[0] = 99; print \ s[6+!i]s[!(s[0])]\ s[8] s[4 +!(i-2)]\ s[9]s[10] ".\n";}}

[Wilhem Weske, http://www.99-bottles-of-beer.net/language-awk-1910.html