/
CS780(Prasad) L3Lexing1Adapted from material by:Prof. Alex Aiken and P CS780(Prasad) L3Lexing1Adapted from material by:Prof. Alex Aiken and P

CS780(Prasad) L3Lexing1Adapted from material by:Prof. Alex Aiken and P - PDF document

luanne-stotts
luanne-stotts . @luanne-stotts
Follow
375 views
Uploaded On 2015-08-25

CS780(Prasad) L3Lexing1Adapted from material by:Prof. Alex Aiken and P - PPT Presentation

Outline Lexical Analysis ID: 114937

Outline Lexical Analysis

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "CS780(Prasad) L3Lexing1Adapted from mate..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

CS780(Prasad) L3Lexing1Adapted from material by:Prof. Alex Aiken and Prof. George Necula (UCB)Prof. Saman Amarasinghe (MIT) Outline Lexical Analysis•What do we want to do? Example:if (i == j)else CS780(Prasad) L3Lexing3•The input is just a string of characters:\tif (i == j)\n\t\tz = 0;\n\telse\n\t\tz = 1;: Partition input string into substrings ) to identify tokens •Examples of TokensOperators ! ^  !Keywordsif while for int doubleNumeric literals 2 Token TypeLexeme•Token type: A syntactic category/groupingIn English: noun, verb, adjective, …In a programming language: identifier, integer, keyword, ‘;’, ‘[’, … L Ctifttiftkithtt CS780(Prasad) L3Lexing5 L exeme e man if es on o f a t o k ex In a case-insensitive language, the lexemes associated with the token are: if, IF,iF,•Attribute: “Value of interest” about a token.Numerical valueof an integertoken.Name (string)associated with an identifiertoken. Lexical Analyzer •Designing a Lexical Analyzer 1.Define a finite set of tokens.2.Describe whic•Implementing a Lexical Analyzer nize tokens from the corres p ondin lexemes. CS780(Prasad) L3Lexing6 gpg Return the value (attribute) and the type of the token.Num(ID( “X6035”Eliminate that do not contribute to parsing. Example: Language Design Decisions•FORTRAN rule: Whitespace is insignificant.VAR1is the same as VA R1ConsiderDO 5 I = 1,25DO 5 I = 1.25The first is DO 5 I = 1 , 25The second is DO5I = 1.25 CS780(Prasad) L3Lexing7“Lookahead” may be required to decide where one token ends and the next token begins.Even our simple examples have lookahead issues.vs. vs. Earlier-example •PL/I keywords are not reserved:IF ELSE THEN THEN = ELSE; ELSE ELSE = THEN•Ada and VHDL require 2-lookahead because of ’) problem. IAdfdfill CS780(Prasad) L3Lexing8 I a, array re f erence syntax an d f on ca ll syntax are similar.fn(1,2)•In C++, these are different.rent.vs.fn(1,2) 4 Definition: Formal Languages•Alphabet = finite set of symbols = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 }•String s = finite sequence of symbols from 6004 CS780(Prasad) L3Lexing13 s = 6004 •Empty string = special string of length zero•Language L = set of strings over an alphabetL = { 6001, 6002, 6003, 6004, 6035 6891 … } Integer Power of a Language nXXXXXXX CS780(Prasad) L3Lexing14 X X XXXXXXXXXiiiiii that Note ...... ......11100 Kleene Star Kleene Plus •Basis: are regular expressions over .•Inductive Step: Let be regular expressions over . Then so are:Regular Expressions over a: and , , CS780(Prasad) L3Lexing15 •ClosureNothing else is a regular expression, unless obtained using the above steps. . and , ,|*rrssr Syntax Syntax vs SemanticsSemantics •Regular Expressions •Regular sets/language CS780(Prasad) L3Lexing16 ,}{ }0{) (0 | 000