Outline Lexical Analysis ID: 114937
Download Pdf The PPT/PDF document "CS780(Prasad) L3Lexing1Adapted from mate..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
CS780(Prasad) L3Lexing1Adapted from material by:Prof. Alex Aiken and Prof. George Necula (UCB)Prof. Saman Amarasinghe (MIT) Outline Lexical AnalysisWhat do we want to do? Example:if (i == j)else CS780(Prasad) L3Lexing3The input is just a string of characters:\tif (i == j)\n\t\tz = 0;\n\telse\n\t\tz = 1;: Partition input string into substrings ) to identify tokens Examples of TokensOperators !^ !Keywordsif while for int doubleNumeric literals 2 Token TypeLexemeToken type: A syntactic category/groupingIn English: noun, verb, adjective, In a programming language: identifier, integer, keyword, ;, [, L Ctifttiftkithtt CS780(Prasad) L3Lexing5 L exeme e man if es on o f a t o k ex In a case-insensitive language, the lexemes associated with the token are: if, IF,iF,Attribute: Value of interest about a token.Numerical valueof an integertoken.Name (string)associated with an identifiertoken. Lexical Analyzer Designing a Lexical Analyzer 1.Define a finite set of tokens.2.Describe whicImplementing a Lexical Analyzer nize tokens from the corres p ondin lexemes. CS780(Prasad) L3Lexing6 gpg Return the value (attribute) and the type of the token.Num(ID( X6035Eliminate that do not contribute to parsing. Example: Language Design DecisionsFORTRAN rule: Whitespace is insignificant.VAR1is the same as VA R1ConsiderDO 5 I = 1,25DO 5 I = 1.25The first is DO 5 I = 1 , 25The second is DO5I = 1.25 CS780(Prasad) L3Lexing7Lookahead may be required to decide where one token ends and the next token begins.Even our simple examples have lookahead issues.vs. vs. Earlier-example PL/I keywords are not reserved:IF ELSE THEN THEN = ELSE; ELSE ELSE = THENAda and VHDL require 2-lookahead because of ) problem. IAdfdfill CS780(Prasad) L3Lexing8 I a, array re f erence syntax an d f on ca ll syntax are similar.fn(1,2)In C++, these are different.rent.vs.fn(1,2) 4 Definition: Formal LanguagesAlphabet = finite set of symbols = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 }String s = finite sequence of symbols from 6004 CS780(Prasad) L3Lexing13 s = 6004 Empty string = special string of length zeroLanguage L = set of strings over an alphabetL = { 6001, 6002, 6003, 6004, 6035 6891 } Integer Power of a Language nXXXXXXX CS780(Prasad) L3Lexing14 X X XXXXXXXXXiiiiii that Note ...... ......11100 Kleene Star Kleene Plus Basis: are regular expressions over .Inductive Step: Let be regular expressions over . Then so are:Regular Expressions over a: and , , CS780(Prasad) L3Lexing15 ClosureNothing else is a regular expression, unless obtained using the above steps. . and , ,|*rrssr Syntax Syntax vs SemanticsSemantics Regular Expressions Regular sets/language CS780(Prasad) L3Lexing16 ,}{ }0{) (0 | 000