We look at two of the tasks involved in the compilation process break the source code into lexemes units lexical analysis the goal is to identify each lexeme and assign it to its proper token ID: 711543
Download Presentation The PPT/PDF document "Lexical and Syntactic Analysis" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Lexical and Syntactic Analysis
We
look at two of the tasks involved in the compilation process
break the source code into lexemes units
(lexical analysis)
the goal is to identify each lexeme and assign it to its proper
token
parse the
lexical components into their syntactic uses (syntactic analysis)
the
goal
is
to parse the lexemes into a
parse tree
During both lexical and syntactic analysis, errors can be detected and
reportedSlide2
Continued
Lexical analysis takes source code that consists of
reserved words, identifiers, punctuation, blank spaces, comments
Identify each item for its lexical category
e.g., the reserved word “for”, a semicolon, an identifier, a close ),
etc
How do we perform this operation?
use
a relatively simple state transition diagram to describe the various entities of interest
implement a
program
based on recursive functions, one per lexical category, to search the next portion of the program to identify a component’s categorySlide3
ExampleSlide4
Recognizing Names/Words/NumbersSlide5
Implementation
int
lex
( ) {
getChar
( );
switch (
charClass
) {
case LETTER:
addChar
( );
getChar
( );
while (
charClass
== LETTER ||
charClass
== DIGIT) {
addChar
( );
getChar
( );
}
return lookup(lexeme);
break;
case DIGIT:
addChar
( );
getChar
( );
while (
charClass
== DIGIT) {
addChar
( );
getChar
( );
}
return INT_LIT; break;
} /* End of switch */
} /* End of function
lex
*/Slide6
Parsing
The process of
generating a parse tree from a set of input that identifies
the grammatical categories of each element of the input
identifying if and where errors occur
Parsing is similar whether for a natural language or a programming language
a good parser will continue parsing even after errors have been found
this requires a
recovery
process
Parsing is based on the language’s grammar but must also include the use of attributes in the grammar (that is, an attribute grammar)Slide7
Forms of Parsers
Top-down
(used in
the LL
parser algorithm)
start with LHS rules, map to RHS rules until terminal symbols have been identified, match these against the input
Bottom-up (used in
the LR
parser algorithms)
start with RHS rules and input, collapse terminals and non-terminals into non-terminals until you have reached the starting non-terminal
Parsing is an O(n
3
) problem where n is the number of items in the input
if we cannot determine a single token for each lexeme, the problem becomes O(2
n
)!
by restricting our parser to work
only
on the grammar of the given language, we can reduce the complexity to O(n)Slide8
Top-Down Parsing
Uses an LL parser (left-to-right, using leftmost derivation)
Generate a recursive-decent parser from a
BNF
grammar
non-terminal
grammatical categories
converted
into functions
e.g
., <expr>, <if>, <factor>,
<assign>, <id
>
each function,
when called,
parses
the next lexeme using a function called
lex
(
)
and maps it to terminal symbols and/or calls
further
functions
Two
restrictions
on
the grammar
cannot
have left recursion
if a rule has recursive parts, those parts must not be the first items on the RHS of a
rule, for instance <A>
<A>b cannot be allowed but <A> b<A> can
must
pass the pairwise
disjointness
test
(covered
shortly)
Algorithms exist to alter a grammar so that it passes both restrictionsSlide9
Recursive Decent Parser Example
Recall our example expression grammar from chapter 3:
<expr>
<term> {(+ | -) <term>}
<term>
<factor> {(* | /) <factor>}
<factor>
id | ( <expr> )
void expr( ) {
term( );
while (
nextToken
= = PLUS_CODE ||
nextToken
= = MINUS_CODE){ lex( ); term( ); }} void term( ) { factor( ); while (nextToken = = MULT_CODE || nextToken = = DIV_CODE) { lex( ); factor( ); }}
void factor( ) {
if(nextToken = = ID_CODE)
lex( );
else if(nextToken = = LEFT_PAREN_CODE)
{
lex( );
expr( );
if(nextToken = = RIGHT_PAREN_CODE)
lex( );
else error( );
}
else error( );
}Slide10
If Statement Example
void ifstmt( )
{
if (nextToken != IF_CODE)
error( );
else {
lex( );
if (nextToken != LEFT_PAREN_CODE)
error( );
else {
boolexpr( );
if (nextToken != RIGHT_PAREN_CODE)
error( );
else {
statement( );
if(nextToken = = ELSE_CODE) {
lex( );
statement( ); } } } }}
We expect an ifstatement to looklike this: if (boolean expr) statement;optionally followedby: else statement;Otherwise, we returnan errorSlide11
Pairwise
Disjointness
Consider a rule with multiple RHS parts, for instance
<A>
a<B> | a<C>
The LL
parser
must be able to
select the
part of the rule
to simplify, the
first non-terminal on each right-hand side rule must
differ (that is, each RHS mapping must start uniquely to make the choice obvious)
This is
pairwise
disjointness
Here are some examples
A aB | bAb | c –
passes (pairwise disjoint)A aB | aAb – fails (not pairwise disjoint)<var> id | id[<expr>] – fails but can be made pairwise disjoint as follows<var> id<next>
<next>
e
| [<expr
>] (
e
means empty
set)Slide12
Bottom-Up Parsing
To avoid the restrictions on an LL parser, we might want to use an LR
parser (l
eft-to-right parsing, rightmost derivation)
Implemented
using a pushdown automaton
a stack added to the state diagrams seen earlier
Parser
has two basic processes
shift
move
items from the input onto the stack
reduce
take
consecutive stack items and reduce
them
for
instance, if we have a rule <A> a<B> and we have a and <B> on the stack, reduce them to <A>the parser is easy to implement but we must first construct what is known as an LR parsing tablethere are numerous algorithms to generate the parsing tableSlide13
Parser Algorithm
Given input S0, a1, …, an, $
S
0
is the start state
a
1
, …, a
n
are the lexemes that make up the program
$ is a special end of input symbol
If action[S
m
,
a
i
] = Shift S, then push
ai, S onto stack and change state to SIf action[Sm, ai] = Reduce R, then use rule R in the grammar and reduce the items on the stack appropriately, changing state to be the state GOTO[Sm-1, R]If action[Sm
, ai] = Accept then the parse is complete with no errorsIf action[Sm, ai] = Error (or the entry in the table is blank) then call error-handling and recovery routineThe Parsing table stores the values of action[x, y] and GOTO[x, y]Slide14
Example
Grammar:
1. E
E + T
2. E T
3. T T * F
4. T F
5. F (E)
6. F id
Parse of id+id*id$
Stack Input Action
0 id+id*id$ S5
0id5 +id*id$ R6(GOTO[0,F])
0F3 +id*id$ R4(GOTO[0,T])
0T2 +id*id$ R2(GOTO[0,E])
0E1 +id*id$ S6
0E1+6 id*id$ S5
0E1+6id5 *id$ R6(GOTO[6,F])0E1+6F3 *id$ R4(GOTO[6,T])0E1+6T9 *id$ S70E1+6T9*7 id$ S50E1+6T9*7id5 $ R6(GOTO[7,F])0E1+6T9*7F10 $ R3(GOTO[6,T])0E1+6T9 $ R1(GOTO[0,E])0E1 $ ACCEPT