/
Lexical and Syntactic Analysis Lexical and Syntactic Analysis

Lexical and Syntactic Analysis - PowerPoint Presentation

sherrill-nordquist
sherrill-nordquist . @sherrill-nordquist
Follow
364 views
Uploaded On 2018-11-03

Lexical and Syntactic Analysis - PPT Presentation

We look at two of the tasks involved in the compilation process break the source code into lexemes units lexical analysis the goal is to identify each lexeme and assign it to its proper token ID: 711543

parser code parsing nexttoken code parser nexttoken parsing goto grammar lex input error expr 0e1 stack factor rule lexical reduce pairwise action

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Lexical and Syntactic Analysis" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Lexical and Syntactic Analysis

We

look at two of the tasks involved in the compilation process

break the source code into lexemes units

(lexical analysis)

the goal is to identify each lexeme and assign it to its proper

token

parse the

lexical components into their syntactic uses (syntactic analysis)

the

goal

is

to parse the lexemes into a

parse tree

During both lexical and syntactic analysis, errors can be detected and

reportedSlide2

Continued

Lexical analysis takes source code that consists of

reserved words, identifiers, punctuation, blank spaces, comments

Identify each item for its lexical category

e.g., the reserved word “for”, a semicolon, an identifier, a close ),

etc

How do we perform this operation?

use

a relatively simple state transition diagram to describe the various entities of interest

implement a

program

based on recursive functions, one per lexical category, to search the next portion of the program to identify a component’s categorySlide3

ExampleSlide4

Recognizing Names/Words/NumbersSlide5

Implementation

int

lex

( ) {

getChar

( );

switch (

charClass

) {

case LETTER:

addChar

( );

getChar

( );

while (

charClass

== LETTER ||

charClass

== DIGIT) {

addChar

( );

getChar

( );

}

return lookup(lexeme);

break;

case DIGIT:

addChar

( );

getChar

( );

while (

charClass

== DIGIT) {

addChar

( );

getChar

( );

}

return INT_LIT; break;

} /* End of switch */

} /* End of function

lex

*/Slide6

Parsing

The process of

generating a parse tree from a set of input that identifies

the grammatical categories of each element of the input

identifying if and where errors occur

Parsing is similar whether for a natural language or a programming language

a good parser will continue parsing even after errors have been found

this requires a

recovery

process

Parsing is based on the language’s grammar but must also include the use of attributes in the grammar (that is, an attribute grammar)Slide7

Forms of Parsers

Top-down

(used in

the LL

parser algorithm)

start with LHS rules, map to RHS rules until terminal symbols have been identified, match these against the input

Bottom-up (used in

the LR

parser algorithms)

start with RHS rules and input, collapse terminals and non-terminals into non-terminals until you have reached the starting non-terminal

Parsing is an O(n

3

) problem where n is the number of items in the input

if we cannot determine a single token for each lexeme, the problem becomes O(2

n

)!

by restricting our parser to work

only

on the grammar of the given language, we can reduce the complexity to O(n)Slide8

Top-Down Parsing

Uses an LL parser (left-to-right, using leftmost derivation)

Generate a recursive-decent parser from a

BNF

grammar

non-terminal

grammatical categories

converted

into functions

e.g

., <expr>, <if>, <factor>,

<assign>, <id

>

each function,

when called,

parses

the next lexeme using a function called

lex

(

)

and maps it to terminal symbols and/or calls

further

functions

Two

restrictions

on

the grammar

cannot

have left recursion

if a rule has recursive parts, those parts must not be the first items on the RHS of a

rule, for instance <A>

 <A>b cannot be allowed but <A>  b<A> can

must

pass the pairwise

disjointness

test

(covered

shortly)

Algorithms exist to alter a grammar so that it passes both restrictionsSlide9

Recursive Decent Parser Example

Recall our example expression grammar from chapter 3:

<expr>

<term> {(+ | -) <term>}

<term>

<factor> {(* | /) <factor>}

<factor>

id | ( <expr> )

void expr( ) {

  term( );

   while (

nextToken

= = PLUS_CODE ||

nextToken

= = MINUS_CODE){    lex( );    term( );    }} void term( ) { factor( ); while (nextToken = = MULT_CODE || nextToken = = DIV_CODE) { lex( ); factor( ); }}

void factor( ) {

if(nextToken = = ID_CODE)

lex( );

else if(nextToken = = LEFT_PAREN_CODE)

{

lex( );

expr( );

if(nextToken = = RIGHT_PAREN_CODE)

lex( );

else error( );

}

else error( );

}Slide10

If Statement Example

void ifstmt( )

{

if (nextToken != IF_CODE)

error( );

else {

lex( );

if (nextToken != LEFT_PAREN_CODE)

error( );

else {

boolexpr( );

if (nextToken != RIGHT_PAREN_CODE)

error( );

else {

statement( );

if(nextToken = = ELSE_CODE) {

lex( );

statement( ); } } } }}

We expect an ifstatement to looklike this: if (boolean expr) statement;optionally followedby: else statement;Otherwise, we returnan errorSlide11

Pairwise

Disjointness

Consider a rule with multiple RHS parts, for instance

<A>

 a<B> | a<C>

The LL

parser

must be able to

select the

part of the rule

to simplify, the

first non-terminal on each right-hand side rule must

differ (that is, each RHS mapping must start uniquely to make the choice obvious)

This is

pairwise

disjointness

Here are some examples

A  aB | bAb | c –

passes (pairwise disjoint)A  aB | aAb – fails (not pairwise disjoint)<var>  id | id[<expr>] – fails but can be made pairwise disjoint as follows<var>  id<next>

<next> 

e

| [<expr

>] (

e

means empty

set)Slide12

Bottom-Up Parsing

To avoid the restrictions on an LL parser, we might want to use an LR

parser (l

eft-to-right parsing, rightmost derivation)

Implemented

using a pushdown automaton

a stack added to the state diagrams seen earlier

Parser

has two basic processes

shift

move

items from the input onto the stack

reduce

take

consecutive stack items and reduce

them

for

instance, if we have a rule <A>  a<B> and we have a and <B> on the stack, reduce them to <A>the parser is easy to implement but we must first construct what is known as an LR parsing tablethere are numerous algorithms to generate the parsing tableSlide13

Parser Algorithm

Given input S0, a1, …, an, $

S

0

is the start state

a

1

, …, a

n

are the lexemes that make up the program

$ is a special end of input symbol

If action[S

m

,

a

i

] = Shift S, then push

ai, S onto stack and change state to SIf action[Sm, ai] = Reduce R, then use rule R in the grammar and reduce the items on the stack appropriately, changing state to be the state GOTO[Sm-1, R]If action[Sm

, ai] = Accept then the parse is complete with no errorsIf action[Sm, ai] = Error (or the entry in the table is blank) then call error-handling and recovery routineThe Parsing table stores the values of action[x, y] and GOTO[x, y]Slide14

Example

Grammar:

1. E

 E + T

2. E  T

3. T  T * F

4. T  F

5. F  (E)

6. F  id

Parse of id+id*id$

Stack Input Action

0 id+id*id$ S5

0id5 +id*id$ R6(GOTO[0,F])

0F3 +id*id$ R4(GOTO[0,T])

0T2 +id*id$ R2(GOTO[0,E])

0E1 +id*id$ S6

0E1+6 id*id$ S5

0E1+6id5 *id$ R6(GOTO[6,F])0E1+6F3 *id$ R4(GOTO[6,T])0E1+6T9 *id$ S70E1+6T9*7 id$ S50E1+6T9*7id5 $ R6(GOTO[7,F])0E1+6T9*7F10 $ R3(GOTO[6,T])0E1+6T9 $ R1(GOTO[0,E])0E1 $ ACCEPT