LEXEME TOKEN PATTERN print print print leftpar 4 number 4 arith 5 number 5 rightpar userAnswer ID Letter followed by letters and digits Game of Jones literal ID: 331032
Download Presentation The PPT/PDF document "What on Earth?" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
What on Earth?
LEXEME
TOKEN
PATTERN
print
print
p,r,i,n,t
(
leftpar
(
4
number
4
*
arith
*
5
number
5
)
rightpar
)
userAnswer
ID
Letter followed by letters and digits
“Game of Jones”
literal
Any string between “ and “Slide2
TranslatorsSlide3
Translators – Module Knowledge Areas
Types of translators and their use
Lexical analysisSyntax analysisCode generation and optimisation
Library routines Slide4
Translators – Module Knowledge Areas
Lexical analysis
Describe what happens during lexical analysis
So, we need to know:What is meant by
Lexical Analysis
What the key language is
What part lexical analysis plays in the translation process
How lexical analysis works
How to identify the key aspects of lexical analysisSlide5
Translators – Lexical Analysis
So far we have investigated the link between source code, assembly code and machine
code
In reality there are many more steps involved in getting code to runThere are a number of compilation phases:Parsing the source code (Lexical analysis)Syntax analysis
Type checking
Machine code generation
Code block sequencing
Register allocation
Optimization
Linking of librariesSlide6
Translators - Parsing
Consider this flow diagram of the translation process:
The source code is parsed ……
Parsing is analysis of the source codeEach line eg print(4*5) is read
The
compiler allocates a type (tokenizes) to each element
eg
keyword/reserved word, variable, constant …..Slide7
Translators - Parsing
In the example
print(4*5)print
and * are recognised (in its simplest form, print is known as a reserved word for print and * is known as the multiplier or an arithmetic token)If the example were written Print 4*5 the parser would not recognise mistakes in syntax – that is not the job of lexical analysisBecause Print
does not match a pattern for a keyword the compiler will assume
it is
a
variable (often give the token
ID
)
and will have the token for that assigned to it.
4 and 5 will have tokens for
number (specifically integer)
applied and the * has an arithmetic token applied
In effect, what happens is that a pair is created comprised of the token and the lexemeWhite space, eg extra lines in source code, spaces between characters and comments are stripped out as these are unnecessary for code to be translated into machine codeSlide8
Translators - Parsing
Look again at the parsing table:
Each lexeme
is a component of the source codeEach token specifies the type of data the lexeme isThe lexeme and token make a pairWhen parsing the source code, each lexeme follows a pattern
EG print has the pattern
p,r,i,n,t
whilst the left parentheses has only one component in the pattern
LEXEME
TOKEN
PATTERN
print
print
p,r,i,n,t
(
leftpar
(
4
number
4
*
arith
*
5
number
5
)
rightpar
)Slide9
Translators - Parsing
Where key/reserved words are concerned, the pattern for the lexeme-token pair must match exactly
If we add the line
print(“The answer to 4 * 5 is”) the lexeme- token, pattern for the content in the quotes would be:The translator knows that text surrounded by quotations has the token
literal
and that the quotations should be
ignored
Once the source code has been analysed by the
lexer
it is ready for the next stage – syntax analysis
LEXEME
TOKEN
PATTERN
“The
answer to 4 *
5”
literal
Sequence of characters inside the quotes but not including the quotes