SNU IDB Lab XML Documents 1 structure Peeping into XML document at Physical view Entity at logical view DTD 2 Peeping into XML document15 3 ltxml version10 standaloneyesgt ID: 362567
Download Presentation The PPT/PDF document "Physical and Logical Structure" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Physical and Logical Structure
SNU IDB Lab.Slide2
XML Documents 1 : structure
Peeping into XML document
at Physical view : Entity
at logical view : DTD
2Slide3
Peeping into XML document(1/5)
3
<?xml version=“1.0” standalone=“yes”?>
<GREETING>
Hello, XML!!
<!--this is greeting-->
</GREETING>
Mark-up
data
Mark-up and character dataSlide4
Peeping into XML document(2/5)
4
<? xml version=“1.0” standalone=“yes” ?>
<!DOCUMENT
DATE
[
<!ELEMENT
DATE
(#PCDATA)>] >
<DATE> 001224</DATE>
XML document : date.xml
XML declaration
xml
문서임을 선언
.
<?
로 시작하여
?>
로 끝난다
.
DTD(Document Type Definition)
user
가 사용할
tag를 정의한다.여기서는 DATE tag를 정의.
Content
<!--This is date -->
Comment
:
parser
는 이를 무시
.Slide5
Peeping into XML document(3/5)
Structure of XML document
physical structure :
allows components of the document, called entities
logical structure : allows a document to be divided into named units and sub-units, called elements
5Slide6
Sub-unit
Unit
Document
elements
Logical Structure
entities
(internal)
(separate)
Physical Structure
5
Peeping into XML document(4/5
)Slide7
Peeping into XML document(5/5)
7
<person>
<name> kim </name>
<ID>771224</ID>
<office>301-453</office>
<phone>1830</phone>
<photo source=“k.jpg”/>
</person>
<person>
<name> kim </name>
<ID>771224</ID>
<office>301-453</office>
<phone>1830</phone>
<photo source= />
</person>
“k.jpg”
element
entitySlide8
XML Documents 1 : structure
Peeping into XML document
at Physical view : Entity
at logical view : DTD
8Slide9
Content of Physical structure
Entity
Figures of Document Entity
Defining an entityGrammar in Declaring EntityExamples of EntityDeclaration
URL format
9Slide10
Entity (1/3)
unit of physically isolating and storing any part of a document (
정보저장단위
)Each unit of information is called an entity
entities
(internal)
(separate)
Physical Structure
<person>
<name> kim </name>
<ID>771224</ID>
<office>301-453</office>
<phone>1830</phone>
<photo source= />
</person>
“k.jpg”
entity
SNU
OOPSLA
Lab.Slide11
Entity (2/3)
Purpose of Entity
contain all the information
(well-formed XML data , other text file, binary data…)
11
<person>
<name> kim </name>
<ID>771224</ID>
<office>301-453</office>
<phone>1830</phone>
<photo source= />
</person>
“k.jpg”
Document entity
Image entitySlide12
Entity (3/3)
Internal Entity
해당
document 안에서 완전하게 정의되는 entity
External EntityURL을 통해 알려진 외부의 source
로부터 그들의 content를 받아 오는 entity 12Slide13
Figures of Document Entity
13
document entity
(no entities)
document entity
(main content)
A
A
B
C
D
document entity
(framework file)Slide14
Defining an entity
Entity must be defined before the first reference to them in the data stream
Declared in the DTD(Document Type Definition)
14
<!DOCTYPE DOCUMENT [
<!ENTITY EMAIL “sjlee@oopsla.snu.ac.kr”>
<!ENTITY TEXT “(#PCDATA)”>
]>
Entity definition in DTDSlide15
Example : EntityDeclaration(1/3)
Internal text entities
<!ENTITY XML “eXtensible Markup Language”>
<!ENTITY DemoEntity ‘The rule is 6” long.’>
Built-in entities (내장entity)<!ENTITY sample “Use " and ‘as delimiters.”>
15
&li;
>
&
'"
for ‘<‘for ‘
>’for ‘&’for ‘ ’
’for ‘ ” ’;Slide16
Example : EntityDeclaration(2/3)
External text entities
<!ENTITY myent SYSTEM “/EMTS/MYENT.XML”>
<!ENTITY myent PUBLIC “-//MyCorp//ENTITY Syperscript Chars//EN”….>
Binary entities<!ENTITY Jsphoto SYSTEM “/ENTS/Jsphoto.tif” NDATA “TIFF”>
16Slide17
Example : EntityDeclaration(3/3)
URL format
<!ENTITY ent9 SYSTEM “entities/entity9.xml”>
/xml/document.xml/entities/entity9.xml
<!ENTITY ent9 SYSTEM “../entities/entity9.xml”>
/xml/docs/document.xml/
entities/entity9.xml
xml
document.xml
entities
entity9.xml
xml
entities
entity9.xml
docs
document.xmlSlide18
XML Documents 1 : structure
Peeping into XML document
at Physical view : Entity
at logical view : DTD
18Slide19
Content of Logical structure
Concepts
DTD Structure
Element DeclarationAttribute DeclarationsParameter Entities
Conditional SectionsNotation DeclarationsDTD Processing Issues
19Slide20
Concepts of DTD(1/3)
DTD(Document Type Definition)
An optional but powerful feature of XML
Comprises a set of declarations that define a document structure tree
XML processors read the DTD and check whether the document is valid and use it to build the document model in memory Describes user’s own tag set as meta markup language
20Slide21
Concepts of DTD(2/3)
DTD describes..
Element , attribute , notation , relation between each elements
Establishes formal document structure rules
21Slide22
Concepts of DTD(3/3)
Declare Vs. Define
Declare
“This document is a concert poster”
Define “A concert poster must have the following features”DTD defineElement type + Attribute + Entities
Valid Vs. InvalidValid conforms to DTDInvalid fail to conform to DTD22
Well formed
XML Document
Valid XML DocumentSlide23
Valid & Invalid Documents
Valid:
<GREETING>
various random text but no markup
</GREETING>
Invalid: anything else including <GREETING>
<sometag>various random text</sometag>
<someEmptyTag/> <GREETING> 23
Example:
<!DOCTYPE GREETING[ <ELEMENT GREETING (#PCDATA)> ]>Slide24
DTD structure
DTD is composed of a number of declarations
ELEMENT (tag definition)
ATTLIST (attribute definitions)ENTITY (entity definition)
NOTATION(data type notation definition)DTD can be stored in an external subset or an internal subset
24Slide25
Internal and External Subset(1/3)
Internal subset
Form :
<!DOCTYPE … [ <!-- Internal Subset -->
… ]>ProsEasy to write XML
ConsEditing two files without movingOther document can’t reuse without copying internal subset25Slide26
Internal and External Subset(2/3)
External subset
better to use external DTDs
Reason why?Many benefits
document managementupdatingeditingFew reasons
If you use an external DTD, you can use public DTDs(capability)External DTDs provide for better document managementExternal DTDs make it easier to validate you document26Slide27
Internal and External Subset(3/3)
27
internal
external
Internal subset
external subset
full parsing pathSlide28
Element Declarations
Used to define a new element, specify its allowed content and gives the name and content model of the element
Each tag must be declared in a <!ELEMENT> declaration.
The content model uses a simple regular expression-like grammar to precisely specify what is and isn't allowed in an element
28
ELEMENT Type declaration
‘<!ELEMENT’ S Name S Contentspec S? ‘>’Slide29
Content Specifications
ANY
#PCDATA
SequencesChoicesMixed Content
ModifiersEmpty
29Slide30
ANY
A SEASON can contain any child element and/or raw text (parsed character data)
Rarely used in practice, due to the lack of constraint on structure it encourages.
30
<!ELEMENT SEASON ANY>Slide31
#PCDATA
Parsed Character Data; i.e. raw text, no markup
Represent normal data and preceded by the hash-symbol, ‘#’, to avoid confusion with an identical element name, when used within a model group
( for example, ‘(#PCDATA | PCDATA)’)
31
<!ELEMENT YEAR (#PCDATA)>Slide32
Use of #PCDATA in XML
32
Valid:
Invalid:
<
YEAR>1999</YEAR>
<YEAR>99</YEAR>
<YEAR>1999 .E.</YEAR>
<YEAR>
The year of our Lord one thousand, nine hundred, and ninety-nine
</YEAR>
<
YEAR>
<MONTH>January</MONTH>
<MONTH>February</MONTH>
<MONTH>March</MONTH>
<MONTH>April</MONTH>
<MONTH>May</MONTH>
<MONTH>June</MONTH>
<MONTH>July</MONTH>
<MONTH>August</MONTH>
<MONTH>September</MONTH>
<MONTH>October</MONTH>
<MONTH>November</MONTH>
<MONTH>December</MONTH>
</YEAR>Slide33
Child Elements
To declare that a LEAGUE element must have a LEAGUE_NAME child:
33
<!
ELEMENT LEAGUE (LEAGUE_NAME)>
<!ELEMENT LEAGUE_NAME (#PCDATA)>Slide34
Sequences(1/2)
Separate multiple required child elements with commas; e.g.
One or More Children +
34
<!
ELEMENT SEASON (YEAR, LEAGUE, LEAGUE)>
<!ELEMENT LEAGUE (LEAGUE_NAME, DIVISION,
DIVISION, DIVISION)>
<!ELEMENT DIVISION_NAME (#PCDATA)>
<!ELEMENT DIVISION (DIVISION_NAME, TEAM+)>Slide35
Sequences(2/2)
Zero or More Children *
Choices
35
<!ELEMENT TEAM (TEAM_CITY, TEAM_NAME, PLAYER*)>
<!ELEMENT TEAM_CITY (#PCDATA)>
<!ELEMENT TEAM_NAME (#PCDATA)>
<!ELEMENT PAYMENT (CASH | CREDIT_CARD)>
<!ELEMENT PAYMENT (CASH | CREDIT_CARD | CHECK)>Slide36
Grouping With Parentheses
Parentheses combine several elements into a single element.
Parenthesized element can be nested inside other parentheses in place of a single element.
The parenthesized element can be suffixed with a plus sign, a comma, or a question mark.
36
<!ELEMENT dl (dt, dd)*>
<!ELEMENT ARTICLE (TITLE, (P | PHOTO |GRAPH
| SIDEBAR | PULLQUOTE | SUBHEAD)*, BYLINE?)>Slide37
Mixed Content
Both #PCDATA and child elements in a choice
#PCDATA must come first
#PCDATA cannot be used in a sequence
37
<!
ELEMENT TEAM (#PCDATA | TEAM_CITY
| TEAM_NAME | PLAYER)*>
Empty elements
<!
ELEMENT BR EMPTY>Slide38
Attribute Declarations
Consider this element:
It is declared like this:
38
<
GREETING LANGUAGE="Spanish">
Hola!
</GREETING>
<!
ELEMENT GREETING (#PCDATA)>
<!ATTLIST GREETING LANGUAGE CDATA "English">
<!
ATTLIST Element_name Attribute_name Type
Default_value>Slide39
Multiple Attribute Declarations
Consider this element
With two attribute declarations:
With one attribute declaration
Indentation is a convetion, not a requirement
39
<
RECT LENGTH="70px" WIDTH="85px"/>
<!
ELEMENT RECTANGLE EMPTY>
<!ATTLIST RECTANGLE LENGTH CDATA "0px">
<!ATTLIST RECTANGLE WIDTH CDATA "0px">
<!ATTLIST RECTANGLE LENGTH CDATA "0px" WIDTH CDATA "0px">Slide40
Attribute Types
40
CDATA
ID
IDREF
IDREFS
ENTITY
ENTITIES
NOTATION
NMTOKEN
NMTOKENS
EnumeratedSlide41
CDATA
Most general attribute type
Value can be any string of text not containing a less-than sign (<) or quotation marks (")
41Slide42
ID
Value must be an XML name
May include letters, digits, underscores, hyphens, and periods
May not include whitespaceMay contain colons only if used for namespaces
Value must be unique within ID type attributes in the documentGenerally the default value is #REQUIRED
42Slide43
IDREF
Value matches the ID of an element in the same document
Used for links and the like
43
IDREFS
A list of ID values in the same document
Separated by white spaceSlide44
ENTITY
Value is the name of an unparsed general entity declared in the DTD
44
ENTITIES
Value is a list of unparsed general entities declared in the DTD
Separated by white spaceSlide45
NOTATION
Value is the name of a notation declared in the DTD
45
<!
NOTATION
Tex SYSTEM “..\TEXVIEW.EXE”>
<!ENTITY Logo SYSTEM “LOGO.TEX”
NDATA Tex
>
TEXVIEW.EXE
LOGO.TEX
1
2
3
4Slide46
NMTOKEN
Value is any legal XML name
46
NMTOKENS
Value is a list of XML names
Separated by white spaceSlide47
Enumerated
Not a keyword
Refers to a list of possible values from which one must be chosen
Default value is generally provided explicitly
47
<!ATTLIST P VISIBLE (TRUE | FALSE) "TRUE">Slide48
Attribute Default Values
A literal string value
One of these three keywords
#REQUIRED#IMPLIED
#FIXED
48Slide49
#REQUIRED
No default value is provided in the DTD
Document authors must provide attribute value for each element
49
<!
ELEMENT IMG EMPTY><!ATTLIST IMG ALT CDATA #REQUIRED><!ATTLIST IMG WIDTH CDATA #REQUIRED>
<!ATTLIST IMG HEIGHT CDATA #REQUIRED>Slide50
#IMPLIED
No default value in the DTD
Author may(but does not have to) provide a value with each element
50Slide51
#FIXED
Value is the same for all elements
Default value must be provided in DTD
Document author may not change default value
51
<!ELEMENT AUTHOR EMPTY><!ATTLIST AUTHOR NAME CDATA #REQUIRED>
<!ATTLIST AUTHOR EMAIL CDATA #REQUIRED>
<!ATTLIST AUTHOR EXTENSION CDATA #IMPLIED>
<!ATTLIST AUTHOR COMPANY CDATA #FIXED "TIC">Slide52
Example of Internal DTDs
52
<?
xml version="1.0"?>
<!DOCTYPE GREETING [
<!ELEMENT GREETING (#PCDATA)>
]>
<GREETING>
Hello XML!
</GREETING>Slide53
Internal DTD Subsets
Internal declarations override external declarations
53
<?
xml version="1.0"?>
<!DOCTYPE GREETING SYSTEM "greeting.dtd" [
<!ELEMENT GREETING (#PCDATA)>
]>
<GREETING>
Hello XML!
</GREETING>