/
UFCEKG-20-2 UFCEKG-20-2

UFCEKG-20-2 - PowerPoint Presentation

tatyana-admore
tatyana-admore . @tatyana-admore
Follow
377 views
Uploaded On 2016-07-11

UFCEKG-20-2 - PPT Presentation

Data Schemas amp Applications Lecture 3 Data Representation XML amp RSS Last week introduction to the web u ri schemas amp encoding http protocol m edia types request response cycle ID: 399305

xml news rss data news xml data rss uwe title http link amp bbc element document web item tags

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "UFCEKG-20-2" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

UFCEKG-20-2 Data, Schemas & Applications

Lecture 3

Data Representation, XML & RSSSlide2

Last week: introduction to the web

u

ri

schemas & encoding

http protocol

m

edia types

request / response cycle

g

et, post, put and delete

i

ntroduction to

mashups

s

imple

mashup

example with formsSlide3

WWW : definitionThe World Wide Web

(abbreviated as

WWW

or

W3

,

commonly known as

the Web

), is a

system

of interlinked

hypertext

documents accessed via the

Internet

. With a

web browser

, one can view

web pages

that may contain text, images, videos, and other

multimedia

,

and

navigate

between them via

hyperlinks

.

Wikipedia : World Wide Web

Concept originally proposed by

Sir

Tim

Berners-Lee

(1989) based on earlier hypertext systems.

Berners-Lee and Belgian computer scientist

Robert

Cailliau

proposed in 1990 to use hypertext "to link and access information of various kinds as a web of nodes in which the user can browse at will

",

and they publicly introduced the project in December of the same year

.Slide4

Problem : How to encode data for communication

Bank of America Market Data Mirrors

Competing

constraints

Data must be serialised into a character stream

Communicate the meaning of the data as well as the data

Error-free

Minimimal

size

Handle Multi-Lingual text Slide5

Solutions

Card file based

csv

xls

- Excel file format

XML

SQL export

JSON - JavaScript Object Notation

The

Medabar

in Asmara, Eritrea

Google MapSlide6

Card-based

Examples

ATCO-CIF

for timetables

IGES

for Computer-Aided Design

Characteristics

Based on old 80-column punched cards

Muliple

record types

Fixed field widths

No formal language to define the format Slide7

ExamplesAlveston (Bristol) weather data

World Health Organization(WHO) - generated estimates of TB mortality, prevalence, incidence (including incidence of HIV+TB) and case detection rate.

1000 Songs - Google

Spreadsheet

Characteristics

Data values separated by a common separator character - space, comma or tab

Column position is significantLines separated by newlines - coding depends on OS - linefeed (x0A) Unix or carriage-return (x0D), line feed - Windows, carriage-return on old Macs Separator must not occur in data values, or some other convention needed - Quotes around value, an escape character

Column headings may be the first lineOnly tables - all lines the same

All columns required - problem for space-separated data

CSVSlide8

Data with optional data and repeated data need more complex structures. Many have been developed for specific domainsMARC library catalogue recordsEDIFACT

for commercial Electronic Data interchange (EDI)

EDIF

LISP -based nested data

EXIF

data embedded in a JPEG image

Tagged record structuresSlide9

XML

A generic data format based on tagged elements in a tree structure.

Developed from GML, via SGML

.

GML

,a

document

markup

language developed

by

Charles Goldfarb at IBM in 1969.

Examples

Alveston

WDL

config

file

UWE news RSS feed

Tree with Buddhist prayer flagsSlide10

XML domain vocabulariesXML defines only the rules for a well-formed document. The allowable tags, their structuring and order in a document, range of allowable values and the meaning of those tags depends on the XML application - called a vocabulary.There are now hundreds of XML vocabularies designed for every sort of data

XHTML - the version of HTML which conforms to XML

SVG - graphics

TransExchange

for timetables

RSS and Atom for news

XML domain vocabulariesSlide11

There are also vocabularies for languages for processing XMLXSLT - for transforming XML documents

XSL-FO

- for transforming to PDF documents

XML Schema

-

for defining XML vocabulariesXProc

- for defining XML Pipelines XML processing vocabulariesSlide12

I want to disseminate news about my project/company, and allow interested people to read it. e.g. the university wants to spread the news about successful staffSolution 1 : HTML pagePublish a page of news on the website in HTML

Problems

how do visitors know when its changed?

news from different universities cannot be easily combined – (why?)

Problem: News disseminationSlide13

Encourage interested users to subscribe to your company newsletter.ProblemsSubscription is a barrier

Clutters up email boxes

can look like spam

List

management

and emailing overhead

Solution : emailSlide14

UWE makes up its own set of additional tagsSolution : Create XML document for news

<

newsItem

date

=‘2007-10-2’>

<

newsTitle>UWE best in West</newsTitle>

<newsBody

>UWE wins

tiddlewinks

again</newsBody

> <contact>press@uwe.ac.uk</Contact

>

</

newsItem

>

Problems

someone

has to design this language

has to be translated to HTML to display

s

reader has to understand multiple new tags from different sources

needs to be distinguished from standard HTMLSlide15

ProblemHow to distinguish in a document XML tags from different vocabularies ?Solution

define a (global) unique URI for the vocabulary

use an arbitrary prefix - news: for all tags in the same

vocubulary

- unique within a document

link the prefix to the vocabulary in the document

Aside: Namespaces

<h1>UWE news</h1>

<p>

<news:item

xmlns

="http://www.uwe.ac.uk/news" date="

2007-10-2“> <news:Title>UWE best in West</news:Title>

<news:Body>UWE wins

tiddlewinks

again</

news:Body

>

<

news:Contact>press@uwe.ac.uk</news:Contact

>

</

news:item

>

</

p>Slide16

Standardize on one (or several !) standard tagsTags are machine-readable to identify news items in a list of web sitesRSS 2.0

Really Simple Syndication

Rich Site Summary

Atom - a more recent format

Differences - dates (RFC 822 v RFC 3339 timestamps), multi-lingual content

Characteristics

Structure:

rss / channel / item TreeItems in reverse chronological order

Few mandatory tagsNamespaces allow additional vocabularies to be added

Solution : RSSSlide17

Example RSS - UWE news<?xml version="1.0" encoding="iso-8859-1

"?>

<

rss

version="2.0

">

<

channel> <title>UWE News</title

><link>http://www.uwe.ac.uk</link

>

<

description>Latest UWE press releases</description

><image>

<

url

>http://info.uwe.ac.uk/common/assets/2004Design/logoNoBorder.gif</url

>

<

title>University of the West of England</title

>

<

link>http://www.uwe.ac.uk</link

>

</

image

>

<

pubDate

>Fri, 13 Oct 2008 15:15:44 GMT</

pubDate

>

<

item

>

<

title>New research looks to transport users for solutions</title

>

<

link>http://info.uwe.ac.uk/news/uwenews/article.asp?item=1363</link

>

<

description>'Ideas in Transit' is a new initiative which will look

to

transport users' experiences and creativity as a source of

innovation

to

tackle the UK's transport problems

....

</

description

>

</

item> Slide18

Example RSS - BBC Finance News<?xml version="1.0" encoding="ISO-8859-1"

?>

<?

xml-

stylesheet

title="

XSL_formatting

" type="text/xsl“

href="/shared/bsp

/

xsl

/

rss/nolsol.xsl"?> <rss

version="2.0"

xmlns:media

="http://search.yahoo.com/

mrss

">

<

channel>

<

title>BBC News | Business | UK Edition</title

>

<

link>http://news.bbc.co.uk/go/rss/-/1/hi/business/default.stm</link

>

<

description>Visit BBC News for up-to-the-minute news, breaking news, video, audio

and

feature stories. BBC News provides trusted World and UK news as well as local

and

regional perspectives. Also entertainment, business, science, technology and

health

news.

</

description

>

<language>en-

gb

</language

>

<

lastBuildDate

>Mon, 13 Oct 2008 14:28:30 GMT</

lastBuildDate

>

<

copyright>Copyright: (C) British Broadcasting Corporation,

see

http://news.bbc.co.uk/1/hi/help/rss/4498287.stm for terms and conditions of

reuse

</

copyright

>

<

docs>http://www.bbc.co.uk/syndication/</docs

>

<

ttl

>15</

ttl

>

<

image>

<

title>BBC News</title>

<

url

>http://news.bbc.co.uk/nol/shared/img/bbc_news_120x60.gif</url

>

<

link>http://news.bbc.co.uk/go/rss/-/1/hi/business/default.stm</link

>

</image

>

<item

>

<title>UK banks receive &#163;37bn bail-out</title>

<

description>The UK government says it is to inject a total of up to &#163;37bn into

Royal

…..

</

item> Slide19

ProblemHow to keep track of multiple feedsRSS aggregation

Solution

http://www.youtube.com/watch?v=0klgLsSxGsU&feature=player_embedded#t=0s

Application needed which is

stateful

– remembers what items you have read

Integrates multiple feeds into one ‘magazine’

Polls RSS providers on a regular basis

Feed integrators Bloglines, Google Reader, reduce the load on the provider and provide some filtering There is an RSS reader integrated into

MyUWE

RSS Aggregation with

BloglinesSlide20

UWE newsBBC Finance news

Earthquakes

RSS as a tree structureSlide21

strings enclosed in tags which provide a humanly readable name for the element - so-called self-describingelements may be nested to create hierarchical data structures

element tags may be repeated

element names can be relative to their parent

element structure can be formally defined

XML CharacteristicsSlide22

Element names provide a clue about the meaning of the data, but not enoughnames are ambiguousnames may be misleading

what units?

what accuracy?

what origin? - leads to need for meta-data

who created

when

what license to use

whyAside: Self -describingSlide23

XML documents are tree-structures, with each node bounded by an open and a closing tagElement: the opening tag, attributes, the body of the element and the closing tag. Elements are not elemental!tag name: the name in angle brackets - must conform to rules, may have a prefix

Attribute: a name="value" pair attached to an element. Names follow the same rules as tag names.

Parent: all

elments

except the root have one parent

Child: an element nested in another parent element

Root: every document has a single root element with no parent

Mixed Content: an element may contain a mixure of text and other elements XML terminologySlide24

A single root elementTags must be properly nestedAn element must be closed:

Open and closing tag <p>... </p>

Empty element <

br

/> or <

hr

size="3"/>Other formatting rules

XML names are case sensitive, no spaces, restricted character setAttribute values must be single or double-quotedSpecial characters coded as references &#10 (a line feed) &gt; >

Some characters have special meaning e.g. < is the start of a tag- within XML data, & is the first character of an entity reference. In XML data these have to be encoded as &lt; and &amp; or enclosed in <[CDATA[ ....]]>

Preferably use

standard

formats for representing values e.g. 2008-10-14 for a date

Basic XML rules

Related Contents


Next Show more