/
Comparing Java  XML  parsers Comparing Java  XML  parsers

Comparing Java XML parsers - PowerPoint Presentation

lindy-dunigan
lindy-dunigan . @lindy-dunigan
Follow
349 views
Uploaded On 2018-11-04

Comparing Java XML parsers - PPT Presentation

PRESENTED BY SASANKA SEKHAR BANERJEE Comparing Java XML parsers During this presentation we will discuss the following Need for XML Brief overview of XML Different methods of parsing XML ID: 713091

java xml parsing document xml java document parsing data parsers xmlstreamwriter comparing parser api sax xpath dom event stax

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Comparing Java XML parsers" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Comparing Java XML parsers

PRESENTED BY

SASANKA SEKHAR BANERJEESlide2

Comparing Java XML parsers

During this presentation, we will discuss the following:

Need for XML

Brief overview of XML

Different methods of parsing XML

DOM [Document Object Model]

SAX [Simple API for XML]

JAXP [Java API for XML processing]

JAXB [Java API for XML Binding]

StAX [Streaming API for XML]

XPath

Choose the right parserSlide3

Comparing Java XML parsers – NEED for XML

Applications essentially consist of two parts - functionality described by the code and the data that is manipulated by the code.

The in-memory storage and management of data is a key part of any programming language and environment.

Within a single application, the programmer is free to decide how the data is stored and represented.

Problem

-

Application must exchange data with another application.

Can use an intermediary storage medium, such as a database. But what if the data is to be exchanged directly between two applications, or the applications cannot access the same database? In this case, the data must be encoded in some particular format as it is produced. This has often resulted in the creation of application-specific data formats. These formats can be text-based, such as HTML for encoding how to display the encapsulated data, or binary, such as those used for sending remote procedure calls.

Problem

-

In either case, there tends to be a lack of flexibility in the data representation, causing problems when versions change or when data needs to be exchanged between disparate applications, frequently from different vendors.Slide4

Comparing Java XML parsers –XML Usage

XML was developed to address these issues. XML is written in plain text, uses self-describing elements and provides a data encoding format that is:

Generic

Simple

Flexible

Extensible

Portable

XML offers a method of putting structured data in a text file. Structured data conforms to a particular format; examples are spreadsheets, address books, configuration parameters, and financial transactions.

This plain text data provides software- and hardware-independent way of storing data making it easier to create data that different applications can share.

Exchanging data as XML greatly reduces this complexity, since the data can be read by different incompatible applications.While upgrading to a new systems large volume of data must be converted and incompatible data is often lost. XML plain text format. This makes it easier to expand or upgrade to new systems, without losing data.With XML, data can be available to all kinds of "reading machines" (Handheld computers, voice machines, news feeds, etc)Slide5

Comparing Java XML parsers – Overview OF XML

XML document consists of elements, each element has a start tag, content and an end tag.

XML document must have exactly one root element, e.g. one tag which encloses the remaining tags.

XML document is case-sensitive and required to be well-formatted.

Following conditions need to satisfied in order to be well-formatted:

A XML document always starts with a prolog

Every tag has a closing tag.

All tags are completely nested.

XML document is valid

if it is well-formatted and if it is contains a link to a XML schema and is valid according to the schema. The following is a valid, well-formatted XML file<?xml version="1.0"?> <!-- This is a comment --> <address> <name>Lars </name> <street> Test </street> <telephone number= "0123"/> </address> Slide6

Comparing Java XML parsers – Parsing XML

Java contains several methods to access XML. The following is a short overview of the available methods.

Document Object Model or DOM

Defines a mechanism for accessing and manipulating well-formed XML.

Using the DOM API, the XML document is transformed into a tree structure in memory.

The application then navigates the tree to parse the document.

If the document is large, it can place a strain on system resources.

Simple API For XML or SAX

Defines XML parsing methods.Event based parser, the SAX parser streams a series of events while it reads the document. These events are forwarded to event handlers, which also provide access to the data of the document.

Consumes extremely low memory, XML is not required to be loaded into the memory at one time.Need to implement all the event handlers to handle each and every incoming event.Incapable of processing the events when it comes to the DOM's element supports, and need to keep track of the parsers position in the document hierarchy.The application logic gets tougher as the document gets complicated and bigger. It may not be required that the entire document be loaded but a SAX parser still requires to parse the whole document, similar to the DOM.

It lacks a built-in document support for navigation like the one which is provided by XPath.

Along with the existing problem the one-pass parsing syndrome also limits the random access support. Slide7

Comparing Java XML parsers – Parsing XML Java API for XML Processing or JAXP

It provides a common interface for creating and using SAX and DOM in Java.

It does not implement a parser in itself, but defines the behavior that a parser is (at least) to support.

The actual parser itself will have to derive these classes and provide concrete classes.

It uses

FACTORY

pattern to create a concrete class and then call methods on these to parse.

DocumentBuilderFactory class is used for DOM Parsing and SAXParserFactory is used for SAX parsing.

Traversing the DOM using JAXP:

Instantiate a factory class.Using the factory class instantiate the provider class.Using the provider class created in the previous step perform the XML processing/parsingDocumentBuilderFactoty factoryBuilder  = DocumentBuilderFactory.newInstance( );DocumentBuilder builder  = factoryBuilder.newDocumentBuilder();Document doc  = builder.parse( fileName );Slide8

Comparing Java XML parsers – Parsing XML SAX Parsing using JAXP

In the case of DOM parser, responsibility was passed to the actual parser to parse the XML document and return the DOM document object.

But for SAX, the approach is quite opposite. We call the parse method and pass a

handler

object – this handler will receive notifications about the parsing progress, errors encountered and so on.

SAXParserFactory factorySAX = SAXParserFactory.newInstance();

SAXParser sax = factorySAX.newSAXParser();

DefaultHandler handler = new XMLParser();sax.parse(inputStream, handler);

The only major difference is the parse function – first, the parse function doesn’t return a Document object and, secondly, we need to specify a DefaultHandler-derived class. The handler class is meant to build up the DOM internally, should it need to. Slide9

Comparing Java XML parsers – Parsing XMLJava API For XML Binding or JAXB

DOM is a useful API that build and transform XML documents in memory. Unfortunately, DOM is somewhat slow and resource hungry. To address these problems, the Java Architecture for XML Binding (JAXB) has been developed.

JAXB provides a mechanism that simplifies the creation and maintenance of XML-enabled Java applications. It does this by using an XML schema compiler (only DTDs and a subset of XML schemas and namespaces at the time of this writing) that translates XML DTDs into one or more Java classes, thereby removing the burden from the developer to write complex parsing code.

The generated classes handle all the details of XML parsing and formatting, including code to perform error and validity checking of incoming and outgoing XML documents, which ensures that only valid, error-free XML is accepted.

Because the code has been generated for a specific schema, the generated classes are more efficient than those in a generic SAX or DOM parser. Most important, a JAXB parser often requires a much smaller

footprint in memory than a generic parser.

Classes created with JAXB do not include tree-manipulation capability, which is one factor that contributes to the small memory footprint of a JAXB object tree. Slide10

Comparing Java XML parsers – Parsing XMLJAXB primarily contains at the two main components:

The binding compiler, which binds a given XML schema to a set of generated Java classes

The binding runtime framework, which provides unmarshalling, marshalling, and validation functionalities.

Unmarshalling a XML document

Unmarshalling is the process of converting an XML document into a corresponding set of Java objects.

First step is to create a JAXBContext context object which is the starting point for marshalling, unmarshalling, and validation operations.

JAXBContext jaxbContext = JAXBContext.newInstance (“com.xmlparsers.jaxb.xsd.marketerprofile");

To unmarshall an XML document, create an Unmarshaller from the context:

Unmarshaller unmarshaller = jaxbContext.createUnmarshaller();The unmarshaller returns the unmarshalled object:CreateCustomerProfileResponse profileElement = (CreateCustomerProfileResponse)

unmarshaller.unmarshal(new File("src/com/xmlparsers/jaxb/xsd/CIMMarketerProfile.xml"));String marketerProfile = profileElement.getCustomerProfileId();Slide11

Comparing Java XML parsers – Parsing XMLMarshalling a XML document

Marshalling involves transforming Java classes into XML format.

MessageType msgType = new MessageType();

msgType.setCode("0");

msgType.setText("Successfull");MessagesType msgTypes = new MessagesType();

msgTypes.setResultCode("OK");

msgTypes.getMessageType().add(msgType);

CreateCustomerProfileResponse marketerProfile = new CreateCustomerProfileResponse();marketerProfile.getMessagesType().add(msgTypes);

marketerProfile.setCustomerProfileId("21345678");JAXBContext context = JAXBContext.newInstance(CreateCustomerProfileResponse.class);Marshaller m = context.createMarshaller();m.setProperty(Marshaller.

JAXB_FORMATTED_OUTPUT, Boolean.TRUE);m.marshal(marketerProfile, System.out);Slide12

Comparing Java XML parsers – Parsing XMLUse JAXB when you want to

Access data in memory, but do not need tree manipulation capabilities

Process only data that is valid

Convert data to different types

Generate classes based on a DTD or XML schemaBuild object representations of XML dataUse JAXP when you want to

Have flexibility with regard to the way you access the data, either serially with SAX or randomly in memory with DOM

Use your same processing code with documents based on different DTDs

Parse documents that are not necessarily validApply XSLT transformations

Insert or remove components from an in-memory XML treeSlide13

Comparing Java XML parsers – Parsing XMLStreaming API For XML or StAX

Traditionally, XML APIs are either:

Tree based - the entire document is read into memory as a tree structure for random access by the calling application

Event based - the application registers to receive events as entities are encountered within the source document.

Tree based API are less efficient with respect to the memory usage.

In such situations, a streaming API is preferred which uses much less memory since it doesn't have to hold the entire document in memory simultaneously.

It can process the document in small pieces making it much faster.

SAX is one such event based streaming API which actually ‘

pushes’ data into the application.

They feed the content of the document to the application as soon as they see it, whether the application is ready to receive that data or not.StAX was designed as a median between these two opposites. The programmatic entry point is a cursor that represents a point within the document. The application moves the cursor forward - 'pulling' the information from the parser as it needs.Slide14

Comparing Java XML parsers – Parsing XML

Pull API has the following advantages:

Pull APIs are a more comfortable alternative for streaming processing of XML.

A Pull API is based around the more familiar

Iterator design pattern

rather than the less well-known

observer design pattern

. In a Pull API, the client program asks the parser for the next piece of information rather than the parser telling the client program when the next datum is available.

In a Pull API the client program drives the parser whereas in a Push API the parser drives the client.

Why StAX ? StAX shares with SAX the ability to read arbitrarily large documents. However, in StAX the application is in control rather than the parser. The application tells the parser when it wants to receive the next data chunk rather than the parser StAX exceeds SAX by allowing programs to both read existing XML documents and create new ones. Unlike SAX, StAX is a bidirectional API.Slide15

Comparing Java XML parsers – Parsing XML

Reading XML with StAX:

XMLStreamReader

is the key interface in StAX.

This

interface represents a cursor that's moved across an XML document from beginning to end.

At

any given time, this cursor points at one event: text node, start-tag, comment, etc. The

cursor always moves forward, never backward, and normally only moves one item at a time. Methods like getName and getText can be invoked to retrieve information. A typical StAX program begins by using the XMLInputFactory class to load an implementation dependent instance of XMLStreamReader.InputStream in = new FileInputStream(new File("src/com/xmlparsers/jaxb/xsd/CIMMarketerProfile.xml"));XMLInputFactory factory = XMLInputFactory.newInstance();

XMLStreamReader staxParser = factory.createXMLStreamReader(in);Slide16

Comparing Java XML parsers – Parsing XML

while (staxParser.hasNext())

{

int event = staxParser.next();

if (event == XMLStreamConstants.END_DOCUMENT) { staxParser.close(); break;

}

if (event == XMLStreamConstants.

START_ELEMENT) { System.out.println(staxParser.getLocalName()); }

}The advantage of StAX parsing over SAX parsing is that a parse event may be skipped by invoking the next() method as shown in the following code.

For example, if the parse event is of type START_ELEMENT, a developer may determine if the event information is to be obtained or the next event is to be retrieved: if (event == XMLStreamConstants.START_ELEMENT) { System.out.println(staxParser.getLocalName()); }Slide17

Comparing Java XML parsers – Parsing XML

Writing with StAX

// XMLStreamWriter

will be obtained from an XMLOutputFactory

XMLOutputFactory outputFactory= XMLOutputFactory.newInstance();

XMLStreamWriter

XMLStreamWriter

= outputFactory.createXMLStreamWriter(System.out);// create a document start with the writeStartDocument() method

XMLStreamWriter.writeStartDocument("UTF-8","1.0");XMLStreamWriter.writeComment("Testing with StAX ");

// Output the start of the 'catalog' element using writeStartElement() method XMLStreamWriter.writeStartElement("createCustomerProfileResponse");XMLStreamWriter.writeNamespace("xsi","http://www.w3.org/2001/XMLSchema-instance");XMLStreamWriter.writeStartElement("messages");XMLStreamWriter.writeStartElement("resultCode");XMLStreamWriter.writeCharacters("Ok");XMLStreamWriter.writeEndElement();Slide18

Comparing Java XML parsers – Parsing XML

Writing with StAX …. contd

XMLStreamWriter.writeStartElement("message");

XMLStreamWriter.writeStartElement("code");

XMLStreamWriter.writeCharacters("I00001");XMLStreamWriter.writeEndElement();

XMLStreamWriter.writeStartElement("text");

XMLStreamWriter.writeCharacters("Successful");

XMLStreamWriter.writeEndElement();XMLStreamWriter.writeEndElement();

XMLStreamWriter.writeStartElement("customerProfileId");XMLStreamWriter.writeCharacters("1103042");XMLStreamWriter.writeEndElement();XMLStreamWriter.writeEndElement();

XMLStreamWriter.flush();XMLStreamWriter.close();Slide19

Comparing Java XML parsers – Parsing XML

XPATH

XPath

is a language for addressing parts of an XML document.

XPath

, XML Path Language, is an expression language for addressing portions of an XML document or navigating within an XML document.

XPath

is really helpful for parsing XML- based configuration or properties files

. XPath uses path expressions to select nodes or node-sets in an XML document.

These path expressions look very much like URL and traditional file system paths. XPath also supports several functions for string manipulation, comparison and others. XML documents are treated as trees of nodes and the root is called the document or root node.

There

are about seven different kinds of nodes.

They

are element, attribute, text, namespace, processing-instruction, comment, and root nodes. Slide20

Comparing Java XML parsers – Parsing XML

XPATH

Let us consider the following XML sample:

<

createCustomerProfileResponse xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <messages>

<

resultCode>Ok</resultCode>

<message> <code>I00001</code>

<text>Successful.</text> </message> </messages> <

customerProfileId>1103042</customerProfileId></createCustomerProfileResponse>The root node is < createCustomerProfileResponse>. <messages> and <customerProfileId> are the two Elements. The <resultCode>

node is a child of the

<

messages

>

element.

The resultCode value ‘Ok’

is a text node.Slide21

Comparing Java XML parsers – Parsing XML

XPATH – Path Expression syntax

Expression

Description

nodename

Selects all child nodes of the named node

/

Selects from root node

//Selects nodes from the current node that match the selection no matter where they are

.Selects the current node..Selects the parent of the current node@Selects attributes*Matches any element node@*Matches any attribute nodesnode()

Matches

any node of any kindSlide22

Comparing Java XML parsers – Parsing XML

XPATH – Reading XML

InputStream resultStream = new FileInputStream(new File("src/com/xmlparsers/jaxb/xsd/CIMMarketerProfile.xml"));

java.io.BufferedReader aReader = new java.io.BufferedReader(new java.io.InputStreamReader(resultStream, "UTF8"));

StringBuffer aResponse = new StringBuffer();

String aLine = aReader.readLine();

while(aLine != null) {

aResponse.append(aLine);aLine = aReader.readLine();}

resultStream.close();if (aResponse.length() > 0 && (int) aResponse.charAt(0) == 0xFEFF) {aResponse.deleteCharAt(0);

}Slide23

Comparing Java XML parsers – Parsing XML

XPATH – Reading XML

javax.xml.parsers.DocumentBuilder

docBuilder = javax.xml.parsers.DocumentBuilderFactory.

newInstance().newDocumentBuilder();java.io.StringReader stringReader = new java.io.StringReader(aResponse.toString());org.w3c.dom.Document doc = docBuilder.parse(new org.xml.sax.InputSource(stringReader));

javax.xml.xpath.XPath xpath = javax.xml.xpath.XPathFactory.

newInstance().newXPath();

String customerProfileId = xpath.evaluate("/*/customerProfileId/text()", doc);