Saturday, February 5, 2011

XML, XSLT and XML Parser


                      (Amarnath can you give some example with full description?)


-Shailendra kumar shail @AVACorp.biz
What is XML?
XML is a text-based markup language that is fast becoming the standard for data interchange on the Web. As with HTML, you identify data using tags (identifiers enclosed in angle brackets, like this: <...>). Collectively, the tags are known as "markup".
But unlike HTML, XML tags tell you what the data means, rather than how to display it. Where an HTML tag says something like "display this data in bold font" (<b>...</b>), an XML tag acts like a field name in your program. It puts a label on a piece of data that identifies it (for example: <topic>...</topic>).

XML is an open, text-based markup language that provides structural and semantic information to data. This "data about data," or metadata, provides additional meaning and context to the application using the data and allows for a new level of management and manipulation of Web-based information. XML, a subset of the popular Standard Generalized Markup Language (SGML), has been optimized for the Web. This helps make XML a powerful, standards-based complement to HTML that could be as important to the future of information delivery on the Web as HTML was to its beginning. 

XML is intended to be used by content creators as well as by programmers. Since XML is text-based, it can be read and worked with easily in relatively nontechnical situations, but its ability to organize, describe, and structure data also makes it ideal for use in highly technical applications. XML thus provides common ground for creating structured data and making it available for manipulation and display. 

XML Schema
The W3C XML Schema Language (schemas for short, though it’s hardly the only schema language) addresses several limitations of DTDs. First schemas are written in XML instance document syntax, using tags, elements, and attributes. Secondly, schemas are fully namespace aware. Thirdly, schemas can assign data types like integer and date to elements, and validate documents not only based on the element structure but also on the contents of the elements. 
With XML schemas, you have more power to define what valid XML documents look like.

They have several advantages over DTDs:

XML schemas use XML syntax. In other words, an XML schema is an XML document. That means you can process a schema just like any other document. For example, you can write an XSLT style sheet that converts an XML schema into a Web form complete with automatically generated JavaScript code that validates the data as you enter it.

XML schemas support datatypes. While DTDs do support datatypes, it's clear those datatypes were developed from a publishing perspective. XML schemas support all of the original datatypes from DTDs (things like IDs and ID references).

They also support integers, floating point numbers, dates, times, strings, URLs, and other datatypes useful for data processing and validation.

XML schemas are extensible. In addition to the datatypes defined in the XML schema specification, you can also create your own, and you can derive new datatypes based on other datatypes.
XML Parsers
What is a parser? It's a piece of software whose job description is to check that an XML document is valid or, failing that, well-formed. Not very interesting to the average user, perhaps, but a vital part of the picture if validity and well-formedness are to have any real meaning in the XML world. 
Any software package that is XML-aware will have a parser built into it in the form of an XML processor. As a minimum, the XML processor checks the XML documents you are about to work on, and checks them again when you have finished. Ideally, it is interactive; any errors you introduce are reported and can be sorted out while you work.
You also can get standalone XML parsers, which are important if you take the plain text editor route for your XML authoring. When you have finished working on an XML document in an uncontrolled environment (one that doesn't know about XML tagging conventions), you should always run an XML parser on it to check that it is still valid.
Kinds of parsers
There are several different ways to categorize parsers:
• Validating versus non-validating parsers
• Parsers that support the Document Object Model (DOM)
• Parsers that support the Simple API for XML (SAX)
• Parsers written in a particular language (Java, C++, Perl, etc.)