Xml - Infoset

> Markup Language (ML) > Extensible Markup Language (XML)

1 - About

The XML Infoset:

  • is a tree-based hierarchical representation of an XML document.
  • is the abstract data and metadata (ie abstract means independently of this representation, independent of the actual technical implementation)
  • represents the significant informations of an XML document

Just because an XML document is an infoset does not mean it conforms to an XSD and is a valid XML document.

3 - Types of information items

An XML document's information set consists of a number of information items.

The information set for any well-formed XML document will contain at least a document information item and several others.

An information set can contain up to eleven different types of information items:

  • The Document Information Item (always present)
  • Element Information Items
  • Attribute Information Items
  • Processing Instruction Information Items
  • Unexpanded Entity Reference Information Items
  • Character Information Items
  • Comment Information Items
  • The Document Type Declaration Information Item
  • Unparsed Entity Information Items
  • Notation Information Items
  • Namespace Information Items

There are information items representing:

Advertising

4 - Representation

XML is just one way of representing that data.

The infoset may exist:

  • in memory as a DOM tree
  • in a XML encoded in UTF-8 or in UTF-16.

For example, the infoset does not distinguish between the two forms of empty element.

The following are considered equivalent according to the XML Infoset.

<test></test>
<test/>

5 - Augmentation

Infoset augmentation or infoset modification refers to the process of modifying the infoset during schema validation, for example by adding default attributes.

6 - Documentation / Reference

Advertising