Java XML - DOM Jaxp

About

The DOM API of the JSE (ie Jaxp) in Java to process an XML file.

To see other DOM implementation, see Java XML - DOM

Articles Related

Package

org.w3c.dom: Defines the Document class (a DOM) as well as classes for all the components of a DOM.
javax.xml.transform.dom This package implements DOM-specific transformation APIs.
Parser

Entry point

DocumentBuilderFactory gives a DocumentBuilder that creates a DOM-compliant Document object

See: https://docs.oracle.com/javase/tutorial/jaxp/intro/dom.html

How to

The process of navigating to a node involves processing sub-elements, ignoring the ones you are not interested in and inspecting the ones you are, until you find the node you are interested in.

Generally, the vast majority of nodes in a DOM tree will be Element and Text nodes.

Obtaining Node information

The DOM node element type information is obtained by calling the various methods of the The inter-wiki (j2se) does not exist and there is no default inter-wiki defined.org.w3c.dom.Node class.

Node n;
String val;
val = n.getNodeName();
val = n.getNamespaceURI();
val = n.getPrefix();
val = n.getLocalName();
val = n.getNodeValue();
if (val != null) {
            out.print(" nodeValue=");
            if (val.trim().equals("")) {
                // Whitespace
                out.print("[WS]");
            } else {
                out.print("\"" + n.getNodeValue() + "\"");
            }
        }

Every DOM node has at least a type, a name, and a value, which might or might not be empty.

Lexical Informations control

Lexical information is the information you need to reconstruct the original syntax of an XML document. Preserving lexical information is important in editing applications, where you want to save a document that is an accurate reflection of the original-complete.

The following lexical markup may or not included in the outset:

The following The inter-wiki (j2se) does not exist and there is no default inter-wiki defined.DocumentBuilderFactory methods give you control over this lexical nodes and over whitespace. The default behaviour is to preserve lexical information.

API	Preserve Lexical Info	Focus on Content	Description
setCoalescing()	False	True	To convert CDATA nodes to Text nodes and append to an adjacent Text node (if any).
setExpandEntityReferences()	False	True	To expand entity reference nodes.
setIgnoringComments()	False	True	To ignore comments.
setIgnoringElementContentWhitespace()	False	True	To ignore whitespace that is not a significant part of element content.

Reading XML Data into a DOM

Node attributes are not included as children in the DOM hierarchy. They are instead obtained via the Node interface's getAttributes method.

The DocType interface is an extension of The inter-wiki (j2se) does not exist and there is no default inter-wiki defined.w3c.org.dom.Node. It defines the getEntities method, which you use to obtain Entity nodes - the nodes that define entities. Like Attribute nodes, Entity nodes do not appear as children of DOM nodes.

Creating Nodes

You can create different types nodes using the methods of the Document interface.

For example:

createElement,
createComment,
createCDATAsection,
createTextNode, and so on.

The full list of methods for creating different nodes is provided in the API documentation for The inter-wiki (j2se) does not exist and there is no default inter-wiki defined.org.w3c.dom.Document.

Traversing Nodes

The The inter-wiki (j2se) does not exist and there is no default inter-wiki defined.org.w3c.dom.Node interface defines a number of methods you can use to traverse nodes, including:

getFirstChild,
getLastChild,
getNextSibling,
getPreviousSibling,
and getParentNode.

Those operations are sufficient to get from anywhere in the tree to any other location in the tree.

Searching for Nodes

Although it is tempting to get the first child and inspect it to see whether it is the right one, the search must account for the fact that the first child in the sub-list could be a comment or a processing instruction. If the XML data has not been validated, it could even be a text node containing ignorable whitespace.

In essence, you need to look through the list of child nodes, ignoring the ones that are of no concern and examining the ones you care about. Here is an example of the kind of routine you need to write when searching for nodes in a DOM hierarchy.

/**
 * Find the named subnode in a node's sublist.
 * <li>Ignores comments and processing instructions.
 * <li>Ignores TEXT nodes (likely to exist and contain
 *         ignorable whitespace, if not validating.
 * <li>Ignores CDATA nodes and EntityRef nodes.
 * <li>Examines element nodes to find one with
 *        the specified name.
 * </ul>
 * @param name  the tag name for the element to find
 * @param node  the element node to start searching from
 * @return the Node found
 */
public Node findSubNode(String name, Node node) {
    if (node.getNodeType() != Node.ELEMENT_NODE) {
        System.err.println(
                "Error: Search node not of element type");
        System.exit(22);
    }

    if (! node.hasChildNodes()) return null;

    NodeList list = node.getChildNodes();
    for (int i=0; i < list.getLength(); i++) {
        Node subnode = list.item(i);
        if (subnode.getNodeType() == Node.ELEMENT_NODE) {
            if (subnode.getNodeName().equals(name)) return subnode;
        }
    }
    return null;
}

Obtaining Node Content

When you want to get the text that a node contains, you again need to look through the list of child nodes, ignoring entries that are of no concern and accumulating the text you find in:

TEXT nodes,
CDATA nodes,
and EntityRef nodes.

/**
  * Return the text that a node contains. This routine:<ul>
  * <li>Ignores comments and processing instructions.
  * <li>Concatenates TEXT nodes, CDATA nodes, and the results of
  *     recursively processing EntityRef nodes.
  * <li>Ignores any element nodes in the sublist.
  *     (Other possible options are to recurse into element 
  *      sublists or throw an exception.)
  * </ul>
  * @param    node  a  DOM node
  * @return   a String representing its contents
  */
public String getText(Node node) {
    StringBuffer result = new StringBuffer();
    if (! node.hasChildNodes()) return "";

    NodeList list = node.getChildNodes();
    for (int i=0; i < list.getLength(); i++) {
        Node subnode = list.item(i);
        if (subnode.getNodeType() == Node.TEXT_NODE) {
            result.append(subnode.getNodeValue());
        }
        else if (subnode.getNodeType() ==
                Node.CDATA_SECTION_NODE) 
        {
            result.append(subnode.getNodeValue());
        }
        else if (subnode.getNodeType() ==
                Node.ENTITY_REFERENCE_NODE) 
        {
            // Recurse into the subtree for text
            // (and ignore comments)
            result.append(getText(subnode));
        }
    }
    return result.toString();
}

Creating Attributes

The The inter-wiki (j2se) does not exist and there is no default inter-wiki defined.org.w3c.dom.Element interface, which extends Node, defines a setAttribute operation, which adds an attribute to that node. (A better name from the Java platform standpoint would have been addAttribute. The attribute is not a property of the class, and a new object is created.) You can also use the Document's createAttribute operation to create an instance of Attribute and then use the setAttributeNode method to add it.

Removing and Changing Nodes

To remove a node, you use its parent Node's removeChild method. To change it, you can use either the parent node's replaceChild operation or the node's setNodeValue operation. Inserting Nodes

The important thing to remember when creating new nodes is that when you create an element node, the only data you specify is a name. In effect, that node gives you a hook to hang things on. You hang an item on the hook by adding to its list of child nodes. For example, you might add:

a text node,
a CDATA node,
or an attribute node.

Documentation / Reference

Java API for XML Processing (JAXP) Tutorial