Xml - Xpath

Card Puncher Data Processing

About

XPath is a pattern expression used to select a set of XML nodes.

You can select:

  • from node name
  • from attributes
  • from text

In the XPath specification, the document model defines seven kinds of nodes. See XSLT/XPath - (Data|Document) Model

Pipeline language

This is a pipeline syntax that should read from left to right with the command being separated by a / or //

For instance:

//h1[1]/following-sibling::*

should be read:

  • start from the root node (always)
  • then //h1[1] - select all descendant nodes (ie //) the first [1], h1 element
  • then from the previous selected node (known also the context node) (ie the h1 node):
    • select the next sibling following-sibling
    • no matter its tag name (*)

Example

Get all node by tag name

If the document is an xhtml document, we can get all anchor node with:

//a

where:'

  • // means go down the whole tree (the whole document)
  • a is the tag name

The same as full qualified expression working for all namespace would be:

//*[local-name()='whateever']

where:

  • // means the whole tree (recursive)
  • * means select the node
  • local-name()='tagName' means select the node with the tag name equal to tagName

Select by attribute

The below expression will select:

  • from all descendant elements of the root with the id value myid
  • the first anchor as descendant
//*[@id="myid"]//a[1]

Select by namespace

//*[namespace-uri()='http://mynamespaceUri']

where:

  • // means the whole tree (recursive)
  • * means select any node
  • namespace-uri()='http://mynamespaceUri' means select the node with the namespace URI name equal to http://mynamespaceUri

More

https://www.w3.org/TR/1999/REC-xpath-19991116/#path-abbrev

Syntax

There is two syntax:

  • an abbreviated one 1)
  • and a fully qualified

Example: select all para in the tree

  • abbreviated syntax: //para
  • is a short for /descendant-or-self::node()/child::para

Basic XPath Addressing

Node Navigation and Content

An XML document is a tree-structured (hierarchical) collection of nodes. As with a hierarchical directory structure, it is useful to specify a path that points to a particular node in the hierarchy (hence the name of the specification: XPath).

In fact, much of the notation of directory paths is carried over intact:

Character Designation Signification Tip
/ The forward slash Path separator An absolute path from the root of the document starts with a /.
A relative path from a given location starts with anything else.
.. A double period The parent of the current node And its content for the functions
. A single period The current node And its content for the functions

For example, in an Extensible HTML (XHTML) document, the path /h1/h2/ would indicate an h2 element under an h1. (Recall that in XML, element names are case-sensitive, so this kind of specification works much better in XHTML than it would in plain HTML, because HTML is case-insensitive).

A name specified in an XPath expression refers to an element. For example, h1 in /h1/h2 refers to an h1 element.

In a pattern-matching specification such as XPath, the specification /h1/h2 selects all h2 elements that lie under an h1 element.

Attribute

See XPATH - (Node) Attribute (@)

Basic XPath Expressions

The full range of XPath expressions takes advantage of the wild cards, operators, and functions that XPath defines.

Square-bracket

Indexing

The square-bracket notation ([]) is normally associated with indexing.

To select a specific h2 element, you use square brackets [] for indexing. The path /h1[4]/h2[5] would therefore select the fifth h2 element under the fourth h1 element.

The function position() gives you the element index. Then /h1[4] is the same that /h1[position()=4]

Boolean

The expression @type=“unordered” specifies an attribute named type whose value is unordered. An expression such as LIST/@type specifies the type attribute of a LIST element.

The expression LIST[@type=“unordered”] selects all LIST elements whose type value is unordered.

Extended

Examples that use the extended square-bracket notation:

  • /PROJECT[.=“MyProject”]: Selects a PROJECT named “MyProject”.
  • /PROJECT[STATUS]: Selects all projects that have a STATUS child element.
  • /PROJECT[STATUS=“Critical”]: Selects all projects that have a STATUS child element with the string-value Critical.

Combining Index Addresses

The XPath specification defines quite a few addressing mechanisms, and they can be combined in many different ways in order to get interesting combinations:

  • LIST[@type=“ordered”][3]: Selects all LIST elements of with the type attribute and a value of ordered, and returns the third.
  • LIST[3][@type=“ordered”]: Selects the third LIST element, but only if it is of the type ordered.

Many more combinations of address operators are listed in section 2.5 of the XPath specification. This is arguably the most useful section of the specification for defining an XSLT transform.

Wild Cards

By definition, an unqualified XPath expression selects a set of XML nodes that matches that specified pattern.

For example, /HEAD matches all top-level HEAD entries, whereas /HEAD[1] matches only the first.

Wild card Meaning
* Matches any element node
node() Matches any node of any kind: element node, text node, attribute node, processing instruction node, namespace node, or comment node.
text() selects all text node
@* Matches any attribute node.

In the project database example, /*/PERSON[.=“Fred”] matches any PROJECT or ACTIVITY element that names Fred.

Tree

First child

The first child of the main element is done via indexing.

//main/*[1]

Sibling

sibling selection:

  • preceding sibling of the first h2 node
//h2[1]/preceding-sibling::*
  • next/following sibling of the second h2 node
//h2[2]/following-sibling::*

Descendant

div1//para 

is short for

child::div1/descendant-or-self::node()/child::para 

Extended-Path Addressing

double forward slash (tree traversal)

So far, all the patterns you have seen have specified an exact number of levels in the hierarchy.

For example, /HEAD specifies any HEAD element at the first level in the hierarchy, whereas /*/* specifies any element at the second level in the hierarchy.

To specify an indeterminate level in the hierarchy, use a double forward slash (//).

For example, the XPath expression //PARA selects all paragraph elements in a document, wherever they may be found.

The // pattern can also be used within a path. So the expression

/HEAD/LIST//PARA

indicates all paragraph elements in a subtree that begins from /HEAD/LIST.

// is short for /descendant-or-self::node()/.

For example, //para is short for /descendant-or-self::node()/child::para and so will select any para element in the document.

Operator

XPath expressions yield either a set of nodes, a string, a Boolean (a true/false value), or a number.

Operator Meaning
| Alternative. For example, PARA|LIST selects all PARA and LIST elements.
or, and Returns the or/and of two Boolean values.
=, != Equal or not equal, for Booleans, strings, and numbers.
<, >, ⇐, >= Less than, greater than, less than or equal to, greater than or equal to, for numbers.
+, -, *, div, mod Add, subtract, multiply, floating-point divide, and modulus (remainder) operations (e.g., 6 mod 4 = 2).

Expressions can be grouped in parentheses, so you do not have to worry about operator precedence.

Note - Operator precedence is a term that answers the question, “If you specify a + b * c, does that mean (a+b) * c or a + (b*c)?” (The operator precedence is roughly the same as that shown in the table).

String-Value of an Element

The string-value of an element is the concatenation of all descendent text nodes, no matter how deep. Consider this mixed-content XML data:

<PARA>This paragraph contains a <b>bold</b> word</PARA>

The string-value of the

element is “This paragraph contains a bold word”. In particular, note that is a child of and that the text bold is a child of .

The point is that all the text in all children of a node joins in the concatenation to form the string-value.

normalized

Also, it is worth understanding that the text in the abstract data model defined by XPath is fully normalized. So whether the XML structure contains the entity reference &lt; or < in a CDATA section, the element's string-value will contain the < character. Therefore, when generating HTML or XML with an XSLT stylesheet, you must convert occurrences of < to &lt; or enclose them in a CDATA section. Similarly, occurrences of & must be converted to &amp;.

functions

2)

contains

If you manipulate a XHTML document, if you want to find all element where the class attribute contains 3)the value value

//*[contains(@class,'value')]
Specification

The XPath specification 4)is the foundation for a variety of specifications:

  • including XSLT. Xpath is used to query nodes from the source document and apply styling templates to them to create a result document.
  • and linking/addressing specifications such as XPointer.
Documentation / Reference





Discover More
Card Puncher Data Processing
(XSLT|XPATH|XQUERY) - Function

Xpath Functions return a string, a number, or a Boolean value. For example, the expression /PROJECT/text() gets the string-value of PROJECT nodes. Many functions depend on the current context. In the...
Card Puncher Data Processing
BIP - Field (Data Element)

BI Publisher uses XPATH to access data elements With a concat function:
Card Puncher Data Processing
Datacadamia - Data all the things

Computer science from a data perspective
Card Puncher Data Processing
Extensible Markup Language (XML)

is a flexible way to create common information formats and to share the formats and data between applications and on the internet. XML is, essentially, a platform-independent means of structuring informationelementschemjsoXML...
JSONPath

JsonPath is used to query and get values from JSON (ie or Javascript Object document) It's the same concept than XPath for XML It's a selector language for json. Language Equivalent Expression XPath...
Java Conceptuel Diagram
Java - Xml

in Java. Two Xml Document building (programming models): as a stream of events an object representation (dom - document object model) Serialization (ie Marshalling/UnMarshalling) JAXB...
Java Conceptuel Diagram
Java XML - XPATH

Xpath in Java
Card Puncher Data Processing
XSLT - Select

The select clause lets you use XPath expressions,
Card Puncher Data Processing
XSLT - Templates

an some other tag, or whitespace. HTML outputs the HTML start tags, processes any templates that apply to children of the root, and then outputs the HTML end tags.
Card Puncher Data Processing
XSLT/XPATH - Special Character

Curly braces cause the text inside the quotes to be processed as an XPath expression instead of being interpreted as a literal string. Curly braces are recognized anywhere that an attribute value template...



Share this page:
Follow us:
Task Runner