XML - Character

> Markup Language (ML) > Extensible Markup Language (XML)

1 - About

character in XML.

Characters are denoted using the notation used in the Unicode Standard, that is, an optional U+ followed by their hexadecimal number, using at least 4 digits, such as “U+1234” or “U+10FFFD”. In XML or HTML this could be expressed as “ሴ” or “􏿽”.

Legal characters are:

XML processors accept any character in the range specified for Char. All XML processors accept the UTF-8 and UTF-16 encodings of Unicode

Advertising

3 - Type of character

3.1 - Reference

A character reference refers to a specific character in the ISO/IEC 10646 character set, for example one not directly accessible from available input devices.

3.1.1 - Syntax

'&#' [0-9]+ ';'

or

'&#x' [0-9a-fA-F]+ ';'

3.1.2 - Example

of character reference:

Type <key>less-than</key> (&#x3C;) to save options.

where 3C is the LESS-THAN SIGN Math Symbol

3.2 - Data

All text that is not markup or comment constitutes the character data of the document.

3.3 - Special

The ampersand character (&) and the left angle bracket (<) MUST NOT appear in their literal form, except when used as:

If they are needed elsewhere, they MUST be escaped using either:

The right angle bracket (>) may be represented using the string “ &gt; ”, and MUST, for compatibility, be escaped using either “ &gt; ” or a character reference when it appears in the string “ ]]> ” in content, when that string is not marking the end of a CDATA section.

Advertising

4 - Documentation / Reference

markup/xml/character.txt · Last modified: 2017/09/13 16:12 by gerardnico