Text - Character

1 - About

A character is an atomic unit of text as specified by ISO/IEC 10646:2000 [ISO/IEC 10646] and is categorized as a primitive data type

A character is the smallest component of written language that has semantic value; refers to the abstract meaning and/or shape …

Characters will not appear as intended unless you have the appropriate font (that contains the appropriate glyph)

Character are the basic unit of organization of encoded text.

A character is usually be represented as an Unicode code point where an int value from 0 to 65535 represents all Unicode code points, including supplementary code points.

3 - Example

A Character can also be simply a set of characters:

  • letters,
  • numbers,
  • symbols (mathematical),
  • ideograms,
  • logograms (from non-phonetic writing systems such as kanji),
  • etc…

For example, the following character set appears in several code pages:

  • 26 non-accented letters A through Z ( A,B,C….X,Y,Z)
  • 26 non-accented letters a through z ( a,b,c,…x,y,z)
  • digits 0 through 9
  • special characters: . , : ; ? ( ) ' “ / - _ & + % * = < >

4 - Type/Category

5 - Management

5.1 - Encoding, File Storage

5.2 - Show

Problem: Which character is

Steps:

  • The Character Set is UTF8. We got then hexadecimal in UTF8.
echo $LANG

The Hexadecimal in UTF8 of this character is e2 80 93. It corresponds to the unicode character 2013 - EN DASH. See Translation of a UTF-8 Multibyte sequence to Unicode - Example 2. 0a is the end of file.

echo| hexdump -C
00000000  e2 80 93 0a                                       |....|
00000004

5.3 - Diff

Diff between Characters with an hex tool such as `hexdump` on Unix that output hexadecimal digits

Problem:

  • Are this two characters the same ?
–
-

Steps:

  • The Character Set is UTF8. We got then hexadecimal in UTF8.
echo $LANG
en_US.UTF-8
echo| hexdump -C
00000000  e2 80 93 0a                                       |....|
00000004
  • The Hexadecimal in UTF8 of the first character is 2d. This is the unicode character 2d - Hyphen Minus
echo  - | hexdump -C
00000000  2d 0a                                             |-.|
00000002

5.4 - Storage

Each character requires:

6 - Java

Character.toChars(int)[0]

For example, Character.isLetter(0x2F81A) returns true because the code point value represents a letter (a CJK ideograph).

7 - Documentation / Reference

  • Bookmark "Text - Character" at del.icio.us
  • Bookmark "Text - Character" at Digg
  • Bookmark "Text - Character" at Ask
  • Bookmark "Text - Character" at Google
  • Bookmark "Text - Character" at StumbleUpon
  • Bookmark "Text - Character" at Technorati
  • Bookmark "Text - Character" at Live Bookmarks
  • Bookmark "Text - Character" at Yahoo! Myweb
  • Bookmark "Text - Character" at Facebook
  • Bookmark "Text - Character" at Yahoo! Bookmarks
  • Bookmark "Text - Character" at Twitter
  • Bookmark "Text - Character" at myAOL
text/character.txt · Last modified: 2017/04/19 20:56 by gerardnico