Javascript - String

> (World Wide) Web - (W3|WWW) > Javascript (Js|ECMAScript) > ECMAScript - Grammar Standard

1 - About

The Text - String in javascript.

A string in JavaScript is encoded with the ucs-2 16-bit character set. An element of a JavaScript string is therefore a 16-bit code unit.

String properties and methods (such as length, charAt, and charCodeAt) all work at the level of code units not at the Unicode code points. Unicode code points <math>2^16 and above are represented by two code units, known as a surrogate pair.</wrap>

Strings behave like sequences of UTF-16 code units (ie variable length).

Advertising

3 - Syntax (Declaration and Assignment)

  • String literal: A single quote ' or a double quote indicate string values.
var myVariable = 'Nico';
// is equivalent to
var myVariable = " Nico" ;
  • String Object
var myVariable = String(var);
var myString = `hello!
 world!`;

4 - Function

4.1 - Concat

"my string" + "my string"
"35" + "25"

4.2 - Replace

Replace uses generally a regular expression as first parameter.

The regexp have no quotes but two slashes / as enclosing character.

// Not 
var regexp = '/pattern/md'
// but
var regexp = /pattern/md

Example: Suppress comments lines

text = "// My beautiful comment\r\nmy beautiful code\r\n";
text = text.replace(/^[/]{2}[^\r\n]+[\r\n]+/mg, "");
console.log(text);

where:

  • The regexp pattern can be read as:
    • have no quotes but two slashes / as enclosing character.
    • ^ - the start of the pattern
    • [/]{2}: two backslashs
    • [^\r\n]: All characters excepts the end of line \r\n. The ^ character is a negation in a class.
    • + is a quantifier meaning one or more
    • mg are modifier that says respectively 'm': multiline matching, 'g': global match (find all matches rather than stopping after the first match)

Advertising

4.3 - Length

s = "length";
console.log(s.length);

The length property of Javascript (as all String function) returns the number of code unit not the number of character (ie code point).

function characterRepresentation(codePoint){
    this.codeUnitDecimal = codePoint;
    this.codeUnitHexadecimal = "0x"+codePoint.toString(16);
    this.character = String.fromCodePoint(codePoint);
    this.length = this.character.length;
}
var counterTotal = 0;
var charactersWithLengthOf2 = [];
for (var i=0;i<112064;i++) {
      var s = String.fromCodePoint(i);
      if (s.length > 1) {
          counterTotal++;
          if (i >= 66352 && i <= 66362 ) {
              charactersWithLengthOf2.push(new characterRepresentation(i));
          }
      }
}
console.log("There is "+counterTotal+" characters coded with more than 1 code unit (ie 2 digits).");
console.log("Example of surrogate pair (character above 2^16 with more than 1 code unit encoding):");
console.table(charactersWithLengthOf2);

4.4 - charCodeAt

charCodeAt return code units rather than code points.

s = "Hello Nico";
console.log("The string ("+s+") is made of character coded with one code units and has then a length of "+s.length);
console.log("First code unit of "+s+" is "+s.charCodeAt(0));
 
s= "𐌰";
console.log("The character "+s+" is encoded with two code units and has then a length of "+s.length);
console.log("First code unit of "+s+" is "+s.charCodeAt(0));
console.log("Second code unit of "+s+" is "+s.charCodeAt(1));

4.5 - charAt

var s = "length"
 
for (var i=0;i<s.length;++i) {
    console.log("Char at "+i+" is "+s.charAt(i));
}

4.6 - Split

var s = "1,2,3"
 
console.log("The second element is "+s.split(',')[1]);

Advertising

5 - Documentation / Reference