Oracle Database - Bytes or Characters for VARCHAR2 and CHAR

> Database > Oracle Database

1 - About

Historically database columns which hold alphanumeric data have been defined using the number of bytes they store. This approach was fine as the number of bytes equated to the number of characters when using single-byte character sets.

With the increasing use of multi-byte character sets to support globalized databases comes the problem of bytes no longer equating to characters.

The VARCHAR2 and CHAR types support two methods of specifying lengths:

  • In bytes: VARCHAR2(10 byte). This will support up to 10 bytes of data, which could be as few as two characters in a multi-byte character sets.
  • In characters: VARCHAR2(10 char). This will support to up 10 characters of data, which could be as much as 40 bytes of information.

3 - Advise

When using a multibyte character set such as UTF8, since your intention is likely to define a column that can in fact store characters of data, you may use :

  • the CHAR modifier in the VARCHAR2/CHAR definition. That is, use of VARCHAR2(80 CHAR), not VARCHAR2(80).
  • the session or system parameter NLS_LENGTH_SEMANTICS to change the default behavior from BYTE to CHAR.

It's not recommended to change this setting at the system level scope but rather an the SESSION scope.

ALTER system SET nls_length_semantics=CHAR scope=both;
ALTER system SET nls_length_semantics=byte scope=both;
 
ALTER session SET nls_length_semantics=CHAR;
ALTER session SET nls_length_semantics=byte;
Advertising

4 - Issue

Two issues, people encounter frequently, are that :

  • when using multibyte character sets, namely that a VARCHAR2(N) doesn't necessarily hold N characters, but rather N bytes.
  • the maximum length in bytes of a VARCHAR2 is 4,000, and in a CHAR it is 2,000:

4.1 - VARCHAR2(N) - byte of characters

The following small example demonstrates the differences between BYTE and CHAR and how the upper bounds come into play. We'll create a table with three columns, the first two of which will be 1 byte and one character, respectively, with the last column being 4,000 characters. Notice that we're performing this test on a multibyte character set database using the character set AL32UTF8, which supports the latest version of the Unicode standard and encodes characters in a variable length fashion using from 1 to 4 bytes for each character:

ops$tkyte@O10GUTF> SELECT *
  2    FROM nls_database_parameters
  3   WHERE parameter = 'NLS_CHARACTERSET';
 
PARAMETER                      VALUE
------------------------------ --------------------
NLS_CHARACTERSET               AL32UTF8
 
ops$tkyte@O10GUTF> CREATE TABLE t
2  ( a VARCHAR2(1),
  3    b VARCHAR2(1 CHAR),
  4    c VARCHAR2(4000 CHAR)
  5  )
  6  /
TABLE created.

Now, if we try to insert into our table a single character that is 2 bytes long in UTF, we observe the following:

ops$tkyte@O10GUTF> INSERT INTO t (a) VALUES (UNISTR('\00d6'));
INSERT INTO t (a) VALUES (UNISTR('\00d6'))
                          *
ERROR AT line 1:
ORA-12899: VALUE too large FOR column "OPS$TKYTE"."T"."A" 
(actual: 2, maximum: 1)

This example demonstrates two things:

  • VARCHAR2(1) is in bytes, not characters. We have single Unicode character, but it won't fit into a single byte.
  • As you migrate an application from a single-byte fixed-width character set to a multibyte character set, you might find that the text that used to fit into your fields no longer does.

The reason for the second point is that a 20-character string in a single-byte character set is 20 bytes long and will absolutely fit in a VARCHAR2(20). However a 20-character field could be as long as 80 bytes in a multibyte character set, and 20 Unicode characters may well not fit in 20 bytes. You might consider modifying your DDL to be VARCHAR2(20 CHAR) or using the NLS_LENGTH_SEMANTICS session parameter mentioned previously when running your DDL to create your tables.

If we insert that single character into a field set up to hold a single character, we will observe the following:

ops$tkyte@O10GUTF> INSERT INTO t (b) VALUES (UNISTR('\00d6'));
1 ROW created.
ops$tkyte@O10GUTF> SELECT LENGTH(b), lengthb(b), DUMP(b) DUMP FROM t;
LENGTH(B) LENGTHB(B) DUMP
---------- ---------- --------------------
         1          2 Typ=1 Len=2: 195,150

That INSERT succeeded, and we can see that the LENGTH of the inserted data is one character. All of the character string functions work 'character-wise'. So the length of the field is one character, but the LENGTHB (length in bytes) function shows it takes 2 bytes of storage, and the DUMP function shows us exactly what those bytes are.

So, that example demonstrates one very common issue people encounter when using multibyte character sets, namely that a VARCHAR2(N) doesn't necessarily hold N characters, but rather N bytes.

Advertising

4.2 - maximum length

The next issue people confront frequently is that the maximum length in bytes of a VARCHAR2 is 4,000, and in a CHAR it is 2,000:

ops$tkyte@O10GUTF> DECLARE
  2          l_data VARCHAR2(4000 CHAR);
  3          l_ch   VARCHAR2(1 CHAR) := UNISTR( '\00d6' );
  4  BEGIN
  5          l_data := RPAD( l_ch, 4000, l_ch );
  6          INSERT INTO t ( c ) © VALUES ( l_data );
  7  END;
  8  /
DECLARE
*
ERROR AT line 1:
ORA-01461: can bind a LONG VALUE only FOR INSERT INTO a LONG column
ORA-06512: AT line 6

That shows that a 4,000-character string that is really 8,000 bytes long cannot be stored permanently in a VARCHAR2(4000 CHAR) field. It fits in the PL/SQL variable because in PL/SQL a VARCHAR2 is allowed to be up to 32KB in size. However, when it is stored in a table, the hard limit is 4,000 bytes. We can store 2,000 of these characters successfully:

ops$tkyte@O10GUTF> DECLARE
  2          l_data VARCHAR2(4000 CHAR);
  3          l_ch   VARCHAR2(1 CHAR) := UNISTR( '\00d6' );
  4  BEGIN
  5          l_data := RPAD( l_ch, 2000, l_ch );
  6          INSERT INTO t ( c ) VALUES ( l_data );
  7  END;
  8  /
 
PL/SQL PROCEDURE successfully completed.
ops$tkyte@O10GUTF> SELECT LENGTH( c ), lengthb( c )
  2    FROM t
  3   WHERE c IS NOT NULL;
 
LENGTH(C)  LENGTHB(C)
---------- ----------
2000       4000

And as you can see, they consume 4,000 bytes of storage.

5 - Functions Impacted

The INSTR, LENGTH and SUBSTR functions always deal with characters, regardless of column definitions and the character sets. For times when you specifically need to deal in bytes Oracle provides the INSTRB, LENGTHB and SUBSTRB functions.

Example The following examples use the LENGTH function using single- and multibyte database character set.

SELECT LENGTH('CANDIDE') "Length in characters"
   FROM DUAL;
 
LENGTH IN characters
--------------------
7

This example assumes a double-byte database character set.

SELECT LENGTHB ('CANDIDE') "Length in bytes"
   FROM DUAL;
 
LENGTH IN bytes
---------------
14

6 - Oracle Error

This kind of problem may lead to this errors :

  • ORA-06502: PL/SQL: numeric or value error: character string buffer too small
  • ORA-12899: value too large for column “OPS$TKYTE”.“T”.“A”
Advertising

7 - Reference