CppStd : lex

identifier identifier-nondigit

1

There are five kinds of tokens: identifiers, keywords, literals,^{[cpp20 8]} operators, and other separators. Blanks, horizontal and vertical tabs, newlines, formfeeds, and comments (collectively, “whitespace”), as described below, are ignored except as they serve to separate tokens.

NoteSome whitespace is required to separate otherwise adjacent identifiers, keywords, numeric literals, and alternative tokens containing alphabetic characters.

5.7 Comments [lex.comment]

1

The characters /* start a comment, which terminates with the characters */. These comments do not nest. The characters // start a comment, which terminates immediately before the next new-line character. If there is a form-feed or a vertical-tab character in such a comment, only whitespace characters shall appear between it and the new-line that terminates the comment; no diagnostic is required.

NoteThe comment characters //, /*, and */ have no special meaning within a // comment and are treated just like other characters. Similarly, the comment characters // and /* have no special meaning within a /* comment.

5.8 Header names [lex.header]

	header-name: < h-char-sequence > " q-char-sequence " h-char-sequence: h-char h-char-sequence h-char h-char: any member of the source character set except new-line and > q-char-sequence: q-char q-char-sequence q-char q-char: any member of the source character set except new-line and "
1	NoteHeader name preprocessing tokens only appear within a `#include` preprocessing directive, a `__has_include` preprocessing expression, or after certain occurrences of an `import` token (see [lex.pptoken]). The sequences in both forms of header-names are mapped in an implementation-defined manner to headers or to external source file names as specified in [cpp.include].
2	The appearance of either of the characters `'` or `\` or of either of the character sequences `/*` or `//` in a q-char-sequence or an h-char-sequence is conditionally-supported with implementation-defined semantics, as is the appearance of the character `"` in an h-char-sequence .^{[cpp20 9]}

5.9 Preprocessing numbers [lex.ppnumber]

	pp-number: digit . digit pp-number digit pp-number identifier-nondigit pp-number ' digit pp-number ' nondigit pp-number e sign pp-number E sign pp-number p sign pp-number P sign pp-number .
1	Preprocessing number tokens lexically include all integer-literal tokens ([lex.icon]) and all floating-point-literal tokens ([lex.fcon]).
2	A preprocessing number does not have a type or a value; it acquires both after a successful conversion to an integer-literal token or a floating-point-literal token.

5.10 Identifiers [lex.name]

identifier:

identifier-nondigit

identifier digit

identifier-nondigit:

nondigit

preprocessing-op-or-punc:

nondigit: one of

a b c d e f g h i j k l m

n o p q r s t u v w x y z

A B C D E F G H I J K L M

N O P Q R S T U V W X Y Z _

digit: one of

0 1 2 3 4 5 6 7 8 9

1

An identifier is an arbitrarily long sequence of letters and digits. Each universal-character-name in an identifier shall designate a character whose encoding in ISO/IEC 10646 falls into one of the ranges specified in Table 2 . The initial element shall not be a universal-character-name designating a character whose encoding falls into one of the ranges specified in Table 3 . Upper- and lower-case letters are different. All characters are significant.^{[cpp20 10]}

Table 2: Ranges of characters allowed [tab:lex.name.allowed]
`00A8`	`00AA`	`00AD`	`00AF`	`00B2-00B5`
`00B7-00BA`	`00BC-00BE`	`00C0-00D6`	`00D8-00F6`	`00F8-00FF`
`0100-167F`	`1681-180D`	`180F-1FFF`
`200B-200D`	`202A-202E`	`203F-2040`	`2054`	`2060-206F`
`2070-218F`	`2460-24FF`	`2776-2793`	`2C00-2DFF`	`2E80-2FFF`
`3004-3007`	`3021-302F`	`3031-D7FF`
`F900-FD3D`	`FD40-FDCF`	`FDF0-FE44`	`FE47-FFFD`
`10000-1FFFD`	`20000-2FFFD`	`30000-3FFFD`	`40000-4FFFD`	`50000-5FFFD`
`60000-6FFFD`	`70000-7FFFD`	`80000-8FFFD`	`90000-9FFFD`	`A0000-AFFFD`
`B0000-BFFFD`	`C0000-CFFFD`	`D0000-DFFFD`	`E0000-EFFFD`

Table 3: Ranges of characters disallowed initially (combining characters) [tab:lex.name.disallowed]
`0300-036F`	`1DC0-1DFF`	`20D0-20FF`	`FE20-FE2F`

2

The identifiers in Table 4 have a special meaning when appearing in a certain context. When referred to in the grammar, these identifiers are used explicitly rather than using the identifier grammar production. Unless otherwise specified, any ambiguity as to whether a given identifier has a special meaning is resolved to interpret the token as a regular identifier .

Table 4: Identifiers with special meaning [tab:lex.name.special]
`final`	`import`	`module`	`override`

3 In addition, some identifiers are reserved for use by C++ implementations and shall not be used otherwise; no diagnostic is required.

(3.1)

Each identifier that contains a double underscore __ or begins with an underscore followed by an uppercase letter is reserved to the implementation for any use.

(3.2)

Each identifier that begins with an underscore is reserved to the implementation for use as a name in the global namespace.

5.11 Keywords [lex.key]

keyword:

any identifier listed in Table 5

import-keyword

module-keyword

export-keyword

1

The identifiers shown in Table 5 are reserved for use as keywords (that is, they are unconditionally treated as keywords in phase 7) except in an attribute-token ([dcl.attr.grammar]).

NoteThe register keyword is unused but is reserved for future use.

Table 5: Keywords [tab:lex.key]
`alignas`	`constinit`	`false`	`public`	`true`
`alignof`	`const_cast`	`float`	`register`	`try`
`asm`	`continue`	`for`	`reinterpret_cast`	`typedef`
`auto`	`co_await`	`friend`	`requires`	`typeid`
`bool`	`co_return`	`goto`	`return`	`typename`
`break`	`co_yield`	`if`	`short`	`union`
`case`	`decltype`	`inline`	`signed`	`unsigned`
`catch`	`default`	`int`	`sizeof`	`using`
`char`	`delete`	`long`	`static`	`virtual`
`char8_t`	`do`	`mutable`	`static_assert`	`void`
`char16_t`	`double`	`namespace`	`static_cast`	`volatile`
`char32_t`	`dynamic_cast`	`new`	`struct`	`wchar_t`
`class`	`else`	`noexcept`	`switch`	`while`
`concept`	`enum`	`nullptr`	`template`
`const`	`explicit`	`operator`	`this`
`consteval`	`export`	`private`	`thread_local`
`constexpr`	`extern`	`protected`	`throw`

2

Furthermore, the alternative representations shown in Table 6 for certain operators and punctuators ([lex.digraph]) are reserved and shall not be used otherwise.

Table 6: Alternative representations [tab:lex.key.digraph]
`and`	`and_eq`	`bitand`	`bitor`	`compl`	`not`
`not_eq`	`or`	`or_eq`	`xor`	`xor_eq`

5.12 Operators and punctuators [lex.operators]

1

The lexical representation of C++ programs includes a number of preprocessing tokens that are used in the syntax of the preprocessor or are converted into tokens for operators and punctuators:

preprocessing-operator

hexadecimal-prefix hexadecimal-digit-sequence

preprocessing-operator: one of

# ## %: %:%:

operator-or-punctuator: one of

{ } [ ] ( )

<: :> <% %> ; : ...

? :: . .* -> ->* ~

! + - * / % ^ & |

= += -= *= /= %= ^= &= |=

== != < > <= >= <=> && ||

<< >> <<= >>= ++ -- ,

and or xor not bitand bitor compl

and_eq or_eq xor_eq not_eq

Each operator-or-punctuator is converted to a single token in translation phase 7 .

5.13.1 Kinds of literals [lex.literal.kinds]

1	There are several kinds of literals.^{[cpp20 11]} literal: integer-literal character-literal floating-point-literal string-literal boolean-literal pointer-literal user-defined-literal

5.13.2 Integer literals [lex.icon]

integer-literal:

binary-literal integer-suffix_opt

octal-literal integer-suffix_opt

decimal-literal integer-suffix_opt

hexadecimal-literal integer-suffix_opt

binary-literal:

0b binary-digit

0B binary-digit

binary-literal '_opt binary-digit

octal-literal:

0

octal-literal '_opt octal-digit

decimal-literal:

nonzero-digit

decimal-literal '_opt digit

hexadecimal-literal:

binary-digit: one of

0 1

octal-digit: one of

0 1 2 3 4 5 6 7

nonzero-digit: one of

1 2 3 4 5 6 7 8 9

hexadecimal-prefix: one of

0x 0X

hexadecimal-digit-sequence:

hexadecimal-digit

hexadecimal-digit-sequence '_opt hexadecimal-digit

hexadecimal-digit: one of

0 1 2 3 4 5 6 7 8 9

a b c d e f

A B C D E F

integer-suffix:

unsigned-suffix long-suffix_opt

unsigned-suffix long-long-suffix_opt

long-suffix unsigned-suffix_opt

long-long-suffix unsigned-suffix_opt

unsigned-suffix: one of

u U

long-suffix: one of

l L

long-long-suffix: one of

ll LL

1

In an integer-literal, the sequence of binary-digits, octal-digits, digits, or hexadecimal-digits is interpreted as a base N integer as shown in table Table 7; the lexically first digit of the sequence of digits is the most significant.

NoteThe prefix and any optional separating single quotes are ignored when determining the value.

Table 7: Base of integer-literals [tab:lex.icon.base]
Kind of integer-literal	base N
binary-literal	2
octal-literal	8
decimal-literal	10
hexadecimal-literal	16

2

The hexadecimal-digits

a through f and A through F have decimal values ten through fifteen.

ExampleThe number twelve can be written 12, 014, 0XC, or 0b1100. The integer-literals 1048576, 1'048'576, 0X100000, 0x10'0000, and 0'004'000'000 all have the same value.

3

The type of an integer-literal is the first type in the list in Table 8 corresponding to its optional integer-suffix in which its value can be represented. An integer-literal is a prvalue.

Table 8: Types of integer-literals [tab:lex.icon.type]
integer-suffix	decimal-literal	integer-literal other than decimal-literal
none	`int`	`int`
	`long int`	`unsigned int`
	`long long int`	`long int`
		`unsigned long int`
		`long long int`
		`unsigned long long int`
`u` or `U`	`unsigned int`	`unsigned int`
	`unsigned long int`	`unsigned long int`
	`unsigned long long int`	`unsigned long long int`
`l` or `L`	`long int`	`long int`
	`long long int`	`unsigned long int`
		`long long int`
		`unsigned long long int`
Both `u` or `U`	`unsigned long int`	`unsigned long int`
and `l` or `L`	`unsigned long long int`	`unsigned long long int`
`ll` or `LL`	`long long int`	`long long int`
		`unsigned long long int`
Both `u` or `U`	`unsigned long long int`	`unsigned long long int`
and `ll` or `LL`

4 If an integer-literal cannot be represented by any type in its list and an extended integer type ([basic.fundamental]) can represent its value, it may have that extended integer type. If all of the types in the list for the integer-literal are signed, the extended integer type shall be signed. If all of the types in the list for the integer-literal are unsigned, the extended integer type shall be unsigned. If the list contains both signed and unsigned types, the extended integer type may be signed or unsigned. A program is ill-formed if one of its translation units contains an integer-literal that cannot be represented by any of the allowed types.

5.13.3 Character literals [lex.ccon]

character-literal:

encoding-prefix_opt ' c-char-sequence '

encoding-prefix: one of

u8 u U L

c-char-sequence:

c-char

c-char-sequence c-char

c-char:

any member of the basic source character set except the single-quote ', backslash \, or new-line character

hexadecimal-escape-sequence

escape-sequence:

simple-escape-sequence

octal-escape-sequence

simple-escape-sequence: one of

\' \" \? \\

\a \b \f \n \r \t \v

octal-escape-sequence:

\ octal-digit

\ octal-digit octal-digit

\ octal-digit octal-digit octal-digit

hexadecimal-escape-sequence:

\x hexadecimal-digit

hexadecimal-escape-sequence hexadecimal-digit

1 A character-literal that does not begin with u8, u, U, or L is an ordinary character literal . An ordinary character literal that contains a single c-char representable in the execution character set has type char, with value equal to the numerical value of the encoding of the c-char in the execution character set. An ordinary character literal that contains more than one c-char is a multicharacter literal . A multicharacter literal, or an ordinary character literal containing a single c-char not representable in the execution character set, is conditionally-supported, has type int, and has an implementation-defined value.

2

A character-literal that begins with u8, such as u8'w', is a character-literal of type char8_t, known as a UTF-8 character literal . The value of a UTF-8 character literal is equal to its ISO/IEC 10646 code point value, provided that the code point value can be encoded as a single UTF-8 code unit.

NoteThat is, provided the code point value is in the range [0,7F] (hexadecimal). If the value is not representable with a single UTF-8 code unit, the program is ill-formed. A UTF-8 character literal containing multiple c-chars is ill-formed.

3

A character-literal that begins with the letter u, such as u'x', is a character-literal of type char16_t, known as a UTF-16 character literal . The value of a UTF-16 character literal is equal to its ISO/IEC 10646 code point value, provided that the code point value is representable with a single 16-bit code unit.

NoteThat is, provided the code point value is in the range [0,FFFF] (hexadecimal). If the value is not representable with a single 16-bit code unit, the program is ill-formed. A UTF-16 character literal containing multiple c-chars is ill-formed.

4 A character-literal that begins with the letter U, such as U'y', is a character-literal of type char32_t, known as a UTF-32 character literal . The value of a UTF-32 character literal containing a single c-char is equal to its ISO/IEC 10646 code point value. A UTF-32 character literal containing multiple c-chars is ill-formed.

5

A character-literal that begins with the letter L, such as L'z', is a wide-character literal . A wide-character literal has type wchar_t.^{[cpp20 12]} The value of a wide-character literal containing a single c-char has value equal to the numerical value of the encoding of the c-char in the execution wide-character set, unless the c-char has no representation in the execution wide-character set, in which case the value is implementation-defined.

NoteThe type wchar_t is able to represent all members of the execution wide-character set (see [basic.fundamental]). The value of a wide-character literal containing multiple c-chars is implementation-defined.

6

Certain non-graphic characters, the single quote ', the double quote ", the question mark ?,^{[cpp20 13]} and the backslash \, can be represented according to Table 9 . The double quote " and the question mark ?, can be represented as themselves or by the escape sequences \" and \? respectively, but the single quote ' and the backslash \ shall be represented by the escape sequences \' and \\ respectively. Escape sequences in which the character following the backslash is not listed in Table 9 are conditionally-supported, with implementation-defined semantics. An escape sequence specifies a single character.

Table 9: Escape sequences [tab:lex.ccon.esc]
new-line	NL(LF)	`\n`
horizontal tab	HT	`\t`
vertical tab	VT	`\v`
backspace	BS	`\b`
carriage return	CR	`\r`
form feed	FF	`\f`
alert	BEL	`\a`
backslash	\	`\\`
question mark	?	`\?`
single quote	`'`	`\'`
double quote	`"`	`\"`
octal number	ooo	`\ooo`
hex number	hhh	`\xhhh`

7

The escape \ooo consists of the backslash followed by one, two, or three octal digits that are taken to specify the value of the desired character. The escape \xhhh consists of the backslash followed by x followed by one or more hexadecimal digits that are taken to specify the value of the desired character. There is no limit to the number of digits in a hexadecimal sequence. A sequence of octal or hexadecimal digits is terminated by the first character that is not an octal digit or a hexadecimal digit, respectively. The value of a character-literal is implementation-defined if it falls outside of the implementation-defined range defined for char (for character-literals with no prefix) or wchar_t (for character-literals prefixed by L).

NoteIf the value of a character-literal prefixed by u, u8, or U is outside the range defined for its type, the program is ill-formed.

8

A universal-character-name is translated to the encoding, in the appropriate execution character set, of the character named. If there is no such encoding, the universal-character-name is translated to an implementation-defined encoding.

NoteIn translation phase 1, a universal-character-name is introduced whenever an actual extended character is encountered in the source text. Therefore, all extended characters are described in terms of universal-character-names. However, the actual compiler implementation can use its own native character set, so long as the same results are obtained.

5.13.4 Floating-point literals [lex.fcon]

floating-point-literal:

decimal-floating-point-literal

hexadecimal-floating-point-literal

decimal-floating-point-literal:

fractional-constant exponent-part_opt floating-point-suffix_opt

digit-sequence exponent-part floating-point-suffix_opt

hexadecimal-floating-point-literal:

hexadecimal-prefix hexadecimal-fractional-constant binary-exponent-part floating-point-suffix_opt

hexadecimal-prefix hexadecimal-digit-sequence binary-exponent-part floating-point-suffix_opt

fractional-constant:

digit-sequence_opt . digit-sequence

digit-sequence .

hexadecimal-fractional-constant:

hexadecimal-digit-sequence_opt . hexadecimal-digit-sequence

hexadecimal-digit-sequence .

exponent-part:

e sign_opt digit-sequence

E sign_opt digit-sequence

binary-exponent-part:

p sign_opt digit-sequence

P sign_opt digit-sequence

sign: one of

+ -

digit-sequence:

digit

digit-sequence '_opt digit

floating-point-suffix: one of

f l F L

1

The type of a floating-point-literal is determined by its floating-point-suffix as specified in Table 10 .

Table 10: Types of floating-point-literals [tab:lex.fcon.type]
floating-point-suffix	type
none	`double`
`f` or `F`	`float`
`l` or `L`	`long` `double`

2

The significand of a floating-point-literal is the fractional-constant or digit-sequence of a decimal-floating-point-literal or the hexadecimal-fractional-constant or hexadecimal-digit-sequence of a hexadecimal-floating-point-literal . In the significand, the sequence of digits or hexadecimal-digits and optional period are interpreted as a base N real number s, where N is 10 for a decimal-floating-point-literal and 16 for a hexadecimal-floating-point-literal .

NoteAny optional separating single quotes are ignored when determining the value. If an exponent-part or binary-exponent-part is present, the exponent e of the floating-point-literal is the result of interpreting the sequence of an optional sign and the digits as a base 10 integer. Otherwise, the exponent e is 0. The scaled value of the literal is s×10e for a decimal-floating-point-literal and s×2e for a hexadecimal-floating-point-literal .

ExampleThe floating-point-literals

49.625 and 0xC.68p+2 have the same value. The floating-point-literals

1.602'176'565e-19 and 1.602176565e-19 have the same value.

3 If the scaled value is not in the range of representable values for its type, the program is ill-formed. Otherwise, the value of a floating-point-literal is the scaled value if representable, else the larger or smaller representable value nearest the scaled value, chosen in an implementation-defined manner.

5.13.5 String literals [lex.string]

string-literal:

encoding-prefix_opt " s-char-sequence_opt "

encoding-prefix_opt R raw-string

s-char-sequence:

s-char

s-char-sequence s-char

s-char:

any member of the basic source character set except the double-quote ", backslash \, or new-line character

named-universal-character:

raw-string:

" d-char-sequence_opt ( r-char-sequence_opt ) d-char-sequence_opt "

r-char-sequence:

r-char

r-char-sequence r-char

r-char:

any member of the source character set, except a right parenthesis ) followed by

the initial d-char-sequence (which may be empty) followed by a double quote ".

d-char-sequence:

d-char

d-char-sequence d-char

d-char:

any member of the basic source character set except:

space, the left parenthesis (, the right parenthesis ), the backslash \, and the control characters

representing horizontal tab, vertical tab, form feed, and newline.

1

A string-literal that has an R

in the prefix is a raw string literal . The d-char-sequence serves as a delimiter. The terminating d-char-sequence of a raw-string is the same sequence of characters as the initial d-char-sequence . A d-char-sequence shall consist of at most 16 characters.

2 NoteThe characters '(' and ')' are permitted in a raw-string . Thus, R"delimiter((a|b))delimiter" is equivalent to "(a|b)".

3

NoteA source-file new-line in a raw string literal results in a new-line in the resulting execution string literal. Assuming no whitespace at the beginning of lines in the following example, the assert will succeed:

const char* p = R"(a\
b
c)";
assert(std::strcmp(p, "a\\\nb\nc") == 0);

4

ExampleThe raw string

R"a(
)\
a"
)a"

is equivalent to "\n)\\\na\"\n". The raw string

R"(x = "\"y\"")"

is equivalent to "x = \"\\\"y\\\"\"".

5 After translation phase 6, a string-literal that does not begin with an encoding-prefix is an ordinary string literal . An ordinary string literal has type “array of n const char” where n is the size of the string as defined below, has static storage duration ([basic.stc]), and is initialized with the given characters.

6 A string-literal that begins with u8, such as u8"asdf", is a UTF-8 string literal . A UTF-8 string literal has type “array of n const char8_t”, where n is the size of the string as defined below; each successive element of the object representation ([basic.types]) has the value of the corresponding code unit of the UTF-8 encoding of the string.

7 Ordinary string literals and UTF-8 string literals are also referred to as narrow string literals.

8

A string-literal that begins with u, such as u"asdf", is a UTF-16 string literal . A UTF-16 string literal has type “array of n const char16_t”, where n is the size of the string as defined below; each successive element of the array has the value of the corresponding code unit of the UTF-16 encoding of the string.

NoteA single c-char may produce more than one char16_t character in the form of surrogate pairs. A surrogate pair is a representation for a single code point as a sequence of two 16-bit code units.

9 A string-literal that begins with U, such as U"asdf", is a UTF-32 string literal . A UTF-32 string literal has type “array of n const char32_t”, where n is the size of the string as defined below; each successive element of the array has the value of the corresponding code unit of the UTF-32 encoding of the string.

10

A string-literal that begins with L, such as L"asdf", is a wide string literal . A wide string literal has type “array of n

const

wchar_t”, where n is the size of the string as defined below; it is initialized with the given characters.

11

In translation phase 6 ([lex.phases]), adjacent string-literals are concatenated. If both string-literals have the same encoding-prefix, the resulting concatenated string-literal has that encoding-prefix . If one string-literal has no encoding-prefix, it is treated as a string-literal of the same encoding-prefix as the other operand. If a UTF-8 string literal token is adjacent to a wide string literal token, the program is ill-formed. Any other concatenations are conditionally-supported with implementation-defined behavior.

NoteThis concatenation is an interpretation, not a conversion. Because the interpretation happens in translation phase 6 (after each character from a string-literal has been translated into a value from the appropriate character set), a string-literal's initial rawness has no effect on the interpretation or well-formedness of the concatenation. Table 11 has some examples of valid concatenations.

Table 11: String literal concatenations [tab:lex.string.concat]
Source		Means	Source		Means	Source		Means
`u"a"`	`u"b"`	`u"ab"`	`U"a"`	`U"b"`	`U"ab"`	`L"a"`	`L"b"`	`L"ab"`
`u"a"`	`"b"`	`u"ab"`	`U"a"`	`"b"`	`U"ab"`	`L"a"`	`"b"`	`L"ab"`
`"a"`	`u"b"`	`u"ab"`	`"a"`	`U"b"`	`U"ab"`	`"a"`	`L"b"`	`L"ab"`

Characters in concatenated strings are kept distinct.

Example

"\xA" "B"

contains the two characters '\xA' and 'B' after concatenation (and not the single hexadecimal character '\xAB').

12 After any necessary concatenation, in translation phase 7 ([lex.phases]), '\0' is appended to every string-literal so that programs that scan a string can find its end.

13

Escape sequences and universal-character-names in non-raw string literals have the same meaning as in character-literals ([lex.ccon]), except that the single quote ' is representable either by itself or by the escape sequence \', and the double quote " shall be preceded by a \, and except that a universal-character-name in a UTF-16 string literal may yield a surrogate pair. In a narrow string literal, a universal-character-name may map to more than one char or char8_t element due to multibyte encoding . The size of a char32_t or wide string literal is the total number of escape sequences, universal-character-names, and other characters, plus one for the terminating U'\0' or L'\0'. The size of a UTF-16 string literal is the total number of escape sequences, universal-character-names, and other characters, plus one for each character requiring a surrogate pair, plus one for the terminating u'\0'.

NoteThe size of a char16_t string literal is the number of code units, not the number of characters.

NoteAny universal-character-names are required to correspond to a code point in the range [0, D800) or [E000, 10FFFF] (hexadecimal) ([lex.charset]). The size of a narrow string literal is the total number of escape sequences and other characters, plus at least one for the multibyte encoding of each universal-character-name, plus one for the terminating '\0'.

14

Evaluating a string-literal results in a string literal object with static storage duration, initialized from the given characters as specified above. Whether all string-literals are distinct (that is, are stored in nonoverlapping objects) and whether successive evaluations of a string-literal yield the same or a different object is unspecified.

NoteThe effect of attempting to modify a string-literal is undefined.

5.13.6 Boolean literals [lex.bool]

	boolean-literal: `false` `true`
1	The Boolean literals are the keywords `false` and `true`. Such literals are prvalues and have type `bool`.

5.13.7 Pointer literals [lex.nullptr]

	pointer-literal: `nullptr`
1	The pointer literal is the keyword `nullptr`. It is a prvalue of type `std::nullptr_t`. Note`std::nullptr_t` is a distinct type that is neither a pointer type nor a pointer-to-member type; rather, a prvalue of this type is a null pointer constant and can be converted to a null pointer value or null member pointer value. See [conv.ptr] and [conv.mem].

5.13.8 User-defined literals [lex.ext]

	pointer-literal: `nullptr`
1	The pointer literal is the keyword `nullptr`. It is a prvalue of type `std::nullptr_t`. Note`std::nullptr_t` is a distinct type that is neither a pointer type nor a pointer-to-member type; rather, a prvalue of this type is a null pointer constant and can be converted to a null pointer value or null member pointer value. See [conv.ptr] and [conv.mem].

↑ Implementations behave as if these separate phases occur, although in practice different phases can be folded together.
↑ A partial preprocessing token would arise from a source file ending in the first portion of a multi-character token that requires a terminating sequence of characters, such as a header-name that is missing the closing " or >. A partial comment would arise from a source file ending with an unclosed /* comment.
↑ An implementation need not convert all non-corresponding source characters to the same execution character.
↑ The glyphs for the members of the basic source character set are intended to identify characters from the subset of ISO/IEC 10646 which corresponds to the ASCII character set. However, the mapping from source file characters to the source character set (described in translation phase 1) is specified as implementation-defined, and therefore implementations must document how the basic source characters are represented in source files.
↑ A sequence of characters resembling a universal-character-name in an r-char-sequence ([lex.string]) does not form a universal-character-name .
↑ These include “digraphs” and additional reserved words. The term “digraph” (token consisting of two characters) is not perfectly descriptive, since one of the alternative preprocessing-tokens is %:%: and of course several primary tokens contain two characters. Nonetheless, those alternative tokens that aren't lexical keywords are colloquially known as “digraphs”.
↑ Thus the “stringized” values ([cpp.stringize]) of [ and <: will be different, maintaining the source spelling, but the tokens can otherwise be freely interchanged.
↑ Literals include strings and character and numeric literals.
↑ Thus, a sequence of characters that resembles an escape sequence might result in an error, be interpreted as the character corresponding to the escape sequence, or have a completely different meaning, depending on the implementation.
↑ On systems in which linkers cannot accept extended characters, an encoding of the universal-character-name can be used in forming valid external identifiers. For example, some otherwise unused character or sequence of characters can be used to encode the \u in a universal-character-name . Extended characters can produce a long external identifier, but C++ does not place a translation limit on significant characters for external identifiers. In C++, upper- and lower-case letters are considered different for all identifiers, including external identifiers.
↑ The term “literal” generally designates, in this document, those tokens that are called “constants” in ISO C.
↑ They are intended for character sets where a character does not fit into a single byte.
↑ Using an escape sequence for a question mark is supported for compatibility with ISO C++ 2014 and ISO C.

[edit]

Source: https://timsong-cpp.github.io/cppwp/n4950/lex

1 Scope [intro.scope]
2 Normative references [intro.refs]
3 Terms and definitions [intro.defs]
4 General principles [intro]
5 Lexical conventions [lex]
5.1 Separate translation [lex.separate]
5.2 Phases of translation [lex.phases]
5.3 Character sets [lex.charset]
5.4 Preprocessing tokens [lex.pptoken]
5.5 Alternative tokens [lex.digraph]
5.6 Tokens [lex.token]
5.7 Comments [lex.comment]
5.8 Header names [lex.header]
5.9 Preprocessing numbers [lex.ppnumber]
5.10 Identifiers [lex.name]
5.11 Keywords [lex.key]
5.12 Operators and punctuators [lex.operators]
5.13 Literals [lex.literal]
6 Basics [basic]
7 Expressions [expr]
8 Statements [stmt.stmt]
9 Declarations [dcl.dcl]
10 Modules [module]
11 Classes [class]
12 Overloading [over]
13 Templates [temp]
14 Exception handling [except]
15 Preprocessing directives [cpp]
16 Library introduction [library]
17 Language support library [support]
18 Concepts library [concepts]
19 Diagnostics library [diagnostics]
20 Memory management library [mem]
21 Metaprogramming library [meta]
22 General utilities library [utilities]
23 Strings library [strings]
24 Containers library [containers]
25 Iterators library [iterators]
26 Ranges library [ranges]
27 Algorithms library [algorithms]
28 Numerics library [numerics]
29 Time library [time]
30 Localization library [localization]
31 Input/output library [input.output]
32 Regular expressions library [re]
33 Concurrency support library [thread]
Annex A (informative) Grammar summary [gram]
Annex B (normative) Implementation quantities [implimits]
Annex C (informative) Compatibility [diff]
Annex D (normative) Compatibility features [depr]
Annex E (informative) Conformance with UAX #31 [uaxid]
Bibliography [bibliography]
Index [generalindex]
Index of grammar productions [grammarindex]
Index of library headers [headerindex]
Index of library names [libraryindex]
Index of library concepts [conceptindex]
Index of implementation-defined behavior [impldefindex]

5.1 Separate translation [lex.separate]

1

The text of the program is kept in units called source files in this document. A source file together with all the headers and source files included via the preprocessing directive #include, less any source lines skipped by any of the conditional inclusion preprocessing directives, is called a preprocessing translation unit .

NoteA C++ program need not all be translated at the same time.

2 NotePreviously translated translation units and instantiation units can be preserved individually or in libraries. The separate translation units of a program communicate ([basic.link]) by (for example) calls to functions whose identifiers have external or module linkage, manipulation of objects whose identifiers have external or module linkage, or manipulation of data files. Translation units can be separately translated and then later linked to produce an executable program.

5.2 Phases of translation [lex.phases]

1

The precedence among the syntax rules of translation is specified by the following phases.^{[cpp23 1]}

An implementation shall support input files that are a sequence of UTF-8 code units (UTF-8 files). It may also support an implementation-defined set of other kinds of input files, and, if so, the kind of an input file is determined in an implementation-defined manner that includes a means of designating input files as UTF-8 files, independent of their content. NoteIn other words, recognizing the U+feff byte order mark is not sufficient. If an input file is determined to be a UTF-8 file, then it shall be a well-formed UTF-8 code unit sequence and it is decoded to produce a sequence of Unicode scalar values. A sequence of translation character set elements is then formed by mapping each Unicode scalar value to the corresponding translation character set element. In the resulting sequence, each pair of characters in the input sequence consisting of U+000d carriage return followed by U+000a line feed, as well as each U+000d carriage return not immediately followed by a U+000a line feed, is replaced by a single new-line character. For any other kind of input file supported by the implementation, characters are mapped, in an implementation-defined manner, to a sequence of translation character set elements ([lex.charset]), representing end-of-line indicators as new-line characters.
If the first translation character is U+feff byte order mark, it is deleted. Each sequence of a backslash character (\) immediately followed by zero or more whitespace characters other than new-line followed by a new-line character is deleted, splicing physical source lines to form logical source lines. Only the last backslash on any physical source line shall be eligible for being part of such a splice. Except for splices reverted in a raw string literal, if a splice results in a character sequence that matches the syntax of a universal-character-name, the behavior is undefined. A source file that is not empty and that does not end in a new-line character, or that ends in a splice, shall be processed as if an additional new-line character were appended to the file.
The source file is decomposed into preprocessing tokens ([lex.pptoken]) and sequences of whitespace characters (including comments). A source file shall not end in a partial preprocessing token or in a partial comment.^{[cpp23 2]} Each comment is replaced by one space character. New-line characters are retained. Whether each nonempty sequence of whitespace characters other than new-line is retained or replaced by one space character is unspecified. As characters from the source file are consumed to form the next preprocessing token (i.e., not being consumed as part of a comment or other forms of whitespace), except when matching a c-char-sequence, s-char-sequence, r-char-sequence, h-char-sequence, or q-char-sequence, universal-character-names are recognized and replaced by the designated element of the translation character set. The process of dividing a source file's characters into preprocessing tokens is context-dependent.
Example See the handling of < within a #include preprocessing directive.
Preprocessing directives are executed, macro invocations are expanded, and _Pragma unary operator expressions are executed. A #include preprocessing directive causes the named header or source file to be processed from phase 1 through phase 4, recursively. All preprocessing directives are then deleted.
For a sequence of two or more adjacent string-literal tokens, a common encoding-prefix is determined as specified in [lex.string]. Each such string-literal token is then considered to have that common encoding-prefix .
Adjacent string-literal tokens are concatenated ([lex.string]).
Whitespace characters separating tokens are no longer significant. Each preprocessing token is converted into a token ([lex.token]). The resulting tokens constitute a translation unit and are syntactically and semantically analyzed and translated. NoteThe process of analyzing and translating the tokens can occasionally result in one token being replaced by a sequence of other tokens ([temp.names]). It is implementation-defined whether the sources for module units and header units on which the current translation unit has an interface dependency ([module.unit], [module.import]) are required to be available. NoteSource files, translation units and translated translation units need not necessarily be stored as files, nor need there be any one-to-one correspondence between these entities and any external representation. The description is conceptual only, and does not specify any particular implementation.
Translated translation units and instantiation units are combined as follows: NoteSome or all of these can be supplied from a library. Each translated translation unit is examined to produce a list of required instantiations. NoteThis can include instantiations which have been explicitly requested ([temp.explicit]). The definitions of the required templates are located. It is implementation-defined whether the source of the translation units containing these definitions is required to be available. NoteAn implementation can choose to encode sufficient information into the translated translation unit so as to ensure the source is not required here. All the required instantiations are performed to produce instantiation units . NoteThese are similar to translated translation units, but contain no references to uninstantiated templates and no template definitions. The program is ill-formed if any instantiation fails.
All external entity references are resolved. Library components are linked to satisfy external references to entities not defined in the current translation. All such translator output is collected into a program image which contains information needed for execution in its execution environment.

5.3 Character sets [lex.charset]

1 The translation character set consists of the following elements:

(1.1)

each abstract character assigned a code point in the Unicode codespace as specified in the Unicode Standard, and

(1.2)

a distinct character for each Unicode scalar value not assigned to an abstract character.
NoteUnicode code points are integers in the range [0, 10FFFF] (hexadecimal). A surrogate code point is a value in the range [D800, DFFF] (hexadecimal). A Unicode scalar value is any code point that is not a surrogate code point.

2

The basic character set is a subset of the translation character set, consisting of 96 characters as specified in Table 1 .

NoteUnicode short names are given only as a means to identifying the character; the numerical value has no other meaning in this context.

Table 1: Basic character set [tab:lex.charset.basic]
character		glyph
U+0009	character tabulation
U+000b	line tabulation
U+000c	form feed
U+0020	space
U+000a	line feed	new-line
U+0021	exclamation mark	`!`
U+0022	quotation mark	`"`
U+0023	number sign	`#`
U+0025	percent sign	`%`
U+0026	ampersand	`&`
U+0027	apostrophe	`'`
U+0028	left parenthesis	`(`
U+0029	right parenthesis	`)`
U+002a	asterisk	`*`
U+002b	plus sign	`+`
U+002c	comma	`,`
U+002d	hyphen-minus	`-`
U+002e	full stop	`.`
U+002f	solidus	`/`
U+0030 .. U+0039	digit zero .. nine	`0 1 2 3 4 5 6 7 8 9`
U+003a	colon	`:`
U+003b	semicolon	`;`
U+003c	less-than sign	`<`
U+003d	equals sign	`=`
U+003e	greater-than sign	`>`
U+003f	question mark	`?`
U+0041 .. U+005a	latin capital letter a .. z	`A B C D E F G H I J K L M`
		`N O P Q R S T U V W X Y Z`
U+005b	left square bracket	`[`
U+005c	reverse solidus	`\`
U+005d	right square bracket	`]`
U+005e	circumflex accent	`^`
U+005f	low line	`_`
U+0061 .. U+007a	latin small letter a .. z	`a b c d e f g h i j k l m`
		`n o p q r s t u v w x y z`
U+007b	left curly bracket	`{`
U+007c	vertical line	`\|`
U+007d	right curly bracket	`}`
U+007e	tilde	`~`

3

The universal-character-name construct provides a way to name other characters.

Syntax (BNF)

n-char: one of

any member of the translation character set except the U+007d right curly bracket or new-line character

n-char-sequence:

n-char

n-char-sequence n-char

\N{ n-char-sequence }

hex-quad:

hexadecimal-digit hexadecimal-digit hexadecimal-digit hexadecimal-digit

simple-hexadecimal-digit-sequence:

hexadecimal-digit

simple-hexadecimal-digit-sequence hexadecimal-digit

universal-character-name:

\u hex-quad

\U hex-quad hex-quad

\u{ simple-hexadecimal-digit-sequence }

named-universal-character

4 A universal-character-name of the form \u hex-quad, \U hex-quad hex-quad, or \u{simple-hexadecimal-digit-sequence} designates the character in the translation character set whose Unicode scalar value is the hexadecimal number represented by the sequence of hexadecimal-digits in the universal-character-name . The program is ill-formed if that number is not a Unicode scalar value.

5

A universal-character-name that is a named-universal-character designates the corresponding character in the Unicode Standard (chapter 4.8 Name) if the n-char-sequence is equal to its character name or to one of its character name aliases of type “control”, “correction”, or “alternate”; otherwise, the program is ill-formed.

NoteThese aliases are listed in the Unicode Character Database's NameAliases.txt. None of these names or aliases have leading or trailing spaces.

6

If a universal-character-name outside the c-char-sequence, s-char-sequence, or r-char-sequence of a character-literal or string-literal (in either case, including within a user-defined-literal) corresponds to a control character or to a character in the basic character set, the program is ill-formed.

NoteA sequence of characters resembling a universal-character-name in an r-char-sequence ([lex.string]) does not form a universal-character-name .

7

The basic literal character set consists of all characters of the basic character set, plus the control characters specified in Table 2 .

Table 2: Additional control characters in the basic literal character set [tab:lex.charset.literal]
character
U+0000	null
U+0007	alert
U+0008	backspace
U+000d	carriage return

8 A code unit is an integer value of character type ([basic.fundamental]). Characters in a character-literal other than a multicharacter or non-encodable character literal or in a string-literal are encoded as a sequence of one or more code units, as determined by the encoding-prefix ([lex.ccon], [lex.string]); this is termed the respective literal encoding . The ordinary literal encoding is the encoding applied to an ordinary character or string literal. The wide literal encoding is the encoding applied to a wide character or string literal.

9

A literal encoding or a locale-specific encoding of one of the execution character sets ([character.seq]) encodes each element of the basic literal character set as a single code unit with non-negative value, distinct from the code unit for any other such element.

NoteA character not in the basic literal character set can be encoded with more than one code unit; the value of such a code unit can be the same as that of a code unit for an element of the basic literal character set. The U+0000 null character is encoded as the value 0. No other element of the translation character set is encoded with a code unit of value 0. The code unit value of each decimal digit character after the digit 0 (U+0030) shall be one greater than the value of the previous. The ordinary and wide literal encodings are otherwise implementation-defined. For a UTF-8, UTF-16, or UTF-32 literal, the implementation shall encode the Unicode scalar value corresponding to each character of the translation character set as specified in the Unicode Standard for the respective Unicode encoding form.

5.4 Preprocessing tokens [lex.pptoken]

	preprocessing-token: header-name import-keyword module-keyword export-keyword identifier pp-number character-literal user-defined-character-literal string-literal user-defined-string-literal preprocessing-op-or-punc each non-whitespace character that cannot be one of the above
1	Each preprocessing token that is converted to a token shall have the lexical form of a keyword, an identifier, a literal, or an operator or punctuator.
2	A preprocessing token is the minimal lexical element of the language in translation phases 3 through 6. In this document, glyphs are used to identify elements of the basic character set ([lex.charset]). The categories of preprocessing token are: header names, placeholder tokens produced by preprocessing `import` and `module` directives (import-keyword, module-keyword, and export-keyword), identifiers, preprocessing numbers, character literals (including user-defined character literals), string literals (including user-defined string literals), preprocessing operators and punctuators, and single non-whitespace characters that do not lexically match the other preprocessing token categories. If a U+0027 apostrophe or a U+0022 quotation mark character matches the last category, the behavior is undefined. If any character not in the basic character set matches the last category, the program is ill-formed. Preprocessing tokens can be separated by whitespace; this consists of comments ([lex.comment]), or whitespace characters (U+0020 space, U+0009 character tabulation, new-line, U+000b line tabulation, and U+000c form feed), or both. As described in [cpp], in certain circumstances during translation phase 4, whitespace (or the absence thereof) serves as more than preprocessing token separation. Whitespace can appear within a preprocessing token only as part of a header name or between the quotation characters in a character literal or string literal.
3	If the input stream has been parsed into preprocessing tokens up to a given character:
(3.1)	If the next character begins a sequence of characters that could be the prefix and initial double quote of a raw string literal, such as `R"`, the next preprocessing token shall be a raw string literal. Between the initial and final double quote characters of the raw string, any transformations performed in phase 2 (line splicing) are reverted; this reversion shall apply before any d-char, r-char, or delimiting parenthesis is identified. The raw string literal is defined as the shortest sequence of characters that matches the raw-string pattern encoding-prefix_opt R raw-string
(3.2)	Otherwise, if the next three characters are `<::` and the subsequent character is neither `:` nor `>`, the `<` is treated as a preprocessing token by itself and not as the first character of the alternative token `<:`.
(3.3)	Otherwise, the next preprocessing token is the longest sequence of characters that could constitute a preprocessing token, even if that would cause further lexical analysis to fail, except that a header-name ([lex.header]) is only formed
(3.3.1)	after the `include` or `import` preprocessing token in an `#include` ([cpp.include]) or `import` ([cpp.import]) directive, or
(3.3.2)	within a has-include-expression . Example #define R "x" const char* s = R"y"; // ill-formed raw string, not "x" "y"
4	The import-keyword is produced by processing an `import` directive ([cpp.import]), the module-keyword is produced by preprocessing a `module` directive ([cpp.module]), and the export-keyword is produced by preprocessing either of the previous two directives. NoteNone has any observable spelling.
5	Example The program fragment `0xe+foo` is parsed as a preprocessing number token (one that is not a valid integer-literal or floating-point-literal token), even though a parse as three preprocessing tokens `0xe`, `+`, and `foo` can produce a valid expression (for example, if `foo` is a macro defined as `1`). Similarly, the program fragment `1E1` is parsed as a preprocessing number (one that is a valid floating-point-literal token), whether or not `E` is a macro name.
6	Example The program fragment `x+++++y` is parsed as `x ++ ++ + y`, which, if `x` and `y` have integral types, violates a constraint on increment operators, even though the parse `x ++ + ++ y` can yield a correct expression.

5.5 Alternative tokens [lex.digraph]

1 Alternative token representations are provided for some operators and punctuators.^{[cpp23 3]}

2

In all respects of the language, each alternative token behaves the same, respectively, as its primary token, except for its spelling.^{[cpp23 4]} The set of alternative tokens is defined in Table 3 .

Table 3: Alternative tokens [tab:lex.digraph]
Alternative	Primary	Alternative	Primary	Alternative	Primary
`<%`	`{`	`and`	`&&`	`and_eq`	`&=`
`%>`	`}`	`bitor`	`\|`	`or_eq`	`\|=`
`<:`	`[`	`or`	`\|\|`	`xor_eq`	`^=`
`:>`	`]`	`xor`	`^`	`not`	`!`
`%:`	`#`	`compl`	`~`	`not_eq`	`!=`
`%:%:`	`##`	`bitand`	`&`

5.6 Tokens [lex.token]

identifier identifier-continue

1

There are five kinds of tokens: identifiers, keywords, literals,^{[cpp23 5]} operators, and other separators. Blanks, horizontal and vertical tabs, newlines, formfeeds, and comments (collectively, “whitespace”), as described below, are ignored except as they serve to separate tokens.

NoteSome whitespace is required to separate otherwise adjacent identifiers, keywords, numeric literals, and alternative tokens containing alphabetic characters.

5.7 Comments [lex.comment]

1

The characters /* start a comment, which terminates with the characters */. These comments do not nest. The characters // start a comment, which terminates immediately before the next new-line character. If there is a form-feed or a vertical-tab character in such a comment, only whitespace characters shall appear between it and the new-line that terminates the comment; no diagnostic is required.

NoteThe comment characters //, /*, and */ have no special meaning within a // comment and are treated just like other characters. Similarly, the comment characters // and /* have no special meaning within a /* comment.

5.8 Header names [lex.header]

	header-name: < h-char-sequence > " q-char-sequence " h-char-sequence: h-char h-char-sequence h-char h-char: any member of the translation character set except new-line and U+003e greater-than sign q-char-sequence: q-char q-char-sequence q-char q-char: any member of the translation character set except new-line and U+0022 quotation mark
1	NoteHeader name preprocessing tokens only appear within a `#include` preprocessing directive, a `__has_include` preprocessing expression, or after certain occurrences of an `import` token (see [lex.pptoken]). The sequences in both forms of header-names are mapped in an implementation-defined manner to headers or to external source file names as specified in [cpp.include].
2	The appearance of either of the characters `'` or `\` or of either of the character sequences `/*` or `//` in a q-char-sequence or an h-char-sequence is conditionally-supported with implementation-defined semantics, as is the appearance of the character `"` in an h-char-sequence .^{[cpp23 6]}

5.9 Preprocessing numbers [lex.ppnumber]

	pp-number: digit . digit pp-number identifier-continue pp-number ' digit pp-number ' nondigit pp-number e sign pp-number E sign pp-number p sign pp-number P sign pp-number .
1	Preprocessing number tokens lexically include all integer-literal tokens ([lex.icon]) and all floating-point-literal tokens ([lex.fcon]).
2	A preprocessing number does not have a type or a value; it acquires both after a successful conversion to an integer-literal token or a floating-point-literal token.

5.10 Identifiers [lex.name]

identifier:

identifier-start

identifier-start:

nondigit

an element of the translation character set with the Unicode property XID_Start

identifier-continue:

digit

nondigit

an element of the translation character set with the Unicode property XID_Continue

nondigit: one of

a b c d e f g h i j k l m

n o p q r s t u v w x y z

A B C D E F G H I J K L M

N O P Q R S T U V W X Y Z _

digit: one of

0 1 2 3 4 5 6 7 8 9

1

NoteThe character properties XID_Start and XID_Continue are Derived Core Properties as described by UAX #44 of the Unicode Standard.^{[cpp23 7]}

The program is ill-formed if an identifier does not conform to Normalization Form C as specified in the Unicode Standard. NoteIdentifiers are case-sensitive.

Note[uaxid] compares the requirements of UAX #31 of the Unicode Standard with the C++ rules for identifiers.

NoteIn translation phase 4, identifier also includes those preprocessing-tokens ([lex.pptoken]) differentiated as keywords ([lex.key]) in the later translation phase 7 ([lex.token]).

2

The identifiers in Table 4 have a special meaning when appearing in a certain context. When referred to in the grammar, these identifiers are used explicitly rather than using the identifier grammar production. Unless otherwise specified, any ambiguity as to whether a given identifier has a special meaning is resolved to interpret the token as a regular identifier .

Table 4: Identifiers with special meaning [tab:lex.name.special]
`final`	`import`	`module`	`override`

3 In addition, some identifiers appearing as a token or preprocessing-token are reserved for use by C++ implementations and shall not be used otherwise; no diagnostic is required.

(3.1)

Each identifier that contains a double underscore __ or begins with an underscore followed by an uppercase letter is reserved to the implementation for any use.

(3.2)

Each identifier that begins with an underscore is reserved to the implementation for use as a name in the global namespace.

5.11 Keywords [lex.key]

keyword:

any identifier listed in Table 5

import-keyword

module-keyword

export-keyword

1

The identifiers shown in Table 5 are reserved for use as keywords (that is, they are unconditionally treated as keywords in phase 7) except in an attribute-token ([dcl.attr.grammar]).

NoteThe register keyword is unused but is reserved for future use.

Table 5: Keywords [tab:lex.key]
`alignas`	`constinit`	`false`	`public`	`true`
`alignof`	`const_cast`	`float`	`register`	`try`
`asm`	`continue`	`for`	`reinterpret_cast`	`typedef`
`auto`	`co_await`	`friend`	`requires`	`typeid`
`bool`	`co_return`	`goto`	`return`	`typename`
`break`	`co_yield`	`if`	`short`	`union`
`case`	`decltype`	`inline`	`signed`	`unsigned`
`catch`	`default`	`int`	`sizeof`	`using`
`char`	`delete`	`long`	`static`	`virtual`
`char8_t`	`do`	`mutable`	`static_assert`	`void`
`char16_t`	`double`	`namespace`	`static_cast`	`volatile`
`char32_t`	`dynamic_cast`	`new`	`struct`	`wchar_t`
`class`	`else`	`noexcept`	`switch`	`while`
`concept`	`enum`	`nullptr`	`template`
`const`	`explicit`	`operator`	`this`
`consteval`	`export`	`private`	`thread_local`
`constexpr`	`extern`	`protected`	`throw`

2

Furthermore, the alternative representations shown in Table 6 for certain operators and punctuators ([lex.digraph]) are reserved and shall not be used otherwise.

Table 6: Alternative representations [tab:lex.key.digraph]
`and`	`and_eq`	`bitand`	`bitor`	`compl`	`not`
`not_eq`	`or`	`or_eq`	`xor`	`xor_eq`

5.12 Operators and punctuators [lex.operators]

1

The lexical representation of C++ programs includes a number of preprocessing tokens that are used in the syntax of the preprocessor or are converted into tokens for operators and punctuators:

preprocessing-op-or-punc:

preprocessing-operator

hexadecimal-prefix hexadecimal-digit-sequence

preprocessing-operator: one of

# ## %: %:%:

operator-or-punctuator: one of

{ } [ ] ( )

<: :> <% %> ; : ...

? :: . .* -> ->* ~

! + - * / % ^ & |

= += -= *= /= %= ^= &= |=

== != < > <= >= <=> && ||

<< >> <<= >>= ++ -- ,

and or xor not bitand bitor compl

and_eq or_eq xor_eq not_eq

Each operator-or-punctuator is converted to a single token in translation phase 7 .

5.13.1 Kinds of literals [lex.literal.kinds]

1	There are several kinds of literals.^{[cpp23 8]} literal: integer-literal character-literal floating-point-literal string-literal boolean-literal pointer-literal user-defined-literal NoteWhen appearing as an expression, a literal has a type and a value category ([expr.prim.literal]).

5.13.2 Integer literals [lex.icon]

integer-literal:

binary-literal integer-suffix_opt

octal-literal integer-suffix_opt

decimal-literal integer-suffix_opt

hexadecimal-literal integer-suffix_opt

binary-literal:

0b binary-digit

0B binary-digit

binary-literal '_opt binary-digit

octal-literal:

0

octal-literal '_opt octal-digit

decimal-literal:

nonzero-digit

decimal-literal '_opt digit

hexadecimal-literal:

binary-digit: one of

0 1

octal-digit: one of

0 1 2 3 4 5 6 7

nonzero-digit: one of

1 2 3 4 5 6 7 8 9

hexadecimal-prefix: one of

0x 0X

hexadecimal-digit-sequence:

hexadecimal-digit

hexadecimal-digit-sequence '_opt hexadecimal-digit

hexadecimal-digit: one of

0 1 2 3 4 5 6 7 8 9

a b c d e f

A B C D E F

integer-suffix:

unsigned-suffix long-suffix_opt

unsigned-suffix long-long-suffix_opt

unsigned-suffix size-suffix_opt

long-suffix unsigned-suffix_opt

long-long-suffix unsigned-suffix_opt

size-suffix unsigned-suffix_opt

unsigned-suffix: one of

u U

long-suffix: one of

l L

long-long-suffix: one of

ll LL

size-suffix: one of

z Z

1

In an integer-literal, the sequence of binary-digits, octal-digits, digits, or hexadecimal-digits is interpreted as a base N integer as shown in table Table 7; the lexically first digit of the sequence of digits is the most significant.

NoteThe prefix and any optional separating single quotes are ignored when determining the value.

Table 7: Base of integer-literals [tab:lex.icon.base]
Kind of integer-literal	base N
binary-literal	2
octal-literal	8
decimal-literal	10
hexadecimal-literal	16

2

The hexadecimal-digits

a through f and A through F have decimal values ten through fifteen.

Example The number twelve can be written 12, 014, 0XC, or 0b1100. The integer-literals 1048576, 1'048'576, 0X100000, 0x10'0000, and 0'004'000'000 all have the same value.

3

The type of an integer-literal is the first type in the list in Table 8 corresponding to its optional integer-suffix in which its value can be represented.

Table 8: Types of integer-literals [tab:lex.icon.type]
integer-suffix	decimal-literal	integer-literal other than decimal-literal
none	`int`	`int`
	`long int`	`unsigned int`
	`long long int`	`long int`
		`unsigned long int`
		`long long int`
		`unsigned long long int`
`u` or `U`	`unsigned int`	`unsigned int`
	`unsigned long int`	`unsigned long int`
	`unsigned long long int`	`unsigned long long int`
`l` or `L`	`long int`	`long int`
	`long long int`	`unsigned long int`
		`long long int`
		`unsigned long long int`
Both `u` or `U`	`unsigned long int`	`unsigned long int`
and `l` or `L`	`unsigned long long int`	`unsigned long long int`
`ll` or `LL`	`long long int`	`long long int`
		`unsigned long long int`
Both `u` or `U`	`unsigned long long int`	`unsigned long long int`
and `ll` or `LL`
`z` or `Z`	the signed integer type corresponding	the signed integer type
	to `std::size_t` ([support.types.layout])	corresponding to `std::size_t`
		`std::size_t`
Both `u` or `U`	`std::size_t`	`std::size_t`
and `z` or `Z`

4 If an integer-literal cannot be represented by any type in its list and an extended integer type ([basic.fundamental]) can represent its value, it may have that extended integer type. If all of the types in the list for the integer-literal are signed, the extended integer type shall be signed. If all of the types in the list for the integer-literal are unsigned, the extended integer type shall be unsigned. If the list contains both signed and unsigned types, the extended integer type may be signed or unsigned. A program is ill-formed if one of its translation units contains an integer-literal that cannot be represented by any of the allowed types.

5.13.3 Character literals [lex.ccon]

character-literal:

encoding-prefix_opt ' c-char-sequence '

encoding-prefix: one of

u8 u U L

c-char-sequence:

c-char

c-char-sequence c-char

c-char:

basic-c-char

conditional-escape-sequence

basic-c-char:

any member of the translation character set except the U+0027 apostrophe,

U+005c reverse solidus, or new-line character

escape-sequence:

simple-escape-sequence

numeric-escape-sequence

simple-escape-sequence:

\ simple-escape-sequence-char

simple-escape-sequence-char: one of

' " ? \ a b f n r t v

numeric-escape-sequence:

octal-escape-sequence

hexadecimal-escape-sequence

simple-octal-digit-sequence:

octal-digit

simple-octal-digit-sequence octal-digit

octal-escape-sequence:

\ octal-digit

\ octal-digit octal-digit

\ octal-digit octal-digit octal-digit

\o{ simple-octal-digit-sequence }

hexadecimal-escape-sequence:

\x simple-hexadecimal-digit-sequence

\x{ simple-hexadecimal-digit-sequence }

conditional-escape-sequence:

\ conditional-escape-sequence-char

conditional-escape-sequence-char:

any member of the basic character set that is not an octal-digit, a simple-escape-sequence-char, or the characters N, o, u, U, or x

1 A non-encodable character literal is a character-literal whose c-char-sequence consists of a single c-char that is not a numeric-escape-sequence and that specifies a character that either lacks representation in the literal's associated character encoding or that cannot be encoded as a single code unit. A multicharacter literal is a character-literal whose c-char-sequence consists of more than one c-char . The encoding-prefix of a non-encodable character literal or a multicharacter literal shall be absent. Such character-literals are conditionally-supported.

2

The kind of a character-literal, its type, and its associated character encoding ([lex.charset]) are determined by its encoding-prefix and its c-char-sequence as defined by Table 9 . The special cases for non-encodable character literals and multicharacter literals take precedence over the base kind.

NoteThe associated character encoding for ordinary character literals determines encodability, but does not determine the value of non-encodable ordinary character literals or ordinary multicharacter literals. The examples in Table 9 for non-encodable ordinary character literals assume that the specified character lacks representation in the ordinary literal encoding or that encoding the character would require more than one code unit.

Table 9: Character literals [tab:lex.ccon.literal]
Encoding	Kind	Type	Associated char-	Example
prefix			acter encoding
none	ordinary character literal	`char`	ordinary	`'v'`
	non-encodable ordinary character literal	`int`	literal	`'\U0001F525'`
	ordinary multicharacter literal	`int`	encoding	`'abcd'`
`L`	wide character literal	`wchar_t`	wide literal	`L'w'`
			encoding
`u8`	UTF-8 character literal	`char8_t`	UTF-8	`u8'x'`
`u`	UTF-16 character literal	`char16_t`	UTF-16	`u'y'`
`U`	UTF-32 character literal	`char32_t`	UTF-32	`U'z'`

3 In translation phase 4, the value of a character-literal is determined using the range of representable values of the character-literal's type in translation phase 7. A non-encodable character literal or a multicharacter literal has an implementation-defined value. The value of any other kind of character-literal is determined as follows:

(3.1)

A character-literal with a c-char-sequence consisting of a single basic-c-char, simple-escape-sequence, or universal-character-name is the code unit value of the specified character as encoded in the literal's associated character encoding. NoteIf the specified character lacks representation in the literal's associated character encoding or if it cannot be encoded as a single code unit, then the literal is a non-encodable character literal.

(3.2)

A character-literal with a c-char-sequence consisting of a single numeric-escape-sequence has a value as follows:

(3.2.1)

Let v be the integer value represented by the octal number comprising the sequence of octal-digits in an octal-escape-sequence or by the hexadecimal number comprising the sequence of hexadecimal-digits in a hexadecimal-escape-sequence .

(3.2.2)

If v does not exceed the range of representable values of the character-literal's type, then the value is v.

(3.2.3)

Otherwise, if the character-literal's encoding-prefix is absent or L, and v does not exceed the range of representable values of the corresponding unsigned type for the underlying type of the character-literal's type, then the value is the unique value of the character-literal's type T that is congruent to v modulo 2N, where N is the width of T.

(3.2.4)

Otherwise, the character-literal is ill-formed.

(3.3)

A character-literal with a c-char-sequence consisting of a single conditional-escape-sequence is conditionally-supported and has an implementation-defined value.

4

The character specified by a simple-escape-sequence is specified in Table 10 .

NoteUsing an escape sequence for a question mark is supported for compatibility with ISO C++ 2014 and ISO C.

Table 10: Simple escape sequences [tab:lex.ccon.esc]
character		simple-escape-sequence
U+000a	line feed	`\n`
U+0009	character tabulation	`\t`
U+000b	line tabulation	`\v`
U+0008	backspace	`\b`
U+000d	carriage return	`\r`
U+000c	form feed	`\f`
U+0007	alert	`\a`
U+005c	reverse solidus	`\\`
U+003f	question mark	`\?`
U+0027	apostrophe	`\'`
U+0022	quotation mark	`\"`

5.13.4 Floating-point literals [lex.fcon]

floating-point-literal:

decimal-floating-point-literal

hexadecimal-floating-point-literal

decimal-floating-point-literal:

fractional-constant exponent-part_opt floating-point-suffix_opt

digit-sequence exponent-part floating-point-suffix_opt

hexadecimal-floating-point-literal:

hexadecimal-prefix hexadecimal-fractional-constant binary-exponent-part floating-point-suffix_opt

hexadecimal-prefix hexadecimal-digit-sequence binary-exponent-part floating-point-suffix_opt

fractional-constant:

digit-sequence_opt . digit-sequence

digit-sequence .

hexadecimal-fractional-constant:

hexadecimal-digit-sequence_opt . hexadecimal-digit-sequence

hexadecimal-digit-sequence .

exponent-part:

e sign_opt digit-sequence

E sign_opt digit-sequence

binary-exponent-part:

p sign_opt digit-sequence

P sign_opt digit-sequence

sign: one of

+ -

digit-sequence:

digit

digit-sequence '_opt digit

floating-point-suffix: one of

f l f16 f32 f64 f128 bf16 F L F16 F32 F64 F128 BF16

1

The type of a floating-point-literal ([basic.fundamental], [basic.extended.fp]) is determined by its floating-point-suffix as specified in Table 11 .

NoteThe floating-point suffixes f16, f32, f64, f128, bf16, F16, F32, F64, F128, and BF16 are conditionally-supported. See [basic.extended.fp].

Table 11: Types of floating-point-literals [tab:lex.fcon.type]
floating-point-suffix	type
none	`double`
`f` or `F`	`float`
`l` or `L`	`long` `double`
`f16` or `F16`	`std::float16_t`
`f32` or `F32`	`std::float32_t`
`f64` or `F64`	`std::float64_t`
`f128` or `F128`	`std::float128_t`
`bf16` or `BF16`	`std::bfloat16_t`

2

The significand of a floating-point-literal is the fractional-constant or digit-sequence of a decimal-floating-point-literal or the hexadecimal-fractional-constant or hexadecimal-digit-sequence of a hexadecimal-floating-point-literal . In the significand, the sequence of digits or hexadecimal-digits and optional period are interpreted as a base N real number s, where N is 10 for a decimal-floating-point-literal and 16 for a hexadecimal-floating-point-literal .

NoteAny optional separating single quotes are ignored when determining the value. If an exponent-part or binary-exponent-part is present, the exponent e of the floating-point-literal is the result of interpreting the sequence of an optional sign and the digits as a base 10 integer. Otherwise, the exponent e is 0. The scaled value of the literal is s×10e for a decimal-floating-point-literal and s×2e for a hexadecimal-floating-point-literal .

Example The floating-point-literals

49.625 and 0xC.68p+2 have the same value. The floating-point-literals

1.602'176'565e-19 and 1.602176565e-19 have the same value.

3 If the scaled value is not in the range of representable values for its type, the program is ill-formed. Otherwise, the value of a floating-point-literal is the scaled value if representable, else the larger or smaller representable value nearest the scaled value, chosen in an implementation-defined manner.

5.13.5 String literals [lex.string]

string-literal:

encoding-prefix_opt " s-char-sequence_opt "

encoding-prefix_opt R raw-string

s-char-sequence:

s-char

s-char-sequence s-char

s-char:

basic-s-char