Source: https://timsong-cpp.github.io/cppwp/n3337/lex
List of Tables [tab] List of Figures [fig] 1 General [intro] 2 Lexical conventions [lex] 2.1 Separate translation [lex.separate] 2.2 Phases of translation [lex.phases] 2.3 Character sets [lex.charset] 2.4 Trigraph sequences [lex.trigraph] 2.5 Preprocessing tokens [lex.pptoken] 2.6 Alternative tokens [lex.digraph] 2.7 Tokens [lex.token] 2.8 Comments [lex.comment] 2.9 Header names [lex.header] 2.10 Preprocessing numbers [lex.ppnumber] 2.11 Identifiers [lex.name] 2.12 Keywords [lex.key] 2.13 Operators and punctuators [lex.operators] 2.14 Literals [lex.literal] 3 Basic concepts [basic] 4 Standard conversions [conv] 5 Expressions [expr] 6 Statements [stmt.stmt] 7 Declarations [dcl.dcl] 8 Declarators [dcl.decl] 9 Classes [class] 10 Derived classes [class.derived] 11 Member access control [class.access] 12 Special member functions [special] 13 Overloading [over] 14 Templates [temp] 15 Exception handling [except] 16 Preprocessing directives [cpp] 17 Library introduction [library] 18 Language support library [language.support] 19 Diagnostics library [diagnostics] 20 General utilities library [utilities] 21 Strings library [strings] 22 Localization library [localization] 23 Containers library [containers] 24 Iterators library [iterators] 25 Algorithms library [algorithms] 26 Numerics library [numerics] 27 Input/output library [input.output] 28 Regular expressions library [re] 29 Atomic operations library [atomics] 30 Thread support library [thread] Annex A (informative) Grammar summary [gram] Annex B (informative) Implementation quantities [implimits] Annex C (informative) Compatibility [diff] Annex D (normative) Compatibility features [depr] Annex E (normative) Universal character names for identifier characters [charname] Index [generalindex] Index of library names [libraryindex] Index of implementation-defined behavior [impldefindex]
#include
\uXXXX
<
_Pragma
hex-quad:
universal-character-name:
The character designated by the universal-character-name \UNNNNNNNN is that character whose character short name in ISO/IEC 10646 is NNNNNNNN; the character designated by the universal-character-name \uNNNN is that character whose character short name in ISO/IEC 10646 is 0000NNNN. If the hexadecimal value for a universal-character-name corresponds to a surrogate code point (in the range 0xD800–0xDFFF, inclusive), the program is ill-formed. Additionally, if the hexadecimal value for a universal-character-name outside the c-char-sequence, s-char-sequence, or r-char-sequence of a character or string literal corresponds to a control character (in either of the ranges 0x00–0x1F or 0x7F–0x9F, both inclusive) or to a character in the basic source character set, the program is ill-formed.[cpp11 5]
\UNNNNNNNN
NNNNNNNN
\uNNNN
0000NNNN
0
??=
#
??(
[
??<
{
??/
\
??)
]
??>
}
??'
^
??!
|
??-
~
??=define arraycheck(a,b) a??(b??) ??!??! b??(a??)
becomes
#define arraycheck(a,b) a[b] || b[a]
?
preprocessing-token:
'
"
If the next character begins a sequence of characters that could be the prefix and initial double quote of a raw string literal, such as R", the next preprocessing token shall be a raw string literal. Between the initial and final double quote characters of the raw string, any transformations performed in phases 1 and 2 (trigraphs, universal-character-names, and line splicing) are reverted; this reversion shall apply before any d-char, r-char, or delimiting parenthesis is identified. The raw string literal is defined as the shortest sequence of characters that matches the raw-string pattern
R"
encoding-prefixopt R raw-string
Otherwise, if the next three characters are <:: and the subsequent character is neither : nor >, the < is treated as a preprocessor token by itself and not as the first character of the alternative token <:.
<::
:
>
<:
Otherwise, the next preprocessing token is the longest sequence of characters that could constitute a preprocessing token, even if that would cause further lexical analysis to fail.
#define R "x" const char* s = R"y"; // ill-formed raw string, not "x" "y"
1Ex
1
Ex
+1
1E1
E
x+++++y
x ++ ++ + y
x
y
x ++ + ++ y
<%
and
&&
and_eq
&=
%>
bitor
or_eq
|=
or
||
xor_eq
^=
:>
xor
not
!
%:
compl
not_eq
!=
%:%:
##
bitand
&
token:
/*
*/
//
header-name:
h-char-sequence:
h-char:
q-char-sequence:
q-char:
pp-number:
identifier:
identifier-nondigit:
nondigit: one of
digit: one of
override
final
export
alignas
continue
friend
register
true
alignof
decltype
goto
reinterpret_cast
try
asm
default
if
return
typedef
auto
delete
inline
short
typeid
bool
do
int
signed
typename
break
double
long
sizeof
union
case
dynamic_cast
mutable
static
unsigned
catch
else
namespace
static_assert
using
char
enum
new
static_cast
virtual
char16_t
explicit
noexcept
struct
void
char32_t
nullptr
switch
volatile
class
extern
operator
template
wchar_t
const
false
private
this
while
constexpr
float
protected
thread_local
const_cast
for
public
throw
preprocessing-op-or-punc: one of { } [ ] # ## ( ) <: :> <% %> %: %:%: ; : ... new delete ? :: . .* + - * / % ^ & | ~ ! = < > += -= *= /= %= ^= &= |= << >> >>= <<= == != <= >= && || ++ -- , ->* -> and and_eq bitand bitor compl not not_eq or or_eq xor xor_eq
Each preprocessing-op-or-punc is converted to a single token in translation phase 7 ([lex.phases]).
literal:
integer-literal:
decimal-literal:
octal-literal:
hexadecimal-literal:
nonzero-digit: one of
octal-digit: one of
hexadecimal-digit: one of
integer-suffix:
unsigned-suffix: one of
long-suffix: one of
long-long-suffix: one of
0x
0X
a
f
A
F
12
014
0XC
long int
unsigned int
long long int
unsigned long int
unsigned long long int
u
U
l
L
ll
LL
character-literal:
c-char-sequence:
c-char:
escape-sequence:
simple-escape-sequence: one of
octal-escape-sequence:
hexadecimal-escape-sequence:
'x'
u'y'
U'z'
L'x'
and implementation-defined value.
\"
\?
\'
\\
\n
\t
\v
\b
\r
\f
\a
\ooo
\xhhh
'u'
'U'
'L'
floating-literal:
fractional-constant:
exponent-part:
sign: one of
digit-sequence:
floating-suffix: one of
e
unless explicitly specified by a suffix. The suffixes f and F specify float, the suffixes l and L specify long double. If the scaled value is not in the range of representable values for its type, the program is ill-formed.
string-literal:
encoding-prefix: u8 u U L
s-char-sequence:
s-char:
raw-string:
r-char-sequence:
r-char:
d-char-sequence:
d-char:
R
u8
u8R
uR
UR
LR
"..."
R"(...)"
u8"..."
u8R"**(...)**"
u"..."
uR"*~(...)*~"
U"..."
UR"zzz(...)zzz"
L"..."
LR"(...)"
'('
')'
R"delimiter((a|b))delimiter"
"(a|b)"
const char *p = R"(a\ b c)"; assert(std::strcmp(p, "a\\\nb\nc") == 0);
R"a( )\ a" )a"
is equivalent to "\n)\\\na\"\n". The raw string
"\n)\\\na\"\n"
R"(??)"
is equivalent to "\?\?". The raw string
"\?\?"
R"#( )??=" )#"
"\n)\?\?=\"\n"
u8"asdf"
const char
u"asdf"
const char16_t
U"asdf"
const char32_t
L"asdf"
const wchar_t
u"a"
u"b"
u"ab"
U"a"
U"b"
U"ab"
L"a"
L"b"
L"ab"
"b"
"a"
Characters in concatenated strings are kept distinct.
"\xA" "B"
'\xA'
'B'
'\xAB'
'\0'
U'\0'
L'\0'
u'\0'
0x0
0x10FFFF
boolean-literal:
pointer-literal:
std::nullptr_t
\u
8
9
Source: https://timsong-cpp.github.io/cppwp/n4140/lex
The character designated by the universal-character-name \UNNNNNNNN is that character whose character short name in ISO/IEC 10646 is NNNNNNNN; the character designated by the universal-character-name \uNNNN is that character whose character short name in ISO/IEC 10646 is 0000NNNN. If the hexadecimal value for a universal-character-name corresponds to a surrogate code point (in the range 0xD800–0xDFFF, inclusive), the program is ill-formed. Additionally, if the hexadecimal value for a universal-character-name outside the c-char-sequence, s-char-sequence, or r-char-sequence of a character or string literal corresponds to a control character (in either of the ranges 0x00–0x1F or 0x7F–0x9F, both inclusive) or to a character in the basic source character set, the program is ill-formed.[cpp14 5]
binary-literal:
binary-digit:
0b
0B
0b1100
1048576
1'048'576
0X100000
0x10'0000
0'004'000'000
1.602'176'565e-19
1.602176565e-19
const char* p = R"(a\ b c)"; assert(std::strcmp(p, "a\\\nb\nc") == 0);
Source: https://timsong-cpp.github.io/cppwp/n4659/lex
List of Tables [tab] List of Figures [fig] 1 Scope [intro.scope] 2 Normative references [intro.refs] 3 Terms and definitions [intro.defs] 4 General principles [intro] 5 Lexical conventions [lex] 5.1 Separate translation [lex.separate] 5.2 Phases of translation [lex.phases] 5.3 Character sets [lex.charset] 5.4 Preprocessing tokens [lex.pptoken] 5.5 Alternative tokens [lex.digraph] 5.6 Tokens [lex.token] 5.7 Comments [lex.comment] 5.8 Header names [lex.header] 5.9 Preprocessing numbers [lex.ppnumber] 5.10 Identifiers [lex.name] 5.11 Keywords [lex.key] 5.12 Operators and punctuators [lex.operators] 5.13 Literals [lex.literal] 6 Basic concepts [basic] 7 Standard conversions [conv] 8 Expressions [expr] 9 Statements [stmt.stmt] 10 Declarations [dcl.dcl] 11 Declarators [dcl.decl] 12 Classes [class] 13 Derived classes [class.derived] 14 Member access control [class.access] 15 Special member functions [special] 16 Overloading [over] 17 Templates [temp] 18 Exception handling [except] 19 Preprocessing directives [cpp] 20 Library introduction [library] 21 Language support library [language.support] 22 Diagnostics library [diagnostics] 23 General utilities library [utilities] 24 Strings library [strings] 25 Localization library [localization] 26 Containers library [containers] 27 Iterators library [iterators] 28 Algorithms library [algorithms] 29 Numerics library [numerics] 30 Input/output library [input.output] 31 Regular expressions library [re] 32 Atomic operations library [atomics] 33 Thread support library [thread] Annex A (informative) Grammar summary [gram] Annex B (informative) Implementation quantities [implimits] Annex C (informative) Compatibility [diff] Annex D (normative) Compatibility features [depr] Index [generalindex] Index of library names [libraryindex] Index of implementation-defined behavior [impldefindex]
The character designated by the universal-character-name \UNNNNNNNN is that character whose character short name in ISO/IEC 10646 is NNNNNNNN; the character designated by the universal-character-name \uNNNN is that character whose character short name in ISO/IEC 10646 is 0000NNNN. If the hexadecimal value for a universal-character-name corresponds to a surrogate code point (in the range 0xD800–0xDFFF, inclusive), the program is ill-formed. Additionally, if the hexadecimal value for a universal-character-name outside the c-char-sequence, s-char-sequence, or r-char-sequence of a character or string literal corresponds to a control character (in either of the ranges 0x00–0x1F or 0x7F–0x9F, both inclusive) or to a character in the basic source character set, the program is ill-formed.[cpp17 5]
If the next character begins a sequence of characters that could be the prefix and initial double quote of a raw string literal, such as R", the next preprocessing token shall be a raw string literal. Between the initial and final double quote characters of the raw string, any transformations performed in phases 1 and 2 (universal-character-names and line splicing) are reverted; this reversion shall apply before any d-char, r-char, or delimiting parenthesis is identified. The raw string literal is defined as the shortest sequence of characters that matches the raw-string pattern
Otherwise, if the next three characters are <:: and the subsequent character is neither : nor >, the < is treated as a preprocessing token by itself and not as the first character of the alternative token <:.
<::
Otherwise, the next preprocessing token is the longest sequence of characters that could constitute a preprocessing token, even if that would cause further lexical analysis to fail, except that a header-name is only formed within a #include directive.
0xe+foo
0xe
+
foo
00A8
00AA
00AD
00AF
00B2-00B5
00B7-00BA
00BC-00BE
00C0-00D6
00D8-00F6
00F8-00FF
0100-167F
1681-180D
180F-1FFF
200B-200D
202A-202E
203F-2040
2054
2060-206F
2070-218F
2460-24FF
2776-2793
2C00-2DFF
2E80-2FFF
3004-3007
3021-302F
3031-D7FF
F900-FD3D
FD40-FDCF
FDF0-FE44
FE47-FFFD
10000-1FFFD
20000-2FFFD
30000-3FFFD
40000-4FFFD
50000-5FFFD
60000-6FFFD
70000-7FFFD
80000-8FFFD
90000-9FFFD
A0000-AFFFD
B0000-BFFFD
C0000-CFFFD
D0000-DFFFD
E0000-EFFFD
0300-036F
1DC0-1DFF
20D0-20FF
FE20-FE2F
Each identifier that contains a double underscore __ or begins with an underscore followed by an uppercase letter is reserved to the implementation for any use.
__
Each identifier that begins with an underscore is reserved to the implementation for use as a name in the global namespace.
NoteThe export and register keywords are unused but are reserved for future use.
<span id="nt:preprocessing-op-or-punc">preprocessing-op-or-punc:</span> one of { } [ ] # ## ( ) <: :> <% %> %: %:%: ; : ... new delete ? :: . .{{*}} + - {{*}} / % ^ & {{or}} ~ ! {{=}} < > +{{=}} -{{=}} {{*}}{{=}} /{{=}} %{{=}} ^{{=}} &{{=}} {{or}}{{=}} << >> >>{{=}} <<{{=}} {{=}}{{=}} !{{=}} <{{=}} >{{=}} && {{or}}{{or}} ++ -- , ->{{*}} -> and and_eq bitand bitor compl not not_eq or or_eq xor xor_eq
Each preprocessing-op-or-punc is converted to a single token in translation phase 7.
hexadecimal-prefix: one of
hexadecimal-digit-sequence:
encoding-prefix: one of
u8'w'
u'x'
U'y'
L'z'
decimal-floating-literal:
hexadecimal-floating-literal:
hexadecimal-fractional-constant:
binary-exponent-part:
p
P
49.625
0xC.68p+2
in the prefix is a raw string literal. The d-char-sequence serves as a delimiter. The terminating d-char-sequence of a raw-string is the same sequence of characters as the initial d-char-sequence. A d-char-sequence shall consist of at most 16 characters.
std::nullptr_t
Source: https://timsong-cpp.github.io/cppwp/n4868/lex
1 Scope [intro.scope] 2 Normative references [intro.refs] 3 Terms and definitions [intro.defs] 4 General principles [intro] 5 Lexical conventions [lex] 5.1 Separate translation [lex.separate] 5.2 Phases of translation [lex.phases] 5.3 Character sets [lex.charset] 5.4 Preprocessing tokens [lex.pptoken] 5.5 Alternative tokens [lex.digraph] 5.6 Tokens [lex.token] 5.7 Comments [lex.comment] 5.8 Header names [lex.header] 5.9 Preprocessing numbers [lex.ppnumber] 5.10 Identifiers [lex.name] 5.11 Keywords [lex.key] 5.12 Operators and punctuators [lex.operators] 5.13 Literals [lex.literal] 6 Basics [basic] 7 Expressions [expr] 8 Statements [stmt.stmt] 9 Declarations [dcl.dcl] 10 Modules [module] 11 Classes [class] 12 Overloading [over] 13 Templates [temp] 14 Exception handling [except] 15 Preprocessing directives [cpp] 16 Library introduction [library] 17 Language support library [support] 18 Concepts library [concepts] 19 Diagnostics library [diagnostics] 20 General utilities library [utilities] 21 Strings library [strings] 22 Containers library [containers] 23 Iterators library [iterators] 24 Ranges library [ranges] 25 Algorithms library [algorithms] 26 Numerics library [numerics] 27 Time library [time] 28 Localization library [localization] 29 Input/output library [input.output] 30 Regular expressions library [re] 31 Atomic operations library [atomics] 32 Thread support library [thread] Annex A (informative) Grammar summary [gram] Annex B (normative) Implementation quantities [implimits] Annex C (informative) Compatibility [diff] Annex D (normative) Compatibility features [depr] Index [generalindex] Index of grammar productions [grammarindex] Index of library headers [headerindex] Index of library names [libraryindex] Index of library concepts [conceptindex] Index of implementation-defined behavior [impldefindex]
NoteA C++ program need not all be translated at the same time.
a b c d e f g h i j k l m n o p q r s t u v w x y z A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 0 1 2 3 4 5 6 7 8 9 _ { } [ ] # ( ) < > % : ; . ? {{*}} + - / ^ & {{or}} ~ ! {{=}} , \ " {{'}}
A universal-character-name designates the character in ISO/IEC 10646 (if any) whose code point is the hexadecimal number represented by the sequence of hexadecimal-digits in the universal-character-name. The program is ill-formed if that number is not a code point or if it is a surrogate code point. Noncharacter code points and reserved code points are considered to designate separate characters distinct from any ISO/IEC 10646 character. If a universal-character-name outside the c-char-sequence, s-char-sequence, or r-char-sequence of a character-literal or string-literal (in either case, including within a user-defined-literal) corresponds to a control character or to a character in the basic source character set, the program is ill-formed.[cpp20 5] NoteISO/IEC 10646 code points are integers in the range [0,10FFFF] (hexadecimal). A surrogate code point is a value in the range [D800,DFFF] (hexadecimal). A control character is a character whose code point is in either of the ranges [0,1F] or [7F,9F] (hexadecimal).
import
module
Otherwise, the next preprocessing token is the longest sequence of characters that could constitute a preprocessing token, even if that would cause further lexical analysis to fail, except that a header-name ([lex.header]) is only formed
after the include or import preprocessing token in an #include ([cpp.include]) or import ([cpp.import]) directive, or
include
within a has-include-expression.
NoteNone has any observable spelling.
NoteSome whitespace is required to separate otherwise adjacent identifiers, keywords, numeric literals, and alternative tokens containing alphabetic characters.
NoteThe comment characters //, /*, and */ have no special meaning within a // comment and are treated just like other characters. Similarly, the comment characters // and /* have no special meaning within a /* comment.
__has_include
The sequences in both forms of header-names are mapped in an implementation-defined manner to headers or to external source file names as specified in [cpp.include].
keyword:
NoteThe register keyword is unused but is reserved for future use.
constinit
co_await
requires
co_return
co_yield
char8_t
concept
consteval
preprocessing-op-or-punc:
preprocessing-operator: one of
operator-or-punctuator: one of
Each operator-or-punctuator is converted to a single token in translation phase 7.
binary-digit: one of
NoteThe prefix and any optional separating single quotes are ignored when determining the value.
a through f and A through F have decimal values ten through fifteen.
NoteThat is, provided the code point value is in the range [0,7F] (hexadecimal). If the value is not representable with a single UTF-8 code unit, the program is ill-formed. A UTF-8 character literal containing multiple c-chars is ill-formed.
NoteThat is, provided the code point value is in the range [0,FFFF] (hexadecimal). If the value is not representable with a single 16-bit code unit, the program is ill-formed. A UTF-16 character literal containing multiple c-chars is ill-formed.
NoteThe type wchar_t is able to represent all members of the execution wide-character set (see [basic.fundamental]). The value of a wide-character literal containing multiple c-chars is implementation-defined.
NoteIf the value of a character-literal prefixed by u, u8, or U is outside the range defined for its type, the program is ill-formed.
NoteIn translation phase 1, a universal-character-name is introduced whenever an actual extended character is encountered in the source text. Therefore, all extended characters are described in terms of universal-character-names. However, the actual compiler implementation can use its own native character set, so long as the same results are obtained.
floating-point-literal:
decimal-floating-point-literal:
hexadecimal-floating-point-literal:
floating-point-suffix: one of
NoteAny optional separating single quotes are ignored when determining the value. If an exponent-part or binary-exponent-part is present, the exponent e of the floating-point-literal is the result of interpreting the sequence of an optional sign and the digits as a base 10 integer. Otherwise, the exponent e is 0. The scaled value of the literal is s×10e for a decimal-floating-point-literal and s×2e for a hexadecimal-floating-point-literal.
49.625 and 0xC.68p+2 have the same value. The floating-point-literals
R"(x = "\"y\"")"
"x = \"\\\"y\\\"\""
const char8_t
NoteA single c-char may produce more than one char16_t character in the form of surrogate pairs. A surrogate pair is a representation for a single code point as a sequence of two 16-bit code units.
wchar_t”, where n is the size of the string as defined below; it is initialized with the given characters.
NoteThis concatenation is an interpretation, not a conversion. Because the interpretation happens in translation phase 6 (after each character from a string-literal has been translated into a value from the appropriate character set), a string-literal's initial rawness has no effect on the interpretation or well-formedness of the concatenation. Table 11 has some examples of valid concatenations.
NoteThe size of a char16_t string literal is the number of code units, not the number of characters.
NoteAny universal-character-names are required to correspond to a code point in the range [0, D800) or [E000, 10FFFF] (hexadecimal) ([lex.charset]). The size of a narrow string literal is the total number of escape sequences and other characters, plus at least one for the multibyte encoding of each universal-character-name, plus one for the terminating '\0'.
NoteThe effect of attempting to modify a string-literal is undefined.
Notestd::nullptr_t is a distinct type that is neither a pointer type nor a pointer-to-member type; rather, a prvalue of this type is a null pointer constant and can be converted to a null pointer value or null member pointer value. See [conv.ptr] and [conv.mem].
Source: https://timsong-cpp.github.io/cppwp/n4950/lex
1 Scope [intro.scope] 2 Normative references [intro.refs] 3 Terms and definitions [intro.defs] 4 General principles [intro] 5 Lexical conventions [lex] 5.1 Separate translation [lex.separate] 5.2 Phases of translation [lex.phases] 5.3 Character sets [lex.charset] 5.4 Preprocessing tokens [lex.pptoken] 5.5 Alternative tokens [lex.digraph] 5.6 Tokens [lex.token] 5.7 Comments [lex.comment] 5.8 Header names [lex.header] 5.9 Preprocessing numbers [lex.ppnumber] 5.10 Identifiers [lex.name] 5.11 Keywords [lex.key] 5.12 Operators and punctuators [lex.operators] 5.13 Literals [lex.literal] 6 Basics [basic] 7 Expressions [expr] 8 Statements [stmt.stmt] 9 Declarations [dcl.dcl] 10 Modules [module] 11 Classes [class] 12 Overloading [over] 13 Templates [temp] 14 Exception handling [except] 15 Preprocessing directives [cpp] 16 Library introduction [library] 17 Language support library [support] 18 Concepts library [concepts] 19 Diagnostics library [diagnostics] 20 Memory management library [mem] 21 Metaprogramming library [meta] 22 General utilities library [utilities] 23 Strings library [strings] 24 Containers library [containers] 25 Iterators library [iterators] 26 Ranges library [ranges] 27 Algorithms library [algorithms] 28 Numerics library [numerics] 29 Time library [time] 30 Localization library [localization] 31 Input/output library [input.output] 32 Regular expressions library [re] 33 Concurrency support library [thread] Annex A (informative) Grammar summary [gram] Annex B (normative) Implementation quantities [implimits] Annex C (informative) Compatibility [diff] Annex D (normative) Compatibility features [depr] Annex E (informative) Conformance with UAX #31 [uaxid] Bibliography [bibliography] Index [generalindex] Index of grammar productions [grammarindex] Index of library headers [headerindex] Index of library names [libraryindex] Index of library concepts [conceptindex] Index of implementation-defined behavior [impldefindex]
each abstract character assigned a code point in the Unicode codespace as specified in the Unicode Standard, and
a distinct character for each Unicode scalar value not assigned to an abstract character.
NoteUnicode code points are integers in the range [0, 10FFFF] (hexadecimal). A surrogate code point is a value in the range [D800, DFFF] (hexadecimal). A Unicode scalar value is any code point that is not a surrogate code point.
NoteUnicode short names are given only as a means to identifying the character; the numerical value has no other meaning in this context.
%
(
)
*
,
-
.
/
0 1 2 3 4 5 6 7 8 9
;
=
A B C D E F G H I J K L M
N O P Q R S T U V W X Y Z
_
a b c d e f g h i j k l m
n o p q r s t u v w x y z
n-char: one of
n-char-sequence:
named-universal-character:
simple-hexadecimal-digit-sequence:
\U
\u{simple-hexadecimal-digit-sequence}
NoteThese aliases are listed in the Unicode Character Database's NameAliases.txt. None of these names or aliases have leading or trailing spaces.
NameAliases.txt
NoteA sequence of characters resembling a universal-character-name in an r-char-sequence ([lex.string]) does not form a universal-character-name.
NoteA character not in the basic literal character set can be encoded with more than one code unit; the value of such a code unit can be the same as that of a code unit for an element of the basic literal character set. The U+0000 null character is encoded as the value 0. No other element of the translation character set is encoded with a code unit of value 0. The code unit value of each decimal digit character after the digit 0 (U+0030) shall be one greater than the value of the previous. The ordinary and wide literal encodings are otherwise implementation-defined. For a UTF-8, UTF-16, or UTF-32 literal, the implementation shall encode the Unicode scalar value corresponding to each character of the translation character set as specified in the Unicode Standard for the respective Unicode encoding form.
identifier-start:
identifier-continue:
The program is ill-formed if an identifier does not conform to Normalization Form C as specified in the Unicode Standard. NoteIdentifiers are case-sensitive.
Note[uaxid] compares the requirements of UAX #31 of the Unicode Standard with the C++ rules for identifiers.
NoteIn translation phase 4, identifier also includes those preprocessing-tokens ([lex.pptoken]) differentiated as keywords ([lex.key]) in the later translation phase 7 ([lex.token]).
NoteWhen appearing as an expression, a literal has a type and a value category ([expr.prim.literal]).
size-suffix: one of
z
Z
std::size_t
basic-c-char:
simple-escape-sequence:
simple-escape-sequence-char: one of
numeric-escape-sequence:
simple-octal-digit-sequence:
conditional-escape-sequence:
conditional-escape-sequence-char:
NoteThe associated character encoding for ordinary character literals determines encodability, but does not determine the value of non-encodable ordinary character literals or ordinary multicharacter literals. The examples in Table 9 for non-encodable ordinary character literals assume that the specified character lacks representation in the ordinary literal encoding or that encoding the character would require more than one code unit.
'v'
'\U0001F525'
'abcd'
L'w'
u8'x'
A character-literal with a c-char-sequence consisting of a single basic-c-char, simple-escape-sequence, or universal-character-name is the code unit value of the specified character as encoded in the literal's associated character encoding. NoteIf the specified character lacks representation in the literal's associated character encoding or if it cannot be encoded as a single code unit, then the literal is a non-encodable character literal.
A character-literal with a c-char-sequence consisting of a single numeric-escape-sequence has a value as follows:
Let v be the integer value represented by the octal number comprising the sequence of octal-digits in an octal-escape-sequence or by the hexadecimal number comprising the sequence of hexadecimal-digits in a hexadecimal-escape-sequence.
If v does not exceed the range of representable values of the character-literal's type, then the value is v.
Otherwise, if the character-literal's encoding-prefix is absent or L, and v does not exceed the range of representable values of the corresponding unsigned type for the underlying type of the character-literal's type, then the value is the unique value of the character-literal's type T that is congruent to v modulo 2N, where N is the width of T.
T
Otherwise, the character-literal is ill-formed.
A character-literal with a c-char-sequence consisting of a single conditional-escape-sequence is conditionally-supported and has an implementation-defined value.
NoteUsing an escape sequence for a question mark is supported for compatibility with ISO C++ 2014 and ISO C.
NoteThe floating-point suffixes f16, f32, f64, f128, bf16, F16, F32, F64, F128, and BF16 are conditionally-supported. See [basic.extended.fp].
f16
f32
f64
f128
bf16
F16
F32
F64
F128
BF16
std::float16_t
std::float32_t
std::float64_t
std::float128_t
std::bfloat16_t
basic-s-char:
"ordinary string"
R"(ordinary raw string)"
L"wide string"
LR"w(wide raw string)w"
u8"UTF-8 string"
u8R"x(UTF-8 raw string)x"
u"UTF-16 string"
uR"y(UTF-16 raw string)y"
U"UTF-32 string"
UR"z(UTF-32 raw string)z"
NoteA string-literal's rawness has no effect on the determination of the common encoding-prefix.
R"(\u00)" "41"
'A'
NoteThe effect of attempting to modify a string literal object is undefined.
The sequence of characters denoted by each contiguous sequence of basic-s-chars, r-chars, simple-escape-sequences ([lex.ccon]), and universal-character-names ([lex.charset]) is encoded to a code unit sequence using the string-literal's associated character encoding. If a character lacks representation in the associated character encoding, then the string-literal is conditionally-supported and an implementation-defined code unit sequence is encoded. NoteNo character lacks representation in any Unicode encoding form. When encoding a stateful character encoding, implementations should encode the first such sequence beginning with the initial encoding state and encode subsequent sequences beginning with the final encoding state of the prior sequence. NoteThe encoded code unit sequence can differ from the sequence of code units that would be obtained by encoding each character independently.
Each numeric-escape-sequence ([lex.ccon]) contributes a single code unit with a value as follows:
If v does not exceed the range of representable values of the string-literal's array element type, then the value is v.
Otherwise, if the string-literal's encoding-prefix is absent or L, and v does not exceed the range of representable values of the corresponding unsigned type for the underlying type of the string-literal's array element type, then the value is the unique value of the string-literal's array element type T that is congruent to v modulo 2N, where N is the width of T.
Otherwise, the string-literal is ill-formed.
When encoding a stateful character encoding, these sequences should have no effect on encoding state.
Each conditional-escape-sequence ([lex.ccon]) contributes an implementation-defined code unit sequence. When encoding a stateful character encoding, it is implementation-defined what effect these sequences have on encoding state.