DEV Community

Florian Pigorsch
Florian Pigorsch

Posted on • Edited on

Go: Identifiers vs. Unicode

A recent Reddit post about Unicode characters in Go identifiers sparked my interest to dive into the Go spec and look things up properly:

According to the spec, the syntax for valid identifiers is

identifier = letter { letter | unicode_digit }
Enter fullscreen mode Exit fullscreen mode

with

letter = unicode_letter | "_"
unicode_letter = /* a Unicode code point classified as "Letter" */ .
unicode_digit  = /* a Unicode code point classified as "Number, decimal digit" */ .
Enter fullscreen mode Exit fullscreen mode

The "Letter" category consists of the Unicode categories Lu (uppercase letters), Ll (lowercase letters), Lt (titlecase letters), Lm (modifier letters), and Lo (other letters), where "Number, decimal digit" refers to the Unicode category Nd.

So an identifier has to start with either a "letter" or an underscore ("_"), and must contain only "letters", "decimal digits" and "underscores" - according to what's defined as letters and digits in Unicode.
The set of letters is not only the usual A-Z, a-z, but also letters from other scripts, like greek letters (e.g. Σ, or CJK characters (e.g. ). The same holds for digits - not only 0-9, but also digits from other scripts are allowed: e.g. , ٣, etc.

Valid identifiers:

Invalid identifiers:

  • 42 (does not start with a letter)
  • 😀 (not a letter, but So / Symbol, other)
  • (not a letter, but So / Symbol, other)
  • x🌞 (starts with a letter, but contains non-letter/digit characters)

Although Go considers identifiers valid that contain other characters than A-Z, a-z, 0-9, and _, it's generally not advisable to use those - because of readability, accessibility, or even to avoid rendering issues.

Top comments (0)