Use unicode characters in identifiers #13753

sxwangzhiwen · 2022-12-03T09:27:24Z

At present, the function name and variable name of Zig language can only use ASCII characters. It is expected that future versions will consider allowing the use of unicode encoded characters in identifiers.

I wrote a program fragment to judge unicode character derived property for reference, see: https://github.com/sxwangzhiwen/unicodeid

The source of data acquisition and generation is https://www.unicode.org/Public/UCD/latest/ucd/DerivedCoreProperties.txt

There are four functions in unicodeid.zig: isID_Start(), isID_Continue(), isXID_Start(), isXID_Continue()

It can judge the derived property of UTF-8 character encoding in UTF-8 string.

The four functions can be used in the. Identifier and else branches of the src/lib/std/zig/tokenizer.zig and src/lib/std/zig/c/tokenizer.zig next() functions, as required, to allow the use of UTF-8 characters conforming to unicode encoding in identifiers.

IntegratedQuantum · 2022-12-03T12:10:09Z

It may be a bit clunky to use, but zig currently supports arbitrary (unicode) identifiers using @"...".
For example:

const @"π" = 3.141592653589;
const @"π²/6" = @"π" * @"π" / 6;

sxwangzhiwen · 2022-12-03T12:45:09Z

Yes, it's OK. After all, it is not as convenient as using unicode characters directly.

Now, major programming languages such as python, javascript, java, rust, etc. support the direct use of unicode characters in identifiers.

Therefore, it is recommended that Zig also support it.

Vexu · 2022-12-03T12:50:46Z

Duplicate of #3947, #4151

loftafi · 2025-03-06T12:30:43Z

Zig would be the only "modern" language that doesn't allow non-English speakers to choose variable names in their own language right?

Vexu closed this as completed Dec 3, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use unicode characters in identifiers #13753

Use unicode characters in identifiers #13753

sxwangzhiwen commented Dec 3, 2022

IntegratedQuantum commented Dec 3, 2022

sxwangzhiwen commented Dec 3, 2022

Vexu commented Dec 3, 2022

loftafi commented Mar 6, 2025

Use unicode characters in identifiers #13753

Use unicode characters in identifiers #13753

Comments

sxwangzhiwen commented Dec 3, 2022

IntegratedQuantum commented Dec 3, 2022

sxwangzhiwen commented Dec 3, 2022

Vexu commented Dec 3, 2022

loftafi commented Mar 6, 2025