Serge « sans paille » Guelton
Compiler Engineer / Wood Craft Lover / RedHat employee
Friday'Con — 25 March 2022
Read:
ដ | ἴ | ಣ |
---|---|---|
➫ | ✞ | ⚫ |
ば | 云 | Ж |
Hangul filler, U+3164: ㅤ
Alveolar Click, U+01C3: ǃ
Zero-width Space, U+200B:
[code];[name];[gc];[cc];[bc];[decomposition];;;[nv];[bm];[alias];;;;
'bc' = bidi (bidirectional) category [L, R etc]
'bm' = bidi mirrored [N or Y]
Age 1.1 ASCII Yes * Bidi_Class Left_to_Right * Bidi_Mirrored No General_Category Letter
see also:
Age 1.1 ASCII Yes * Bidi_Class Other_Neutral * Bidi_Mirrored Yes * Bidi_Mirroring_Glyph ) * Bidi_Paired_Bracket ) Block Basic_Latin General_Category Open_Punctuation
Age 1.1 ASCII No * Bidi_Class Right_to_Left * Bidi_Mirrored No General_Category Letter
⇒⇒⇒⇒⇒⇒⇒⇒⇒⇒↴ ↶←←←←←←←←←↵ ↳⇒⇒⇒⇒⇒⇒⇒⇒⇒↴ .←←←←←←←←←↵
Abbr. | Code Point | Description |
---|---|---|
RLO | U+202E | Force following characters to be treated as strong right-to-left characters. |
LRI | U+2066 | Treat the following text as isolated and left-to-right. |
PDI | U+2069 | End the scope of the last LRI, RLI, or FSI. |
/* <U+0x202E> } <U+0x2066> if (isAdmin) <U+0x2069> <U+0x2066> begin admins only */
/* begin admins only */ if (isAdmin) {
See http://www.unicode.org/reports/tr9
Important notes:
Warn about Bidi characters in
Invariant: before the closing */ we must be back to the initial state
Pros:
Cons:
In GCC: -Wbidi-chars, see https://godbolt.org/z/MM3na11rj
In clang: not supported
In clang-tidy: misc-misleading-bidirectional, see github rendering
א = ג ;
What's assigned to what?
In clang-tidy: misc-misleading-identifier
һ = 1 h = 2 print(һ)
What's printed?
RTFM: http://www.unicode.org/reports/tr39/#def-skeleton
In gcc12: -Whomoglyph
In clang-tidy: https://reviews.llvm.org/D112916
I am Not a Linguist, but...
code ≠ text
compiler ≠ renderer
But