If I got this correctly, you're implying that \d should only match ASCII digits, right? We should use Nd to match any unicode digit, and not \d. The massive bug is to make \d == `<:Nd>
Yes, it is a massive bug. It causes a lot of programs to match a lot more than they expect, including very likely a lot of security validations. Everyone including people who wrote those docs assumes \d matches ASCII digits only, and this is needed for basically any parsing of either machine format or human text.
It is exceedingly rare to want to match <:Nd> (I double anyone ever actually used that), and if you absolutely need to, well, you can say <:Nd>, or more likely some more specific range.
It won't even do for extracting numbers from natural language text, as most common numerical systems (Roman and Chinese numerals) don't match <:Nd> as they reuse letters.
If I got this correctly, you're implying that
\d
should only match ASCII digits, right? We should useNd
to match any unicode digit, and not \d. The massive bug is to make\d
== `<:Nd>Yes, it is a massive bug. It causes a lot of programs to match a lot more than they expect, including very likely a lot of security validations. Everyone including people who wrote those docs assumes
\d
matches ASCII digits only, and this is needed for basically any parsing of either machine format or human text.It is exceedingly rare to want to match
<:Nd>
(I double anyone ever actually used that), and if you absolutely need to, well, you can say<:Nd>
, or more likely some more specific range.It won't even do for extracting numbers from natural language text, as most common numerical systems (Roman and Chinese numerals) don't match
<:Nd>
as they reuse letters.They don't really reuse letter codepoints; they use a different codepoint in Unicode. They match <:N> alright, and also <:Nl>:
Nice one, I didn't know they had separate characters for Roman numerals in Unicode. I don't think it's actually used in the wild much, still, nice.