This trick is very useful if you need to find all code points with given name or property. Let's find all characters that have ogonek (tiny tail) from previous post.
$ raku -e 'for 1..1_112_064 {
next unless .uniname.contains( "WITH OGONEK" );
say .chr, " ", .uniname;
}'
Ą LATIN CAPITAL LETTER A WITH OGONEK
ą LATIN SMALL LETTER A WITH OGONEK
Ę LATIN CAPITAL LETTER E WITH OGONEK
ę LATIN SMALL LETTER E WITH OGONEK
Į LATIN CAPITAL LETTER I WITH OGONEK
į LATIN SMALL LETTER I WITH OGONEK
Ų LATIN CAPITAL LETTER U WITH OGONEK
ų LATIN SMALL LETTER U WITH OGONEK
Ǫ LATIN CAPITAL LETTER O WITH OGONEK
ǫ LATIN SMALL LETTER O WITH OGONEK
Ǭ LATIN CAPITAL LETTER O WITH OGONEK AND MACRON
ǭ LATIN SMALL LETTER O WITH OGONEK AND MACRON
In Raku you can call uniname
or chr
methods on integer value directly to get code point name or character under this code point respectively. If you are not familiar with .method
syntax - this is just a lazy way to call a method inside a block on whatever value your iteration is at the moment, without assigning it explicitly to named variable. If you want you can be more explicit about it like: for 1..1_112_064 -> $codepoint { next unless $codepoint.uniname... }
.
Did you know that:
- There are 899 digits defined in Unicode?
$ raku -e '( 1 .. 1_112_064 ).grep( *.uniname.contains( "DIGIT" ) ).elems.say;'
899
- There are 154 sentence terminals?
$ raku -e '( 1 .. 1_112_064 ).grep( *.uniprop( "Sentence_Terminal" ) ).elems.say;'
154
(Unicode properties will be explained in next post)
- Within those 154 sentence terminals there are 22 question marks?
$ raku -e 'for 1 .. 1_112_064 { next unless .uniname.ends-with( "QUESTION MARK" ); say .chr, " ", .uniname; }'
? QUESTION MARK
¿ INVERTED QUESTION MARK
; GREEK QUESTION MARK
՞ ARMENIAN QUESTION MARK
؟ ARABIC QUESTION MARK
፧ ETHIOPIC QUESTION MARK
᥅ LIMBU QUESTION MARK
...
So, what was the funniest thing you found in Unicode?¿⸮
Top comments (0)