DEV Community

Tomasz Wegrzanowski
Tomasz Wegrzanowski

Posted on

Open Source Adventures: Episode 75: Issues with Crystal Char type

While I was writing puzzle game solvers in Z3 Crystal, one issue I was running into more than anything else was existence of Char type.

Most modern languages don't have character type. "foo"[0] in most languages is a "f" - a String that just so happens to be one character long.

Having a separate Char hugely complicates APIs. I can see why it could be a useful thing for performance, but the complexity cost is real.

Why character type is problematic in general

The main reason is that in Unicode world, a lot of operations you might intuitively thing would work on characters, actually don't. But they work on strings just fine.

Just one such operation out of many is upper-casing something. Here's Crystal:

puts "ß".upcase # outputs correctly uppercased SS
puts 'ß'.upcase # outputs lowercase ß
Enter fullscreen mode Exit fullscreen mode

Ouch!

There's a lot of situations where uppercasing a length 1 string results in a string longer than 1.

So a language that has separate character types has a choice - either don't support any such operations on characters (which would be a huge pain), or implement them not quite correctly (like Crystal does).

Crystal specific issues

In Crystal I was writing a lot of code like c == "." or c =~ /[0-9]/.

The problem here is that they simply return false or nil, and do not complain any type issues. So I have code that looks perfectly fine, and it would run perfectly fine in Ruby and most other languages, and for which type checker isn't complaining in any way, and yet it is statically wrong.

So here are some questions:

Should Crystal have exposed Char in the first place? If I was designing a language, I wouldn't add such type, or just have an internal one not exposed in regular APIs, but obviously that would be a huge change, so I doubt this would even be a consideration at this point.

Should "a" == 'a'? Sure, they're different types, but 420 == 420.0 is true even though they're different types too, so it's not inherently impossible. I'm not sure what would be the implications here.

Should Char =~ Regexp match it as if it was a length one String? I'd say probably yes to this one, at least I'm not seeing a big downside, and it has very obvious meaning, and it's difficult to express it otherwise.

Should == or =~ with mismatching types pass type check? Obviously yes due to union types. If x is String | Nil, then x == nil which means "foo" == nil must be valid code. And same argument for =~.

Should == or =~ with types that cannot match produce a warning? Now here's an interesting question. If we statically know that a == b or a =~ b will be false/nil due to types of a and b, the odds are good that it might be programmer error, not intended code. And it doesn't seem like a terribly complicated analysis to do. So should Crystal warn in such case? Like with all warnings, that's mainly a question of false positive rate, as overly aggressive linters are a huge pain.

Coming next

OK, that's enough Crystal for now. In the next episode we'll try another technology as promised.

Top comments (5)

Collapse
 
asterite profile image
Ary Borenszweig

Thank you! This is a very well put article and I agree with most of what you say. I opened a forum discussion about this to gather some ideas: forum.crystal-lang.org/t/fair-crit... . I don't think we can change things due to Crystal 1.0 backwards compatibility promise, but we can think about these things for 2.0, and maybe allow at least matching a Char against a regex.

Collapse
 
taw profile image
Tomasz Wegrzanowski

I'd not count any of the "Better C++" languages as modern, even if they were released yesterday. "Better C++" languages all intentionally sacrifice productivity for other goals like performance (just how many string types Rust has? feels like it's at least 10).

My list would be more like (latest major version) Ruby, Python, JavaScript, Raku etc. I checked a bunch of what I considered modern languages, and Julia and Crystal seem to be the only ones with a separate Char type.

Anyway, what do you think a false positive rate would be if Crystal had a warning for statically type-impossible == or =~? I think it's relatively safe with a more traditional type system, but maybe what Crystal is doing makes this impossible.

Collapse
 
asterite profile image
Ary Borenszweig

Given that Crystal has union types it's essentially impossible to make == and =~ type safe. For example, say we want to restrict comparing numbers against numbers only, never strings. But now you have a variable of type Int32 | String. You want to check if that's equal to "hello". The compiler won't let you write that program because it will say "I can't compare Int32 with String". You'd have to write something like x.is_a?(String) && x == "hello". So it will lead to incredibly verbose code.

The same argument applies to letting any type only be comparable to some other type.

With =~ or === we could maybe make it more strict, not sure. But for example if you have a variable of type String | Nil and we'd like to disallow =~ for nil (it will never match) then you'd have to write !value.nil? && value =~ "...".

In the end, this was just a tradeoff between verboseness vs. how common it is to fall into this trap. If you know that String#[Int32] returns a Char, and that you can't compare Char with Regex, then you won't do it. So I think it's just a matter of how much exposure you had to the language before.

(every language has some of this, it's inevitable)

Thread Thread
 
taw profile image
Tomasz Wegrzanowski

I didn't mean it as a type check, I meant it more as a linter warning if == is statically impossible. A few languages and linters have some sort of Condition is always false warning.

In Crystal's case it would be based on type overlap. So the idea is that ARGV[0] == nil would trigger this linter rule (String and Nil don't overlap, so it's always false), but ARGV[0]? == nil wouldn't (String|Nil and Nil overlap).

I'm not sure how practical that would be.

Thread Thread
 
asterite profile image
Ary Borenszweig

I'm almost sure that there will be a lot of false positives, based on how the compiler and language works (or put another way: the type system.) Maybe there's a rule for this already in Ameba (a popular Crystal linter)