Tomasz Wegrzanowski

Posted on Dec 27, 2021

100 Languages Speedrun: Episode 37: OCaml

#ocaml

OCaml is functional programming language with very weird static type system (once you get past the basics), and what's possibly the ugliest syntax of any major programming language. Double semicolons are only the start of it.

Hello, World!

(* Hello, World! in OCaml *)
print_string "Hello, World!\n"

Multiple statements

So far it wasn't bad. Let's try to define a function and call it:

let ask_for_name () = (
  print_string "What's your name? ";
  read_line()
);;

print_string ("Hello, "^ask_for_name()^"!\n");;

$  ocaml name.ml
What's your name? Kitty
Hello, Kitty!

So that worked, but that syntax is quite awful. ; within an expression and ;; between expressions.

Fibonacci

let rec fib n =
  if n <= 2
    then 1
    else fib (n - 1) + fib (n - 2)
;;

for i = 1 to 20 do
  print_string("fib(" ^ (string_of_int(i)) ^ ")=" ^ (string_of_int(fib(i))) ^ "\n")
done;;

There's no string interpolation, and we need explicit type conversions. In OCaml we don't even do the usual (for verbose statically typed languages) "convert to X" - we need specifically "convert to X from Y" so it's twice as verbose.

To define recursive function we need to specify let rec instead of the usual let, which is a stupid design decision a lot of functional languages make.

Oh and I'm typing a few parentheses more than idiomatic OCaml code would use, as I think that's going to be more readable for non-OCaml developers.

Unicode

Not only OCaml has no string interpolation, it doesn't even have any equivalent of console.log. Well, why don't we create our own!

type printable =
    S of string
  | I of int
;;

let printable_to_string = function
    S s -> s
  | I i -> string_of_int i
;;

let rec string_join sep = function
    [] -> ""
  | [s] -> s
  | (s::ss) -> s ^ sep ^ (string_join sep ss)
;;

let console_log list =
  print_string ((string_join " " (List.map printable_to_string list)) ^ "\n")
;;

console_log [(S "Length of [1; 2; 3] is "); (I (List.length [1; 2; 3]))];;
console_log [(S "Length of \"Hello\" is"); (I (String.length "Hello"))];;
console_log [(S "Length of \"Żółw\" is"); (I (String.length "Żółw"))];;
console_log [(S "Length of \"💩\" is"); (I (String.length "💩"))];;

The result is completely wrong of course:

$ ocaml unicode.ml
Length of [1; 2; 3] is  3
Length of "Hello" is 5
Length of "Żółw" is 7
Length of "💩" is 4

But first, what the hell is even going on here!

OCaml doesn't have any "polymorphic" functions that would accept multiple types - every function has just one type. So List.length for length of a list, String.length for length of a string, and so on. There are no ways around it.
There's no String.join, but that's not too hard to write on our own. Pattern matching is pretty decent.
We can define custom data type that's S string or I int, then we just need to pass. By the way these names generally need to be unique everywhere, so you can't really reuse it in some different interface which would take I int | F float.
once we wrap everything in the right type wrapper, we can send it to our console_log.

With this much pain for such a simple thing, does it get easier for more complex code? It does not, it gets a lot worse.

OCaml has a few outs. It has some (atrocious) macro functionality, which allows Printf.printf (with static template only). And it has "polymorphic variants", which at least let you reuse those wrappers, so with a lot of extra explicit type declarations you can have one function take S | I and another take I | F. At least that's the idea, it runs into a lot of problems in practice.

Oh and you might have noticed that OCaml has no idea what Unicode even is. All the answers were wrong.

FizzBuzz

No new issues here:

let fizzbuzz i =
  if i mod 15 == 0
    then "FizzBuzz"
    else if i mod 5 == 0
      then "Buzz"
      else if i mod 3 == 0
        then "Fizz"
        else string_of_int i
;;

for i = 1 to 100 do
  print_string (fizzbuzz(i) ^ "\n")
done;;

Pythagorean theorem

I keep saying that in OCaml every function must have unique input types, and there are no polymorphic functions at all. A small note on terminology, as "polymorphic" refers to two different things. Either to functions like List.length which can work on list of any type because it doesn't look inside (which I'd just call "generic functions", sometimes they're called "parametric polymorphism"). Or to functions like length which could work with multiple container types (sometimes called "ad-hoc polymorphism" - but that's what most people call "polymorphic"). OCaml has generic function, but no ad-hoc polymorphism, and it's extremely committed to that.

How committed? Well, you can't even + two floats.

let a = 3.0;;
let b = 4.0;;
let c = Float.sqrt(a *. a +. b *. b);;

Printf.printf "%f^2 + %f^2 = %f^2\n" a b c;;

Every type has its own +, and *, and so on.

The only tiny exception to this is that =, > etc are polymorphic. Of course types on both sides must be the same.

Printf.printf macro saves us from a lot of nasty code.

Custom operators

One consequence of needing so many different operator variants is that OCaml lets us (and pretty much forces us to) define our own operators. For example this defines +& as addition of two 2D points:

type point = {x: float; y: float};;

let (+&) a b = {x=a.x +. b.x; y = a.y +. b.y};;

let a = {x=1.0; y=2.0};;
let b = {x=2.0; y=5.0};;
let c = a +& b;;

Printf.printf "<%f,%f>\n" c.x c.y;;

Of course we cannot do anything polymorphic here. 2D points of ints, or 3D points of floats, or anything like that would all need their own symbols.

How operator starts is used for precedence, so a +& b *& c would be treated as a +& (b *& c).

It's better than having to do Point2D.add etc., but it's still miserable compared to just having + work on everything, like it works in most other languages.

Oh and OCaml does not love if you reuse field names between different types. So type point3 = {x: float; y: float; z: float};; isn't forbidden, but it causes issues and would require a lot of manual type annotations.

Should you use OCaml?

No. And I'm saying it as someone who's done a lot of OCaml back in the days.

OCaml offered a mix of features that was somewhat appealing a few decades ago - it's a functional garbage-collected language, statically compiled to speeds comparable to Java, with easy to understand eager semantics (no laziness and monads), and syntax which while godawful at least doesn't have millions of parentheses. All alternatives back then were either not really functional (C, Java), too parenthesized (Lisp), semantically too weird (Haskell), or too slow (Lisp, Ruby; generally Haskell too unless you put a lot of effort to work around its laziness).

Nowadays most languages have sufficient functional features (even totally non-functional ones like Kotlin), there's a plethora of LLVM-based languages that are fast enough, so OCaml's niche disappeared - and it was a small niche to begin with.

OCaml is also in weird situation where a lot of users simply don't use the standard "standard library" and instead replace it with their own. And there's multiple such efforts. So the thing they're using, is it even OCaml? Well, it can't fix the core language issues.

OCaml has far too many quirks, such atrocious syntax, and lacks convenience functions provided by pretty much every language these days. There's no payoff for putting up with all that. Unless you want a job at Jane Street Capital I guess (who have their own replacement standard library too).

Fully committed "Functional Programming" didn't get far, but then neither did fully committed "Object Oriented Programming" (that's just Smalltalk and Ruby). But half-assed functional programming just as half-assed OOP is everywhere these days, and it's good enough. Either pick a language that does good-enough functional programming (which is most of them), or accept the sacrifices to do something like Ruby, Haskell, Clojure, or Racket, they're all much less painful than OCaml's.

Code

All code examples for the series will be in this repository.

Code for the OCaml episode is available here.

Top comments (5)

Yawar Amin • Dec 28 '21

Just fyi, a few points in this post which need to be corrected.

ugliest syntax of any major programming language. Double semicolons are only the start

Double semicolons are actually not needed or used in OCaml source code for a long time now. Your 'multiple statements' example would be written idiomatically like this:

let ask_for_name () =
  print_string "What's your name? ";
  read_line ()

let () = Printf.printf "Hello, %s!\n" (ask_for_name ())

There's no string interpolation

However, as you mention later in your post, and as shown above, there is printf-style formatting, so the Fibonacci example would look like:

let () = for i = 1 to 20 do
  Printf.printf "fib(%d) = %d\n%!" i (fib i)
done

It has some (atrocious) macro functionality, which allows Printf.printf (with static template only)

Printf-style formatting is not using macros, it is all defined in the standard library. And, using static templates is actually a security best practice (you may have heard of things like the log4j vulnerability that may happen if you allow format strings to be injected at runtime).

Oh and you might have noticed that OCaml has no idea what Unicode even is

Yes, Unicode is not handled in the language or in the standard library. The string data type does not know or care about the encoding of the string's contents. But there are good third-party libraries that handle it. Unicode is actually very tricky to handle properly and the small OCaml team made the (for them) correct decision to not try to do it themselves. See this thread for more details: reddit.com/r/programming/comments/...

By the way these names generally need to be unique everywhere, so you can't really reuse it in some different interface which would take I int | F float

The simple way to reuse the same constructor names is to put them in different modules. Constructors are unique when namespaced by their modules. So e.g. you can have:

module Num = struct
  type t = I of int | F of float
end

module Pronoun = struct
  type t = I | Me | You | They | Them
end

it runs into a lot of problems in practice.

I'd be interested to know what those problems are.

Printf.printf macro saves us from a lot of nasty code.

Exactly, well, except it's not a macro :-)

but it's still miserable compared to just having + work on everything

Note that you can redefine standard operators like ( + ) for new data types, and many modules do that e.g. github.com/ocaml/Zarith/blob/48524...

Oh and OCaml does not love if you reuse field names between different types. So type point3 = {x: float; y: float; z: float};; isn't forbidden, but it causes issues and would require a lot of manual type annotations.

It's actually not that bad if you follow the normal practice of putting the records in different modules, e.g.

module Point2 = struct
  type t = { x : float; y : float }
end

module Point3 = struct
  type t = { x : float; y : float; z : float }
end

Now it's easy to distinguish between them by just providing the module prefix for at least one field e.g. let p2 = { Point2.x = 1.; y = 1. } or let { Point3.x; _ } = p3.

Should you use OCaml?...No. And I'm saying it as someone who's done a lot of OCaml back in the days.

Interesting. I guess you must have used it a long time ago and in a fairly restricted way. A lot of what I mentioned above is standard practice now.

I will say I agree that just being able to polymorphically print out any data is definitely an OCaml pain point. But on the whole I can't agree with your conclusion :-)

Tomasz Wegrzanowski • Dec 28 '21

Nope, printf is done with special macro system OCaml has just for format strings, you can see the compiler hooks here: github.com/ocaml/ocaml/blob/trunk/...

You cannot actually write printf-like function yourself, or customize the ones ocaml has, without either modifying the compiler, or using an external macro system like camlp4 (or I guess ppx now, I haven't touched that in years).

As for putting names in modules, this doesn't actually solve this issue, as when you want to pass S some_str to a polymorphic function, you'd need to know exact type of polymorphism it supports, like foo (SIF.S s) vs bar (SI.S s).

This isn't required by any theoretical reason, Haskell deals with this perfectly fine, and even SML had some limited type classes IIRC.

I'm not sure there's one standard OCaml style like with Python. I double checked it with OCaml source code, and it's full of ;;s. And tbh I'm not sure starting a lot of subsequent lines with let () = as it also sometimes does is really an improvement. I'll leave it to the readers to decide which one is more readable (first one from some tests in OCaml source):

let () = show (not_greater_equal 1.0 2.0)
let () = show (not_greater_equal 1.0 1.0)
let () = show (not_greater_equal 2.0 1.0)
let () = show (not_greater_equal 1.0 nan)
let () = print_line ()

show (not_greater_equal 1.0 2.0);;
show (not_greater_equal 1.0 1.0);;
show (not_greater_equal 2.0 1.0);;
show (not_greater_equal 1.0 nan);;
print_line ();;

Yawar Amin • Dec 28 '21

Nope, printf is done with special macro system OCaml has just for format strings

Format strings are part of the equation, yes, but as the comment you pointed to notes, the printf functionality is defined in normal functions in the Printf/Scanf/Format modules. Format strings are just a syntax sugar; it's possible to desugar them to (something like) printf [Text "Hello, "; Format_s; Text "!\n"] name which would do the same job, just a little more verbosely.

This isn't required by any theoretical reason, Haskell deals with this perfectly fine, and even SML had some limited type classes IIRC.

Well, Haskell and OCaml explicitly are in very different design spaces here, and both have valid reasons for doing what they do. OCaml's more explicit style makes compilation faster and error messages simpler, for example. And SML doesn't have type classes, unless you're talking about the built-in non-extensible equality types.

I double checked it with OCaml source code, and it's full of ;;s

Yes, very old code still uses ;; but that has not been the convention for a long time now. For example, if you check the link you gave above, very little of that file is using ;;. And if you use ocamlformat, the standard code formatter, it will idiomatically remove ;;.

And tbh I'm not sure starting a lot of subsequent lines with let () = as it also sometimes does is really an improvement

Yes, we actually don't need to do that either. We can use ; which is the sequencing operator to do it pretty cleanly:

let () =
  show (not_greater_equal 1.0 2.0);
  show (not_greater_equal 1.0 1.0);
  show (not_greater_equal 2.0 1.0);
  show (not_greater_equal 1.0 nan);
  print_line ()

Lomig • Dec 28 '21

Saying that double semi colon is needed is proof enough that you cannot possibly have written your fair share of OCaml - or at least not within the last fifteen years.

And it speaks for the bad faith of the rest. How sad.

E.R. Nurwijayadi • Jan 6 '22

I respect your writing.

I can code.. a lot, and clean.
But explaining concept cleary to other people is another whole new level.

How do you manage to do that?

DEV Community

100 Languages Speedrun: Episode 37: OCaml

Hello, World!

Multiple statements

Fibonacci

Unicode

FizzBuzz

Pythagorean theorem

Custom operators

Should you use OCaml?

Code

Top comments (5)

Read next

Kotlin Object Declarations vs. Java: Summoning Singletons with Ease

Day 20 Progress Journal: Implementing the Frontend for Comment Box (MERN Stack Instagram Clone)

Reactify Your Resume! Build a Basic Online Portfolio in React

You probably don't need to build large scale microservices. Here is what you can do instead