This is a guide to OCaml as a pragmatic, general-purpose language that can scale with your everyday programming needs.
Why OCaml? I think many people have attempted to answer this question:
- https://dev.realworldocaml.org/prologue.html#scrollNav-1
- https://www2.lib.uchicago.edu/keith/ocaml-class/why.html
- https://blog.janestreet.com/why-ocaml/
But I'll try to answer from the perspective of this guide. Often when people need some automation, they will reach for Python, or nowadays maybe Go (Golang). I hope to show that OCaml can cover these needs, because it has:
- Lightweight syntax, like Python
- An interpreted mode, like Python, so you can run quick scripts immediately
- Powerful interactive environment (REPL), like Python, so you can develop and test code in an exploratory style
- Powerful type inference, so you almost never need to add type annotations to code, like Python
- Fast compiler, like Go, so you can iterate quickly
- Native compilation, like Go, so you can deploy a single binary instead of needing to set up VMs
- Efficient compiled programs, like Go, so you can enjoy instant program startup and fast runtimes
Lastly,
- A strong static typechecker (stronger than Go), so you can catch a lot of code issues very quickly.
To balance all of these out a bit, let's ask: why not OCaml?
- Small, niche language and community. For certain projects that may be more critical to your company, nothing beats the ecosystem of e.g. Java, Ruby, or .NET.
- Unfamiliar syntax. This is really a sticking point for many people. I'll say a bit more about this later.
Setup
Follow the instructions here: https://ocaml.org/install
At this point you have opam, the OCaml package manager, installed. Now, refresh the package repository info and install some standard OCaml packages:
$ opam update
$ opam install lwt dune utop ocaml-lsp-server alcotest odoc angstrom piaf ppx_deriving_cmdliner ppx_deriving_yojson
Explanations of these packages:
- lwt: asynchronous promises library, works very similarly to Python or Node.js async
- dune: standard build tool
- utop: REPL (interactive environment)
- ocaml-lsp-server: editor support tool
- alcotest: unit testing library
- odoc: documentation generator
- angstrom: parser combinator library, very handy for quickly parsing arbitrary data
- piaf: lightweight web client and server
- ppx_deriving_cmdliner: PPX (roughly, a macro) to generate command-line parsing code from simple OCaml types
- ppx_deriving_yojson: PPX to generate JSON encoders/decoders from OCaml types
After installing the above packages, run opam list
to check the full list of installed packages (many were installed as dependencies of others).
Small aside: opam works by solving the dependency versions of all packages it is asked to install, then finding a set of package versions that are all consistent with each other. So ideally, we want to give it all the packages we need in one shot. Of course, we can and will install more later, but this is just good to keep in mind.
Finally, you will need to install an editor plugin to get support features like syntax highlighting, type on hover, go to definition, etc. For VS Code the recommended plugin is OCaml Platform.
Basic checks
Let's first do some basic checks that the system is working. Open up the REPL:
$ utop
And enter this code at the prompt:
let fizzbuzz n = match n mod 3, n mod 5 with
| 0, 0 -> "FizzBuzz"
| 0, _ -> "Fizz"
| _, 0 -> "Buzz"
| _ -> string_of_int n
let () = for i = 1 to 20 do
print_endline (fizzbuzz i)
done;;
Note, the ;;
is only needed to tell the REPL when the block of code is fully entered. Without it we can freely move around with arrow keys and edit the code.
The REPL should show this output:
1
2
Fizz
4
Buzz
Fizz
...
This shows the fizzbuzz
function running for inputs from 1 to 20.
The whirlwind tour
There is a concise tutorial at https://ocaml.org/releases/4.12/htmlman/coreexamples.html and it's a really good idea to go through it to get the basics. But a super concise crash course:
- Bind a variable to a value:
let x = 1
- Create a function:
let add_one x = x + 1
The x
is the parameter and the expression on the right of the =
sign is the function body. A good way to think about OCaml functions is the substitution model: functions are evaluated by substituting their parameters with the arguments that are actually given when the function is called. So e.g.
add_one x = x + 1
add_one 0 = 0 + 1
= 1
Technically, bindings have patterns on the left-hand side of the =
sign. Patterns are any literal OCaml value or structure, or variables, or some combination of the two. E.g., the following are valid bindings:
let 1 = 1
let 2 = 1
Of course, we don't normally bind like this because it can (and usually will) throw an exception at runtime, when OCaml finds out that the values are not actually the same. But it's good to keep this piece of knowledge in the back of your mind for later.
- Define a variable in a limited scope:
let print_sum x y =
let sum = x + y in
Print.printf "Sum: %d\n" sum
The let ... in ...
syntactic form is composeable, so we can nest them:
let print_sum x y =
let sum = x + y in
let sum_string = string_of_int sum in
Printf.printf "Sum: %s\n" sum_string
It may be easier to understand the structure if we indent it like this:
let print_sum x y =
let
sum = x + y
in
let
sum_string = string_of_int sum
in
Printf.printf "Sum: %s\n" sum_string
But unlike Python, OCaml syntax is not whitespace-sensitive, and is usually written in a compact style.
- Define a record type (like a Python namedtuple or a Go struct):
type file = { name : string; contents : string }
let my_file = { name = "README.md"; contents = "Hello!" }
Note: in OCaml, strings are just bytestrings. They aren't assumed to have any encoding e.g. UTF-8. If we need Unicode support we can use libraries like Camomile to get it. But often, we don't need it because we're just shuttling strings back and forth or handling them in a very limited way.
- Define a variant type (like an enum type but more powerful):
type payment_method =
| Cheque of string
| Credit_card of string * string * string
let to_string payment_method = match payment_method with
| Cheque number ->
Printf.sprintf "Cheque # %s" number
| Credit_card (name, number, expiry) ->
Printf.sprintf
"Credit card # %s, cardholder name %s, expiry %s"
number
name
expiry
Note: match ... with ...
is a super important construct in OCaml. It is the main workhorse for logic algorithms and ensuring all cases are covered. Here's an example of what it can do: https://www.reddit.com/r/programming/comments/n2639k/ocaml_typechecker_catches_a_redundant_rule_in/
In this example Cheque
and Credit_card
are called 'cases' or 'constructors'. If you've ever done some high school algebra you'll have come across functions that are defined by case analysis, e.g.:
f(0) = 0
f(x) = 1/x
Variants and pattern matching are a way of bringing this notation into programming.
- Define and use modules:
In OCaml, modules are the unit of organization of code. They also serve as the unit of compilation, and as namespaces (among other uses). Modules are very powerful and are one of the 'secret weapons' in OCaml.
In an OCaml project, each source file automatically becomes a module. E.g. if you have a file myprog.ml
, the compiler derives a module Myprog
from it (the first letter is capitalized). All modules in a project are automatically 'in scope', or visible. So we don't need to import anything. E.g. if you have a source file:
(* lib.ml *)
let x = 1
Then the module Lib
is available to the rest of your program, and so is the value Lib.x
. This is one of the things that makes programming in OCaml so productive–the compiler takes care of mundane details like importing values.
- Define nested modules:
(* lib.ml *)
module Print_endline = struct
let int = Printf.printf "%d\n"
let string = print_endline
end
let print_person name age =
Print_endline.string name;
Print_endline.int age
This is one of the killer features of modules that few other languages possess. You can tie together discrete units of code inside files and use them to neatly organize, document, and control visibility of the code.
Note: in OCaml, we often want to perform some operations (let's informally call them 'actions') in sequence. Actions are just expressions that return no meaningful value (in other words, ()
), and are composed using the ;
operator, so e.g. ACTION1; ACTION2; ...; ACTIONn
. We typically say that actions are performed solely for their side effects e.g. printing something out.
- Run the 'main' program:
let () = print_sum 1 2
Unlike Go or Python, there is no specific main entrypoint into a program in OCaml. Any code that is not a function definition is executed immediately when a module is loaded during program startup.
Typically, we arrange our programs so that there is one obvious entrypoint module, perhaps named main.ml
, and bind the main code (actions) to a ()
pattern to indicate that it returns no meaningful result. This is a safe binding because the ()
value (pronounced 'unit') is the only value of its type, unit
. So there's no possible way for it to fail to match or throw an exception at runtime.
Of course, if you are using it for scripting, then you would call the file whatever is appropriate and put the entrypoint in there.
Proof of concept
Let's make a small project in OCaml to prove out its practicality for general-purpose programming. We're going to implement a StatsD filter proxy like the one described by Alan Ning in Optimizing 700 CPUs Away With Rust. Be prepared to read and try to get your head around some OCaml code here (not a lot though)!
Note, this is not about competing with Rust–benchmarks are not my goal here–but about proving out OCaml's feasibility for all kinds of projects.
Short recap: StatsD is an application performance monitoring tool that runs as a daemon. Any application can send statistics to it. The statistics are simple lines that look like:
foo:1|c
That is a metric named foo
, which is a counter (c
), and here we are incrementing the counter by 1
. (As an aside, I would have chosen the format foo:c=1
, but that seems obvious to me as an OCaml programmer!)
The protocol is super simple–not even HTTP, just raw UDP datagrams.
The filter proxy in this small project is a service that receives these metrics lines, filters them out by checking against a blocklist, and forwards any unfiltered metrics lines to the actual StatsD daemon.
First, create a directory ocaml_statsd_filter
to house the project:
$ mkdir ocaml_statsd_filter
$ cd ocaml_statsd_filter
Now, create the project files:
-
dune-project
:
(lang dune 3.6)
-
dune
:
(executable
(name ocaml_statsd_filter)
(libraries str unix))
This file describes the main executable, named ocaml_statsd_filter
, and what libraries it depends on. Libraries are basically groups of (one or more) modules that are distributed together. The two libraries used here are str
for regular expressions and unix
for Unix programming (communicating with sockets, managing files, etc.). These two happen to be distributed with OCaml but need to be explicitly listed as dependencies because they're not portable to all platforms.
To learn more about dune, visit its website: https://dune.build/ .
-
cfg.ml
:
let listen_port =
try int_of_string (Sys.getenv "listen_port") with Not_found -> 8125
let forward_host = try Sys.getenv "forward_host" with Not_found -> "127.0.0.1"
let forward_port =
try int_of_string (Sys.getenv "forward_port") with Not_found -> 8126
let blocklist =
try
"blocklist"
|> Sys.getenv
|> String.split_on_char ','
|> List.map Str.regexp_string
with
Not_found -> []
This helper module reads configuration values that are passed in as environment variables:
-
listen_port
: what port to listen on for metrics -
forward_host
: what host to forward the metrics to -
forward_port
: what port to forward the metrics to onforward_host
-
blocklist
: comma-separated list of words to block, e.g.foo,bar
, split into a list of regular expressions that we can check against
If any of the above environment variables are not found, it uses OCaml's convenient exception handling to substitute defaults.
Getting configuration from environment variables is as simple as it gets! We could easily have done something more complex though, like reading a JSON file to get the configuration. In practice, environment variables are often more than good enough, and easy to feed into services using tools like systemd.
Note, the |>
operator is called 'pipe-forward', and it lets you write a series of function applications in a left-to-right direction without parentheses, instead of right-to-left. This TC39 proposal to add it to JavaScript explains the reasoning fairly well. (It comes out of the box with OCaml.)
-
ocaml_statsd_filter.ml
:
Before we get to the code, just a note that the name of this file is significant–we need a file name that matches the name given in the name
field in the dune
file. In other words, we have (name ocaml_statsd_filter)
in the dune
file, so we need to have a module Ocaml_statsd_filter
in the project.
open Unix
let bufsize = 8192
let buf = Bytes.create bufsize
let forward_addr = ADDR_INET (inet_addr_of_string Cfg.forward_host, Cfg.forward_port)
let forward_sock = socket PF_INET SOCK_DGRAM 0
let allow data = Cfg.blocklist
|> List.exists (fun regexp -> Str.string_match regexp data 0)
|> not
let process inc _ =
let in_descr = descr_of_in_channel inc in
let read_len, _ = recvfrom in_descr buf 0 bufsize [] in
let buf_str = Bytes.to_string buf in
if allow buf_str then begin
ignore (send forward_sock buf 0 read_len []);
print_string ("Sent: " ^ buf_str)
end else
print_string ("Did not send: " ^ buf_str)
let () =
connect forward_sock forward_addr;
establish_server process (ADDR_INET (inet_addr_any, Cfg.listen_port))
This is the main executable. It works like this:
- For each incoming request, fork a new process
- In the process to handle the request, read 8192 bytes (i.e., 8 KB) of data from the request, and forward it to the forwarding address if it doesn't contain any blocklisted words
This is all from built-in functionality in the Unix
module:
- Multi-process request handling server
- Using sockets to communicate
We are using the Unix
functionality so heavily in this program that we just open
the module at the top of the file, something that we rarely do in OCaml programs, because open
ing modules liberally pollutes the current scope with the contents of those modules and makes code harder to understand.
In this case it's worth it because it's a small module and almost all the functionality in it is from Unix
. You can hover over the various functions in your editor to get the types and documentation about them. Full documentation is also available at https://ocaml.org/releases/4.12/api/Unix.html .
By the way, the name Unix
is actually a slight misnomer; the functionality is mostly portable to Windows–exceptions are noted in the documentation. The idea behind the module is to provide a common set of system programming functionality, very similar to C.
A few points to note about this server:
- It first sets up a socket pointing to the 'upstream' server (the one that it forwards the metrics to), since it will continuously send data to it
- It doesn't establish a connect to the 'downstream' clients; it just reads data from them in unconnected mode
- It reads only 8 KB of data by default; this should be enough for StatsD metrics lines based on the StatsD documentation. Also, that's what Alan Ning's Rust implementation does.
For more Unix socket programming in OCaml, check this chapter of the OCaml Unix book.
Test run
Now let's test out the PoC:
$ OCAMLRUNPARAM=b blocklist=foo,bar dune exec ./ocaml_statsd_filter.exe
The main command here is dune exec ocaml_statsd_filter.exe
. The .exe
extension tells dune to build and run a binary executable. Building a bytecode version is also possible, but in this guide we will test the binary version only.
The two environment variables passed to the process are:
-
OCAMLRUNPARAM=b
: this tells OCaml to print a full stack trace if there is an exception; normally it only prints the exception message -
blocklist=foo,bar
: this is handled specifically by ourCfg
module above, and blocks any metrics with the wordsfoo
orbar
from being forwarded upstream
In a separate terminal window, try sending messages which should be blocked:
$ echo 'foo:1|c' | nc localhost 8125
$ echo 'bar:1|c' | nc localhost 8125
Then check the server–it should show the following messages:
Did not send: foo:1|c
Did not send: bar:1|c
Next, try sending a message which shouldn't be blocked:
$ echo 'baz:1|c' | nc localhost 8125
$ echo 'baz:1|c' | nc localhost 8125
(We need to do this a couple of times to trigger the error we want to see on some systems, because of a peculiarity in how UDP connections work.
Then check the server:
Fatal error: exception Unix.Unix_error(Unix.ECONNREFUSED, "send",
"")
Raised by primitive operation at file "unix.ml", line 642,
characters 7-39
Called from file "ocaml_statsd_filter.ml", line 17, characters 11-48
Called from file "unix.ml", line 1195, characters 12-37
Called from file "ocaml_statsd_filter.ml", line 24, characters 2-71
This throws an exception because there is no server upstream listening to the port we're sending to right now. If you do happen to have a StatsD daemon running on the port, it should work!
Review
I hope this gives you an idea that OCaml is surprisingly capable out of the box. A few key points to note:
- There really are no type annotations needed anywhere. It looks just like a scripting language. People sometimes don't like this and want to annotate everything. My approach is to let my editor plugin and my compiler tell me what the types are and if I'm getting it wrong–it's easy to fix. And as you work in the codebase over time, you get used to not having any explicit types in the implementations, just like dynamic typing programmers do, but with the added benefit of static typing!
- The syntax is a bit weird though. There is a reason why each piece of the syntax is the way it is–it's a balance between accreting new language features over time and preserving a fast, unambiguous syntax for a fast parser. But for those who can't get used to it, ReasonML syntax is always a possibility–it's easy to install with opam (
opam install reason
), and dune supports it out of the box (just start writing.re
files instead of.ml
, and so on). It's a familiar syntax that's designed to look like JavaScript but compile to the same native executable that regular old OCaml syntax does. - We didn't actually end up using any of the opam libraries we installed earlier. But, stay tuned!
Top comments (5)
I was having problems installing the packages.I was seeing the error
Thought that was weird because I have
gcc
in my Mac.Got the error below when checkinggcc
To fix it I ran
xcode-select --install
Hope this helps anyone who is having the same problem.
Thanks Ali, great tip. Indeed, Apple Developer Tools setup is a prerequisite to installing opam on macOS.
Fun read @yawaramin ! I didn’t know StatsD worked like that. What else are you planning to write?
If Practical OCaml were ever a book, I’d imagine it to be a cookbook with many recipes like these. Working code examples that introduce features in small enough chunks to view. e.g., gobyexample, or maybe rust in action
Thank you Dennis. Yeah that style is my ideal too. I'm incubating some more posts about Lwt and other topics. I'll try to make a 'Practical OCaml' series here.
Now I understand why F# is basically microsofted ocaml )
Nice article