DEV Community

Yawar Amin
Yawar Amin

Posted on • Edited on

Practical OCaml

This is a guide to OCaml as a pragmatic, general-purpose language that can scale with your everyday programming needs.

Why OCaml? I think many people have attempted to answer this question:

But I'll try to answer from the perspective of this guide. Often when people need some automation, they will reach for Python, or nowadays maybe Go (Golang). I hope to show that OCaml can cover these needs, because it has:

  • Lightweight syntax, like Python
  • An interpreted mode, like Python, so you can run quick scripts immediately
  • Powerful interactive environment (REPL), like Python, so you can develop and test code in an exploratory style
  • Powerful type inference, so you almost never need to add type annotations to code, like Python
  • Fast compiler, like Go, so you can iterate quickly
  • Native compilation, like Go, so you can deploy a single binary instead of needing to set up VMs
  • Efficient compiled programs, like Go, so you can enjoy instant program startup and fast runtimes

Lastly,

  • A strong static typechecker (stronger than Go), so you can catch a lot of code issues very quickly.

To balance all of these out a bit, let's ask: why not OCaml?

  • Small, niche language and community. For certain projects that may be more critical to your company, nothing beats the ecosystem of e.g. Java, Ruby, or .NET.
  • Unfamiliar syntax. This is really a sticking point for many people. I'll say a bit more about this later.

Setup

Follow the instructions here: https://ocaml.org/install

At this point you have opam, the OCaml package manager, installed. Now, refresh the package repository info and install some standard OCaml packages:

$ opam update
$ opam install lwt dune utop ocaml-lsp-server alcotest odoc angstrom piaf ppx_deriving_cmdliner ppx_deriving_yojson
Enter fullscreen mode Exit fullscreen mode

Explanations of these packages:

  • lwt: asynchronous promises library, works very similarly to Python or Node.js async
  • dune: standard build tool
  • utop: REPL (interactive environment)
  • ocaml-lsp-server: editor support tool
  • alcotest: unit testing library
  • odoc: documentation generator
  • angstrom: parser combinator library, very handy for quickly parsing arbitrary data
  • piaf: lightweight web client and server
  • ppx_deriving_cmdliner: PPX (roughly, a macro) to generate command-line parsing code from simple OCaml types
  • ppx_deriving_yojson: PPX to generate JSON encoders/decoders from OCaml types

After installing the above packages, run opam list to check the full list of installed packages (many were installed as dependencies of others).

Small aside: opam works by solving the dependency versions of all packages it is asked to install, then finding a set of package versions that are all consistent with each other. So ideally, we want to give it all the packages we need in one shot. Of course, we can and will install more later, but this is just good to keep in mind.

Finally, you will need to install an editor plugin to get support features like syntax highlighting, type on hover, go to definition, etc. For VS Code the recommended plugin is OCaml Platform.

Basic checks

Let's first do some basic checks that the system is working. Open up the REPL:

$ utop
Enter fullscreen mode Exit fullscreen mode

And enter this code at the prompt:

let fizzbuzz n = match n mod 3, n mod 5 with
  | 0, 0 -> "FizzBuzz"
  | 0, _ -> "Fizz"
  | _, 0 -> "Buzz"
  | _ -> string_of_int n

let () = for i = 1 to 20 do
  print_endline (fizzbuzz i)
done;;
Enter fullscreen mode Exit fullscreen mode

Note, the ;; is only needed to tell the REPL when the block of code is fully entered. Without it we can freely move around with arrow keys and edit the code.

The REPL should show this output:

1
2
Fizz
4
Buzz
Fizz
...
Enter fullscreen mode Exit fullscreen mode

This shows the fizzbuzz function running for inputs from 1 to 20.

The whirlwind tour

There is a concise tutorial at https://ocaml.org/releases/4.12/htmlman/coreexamples.html and it's a really good idea to go through it to get the basics. But a super concise crash course:

  • Bind a variable to a value:
let x = 1
Enter fullscreen mode Exit fullscreen mode
  • Create a function:
let add_one x = x + 1
Enter fullscreen mode Exit fullscreen mode

The x is the parameter and the expression on the right of the = sign is the function body. A good way to think about OCaml functions is the substitution model: functions are evaluated by substituting their parameters with the arguments that are actually given when the function is called. So e.g.

add_one x = x + 1
add_one 0 = 0 + 1
          = 1
Enter fullscreen mode Exit fullscreen mode

Technically, bindings have patterns on the left-hand side of the = sign. Patterns are any literal OCaml value or structure, or variables, or some combination of the two. E.g., the following are valid bindings:

let 1 = 1
let 2 = 1
Enter fullscreen mode Exit fullscreen mode

Of course, we don't normally bind like this because it can (and usually will) throw an exception at runtime, when OCaml finds out that the values are not actually the same. But it's good to keep this piece of knowledge in the back of your mind for later.

  • Define a variable in a limited scope:
let print_sum x y =
  let sum = x + y in
  Print.printf "Sum: %d\n" sum
Enter fullscreen mode Exit fullscreen mode

The let ... in ... syntactic form is composeable, so we can nest them:

let print_sum x y =
  let sum = x + y in
  let sum_string = string_of_int sum in
  Printf.printf "Sum: %s\n" sum_string
Enter fullscreen mode Exit fullscreen mode

It may be easier to understand the structure if we indent it like this:

let print_sum x y =
  let
    sum = x + y
  in
    let
      sum_string = string_of_int sum
    in
      Printf.printf "Sum: %s\n" sum_string
Enter fullscreen mode Exit fullscreen mode

But unlike Python, OCaml syntax is not whitespace-sensitive, and is usually written in a compact style.

  • Define a record type (like a Python namedtuple or a Go struct):
type file = { name : string; contents : string }

let my_file = { name = "README.md"; contents = "Hello!" }
Enter fullscreen mode Exit fullscreen mode

Note: in OCaml, strings are just bytestrings. They aren't assumed to have any encoding e.g. UTF-8. If we need Unicode support we can use libraries like Camomile to get it. But often, we don't need it because we're just shuttling strings back and forth or handling them in a very limited way.

  • Define a variant type (like an enum type but more powerful):
type payment_method =
| Cheque of string
| Credit_card of string * string * string

let to_string payment_method = match payment_method with
  | Cheque number ->
    Printf.sprintf "Cheque # %s" number
  | Credit_card (name, number, expiry) ->
    Printf.sprintf
      "Credit card # %s, cardholder name %s, expiry %s"
      number
      name
      expiry
Enter fullscreen mode Exit fullscreen mode

Note: match ... with ... is a super important construct in OCaml. It is the main workhorse for logic algorithms and ensuring all cases are covered. Here's an example of what it can do: https://www.reddit.com/r/programming/comments/n2639k/ocaml_typechecker_catches_a_redundant_rule_in/

In this example Cheque and Credit_card are called 'cases' or 'constructors'. If you've ever done some high school algebra you'll have come across functions that are defined by case analysis, e.g.:

f(0) = 0
f(x) = 1/x
Enter fullscreen mode Exit fullscreen mode

Variants and pattern matching are a way of bringing this notation into programming.

  • Define and use modules:

In OCaml, modules are the unit of organization of code. They also serve as the unit of compilation, and as namespaces (among other uses). Modules are very powerful and are one of the 'secret weapons' in OCaml.

In an OCaml project, each source file automatically becomes a module. E.g. if you have a file myprog.ml, the compiler derives a module Myprog from it (the first letter is capitalized). All modules in a project are automatically 'in scope', or visible. So we don't need to import anything. E.g. if you have a source file:

(* lib.ml *)
let x = 1
Enter fullscreen mode Exit fullscreen mode

Then the module Lib is available to the rest of your program, and so is the value Lib.x. This is one of the things that makes programming in OCaml so productive–the compiler takes care of mundane details like importing values.

  • Define nested modules:
(* lib.ml *)

module Print_endline = struct
  let int = Printf.printf "%d\n"
  let string = print_endline
end

let print_person name age =
  Print_endline.string name;
  Print_endline.int age
Enter fullscreen mode Exit fullscreen mode

This is one of the killer features of modules that few other languages possess. You can tie together discrete units of code inside files and use them to neatly organize, document, and control visibility of the code.

Note: in OCaml, we often want to perform some operations (let's informally call them 'actions') in sequence. Actions are just expressions that return no meaningful value (in other words, ()), and are composed using the ; operator, so e.g. ACTION1; ACTION2; ...; ACTIONn. We typically say that actions are performed solely for their side effects e.g. printing something out.

  • Run the 'main' program:
let () = print_sum 1 2
Enter fullscreen mode Exit fullscreen mode

Unlike Go or Python, there is no specific main entrypoint into a program in OCaml. Any code that is not a function definition is executed immediately when a module is loaded during program startup.

Typically, we arrange our programs so that there is one obvious entrypoint module, perhaps named main.ml, and bind the main code (actions) to a () pattern to indicate that it returns no meaningful result. This is a safe binding because the () value (pronounced 'unit') is the only value of its type, unit. So there's no possible way for it to fail to match or throw an exception at runtime.

Of course, if you are using it for scripting, then you would call the file whatever is appropriate and put the entrypoint in there.

Proof of concept

Let's make a small project in OCaml to prove out its practicality for general-purpose programming. We're going to implement a StatsD filter proxy like the one described by Alan Ning in Optimizing 700 CPUs Away With Rust. Be prepared to read and try to get your head around some OCaml code here (not a lot though)!

Note, this is not about competing with Rust–benchmarks are not my goal here–but about proving out OCaml's feasibility for all kinds of projects.

Short recap: StatsD is an application performance monitoring tool that runs as a daemon. Any application can send statistics to it. The statistics are simple lines that look like:

foo:1|c
Enter fullscreen mode Exit fullscreen mode

That is a metric named foo, which is a counter (c), and here we are incrementing the counter by 1. (As an aside, I would have chosen the format foo:c=1, but that seems obvious to me as an OCaml programmer!)

The protocol is super simple–not even HTTP, just raw UDP datagrams.

The filter proxy in this small project is a service that receives these metrics lines, filters them out by checking against a blocklist, and forwards any unfiltered metrics lines to the actual StatsD daemon.

First, create a directory ocaml_statsd_filter to house the project:

$ mkdir ocaml_statsd_filter
$ cd ocaml_statsd_filter
Enter fullscreen mode Exit fullscreen mode

Now, create the project files:

  • dune-project:
(lang dune 3.6)
Enter fullscreen mode Exit fullscreen mode
  • dune:
(executable
  (name ocaml_statsd_filter)
  (libraries str unix))
Enter fullscreen mode Exit fullscreen mode

This file describes the main executable, named ocaml_statsd_filter, and what libraries it depends on. Libraries are basically groups of (one or more) modules that are distributed together. The two libraries used here are str for regular expressions and unix for Unix programming (communicating with sockets, managing files, etc.). These two happen to be distributed with OCaml but need to be explicitly listed as dependencies because they're not portable to all platforms.

To learn more about dune, visit its website: https://dune.build/ .

  • cfg.ml:
let listen_port =
  try int_of_string (Sys.getenv "listen_port") with Not_found -> 8125

let forward_host = try Sys.getenv "forward_host" with Not_found -> "127.0.0.1"

let forward_port =
  try int_of_string (Sys.getenv "forward_port") with Not_found -> 8126

let blocklist =
  try
    "blocklist"
    |> Sys.getenv
    |> String.split_on_char ',' 
    |> List.map Str.regexp_string 
  with
    Not_found -> []
Enter fullscreen mode Exit fullscreen mode

This helper module reads configuration values that are passed in as environment variables:

  • listen_port: what port to listen on for metrics
  • forward_host: what host to forward the metrics to
  • forward_port: what port to forward the metrics to on forward_host
  • blocklist: comma-separated list of words to block, e.g. foo,bar, split into a list of regular expressions that we can check against

If any of the above environment variables are not found, it uses OCaml's convenient exception handling to substitute defaults.

Getting configuration from environment variables is as simple as it gets! We could easily have done something more complex though, like reading a JSON file to get the configuration. In practice, environment variables are often more than good enough, and easy to feed into services using tools like systemd.

Note, the |> operator is called 'pipe-forward', and it lets you write a series of function applications in a left-to-right direction without parentheses, instead of right-to-left. This TC39 proposal to add it to JavaScript explains the reasoning fairly well. (It comes out of the box with OCaml.)

  • ocaml_statsd_filter.ml:

Before we get to the code, just a note that the name of this file is significant–we need a file name that matches the name given in the name field in the dune file. In other words, we have (name ocaml_statsd_filter) in the dune file, so we need to have a module Ocaml_statsd_filter in the project.

open Unix

let bufsize = 8192
let buf = Bytes.create bufsize

let forward_addr = ADDR_INET (inet_addr_of_string Cfg.forward_host, Cfg.forward_port)
let forward_sock = socket PF_INET SOCK_DGRAM 0

let allow data = Cfg.blocklist
  |> List.exists (fun regexp -> Str.string_match regexp data 0)
  |> not

let process inc _ =
  let in_descr = descr_of_in_channel inc in
  let read_len, _ = recvfrom in_descr buf 0 bufsize [] in
  let buf_str = Bytes.to_string buf in
  if allow buf_str then begin
    ignore (send forward_sock buf 0 read_len []);
    print_string ("Sent: " ^ buf_str)
  end else
    print_string ("Did not send: " ^ buf_str)

let () =
  connect forward_sock forward_addr;
  establish_server process (ADDR_INET (inet_addr_any, Cfg.listen_port))
Enter fullscreen mode Exit fullscreen mode

This is the main executable. It works like this:

  • For each incoming request, fork a new process
  • In the process to handle the request, read 8192 bytes (i.e., 8 KB) of data from the request, and forward it to the forwarding address if it doesn't contain any blocklisted words

This is all from built-in functionality in the Unix module:

  • Multi-process request handling server
  • Using sockets to communicate

We are using the Unix functionality so heavily in this program that we just open the module at the top of the file, something that we rarely do in OCaml programs, because opening modules liberally pollutes the current scope with the contents of those modules and makes code harder to understand.

In this case it's worth it because it's a small module and almost all the functionality in it is from Unix. You can hover over the various functions in your editor to get the types and documentation about them. Full documentation is also available at https://ocaml.org/releases/4.12/api/Unix.html .

By the way, the name Unix is actually a slight misnomer; the functionality is mostly portable to Windows–exceptions are noted in the documentation. The idea behind the module is to provide a common set of system programming functionality, very similar to C.

A few points to note about this server:

  • It first sets up a socket pointing to the 'upstream' server (the one that it forwards the metrics to), since it will continuously send data to it
  • It doesn't establish a connect to the 'downstream' clients; it just reads data from them in unconnected mode
  • It reads only 8 KB of data by default; this should be enough for StatsD metrics lines based on the StatsD documentation. Also, that's what Alan Ning's Rust implementation does.

For more Unix socket programming in OCaml, check this chapter of the OCaml Unix book.

Test run

Now let's test out the PoC:

$ OCAMLRUNPARAM=b blocklist=foo,bar dune exec ./ocaml_statsd_filter.exe
Enter fullscreen mode Exit fullscreen mode

The main command here is dune exec ocaml_statsd_filter.exe. The .exe extension tells dune to build and run a binary executable. Building a bytecode version is also possible, but in this guide we will test the binary version only.

The two environment variables passed to the process are:

  • OCAMLRUNPARAM=b: this tells OCaml to print a full stack trace if there is an exception; normally it only prints the exception message
  • blocklist=foo,bar: this is handled specifically by our Cfg module above, and blocks any metrics with the words foo or bar from being forwarded upstream

In a separate terminal window, try sending messages which should be blocked:

$ echo 'foo:1|c' | nc localhost 8125
$ echo 'bar:1|c' | nc localhost 8125
Enter fullscreen mode Exit fullscreen mode

Then check the server–it should show the following messages:

Did not send: foo:1|c
Did not send: bar:1|c
Enter fullscreen mode Exit fullscreen mode

Next, try sending a message which shouldn't be blocked:

$ echo 'baz:1|c' | nc localhost 8125
$ echo 'baz:1|c' | nc localhost 8125
Enter fullscreen mode Exit fullscreen mode

(We need to do this a couple of times to trigger the error we want to see on some systems, because of a peculiarity in how UDP connections work.

Then check the server:

Fatal error: exception Unix.Unix_error(Unix.ECONNREFUSED, "send",
"")
Raised by primitive operation at file "unix.ml", line 642,
characters 7-39
Called from file "ocaml_statsd_filter.ml", line 17, characters 11-48
Called from file "unix.ml", line 1195, characters 12-37
Called from file "ocaml_statsd_filter.ml", line 24, characters 2-71
Enter fullscreen mode Exit fullscreen mode

This throws an exception because there is no server upstream listening to the port we're sending to right now. If you do happen to have a StatsD daemon running on the port, it should work!

Review

I hope this gives you an idea that OCaml is surprisingly capable out of the box. A few key points to note:

  • There really are no type annotations needed anywhere. It looks just like a scripting language. People sometimes don't like this and want to annotate everything. My approach is to let my editor plugin and my compiler tell me what the types are and if I'm getting it wrong–it's easy to fix. And as you work in the codebase over time, you get used to not having any explicit types in the implementations, just like dynamic typing programmers do, but with the added benefit of static typing!
  • The syntax is a bit weird though. There is a reason why each piece of the syntax is the way it is–it's a balance between accreting new language features over time and preserving a fast, unambiguous syntax for a fast parser. But for those who can't get used to it, ReasonML syntax is always a possibility–it's easy to install with opam (opam install reason), and dune supports it out of the box (just start writing .re files instead of .ml, and so on). It's a familiar syntax that's designed to look like JavaScript but compile to the same native executable that regular old OCaml syntax does.
  • We didn't actually end up using any of the opam libraries we installed earlier. But, stay tuned!

Top comments (5)

Collapse
 
alieissa profile image
ali

I was having problems installing the packages.I was seeing the error

checking whether the C compiler works... no

Thought that was weird because I have gcc in my Mac.Got the error below when checking gcc

xcrun: error: invalid active developer path (/Library/Developer/CommandLineTools), missing xcrun at: /Library/Developer/CommandLineTools/usr/bin/xcrun

To fix it I ran xcode-select --install

Hope this helps anyone who is having the same problem.

Collapse
 
yawaramin profile image
Yawar Amin

Thanks Ali, great tip. Indeed, Apple Developer Tools setup is a prerequisite to installing opam on macOS.

Collapse
 
dangdennis profile image
Dennis Dang

Fun read @yawaramin ! I didn’t know StatsD worked like that. What else are you planning to write?

If Practical OCaml were ever a book, I’d imagine it to be a cookbook with many recipes like these. Working code examples that introduce features in small enough chunks to view. e.g., gobyexample, or maybe rust in action

Collapse
 
yawaramin profile image
Yawar Amin

Thank you Dennis. Yeah that style is my ideal too. I'm incubating some more posts about Lwt and other topics. I'll try to make a 'Practical OCaml' series here.

Collapse
 
captainyossarian profile image
yossarian

Now I understand why F# is basically microsofted ocaml )
Nice article