DEV Community

Yawar Amin
Yawar Amin

Posted on

Format strings in OCaml

OCAML doesn't have string interpolation, but it does have C-style format strings (but type-safe). Here's an example:

let hello name = Printf.printf "Hello, %s!\n" name
(* Can be written as: let hello = Printf.printf "Hello, %s!" *)
Enter fullscreen mode Exit fullscreen mode

This is type-safe in an almost magical way (example REPL session):

# hello 1;;
Error: This expression has type int but an expression was expected of type
         string
Enter fullscreen mode Exit fullscreen mode

It can however be a little tricky to wrap your head around:

# let bob = "Bob";;
val bob : string = "Bob"

# Printf.printf bob;;
Error: This expression has type string but an expression was expected of type
         ('a, out_channel, unit) format =
           ('a, out_channel, unit, unit, unit, unit) format6
Enter fullscreen mode Exit fullscreen mode

This error is saying that the printf function wants a 'format string', which is distinct from a regular string:

# let bob = format_of_string "bob";;
val bob : ('_weak1, '_weak2, '_weak3, '_weak4, '_weak4, '_weak1) format6 =
  CamlinternalFormatBasics.Format
   (CamlinternalFormatBasics.String_literal ("bob",
     CamlinternalFormatBasics.End_of_format),
   "bob")

# Printf.printf bob;;
bob- : unit = ()
Enter fullscreen mode Exit fullscreen mode

OCaml distinguishes between regular strings and format strings. The latter are complex structures which encode type information inside them. They are parsed and turned into these structures either when the compiler sees a string literal and'realizes' that a format string is expected, or when you (the programmer) explicitly asks for the conversion. Another example:

# let fmt = "Hello, %s!\n" ^^ "";;
val fmt :
  (string -> '_weak5, '_weak6, '_weak7, '_weak8, '_weak8, '_weak5) format6 =
  CamlinternalFormatBasics.Format
   (CamlinternalFormatBasics.String_literal ("Hello, ",
     CamlinternalFormatBasics.String (CamlinternalFormatBasics.No_padding,
      CamlinternalFormatBasics.String_literal ("!\n",
       CamlinternalFormatBasics.End_of_format))),
   "Hello, %s!\n%,")

# Printf.printf fmt "Bob";;
Hello, Bob!
- : unit = ()
Enter fullscreen mode Exit fullscreen mode

The ^^ operator is the format string concatenation operator. Think of it as a more powerful version of the string concatenation operator, ^. It can concatenate either format strings that have already been bound to a name, or string literals which it interprets as format strings:

# bob ^^ bob;;
- : (unit, out_channel, unit, unit, unit, unit) format6 =
CamlinternalFormatBasics.Format
 (CamlinternalFormatBasics.String_literal ("bob",
   CamlinternalFormatBasics.String_literal ("bob",
    CamlinternalFormatBasics.End_of_format)),
 "bob%,bob")

# bob ^^ "!";;
- : (unit, out_channel, unit, unit, unit, unit) format6 =
CamlinternalFormatBasics.Format
 (CamlinternalFormatBasics.String_literal ("bob",
   CamlinternalFormatBasics.Char_literal ('!',
    CamlinternalFormatBasics.End_of_format)),
 "bob%,!")
Enter fullscreen mode Exit fullscreen mode

Custom formatting functions

The really amazing thing about format strings is that you can define your own functions which use them to output formatted text. For example:

# let shout fmt = Printf.ksprintf (fun s -> s ^ "!") fmt;;
val shout : ('a, unit, string, string) format4 -> 'a = <fun>

# shout "hello";;
- : string = "hello!"

# let jim = "Jim";;
val jim : string = "Jim"

# shout "Hello, %s" jim;;
- : string = "Hello, Jim!"
Enter fullscreen mode Exit fullscreen mode

This is really just a simple example; you actually are not restricted to outputting only strings from ksprintf. You can output any data structure you like. Think of ksprintf as '(k)ontinuation-based sprintf'; in other words, it takes a format string (fmt), any arguments needed by the format string (eg jim), builds the output string, then passes it to the continuation that you provide (fun s -> ...), in which you can build any value you want. This value will be the final output value of the function call.

Again, this is just as type-safe as the basic printf function:

# shout "Hello, jim" jim;;
Error: This expression has type
         ('a -> 'b, unit, string, string, string, 'a -> 'b)
         CamlinternalFormatBasics.fmt
       but an expression was expected of type
         ('a -> 'b, unit, string, string, string, string)
         CamlinternalFormatBasics.fmt
       Type 'a -> 'b is not compatible with type string
Enter fullscreen mode Exit fullscreen mode

This error message looks a bit scary, but the real clue here is in the last line: an extra string argument was passed in, but it was expecting 'a -> 'b. Unfortunately the type error here is not that great because of how powerful and general this function is. Because it could potentially accept any number of arguments depending on the format string, its type is expressed in a very general way. This is a drawback of format strings to watch out for. But once you are familiar with it, it's typically not a big problem. You just need to match up the conversion specifications like % with the actual arguments passed in after the format string.

You might have noticed that the function is defined with let shout fmt = .... It doesn't look like it could accept 'any number of arguments'. The trick here is that in OCaml, every function accepts only a single argument and returns either a final non-function value, or a new function. In the case of functions which use format strings, it depends on the conversion specifications, so the formal definition shout fmt could potentially turn into a call like shout "%s bought %d apples today" bob num_apples. As a shortcut, you can think of the format string fmt as a variadic argument which can potentially turn into any number of arguments at the callsite.

More reading

You can read more about OCaml's format strings functionality in the documentation for the Printf and Format modules. There is also a gentle guide to formatting text, something OCaml has fairly advanced support for because it turns out to be a pretty common requirement to print out the values of various things at runtime.

On that note, I have also written more about defining custom formatted printers for any value right here on dev.to. Enjoy 🐫

Top comments (0)