Use Polymorphic Variant Type in Ocaml

I have been learning and using OCaml for a while. However, I’m still quite confused about the Polymorphic variant feature. Recently, while exploring the Yojson library, I came across this feature again. After some research, I couldn’t find any comprehensive articles about polymorphic variants (the official introduction is not easy to grasp this feature IMHO), only a few related answers12. So I decided to write an article, hoping it will be helpful to you :)

Info
I’m not an OCaml expert, which means that this article may not be as comprehensive as others.

Use bracket [] to enclose all constructors, and tag each constructor with a backtick ( ` ). For example,

ocaml

type either = [
  | `A of int
  | `B
]

When creating a value of such type, we also need to add the backtick

ocaml

utop # let v = `A 1;;
val v : [> `A of int/2 ] = `A 1
Tip
The meaning of int/2 can be found in this StackOverflow link

Previously we defined a value v, you may find that the type of v is not either but [> ...]. What does this mean? Based on the syntax of the polymorphic variant, we know that the [ ... ] refers to the polymorphic variant type, then what about the > symbol?

It has a similar meaning with the > in mathematics, that is, the polymorphic variant type can have more constructors. For example,

ocaml

[`A of int | `B]
[`A of int | `B | `C]
...

Thus, the polymorphic variant is also called open variant.


Similarly, there exists [< ...], which means it has these constructors at most. Take the following type as an example,

ocaml

[< `A of int | `B]

There can be the following three scenarios.

ocaml

[`A of int]
[`B]
[`A of int | `B]

We can try pattern matching on the polymorphic variant argument of a function. For example,

ocaml

utop # let foo = function
  | `A -> "A"
  | `B -> "B" ;;
  
val foo : [< `A | `B ] -> string/2 = <fun>

We can see that the OCaml infers the parameter type as [< ...], which is reasonable because the function cannot handle other constructors if we provide them. For example,

ocaml

utop # foo `C;;
Error: This expression has type [> `C ] but an expression was expected of type
         [< `A | `B ]
       The second variant type does not allow tag(s) `C

Now let’s replace the case B with case C in the foo function.

ocaml

utop # let bar = function
  | `A -> "A"
  | `C -> "C"
  
val bar : [< `A | `C ] -> string/2 = <fun>

Then we can try to put the foo and bar functions in the same list. What type should this list of functions have? Let’s try inferring the type ourselves. Given that:

  • The foo function can only handle case A or case B
  • The bar function can only handle case A or case C
  • Each item in the same list should have the same type
Warning
You may find that I omit the backtick symbol when writing inline code. This is because they would conflict with Markdown’s inline code syntax

We can conclude that the list should have the ([< A] -> string) list type, and the output from utop confirms this conclusion.

ocaml

utop # let funcs = [ foo; bar ];;
val funcs : ([< `A ] -> string/2) list/2 = [<fun>; <fun>]
Info
From the previous examples, we can see that OCaml can automatically perform type inference based on the use of [> ...] and [< ...], determining the matching type.

Additionally, when performing pattern matching, you can use the #some_variant syntax to represent the entire polymorphic variant. For example,

ocaml

type my_type =
  [ `A
  | `B
  ]

let process_a_b_c = function
  | #my_type -> "A or B"
  | `C -> "C"
Info
You can find more use cases in this amazing paper Programming with Polymorphic Variants

The most intuitive use case in my opinion is that we can use the polymorphic variant type without declaring it first.

The following example demonstrates this idea.

ocaml

type return_type =
  | A of int
  | B

val f : int -> return_val

By using the polymorphic variant, we can write the code like this.

ocaml

val f : int -> [ `A of int | `B ]

When using a library, you may need to use its defined data types. In this case, you can list this library as a dependency. For example,

ocaml

(* ./mycolor.ml *)
type rgb = Red | Green | Blue

(* ./main.ml *)
let process_color = function
  | Mycolor.Red -> "Red"    (* the Main module depends on Mycolor *)
  | Mycolor.Green -> "Green"
  | Mycolor.Blue -> "Blue"

However, if the data type is defined using a polymorphic variant, this dependency can be eliminated.

ocaml

(* ./mycolor.ml *)
type rgb = [ `Red | `Green | `Blue ]
  
(* ./main.ml *)
let process_color = function
  | `Red -> "Red"    (* the dependence is gone *)
  | `Green -> "Green"
  | `Blue -> "Blue"
Tip
Usually, you can find the definition of the polymorphic variant in the documentation, such as Yojson’s JSON

Polymorphic variant type is particularly suitable for cases where different variant types share common parts.

For example, let’s say the common type is the common part of foo type and bar type.

ocaml

type common = A | B

type foo =
  | Common of common (* it's kind of redundant *)
  | C
  | D

type bar =
  | Common of common
  | E
  | F

Now, suppose that there is a show function to print the above three types. In this case, you have to write three separate functions, as shown below.

ocaml

let show = function
  | A -> "A"
  | B -> "B"

let show_foo arg =
  match (arg : foo) with
  | Common v -> show v
  | C -> "C"
  | D -> "D"

let show_bar arg =
  match (arg : bar) with
  | Common v -> show v
  | E -> "E"
  | F -> "F"

In addition, we also need to add the :foo and :bar to distinguish different types. However, if we use a polymorphic variant, the above code can be simplified to

ocaml

type common = [ `A | `B ]

type foo = [ common | `C | `D ]

type bar = [ common | `E | `F ]

let show = function
  | `A -> "A"
  | `B -> "B"
  | `C -> "C"
  | `D -> "D"
  | `E -> "E"
  | `F -> "F"

The polymorphic variant is quite flexible but comes with a cost3

  1. Complexity: The typing rule is more complex compared to the nominal variant type. You will find that the error message is more dense.
  2. Error-finding: It may be harder to catch the bugs.
  3. Efficiency: The polymorphic variant is less efficient.