Use Polymorphic Variant Type in Ocaml

MartinLwx included in category Programming-Languages

2024-11-24 2024-11-24 1261 words 6 minutes

Contents

Intro

I have been learning and using OCaml for a while. However, I’m still quite confused about the Polymorphic variant feature. Recently, while exploring the Yojson library, I came across this feature again. After some research, I couldn’t find any comprehensive articles about polymorphic variants (the official introduction is not easy to grasp this feature IMHO), only a few related answers¹². So I decided to write an article, hoping it will be helpful to you :)

Info

I’m not an OCaml expert, which means that this article may not be as comprehensive as others.

Syntax

Use bracket [] to enclose all constructors, and tag each constructor with a backtick ( ` ). For example,

type either = [
  | `A of int
  | `B
]

When creating a value of such type, we also need to add the backtick

utop # let v = `A 1;;
val v : [> `A of int/2 ] = `A 1

Lower Bound and Upper Bound

Tip

The meaning of int/2 can be found in this StackOverflow link

Previously we defined a value v, you may find that the type of v is not either but [> ...]. What does this mean? Based on the syntax of the polymorphic variant, we know that the [ ... ] refers to the polymorphic variant type, then what about the > symbol?

It has a similar meaning with the > in mathematics, that is, the polymorphic variant type can have more constructors. For example,

[`A of int | `B]
[`A of int | `B | `C]
...

Thus, the polymorphic variant is also called open variant.

Similarly, there exists [< ...], which means it has these constructors at most. Take the following type as an example,

[< `A of int | `B]

There can be the following three scenarios.

[`A of int]
[`B]
[`A of int | `B]

From these examples, we can see that the polymorphic variant type is quite flexible. Its type definition may take these forms:

[ ... ] indicates that only these variants are allowed.
[> ... ] means the additional variants are supported.
[< ... ] means at most these variants are supported.

Pattern Matching

We can try pattern matching on the polymorphic variant argument of a function. For example,

utop # let foo = function
  | `A -> "A"
  | `B -> "B" ;;
  
val foo : [< `A | `B ] -> string/2 = <fun>

We can see that the OCaml infers the parameter type as [< ...], which is reasonable because the function cannot handle other constructors if we provide them. For example,

utop # foo `C;;
Error: This expression has type [> `C ] but an expression was expected of type
         [< `A | `B ]
       The second variant type does not allow tag(s) `C

Now let’s replace the case B with case C in the foo function.

utop # let bar = function
  | `A -> "A"
  | `C -> "C"
  
val bar : [< `A | `C ] -> string/2 = <fun>

Then we can try to put the foo and bar functions in the same list. What type should this list of functions have? Let’s try inferring the type ourselves. Given that:

The foo function can only handle case A or case B
The bar function can only handle case A or case C
Each item in the same list should have the same type

Warning

You may find that I omit the backtick symbol when writing inline code. This is because they would conflict with Markdown’s inline code syntax

We can conclude that the list should have the ([< A] -> string) list type, and the output from utop confirms this conclusion.

utop # let funcs = [ foo; bar ];;
val funcs : ([< `A ] -> string/2) list/2 = [<fun>; <fun>]

Info

From the previous examples, we can see that OCaml can automatically perform type inference based on the use of [> ...] and [< ...], determining the matching type.

Additionally, when performing pattern matching, you can use the #some_variant syntax to represent the entire polymorphic variant. For example,

type my_type =
  [ `A
  | `B
  ]

let process_a_b_c = function
  | #my_type -> "A or B"
  | `C -> "C"

Use cases

Info

You can find more use cases in this amazing paper Programming with Polymorphic Variants

Using without declaring it first

The most intuitive use case in my opinion is that we can use the polymorphic variant type without declaring it first.

The following example demonstrates this idea.

type animals =
  | Dog
  | Cat

let make_sound = function
  | Dog -> "It's a dog"
  | Cat -> "It's a cat"

Here, I defined a function called make_sound. By using polymorphic variant, it can be written like this:

let make_sound = function
  | `Dog -> "It's a dog"
  | `Cat -> "It's a cat"
(* val make_sound : [< `Cat | `Dog ] -> string = <fun> *)

When using a library, you may need to use its defined data types. In this case, you can list this library as a dependency. For example,

(* ./mycolor.ml *)
type rgb = Red | Green | Blue

(* ./main.ml *)
let process_color = function
  | Mycolor.Red -> "Red"    (* the Main module depends on Mycolor *)
  | Mycolor.Green -> "Green"
  | Mycolor.Blue -> "Blue"

However, if the data type is defined using a polymorphic variant, this dependency can be eliminated.

(* ./mycolor.ml *)
type rgb = [ `Red | `Green | `Blue ]
  
(* ./main.ml *)
let process_color = function
  | `Red -> "Red"    (* the dependence is gone *)
  | `Green -> "Green"
  | `Blue -> "Blue"

Tip

Usually, you can find the definition of the polymorphic variant in the documentation, such as Yojson’s JSON

Subtype

Polymorphic variant type is particularly suitable for cases where different variant types share common parts.

For example, let’s say the common type is the common part of foo type and bar type.

type common = A | B

type foo =
  | Common of common (* it's kind of redundant *)
  | C
  | D

type bar =
  | Common of common
  | E
  | F

Now, suppose that there is a show function to print the above three types. In this case, you have to write three separate functions, as shown below.

let show = function
  | A -> "A"
  | B -> "B"

let show_foo arg =
  match (arg : foo) with
  | Common v -> show v
  | C -> "C"
  | D -> "D"

let show_bar arg =
  match (arg : bar) with
  | Common v -> show v
  | E -> "E"
  | F -> "F"

In addition, we also need to add the :foo and :bar to distinguish different types. However, if we use a polymorphic variant, the above code can be simplified to

let show = function
  | `A -> "A"
  | `B -> "B"
  | `C -> "C"
  | `D -> "D"
  | `E -> "E"
  | `F -> "F"

Drawbacks

The polymorphic variant is quite flexible but comes with a cost³

Complexity: The typing rule is more complex compared to the nominal variant type. You will find that the error message is more dense.
Error-finding: It may be harder to catch the bugs.
Efficiency: The polymorphic variant is less efficient.

Contents

Use Polymorphic Variant Type in Ocaml

Intro

Syntax

Lower Bound and Upper Bound

Pattern Matching

Use cases

Using without declaring it first

Share datatype

Subtype

Drawbacks

Ref