Tail call and Tail-call Optimization (TCO)
Tail call & Tail-recursion
Assume that function A
calls function B
. We call function A
Caller and function B
Callee.
The tail call refers to when the Caller only needs to wait for the return value of the Callee, as everything else has already been completed1.
If this is a recursive function call(e.g. the Caller and Callee are the same function), then the function is said to be tail-recursive.
Tail-call Optimization (TCO)
According to the mechanism of the function call, a stack frame will be created to store some useful information such as the return address. What if the function call is a tail-call? Could you find any optimization chance?
By the definition of tail call, we know that the Caller has done everything and just needs to keep waiting. Most of the information of the Caller is no longer needed. Then we can reuse the stack frame of the Caller instead of creating a new one. That’s exactly what the tail call optimization (TCO) does2
The benefit of TCO is that the overhead of creating new stack frames is avoided. We can safely use tail-recursion without worrying about the famous stack overflow issue, as long as the programming language supports TCO, and it’s enabled.
Make recursion tail-recursion
To get the benefits of TCO, we need to learn how to rewrite any naive recursion function to a tail-recursion. One of the popular ways is the accumulator pattern.
Steps to follow3
- Create a helper function which usually has two parameters
- The data to be processed
- The accumulated value
- The base case of the helper function is: returning the accumulated value
- The base case of the original recursive function changes to calling the helper function
Let’s use a classic recursive function which calculates the factorial as an example. First, create a new project with dune
$ dune init project tco
Then we can easily write a naive recursive function in ./tco/bin/main.ml` as follows
let rec factorial n =
match n with
0 | 1 -> 1
| n -> n * factorial (n - 1)
It’s obvious that this function is not tail-recursive. The reason is that we still need to multiply the return value of factorial (n - 1)
by n
.
Now we can rewrite this function to a tail-recursive function using the accumulator pattern.
let factorial_tco n =
let rec helper cur acc =
match cur with
| 0 | 1 -> acc (* Return the accumulated value *)
| cur -> (helper [@tailcall]) (cur - 1) (acc * cur)
in
helper n 1 (* Call the helper function *)
[@tailcall]
annotation to ensure a function call is a tail call, otherwise, the compiler will throw a warning.To benchmark the two implementations, we can use the core_bench tool. Let’s first install the required packages
$ opam install core_bench
After the installation, we need to change the configuration file ./tco/bin/dune
(executable
(public_name tco)
(name main)
(libraries tco core core_bench core_unix.command_unix))
Now, we can add the performance testing code in ./tco/bin/main.ml
as follows
open Core
open Core_bench
let rec factorial n =
match n with
| 0 | 1 -> 1
| n -> n * factorial (n - 1)
let factorial_tco n =
let rec helper cur acc =
match cur with
| 0 | 1 -> acc
| cur -> (helper [@tailcall]) (cur - 1) (acc * cur)
in
helper n 1
(* Benchmark *)
let bench () =
Bench.make_command
[
Bench.Test.create ~name:"factorial" (fun () -> factorial 10000);
Bench.Test.create ~name:"factorial_tco" (fun () -> factorial_tco 10000);
]
|> Command_unix.run
let _ = bench ()
Build then run
$ dune build
$ dune exec tco
The output of performance testing in my MacBook M3 Air is:
Estimated testing time 20s (2 benchmarks x 10s). Change using '-quota'.
┌───────────────┬──────────┬────────────┐
│ Name │ Time/Run │ Percentage │
├───────────────┼──────────┼────────────┤
│ factorial │ 40.42us │ 100.00% │
│ factorial_tco │ 10.06us │ 24.90% │
└───────────────┴──────────┴────────────┘
The performance gain of TCO is quite impressive 🚀
10000
to other values to explore the differences.Wrap-up
In my opinion, the TCO stores the accumulated value in a specific parameter which avoids the overhead of creating new stack frames. We gain the performance boost but lose the beauty of recursion. I would suggest that only try to rewrite recursive functions to tail-recursive functions as needed.