Prefix Sum Array: the secret to fast range sum query and more

There is a type of problem where you are given an array $arr$ and $Q$ queries. Each query is represented as $query(l, r)$, which asks for the sum of the elements in the subarray $[l, r]$, i.e., $arr[l] + arr[l + 1] + … + arr[r]$.

If we handle each query using a brute-force approach, the time complexity will be $O(N)$. Thus, solving $Q$ queries would require $O(NQ)$. Is there a more efficient approach?

Of course! This brings us to today’s topic: the prefix sum array.

Let’s take a look at the prefix sum array.

$$ \texttt{prefix}[k]=\sum_{i=0}^{k-1}arr[i] $$

The definition is straightforward: $\texttt{prefix}[k]$ is the sum of the first $k$ numbers in $arr$. This is why it is called the prefix sum array.

The time complexity of computing $\texttt{prefix}$ is $O(N)$ with the following equation

$$ \texttt{prefix}[k+1] = \texttt{prefix}[k] + arr[k] $$

According to the definition of $\texttt{prefix}$, the range sum can be calculated by the following equation.

$$ query[l, r]=\texttt{prefix}[r+1]-\texttt{prefix}[l] $$

We can observe that the time complexity of processing a single query has now been reduced to $O(1)$, and the time complexity of handling queries is

$$ O(N + Q)\ll O(NQ) $$

Tip

The tip is: the length of prefix sum array = the length of original array + 1.

With the prefix sum array defined, the code is just 3 lines long.

python

prefix_sum = [0 for _ in range(len(arr) + 1)]
for i in range(len(arr)):
	prefix_sum[i + 1] = prefix_sum[i] + arr[i]

But I prefer to use the accumulate function :)

python

from itertools import accumulate
prefix_sum = list(accumulate(a, initial=0))

Previously, we assumed that the array $arr$ was static, meaning it won’t change after initialization. However, sometimes problems require implementing an update function $update(arr, l, r, x)$: adds or subtracts a value x to all the elements in range $[l, r]$. A classic example of this can be found in SPOJ.HAYBALE-Haybale stacking.

If we stick to the brute-force approach, handling an update operation $update(arr, a, b, x)$ would require $O(N)$. Solving $K$ update operations would then take $O(NK)$, which is impractical in competitive programming and will result in a TLE.

To reduce the overhead of the update operation, we shouldn’t update each element directly. Instead, we use a technique called lazy update. Let’s examine the following equation carefully.

$$ \texttt{prefix}[i+1] = \texttt{prefix}[i] + arr[i] $$

The prefix sum array accumulates the values from the original array. What happens if we execute $arr[l] + x$? All values after index $l$ in the prefix sum array would also increase by $x$. However, we only want to add $x$ within the range $[l, r]$. So what should we do? we execute $arr[r + 1] - x$ to cancle the side effect.

python

def update(arr, l, r, x):
    arr[l] += x
    arr[r + 1] -= x

Now the time complexity of the update function is $O(1)$, and handling $K$ update operations only needs $O(K)$. Note that we also need $O(N)$ to compute the prefix sum array.

$$ O(N + K)\ll O(NK) $$

Example

If you have difficulty understanding the update operation, you can try to infer this example here. Let’s say we want to add 1 to all the elements within the range [2, 4].

Index 0 1 2 3 4 5 6
arr 1 2 3 4 5 6
Calculate the Prefix Sum 0 1 3 6 10 15 21
Add 1 to [2, 4] with the Lazy Update trick 1 2 4 (+1) 4 5 5 (-1)
Calculate the Prefix Sum 0 1 3 7 11 16 21

From the output of the prefix sum array, prefix[3] ~ prefix[5] changes from 6, 10, 15 to 7, 11, 16 but keep prefix[0], prefix[1], prefix[2], prefix[6] the same.