Kosaraju's Algorithm Explained

MartinLwx included in category Algorithm

2025-04-26 2025-04-26 677 words 3 minutes

Contents

Intro

During my daily coding practice, I encountered an interesting problem - 1682. Flight Routes Check. Solving this problem requires finding all strongly connected components (SCCs) in a directed graph. After some research, I discovered Kosaraju’s algorithm, which solves this problem in linear time. That is, the time complexity is

$$ O(V+E) $$

Where $V$ refers to the nodes and $E$ refers to the edges in the graph.

By interesting, I mean that Kosaraju’s algorithm is easy to implement yet a bit tricky to understand fully. In my opinion, knowing why it works matters more than just memorizing how to code it. That’s why I’m sharing this short post - to break down the key insights.

Kosaraju’s Algorithm

Let me introduce some key notations:

$G$: The original graph directed graph.
$G’$: The transpose of $G$ - formed by reversing all edge directions.

Algorithm

Reverse all edge directions of $G$ to get $G’$.
DFS: Perform a depth-first search (DFS) traversal on $G$, pushing nodes onto a stack in order of their finishing time (post order).
DFS again: Process nodes from the stack (in decreasing order of finishing time) and performs DFS on the transpose graph $G’$. Each DFS tree discovered in this phase represents a SCC.

Tip

The key differences between the two DFS phases lie in which graph they operate on: the first DFS processes the original graph $G$, while the second operates on the transpose graph $G’$.

An intuition of Kosaraju’s Algorithm

flowchart
    subgraph G
        SCC1(SCC1) --> SCC2(SCC2) --> n(...) --> SCC3(SCC3 )
    end

flowchart
    subgraph G'
        CCS3(SCC3) --> m(...) --> CCS2(SCC2) --> CCS1(SCC1)
    end

We can condense the graph by compressing each SCC into a single node. This transformation converts the original directed graph into a DAG. I use Mermaid to draw two graphs to help you understand how the compression works.

Tip

In a DAG

A source is a node that has no incoming edges.
A sink is a node that has no outcoming edges.

During the first DFS phase, the node finishes last (i.e., has the highest finishing time) must be a source in the condensed DAG $G$ (the $SCC1$ of $G$ in the diagram).

During the second DFS phase, the DFS starts from the sink of the condensed DAG $G’$ (corresponding to the source of the condensed DAG $G$, that is, $SCC1$). One thing to notice is that the edge directions between SCC also got reversed in $G’$. As a result, the DFS will only visit nodes within a SCC. For example, a DFS that starts from $SCC1$ won’t escape from $SCC1$ in $G’$.

This key property ensures that each DFS traversal in the second DFS phase will exactly find one SCC.

This algorithm depends on a fundamental property of SCC: if you reverse each edge within a SCC, it’s still a SCC.

Example

I would just paste my solution to 1682. Flight Routes Check here.

#include <algorithm>
#include <iostream>
#include <vector>

using std::cin;
using std::cout;
using std::fill;
using std::vector;

void dfs(int cur, const vector<vector<int>> &graph, vector<bool> &visited,
         vector<int> &stack, bool recorded) {
  if (visited[cur]) {
    return;
  }
  visited[cur] = true;
  for (const auto &out : graph[cur]) {
    dfs(out, graph, visited, stack, recorded);
  }
  if (recorded) {
    // finishing time
    stack.push_back(cur);
  }
}
int main() {
  // IO
  int N{}, M{};
  cin >> N >> M;
  vector<vector<int>> G(N + 1);
  vector<vector<int>> GT(N + 1);
  for (auto i{0}; i < M; i++) {
    int x{}, y{};
    cin >> x >> y;
    G[x].push_back(y);
    GT[y].push_back(x);
  }

  // DFS on original graph (G)
  vector<bool> visited(N + 1, false);
  vector<int> stack;
  for (auto i{1}; i <= N; i++) {
    if (!visited[i]) {
      // Set recorded to true to put nodes to a stack by finishing time
      dfs(i, G, visited, stack, true);
    }
  }

  // Reset visited
  fill(visited.begin(), visited.end(), false);

  // DFS on transpose graph (GT)
  vector<int> ans;
  while (!stack.empty()) {
    const auto node{stack.back()};
    if (!visited[node]) {
      ans.push_back(node);
      dfs(node, GT, visited, stack, false);
    }
    stack.pop_back();
  }

  if (ans.size() > 1) {
    cout << "NO\n" << ans[1] << " " << ans[0] << "\n";
  } else {
    cout << "YES\n";
  }
}