Kosaraju's Algorithm Explained
Intro
During my daily coding practice, I encountered an interesting problem - 1682. Flight Routes Check. Solving this problem requires finding all strongly connected components (SCCs) in a directed graph. After some research, I discovered Kosaraju’s algorithm, which solves this problem in linear time. That is, the time complexity is
$$ O(V+E) $$
Where $V$ refers to the nodes and $E$ refers to the edges in the graph.
By interesting, I mean that Kosaraju’s algorithm is easy to implement yet a bit tricky to understand fully. In my opinion, knowing why it works matters more than just memorizing how to code it. That’s why I’m sharing this short post - to break down the key insights.
Kosaraju’s Algorithm
Let me introduce some key notations:
- $G$: The original graph directed graph.
- $G’$: The transpose of $G$ - formed by reversing all edge directions.
Algorithm
- Reverse all edge directions of $G$ to get $G’$.
- DFS: Perform a depth-first search (DFS) traversal on $G$, pushing nodes onto a stack in order of their finishing time (post order).
- DFS again: Process nodes from the stack (in decreasing order of finishing time) and performs DFS on the transpose graph $G’$. Each DFS tree discovered in this phase represents a SCC.
The key differences between the two DFS phases lie in which graph they operate on: the first DFS processes the original graph $G$, while the second operates on the transpose graph $G’$.
An intuition of Kosaraju’s Algorithm
flowchart subgraph G SCC1(SCC1) --> SCC2(SCC2) --> n(...) --> SCC3(SCC3 ) end
flowchart subgraph G' CCS3(SCC3) --> m(...) --> CCS2(SCC2) --> CCS1(SCC1) end
We can condense the graph by compressing each SCC into a single node. This transformation converts the original directed graph into a DAG. I use Mermaid to draw two graphs to help you understand how the compression works.
In a DAG
- A source is a node that has no incoming edges.
- A sink is a node that has no outcoming edges.
During the first DFS phase, the node finishes last (i.e., has the highest finishing time) must be a source in the condensed DAG $G$ (the $SCC1$ of $G$ in the diagram).
During the second DFS phase, the DFS starts from the sink of the condensed DAG $G’$ (corresponding to the source of the condensed DAG $G$, that is, $SCC1$). One thing to notice is that the edge directions between SCC also got reversed in $G’$. As a result, the DFS will only visit nodes within a SCC. For example, a DFS that starts from $SCC1$ won’t escape from $SCC1$ in $G’$.
This key property ensures that each DFS traversal in the second DFS phase will exactly find one SCC.
This algorithm depends on a fundamental property of SCC: if you reverse each edge within a SCC, it’s still a SCC.
Example
I would just paste my solution to 1682. Flight Routes Check here.
#include <algorithm>
#include <iostream>
#include <vector>
using std::cin;
using std::cout;
using std::fill;
using std::vector;
void dfs(int cur, const vector<vector<int>> &graph, vector<bool> &visited,
vector<int> &stack, bool recorded) {
if (visited[cur]) {
return;
}
visited[cur] = true;
for (const auto &out : graph[cur]) {
dfs(out, graph, visited, stack, recorded);
}
if (recorded) {
// finishing time
stack.push_back(cur);
}
}
int main() {
// IO
int N{}, M{};
cin >> N >> M;
vector<vector<int>> G(N + 1);
vector<vector<int>> GT(N + 1);
for (auto i{0}; i < M; i++) {
int x{}, y{};
cin >> x >> y;
G[x].push_back(y);
GT[y].push_back(x);
}
// DFS on original graph (G)
vector<bool> visited(N + 1, false);
vector<int> stack;
for (auto i{1}; i <= N; i++) {
if (!visited[i]) {
// Set recorded to true to put nodes to a stack by finishing time
dfs(i, G, visited, stack, true);
}
}
// Reset visited
fill(visited.begin(), visited.end(), false);
// DFS on transpose graph (GT)
vector<int> ans;
while (!stack.empty()) {
const auto node{stack.back()};
if (!visited[node]) {
ans.push_back(node);
dfs(node, GT, visited, stack, false);
}
stack.pop_back();
}
if (ans.size() > 1) {
cout << "NO\n" << ans[1] << " " << ans[0] << "\n";
} else {
cout << "YES\n";
}
}