Class Hierarchy Analysis: a quick way to generate call graph

MartinLwx included in category Program-Analysis

2025-03-19 2025-03-19 1208 words 6 minutes

Contents

The key to call graph construction

For an OOP programming language, the key challenge in call graph construction is handling the virtual call, as it may involve multiple target methods, as shown in the following table¹.

	Static Call	Special Call	Virtual Call
Instruction	`invokestatic`	`invokespecial`	`invokeinterface, invokevirtual`
Receiver Objects	❌	✅	✅
Target Methods	Static Method	Constructor, Private Instance Method, Superclass Instance Method	Other Instance Method
Count of Possible Target Methods	1	1	$\ge 1$ (polymorphism)
Determinancy	Compile-time	Compile-time	Run-time

The method call and method signature

Let’s take Java as an example; a method call may have this form.

A a = ...
a.foo()

The a.foo() here is a method call and the receiver of the method call is a which has the declaring type A.

The method signature is our main concern, as it serves as the unique identifier. It can be broken into multiple parts²

Class Type: the declaring type of receiver
Subsignature
- Method Name
- Return Type
- Parameter Types

Note that I define method name, return type, and parameter types as Subsignature and I will explain why later.

For simplicity, let’s introduce a notation for method call²

<Class Type>: <Return Type> <Method Name>(<Parameter Types>)

Then the method call a.foo() can be represented as A: foo().

The Motivations behind the Class Hierarchy Analysis

The core idea of the Class Hierarchy Analysis (CHA) algorithm is to deduce the possible target methods based on the declaring type of the receiver.

As its name suggests, CHA relies on the class hierarchy information and assumes that the receiver may point to its declaring type as well as all the subclasses.

Tip

Sometimes, the receiver’s declaring type is an interface, and CHA assumes that it may point to all classes that implement the interface.

CHA Algorithm

You only need to under two functions ($\texttt{Dispatch}$ and $\texttt{Resolve}$) in order to understand the CHA Algorithm

Dispatch function

Input:

Current class type (denoted as $c$)
The method signature we are considering (denoted as $m$)

Output: the target method being

Algorithm: Let’s consider the function call $\texttt{Dispatch}(c,m)$

If there exists a non-abstract method with a method signature equal to $m$ (excluding class type, i.e., the subsignature mentioned above), then we have found the target method.
Otherwise, call $\texttt{Dispatch}(c’,m)$, where $c’$ is the superclass of $c$

Wrap-up: Understanding $\texttt{Dispatch}$ is easy in my opinion, as it is the way we infer the target method manually.

Resolve function

Tip

You may simply regard the call site as an invoke statement (method call).

Input: call site

Output: all possible target methods for this call site

Algorithm

Initialize an empty set $T$.
Define the method signature of the call site as $m$.
There are different cases:
- If $m$ is a static call, return ${ m }$.
- If $m$ is a special call, retrieve its declaring type $c^m$ and return $T= {\texttt{Dispatch}(c^m, m)}$.
- If $m$ is a virtual call, retrieve its declaring type $c$. For $c$ and all its subclass (both direct and indirect) $c’$, call $\texttt{Dispatch}(c’, m)$ and add all results to $T$.

Wrap-up: From the algorithm, we see that the CHA algorithm enumerates all possible cases when resolving a virtual call.

Call Graph Construction with CHA

Input: the entry method, such as main in the Java.

Output: Call graph

Algorithm:

Initialization
- $\texttt{WorkList} = {Entry\ Method}$
- $\texttt{CG} = \emptyset$, which is a edge set.
- $\texttt{RM} = \emptyset$, that is reached methods. It also serves as a hash set to check whether a method has been visited.
While $\texttt{WorkList}$ is not empty:
- Get a method (denoted as $m$) from $\texttt{workList}$.
- If $m$ is not in $\texttt{RM}$:
  - Add $m$ to $\texttt{RM}$.
  - For each call site $cs$ in method $m$:
    - $T=\texttt{Resolve}(cs)$
    - For each method $m’$ in $T$
      - Add the call edge $cs\rightarrow m’$ to $\texttt{CG}$.
      - Add method $m’$ to $\texttt{WorkList}$ 里面
Return $\texttt{CG}$

CHA Example. ArkAnalyzer

Recently, I was reading the source code of ArkAnalyzer³, and its call graph construction is based on the CHA algorithm. The following code provides an example of what the implementations look like. The full code is hosted here.

Tip

Part of the source code is omitted for brevity.

The method signature

The method signature corresponds to the class MethodSignature

export class MethodSignature {
    private declaringClassSignature: ClassSignature;
    private methodSubSignature: MethodSubSignature;
    ...
}

It has two fields: ClassSignature and MethodSubSignature.

export class ClassSignature {
    private declaringFileSignature: FileSignature;
    private className: string;
    ...
}

export class MethodSubSignature {
    private methodName: string;
    private parameters: MethodParameter[];
    private returnType: Type;
    private staticFlag: boolean;
}

If you combine ClassSignature with MethodSubSignature then you would get the full method signature we mentioned above.

The entry method of Call Graph Construction

The entry method of call graph generation is makeCallGraphCHA, whose parameters are a method signature array.

public makeCallGraphCHA(entryPoints: MethodSignature[]): CallGraph {
    let callGraph = new CallGraph(this);
    let callGraphBuilder = new CallGraphBuilder(callGraph, this);
    callGraphBuilder.buildClassHierarchyCallGraph(entryPoints);
    return callGraph;
}

public buildClassHierarchyCallGraph(entries: Method[], displayGeneratedMethod: boolean = false): void {
    ...
    let classHierarchyAnalysis: ClassHierarchyAnalysis = 
        new ClassHierarchyAnalysis(this.scene, this.cg);
    classHierarchyAnalysis.start(displayGeneratedMethod);
}

CHA implementation

Let’s take a closer look at the start method.

protected init(): void {
    this.processedMethod = new Set();
    this.cg.getEntries().forEach((entryFunc) => {
        this.workList.push(entryFunc);
    })
}

public start(displayGeneratedMethod: boolean): void {
    this.init();
    while (this.workList.length !== 0) {
        const method = this.workList.shift() as FuncID;
        const cgNode = this.cg.getNode(method) as CallGraphNode;

        if (this.processedMethod.has(method) || cgNode.isSdkMethod()) {
            continue;
        }

        ...
        
        this.processMethod(method).forEach((cs: CallSite) => {
            let me = this.cg.getArkMethodByFuncID(cs.calleeFuncID);

            this.addCallGraphEdge(method, me, cs, displayGeneratedMethod);

            if (!this.processedMethod.has(cs.calleeFuncID)) {
                this.workList.push(cs.calleeFuncID);
                this.processedMethod.add(cs.callerFuncID);
            }
        })
    }
}

It can be seen from the code above that the code is indeed the CHA algorithm’s implementation.

Dispatch + Resolve

In the ArkAnalyzer, the logics of $\texttt{Dispatch}$ and $\texttt{Resolve}$ are put in the resolvecall method, which is called by the processMethod method.

protected processMethod(methodID: FuncID): CallSite[] {
    ...
    cfg.getStmts().forEach((stmt) => {
        if (stmt.containsInvokeExpr()) {
            this.resolveCall(cgNode.getID(), stmt).forEach(stmt => calleeMethods.push(stmt));
        }
    })

    return calleeMethods;
}

The source code of resolveCall is

public resolveCall(callerMethod: NodeID, invokeStmt: Stmt): CallSite[] {
    let invokeExpr = invokeStmt.getInvokeExpr();
    let resolveResult: CallSite[] = [];

    if (!invokeExpr) {
        return [];
    }

    // process anonymous method call
    this.getParamAnonymousMethod(invokeExpr).forEach(method => {
        resolveResult.push(
            new CallSite(
                invokeStmt,
                undefined,
                this.cg.getCallGraphNodeByMethod(method).getID(),
                callerMethod,
            )
        );
    });

    // 返回方法
    let calleeMethod = this.resolveInvokeExpr(invokeExpr);
    if (!calleeMethod) {
        return resolveResult;
    }

    if (invokeExpr instanceof ArkStaticInvokeExpr) {
        // NOTE: Case 1. Static Method Call
        resolveResult.push(
            new CallSite(
                invokeStmt,
                undefined,
                this.cg.getCallGraphNodeByMethod(calleeMethod!.getSignature()).getID(),
                callerMethod!
            )
        );
    } else {
        let declareClass = calleeMethod.getDeclaringArkClass();
        // NOTE: Case 2. Virtual Call
        this.getClassHierarchy(declareClass).forEach((arkClass: ArkClass) => {
            if (arkClass.isAbstract()) {
                return;
            }

            let possibleCalleeMethod = arkClass.getMethodWithName(calleeMethod!.getName());

            if (possibleCalleeMethod && possibleCalleeMethod.isGenerated() &&
                arkClass.getSignature().toString() !== declareClass.getSignature().toString()) {
                // remove the generated method in extended classes
                return;
            }

            if (possibleCalleeMethod && !possibleCalleeMethod.isAbstract()) {
                resolveResult.push(
                    new CallSite(
                        invokeStmt,
                        undefined,
                        this.cg.getCallGraphNodeByMethod(possibleCalleeMethod.getSignature()).getID(),
                        callerMethod
                    )
                );
            }
        });
    }

    return resolveResult;
}

Compared to the standard CHA algorithm, it seems that ArkAnalyzer’s implementation takes all non-static method calls as virtual calls.

Wrap-up

That’s all for the CHA algorithm. It’s simple and fast to compute. However, it is not very precise and usually has false positives. For example,

A a = new B();
a.foo();

The CHA algorithm would report both A and B are possible target methods for foo, as it doesn’t take new B() into account. This drawback can be remedied by pointer analysis, which I will discuss in the future :0

Contents

Class Hierarchy Analysis: a quick way to generate call graph

The key to call graph construction

The method call and method signature

The Motivations behind the Class Hierarchy Analysis

CHA Algorithm

Dispatch function

Resolve function

Call Graph Construction with CHA

CHA Example. ArkAnalyzer

The method signature

The entry method of Call Graph Construction

CHA implementation

Dispatch + Resolve

Wrap-up

Ref