Binary Ninja Shenanigans: Control Flow Unflattening

2023-10-29 2023-10-29 1726 words 9 minutes

Contents

Introduction To Control Flow Flattening

Control Flow Flattening is a code obfuscation technique used to make code harder to reverse engineer. It involves transforming the control flow of a program to make it more difficult to understand and analyse.

CFF removes structured code flow and is very effective at hindering static analysis efforts, while maintaining the original behaviour of a protected binary.

CFF typically has the following characteristics:

Has >= 1 state variable(s)
Control flow decisions are made based on the state variable(s)
State variable(s) are set to a new value, and the code flow is then “rerouted” to the intended block

For the scope of this article, we shall be looking at a trivial example of a CFF-obfuscated calculator binary.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
#include <stdio.h>

int main() {

  char op;
  double first, second;
  printf("Enter an operator (+, -, *, /): ");
  scanf("%c", &op);
  printf("Enter two operands: ");
  scanf("%lf %lf", &first, &second);

  switch (op) {
    case '+':
      printf("%.1lf + %.1lf = %.1lf", first, second, first + second);
      break;
    case '-':
      printf("%.1lf - %.1lf = %.1lf", first, second, first - second);
      break;
    case '*':
      printf("%.1lf * %.1lf = %.1lf", first, second, first * second);
      break;
    case '/':
      printf("%.1lf / %.1lf = %.1lf", first, second, first / second);
      break;
    // operator doesn't match any case constant
    default:
      printf("Error! operator is not correct");
  }
}

Determining The State Variable

We can determine the state variable used for a CFF-obfuscated function by looking at the most commonly written to variable of the function.

1
2
3
4
def get_most_assigned_var(hlil_func: HighLevelILFunction):
    defines = [hlil_func.get_var_definitions(var) for var in hlil_func.vars]
    most_used = max(defines, key=len)
    return most_used[0].vars_written[0]

The function takes in a Binary Ninja HighLevelILFunction and returns the stack variable that is most frequently written to.

Note: this article would be contrained to only 1 state variable used, although the code can be tweaked slightly to allow for multiple state vars.

Getting The Dependencies

In order to recover the original control flow, one must be able to determine what the next state is for every Original Basic Block (OBB). OBBs are code blocks that were in the original program, and not added by the obfuscator. Blocks added by the obfuscator shall be named Control Flattened (CF) blocks for this post.
By the end of any OBBs, the state variable has to be set to a new value, to allow for the program to continue its intended execution. Within the basic blocks, the state variable can be set indirectly via arthimetic with another stack variable as such:

We would now need a way to extract all stack variable dependencies of the state variable, to perform our analysis.

1
2
3
4
5
6
7
8
9
def get_var_dependencies(hlil_func: HighLevelILFunction, var: Variable):
    defines = hlil_func.get_var_definitions(var)
    assigns = list(filter(lambda x:len(x.vars_read) != 0, defines))
    deps = []
    for assign in assigns:
        for v in assign.vars_read:
            if v not in deps:
                deps.append(v)
    return deps

The function iterate through all writes to the specified variable in the given HighLevelILFunction.
All stack variables involved in any of the write operations are treated as a dependency.

Are They Evaluatable?

We now have a way to determine all dependencies of the state variable in question. Another question remains:
Are we now able to determine the next state value at the end of the OBBs?

To answer this question, we now need to be able to determine that each of these dependencies is derivable.

1
2
3
4
5
6
def can_be_derived(hlil_func: HighLevelILFunction, var: Variable):
    deps = get_var_dependencies(hlil_func, var)
    if len(deps) == 0:
        return is_const(h_func,var)
    results = [can_be_derived(h_func,d) for d in deps]
    return all(results)

Eventually, we also need a way to determine is a stack variable is a const.

1
2
3
def is_const(hlil_func: HighLevelILFunction, var: Variable):
    results = hlil_func.get_var_definitions(var)
    return len(results) == 1 and isinstance(results[0], HighLevelILConst)

We can determine if the stack variable is const by ensuring that the variable is only being written to once, with a static value.

Basic Block <-> State Mapping

The next crucial step for deobfuscation would be to build a mapping of states to basic blocks.
The mapping would allow us to be recover the original control flow by stitching the original basic blocks together.

Looking at HLIL, a pattern of state comparing if statements can be seen. It typically follows the format of if (state_var == <SOME VALUE>).

The state comparing statements have been labelled in red for ease of viewing.

By following the True path, we are now sure that the basic block the path leads to is associated with the value in the If comparison.

With this logic in mind, we can iterate over all basic blocks in the function and extract all If statements:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
COLOR_OBB = HighlightStandardColor.GreenHighlightColor
COLOR_CF = HighlightStandardColor.RedHighlightColor
COLOR_UNLABELLED = HighlightStandardColor.NoHighlightColor

for bb in h_func.basic_blocks:
    if len(bb) != 1:
        continue
    insn = bb.get_disassembly_text()[0].il_instruction
    # check if state var is involved in change of control flow
    if isinstance(insn, HighLevelILIf) and insn.vars_read[0] == state_var:
        bb.highlight = COLOR_CF
        # build OBB mapping
        insn.condition.visit(get_cmp_eq_value)

Once we find a If statement, we check that the comparison source is the state variable as found previously.
If the state var is the source of the comparison, a callback is performed to extract the state value, adding it to a dictionary mapping basic blocks addresses to their respective state values:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
def get_cmp_eq_value(operand_name, operand, operand_type, parent):
    if isinstance(operand, HighLevelILCmpE):
        global obbs, bb_mappings
        obbs[operand.instr.true.address] = operand.operands[1].value.value # associate basic block address with state value
        operand.function.source_function.set_comment_at(operand.instr.true.address, hex(obbs[operand.instr.true.address]))
        basic_block = operand.il_basic_block.outgoing_edges[0].target if operand.il_basic_block.outgoing_edges[0].type == BranchType.TrueBranch else operand.il_basic_block.outgoing_edges[1].target
        basic_block.highlight = COLOR_OBB
        bb_mappings[operand.instr.true.address] = basic_block # done to get easier access to a HLIL basic block by an address
        return False # stop traversal
    return True

Every basic block that is not labelled as OBBs are then classified as CF blocks:

1
2
3
4
5
6
7
8
for bb in h_func.basic_blocks:
    if bb.highlight.color == COLOR_UNLABELLED:
        if len(bb.incoming_edges) == 0 or len(bb.outgoing_edges) == 0: # add root basic blocks and tail basic blocks as OBBs
            bb.highlight = COLOR_OBB
            obbs[bb.disassembly_text[0].address] = 0
            bb_mappings[bb.disassembly_text[0].address] = bb
        else:
            bb.highlight = COLOR_CF

We now have a mapping of OBBs and their states:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
>>> obbs
{
    0x40142b: 0x8eba195c,
    0x401542: 0xaaecfd35,
    0x401709: 0xb29e82c8,
    0x4014fd: 0xbe5f66b4,
    0x401494: 0xdf33d858,
    0x4014da: 0xe6416a97,
    0x40162e: 0xfbdea990,
    0x401320: 0xdafe8ea,
    0x40172e: 0x21fe6ab3,
    0x4015b8: 0xfc5cb732,
    0x40144e: 0x3116c698,
    0x401471: 0x4013a81a,
    0x4016aa: 0x643e9500,
    0x4014b7: 0x6595e7e5,
    0x40115b: 0x0,
    0x401760: 0x0
}

Repairing The Control Flow

Determining What’s Next

A typical pattern we can spot for trivial CFF obfuscation would be that at the end of each OBB, the state variable would be set to a new value. The new value serves as an anchor for the program to determine where the execution of the program should continue.

What if some sort of control flow is retained after the CFF obfuscation?

The basic block shown at the top shows that depending on rcx, there can be 2 possible states

As such, we now need a way to determine which control flow to keep within the function, and which to discard. A simple algorithm to do this would be to check if the control flow depends on the state variable!

1
2
3
4
5
6
7
8
9
if len(bb.outgoing_edges) > 1:
    # likely an if block
    possible_state_set_insn: HighLevelILInstruction  = bb.disassembly_text[-2].il_instruction
    # process true
    true_branch = bb.outgoing_edges[0].target
    insn: HighLevelILInstruction = instructions[true_branch.start]
    if len([var for var in insn.vars_written if var in state_var_deps]) == 0: # check if control flow decision depends on state var
        raise RuntimeError("target var is not related to state var")
    target = get_idx_from_state(get_state_value(insn))

Actually Fixing It (Finally…)

In order to repair the control flow of the program, we can utilise a fairly new feature of Binary Ninja called Workflows!

Binary Ninja Workflows is an analysis orchestration framework which simplifies the definition and execution of a computational binary analysis pipeline.
The extensible pipeline accelerates program analysis and reverse engineering of binary blobs at various levels of abstraction.
Workflows supports hybridized execution models, where the ordering of activities in the pipeline can be well-known and procedural, or dynamic and reactive.

tl;dr: The IL can be edited at any point in the analysis. This includes adding and removing IL instructions to conform to what we want to achieve.

NOTE: As of the point of writing, Workflows is still considered an experimental feature, and has no guarantees of API stability and future compatibility.

This approach has a few benefits over editing the assembly directly to repair the control flow:

Execution of program is guaranteed to be the same
No space constraints
- difference in length of opcodes for certain jumps

Recall that we now have the target state at the end of each basic block, which we can use to perform a lookup for the basic block mapped to that state. We can then edit the IL using Workflows to jump there directly instead of going back to the dispatcher.

1
2
3
4
5
6
7
def get_idx_from_state(state):
    for k,v in obbs.items(): # loop over all OBBs to find target state
        if v == state:
            if bb_mappings[k].il_function[bb_mappings[k].start].medium_level_il is None:
                # take block after
                return bb_mappings[k].il_function[bb_mappings[k].post_dominators[0].start].medium_level_il.non_ssa_form.instr_index
            return bb_mappings[k].il_function[bb_mappings[k].start].medium_level_il.non_ssa_form.instr_index

For the CFG repair, MLIL would be the target IL level to work with.
Given that the analysis was performed on HLIL, the corresponding MLIL Basic Block has to be resolved in order to find the instruction index of the start of the basic block.

With the instruction index at hand, IL instructions can be crafted:

1
2
3
target_insn = function.mlil[int(patch_idx)]
new_mlil_insn = function.mlil.expr(MediumLevelILOperation.MLIL_GOTO, target_idx) # craft MLIL_GOTO instruction
function.mlil.replace_expr(target_insn, new_mlil_insn) # replace jump to dispatcher with jump to target OBB

Lastly, all other instructions that does not belong to any OBB is removed to prevent any interference when further analysis is performed by Binary Ninja. This is done by replacing the instructions with a NOP instruction.

1
2
3
4
5
6
7
for block in h_func.basic_blocks:
    if block.highlight.color == COLOR_CF:
        if block.il_function[block.start].medium_level_il is None:
            continue
        mlil_bb_start = block.il_function[block.start].medium_level_il.non_ssa_form.instr_index
        mlil_bb = h_func.medium_level_il.get_basic_block_at(mlil_bb_start)
        add_junk(fn_start, mlil_bb.start, mlil_bb.length)

The Results

Before:

After:

The CFF analysis script can be found here.
The Workflow patching script can be found here.