Binary Ninja Shenanigans: Control Flow Unflattening
Introduction To Control Flow Flattening
Control Flow Flattening is a code obfuscation technique used to make code harder to reverse engineer. It involves transforming the control flow of a program to make it more difficult to understand and analyse.
CFF removes structured code flow and is very effective at hindering static analysis efforts, while maintaining the original behaviour of a protected binary.
CFF typically has the following characteristics:
- Has >= 1 state variable(s)
- Control flow decisions are made based on the state variable(s)
- State variable(s) are set to a new value, and the code flow is then “rerouted” to the intended block
For the scope of this article, we shall be looking at a trivial example of a CFF-obfuscated calculator binary.
|
|
Determining The State Variable
We can determine the state variable used for a CFF-obfuscated function by looking at the most commonly written to variable of the function.
|
|
The function takes in a Binary Ninja HighLevelILFunction and returns the stack variable that is most frequently written to.
Note: this article would be contrained to only 1 state variable used, although the code can be tweaked slightly to allow for multiple state vars.
Getting The Dependencies
In order to recover the original control flow, one must be able to determine what the next state is for every Original Basic Block (OBB). OBBs are code blocks that were in the original program, and not added by the obfuscator. Blocks added by the obfuscator shall be named Control Flattened (CF) blocks for this post.
By the end of any OBBs, the state variable has to be set to a new value, to allow for the program to continue its intended execution.
Within the basic blocks, the state variable can be set indirectly via arthimetic with another stack variable as such:
We would now need a way to extract all stack variable dependencies of the state variable, to perform our analysis.
|
|
The function iterate through all writes to the specified variable in the given HighLevelILFunction.
All stack variables involved in any of the write operations are treated as a dependency.
Are They Evaluatable?
We now have a way to determine all dependencies of the state variable in question.
Another question remains:
Are we now able to determine the next state value at the end of the OBBs?
To answer this question, we now need to be able to determine that each of these dependencies is derivable.
|
|
Eventually, we also need a way to determine is a stack variable is a const
.
|
|
We can determine if the stack variable is const
by ensuring that the variable is only being written to once, with a static value.
Basic Block <-> State Mapping
The next crucial step for deobfuscation would be to build a mapping of states to basic blocks.
The mapping would allow us to be recover the original control flow by stitching the original basic blocks together.
Looking at HLIL, a pattern of state comparing if statements can be seen.
It typically follows the format of if (state_var == <SOME VALUE>)
.
The state comparing statements have been labelled in red for ease of viewing.
By following the True path, we are now sure that the basic block the path leads to is associated with the value in the If
comparison.
With this logic in mind, we can iterate over all basic blocks in the function and extract all If
statements:
|
|
Once we find a If
statement, we check that the comparison source is the state variable as found previously.
If the state var is the source of the comparison, a callback is performed to extract the state value, adding it to a dictionary mapping basic blocks addresses to their respective state values:
|
|
Every basic block that is not labelled as OBBs are then classified as CF blocks:
|
|
We now have a mapping of OBBs and their states:
|
|
Repairing The Control Flow
Determining What’s Next
A typical pattern we can spot for trivial CFF obfuscation would be that at the end of each OBB, the state variable would be set to a new value. The new value serves as an anchor for the program to determine where the execution of the program should continue.
What if some sort of control flow is retained after the CFF obfuscation?
The basic block shown at the top shows that depending on rcx
, there can be 2 possible states
As such, we now need a way to determine which control flow to keep within the function, and which to discard. A simple algorithm to do this would be to check if the control flow depends on the state variable!
|
|
Actually Fixing It (Finally…)
In order to repair the control flow of the program, we can utilise a fairly new feature of Binary Ninja called Workflows!
Binary Ninja Workflows is an analysis orchestration framework which simplifies the definition and execution of a computational binary analysis pipeline.
The extensible pipeline accelerates program analysis and reverse engineering of binary blobs at various levels of abstraction.
Workflows supports hybridized execution models, where the ordering of activities in the pipeline can be well-known and procedural, or dynamic and reactive.
tl;dr: The IL can be edited at any point in the analysis. This includes adding and removing IL instructions to conform to what we want to achieve.
NOTE: As of the point of writing, Workflows is still considered an experimental feature, and has no guarantees of API stability and future compatibility.
This approach has a few benefits over editing the assembly directly to repair the control flow:
- Execution of program is guaranteed to be the same
- No space constraints
- difference in length of opcodes for certain jumps
Recall that we now have the target state at the end of each basic block, which we can use to perform a lookup for the basic block mapped to that state. We can then edit the IL using Workflows to jump there directly instead of going back to the dispatcher.
|
|
For the CFG repair, MLIL would be the target IL level to work with.
Given that the analysis was performed on HLIL, the corresponding MLIL Basic Block has to be resolved in order to find the instruction index of the start of the basic block.
With the instruction index at hand, IL instructions can be crafted:
|
|
Lastly, all other instructions that does not belong to any OBB is removed to prevent any interference when further analysis is performed by Binary Ninja.
This is done by replacing the instructions with a NOP
instruction.
|
|
The Results
Before:
After:
The CFF analysis script can be found here.
The Workflow patching script can be found here.