Impeding Malware Analysis Using Conditional Code Obfuscation, Hacking and IT E-Book Dump Release, pobierz pdf

[ Pobierz całość w formacie PDF ]
Impeding Malware Analysis Using Conditional Code Obfuscation
Monirul Sharif
1
Andrea Lanzi
2
Jonathon Gifﬁn
1
Wenke Lee
1
1
School of Computer Science, College of Computing, Georgia Institute of Technology, USA
{
msharif, giffin, wenke
}
@cc.gatech.edu
2
Dipartimento di Informatica e Comunicazione, Universita degli Studi di Milano, Italy
andrew@security.dico.unimi.it
Abstract
static analysis obfuscations, but they only observe a single
execution path. Malware can exploit this limitation by em-
ploying trigger-based behaviors such as time-bombs, logic-
bombs, bot-command inputs, and testing the presence of an-
alyzers, to hide its intended behavior.
Recent analyzers provide a powerful way to discover
trigger based malicious behavior in arbitrary malicious pro-
grams. Moser et al. proposed a scheme [28] that explores
multiple paths during execution of a malware. After ex-
ploring a branch, their technique resumes execution from
a previously saved state and takes the alternate branch by
inverting the condition and solving constraints to modify
related memory variables in a consistent manner. Other re-
cently proposed approaches can make informed path selec-
tion [4], discover inputs that take a speciﬁc path [5, 7, 13]
or force execution along different paths [42]. We call all
such approaches
input-oblivious analyzers
because they do
not utilize any source of information about inputs other than
the program being analyzed.
Our goal is to anticipate attacks against the state-of-the-
art malware analyzers in order to develop more effective
analysis techniques. We present a simple, automated and
transparent obfuscation against powerful input oblivious an-
alyzers. We show that
it is possible to automatically conceal
trigger-based malicious behavior of existing malware from
any static or dynamic input-oblivious analyzer by an au-
tomatically applicable obfuscation scheme based on static
analysis.
Our scheme, which we call
conditional code obfusca-
tion
, relies on the principles of secure triggers [18]. First,
we identify and transform speciﬁc branch conditions that
rely on inputs by incorporating one-way hash functions in
such a way that it is hard to identify the values of variables
for which the conditions are satisﬁed. Second, the
condi-
tional code
, which is the code executed when these con-
ditions are satisﬁed, is identiﬁed and encrypted with a key
that is derived from the value that satisﬁes the condition.
As a result, input oblivious analyzers can no longer feasi-
Malware programs that incorporate trigger-based be-
havior initiate malicious activities based on conditions sat-
isﬁed only by speciﬁc inputs. State-of-the-art malware an-
alyzers discover code guarded by triggers via multiple path
exploration, symbolic execution, or forced conditional exe-
cution, all without knowing the trigger inputs. We present
a malware obfuscation technique that automatically con-
ceals speciﬁc trigger-based behavior from these malware
analyzers. Our technique automatically transforms a pro-
gram by encrypting code that is conditionally dependent on
an input value with a key derived from the input and then
removing the key from the program. We have implemented
a compiler-level tool that takes a malware source program
and automatically generates an obfuscated binary. Exper-
iments on various existing malware samples show that our
tool can hide a signiﬁcant portion of trigger based code. We
provide insight into the strengths, weaknesses, and possible
ways to strengthen current analysis approaches in order to
defeat this malware obfuscation technique.
1
Introduction
With hundreds of new malware samples appearing ev-
ery day [3], malware analysis, which attempts to under-
stand and extract the capabilities or behavior from malware
samples, is becoming increasingly important. As malware
analysis techniques evolve, malware writers continually
employ sophisticated anti-reverse engineering techniques in
an effort to defeat and evade the state-of-the-art analyzers.
Historically, encryption [44], polymorphism [31, 39], and
other obfuscation schemes [11,12,25,41,43] have been pri-
marily employed to thwart anti-virus tools and static analy-
sis [9,10,22,23] based approaches. Dynamic analysis based
approaches [2, 6, 13, 29, 40] inherently overcome all anti-
bly determine the values that satisfy the condition and con-
sequently the key to unveil the conditional code. Our ap-
proach utilizes several static analysis techniques, including
control dependence analysis, and incorporates both source-
code and binary analysis to automate the entire process of
transforming malware source programs to their obfuscated
binary forms.
In order to show that conditional code obfuscation is
a realistic threat, we have developed a compiler-level tool
that applies the obfuscation to malware programs written
in C/C++. Our prototype implementation generates obfus-
cated compiled ELF binaries for Linux. Since the mal-
ware authors will be the ones applying this technique, the
assumption of having the source code available is realis-
tic. We have tested our system by applying it on several
real malware programs and then evaluated its effectiveness
in concealing trigger based malicious code. In our experi-
ments on 7 different malware programs containing 92 ma-
licious triggers, our tool successfully obfuscated and con-
cealed the entire code that implemented 87 of them.
We analyze the strengths and weaknesses of our obfus-
cation. Although the keys are effectively removed from the
program, the encryption is still susceptible to brute force
and dictionary attacks. We provide a method to measure the
strength of particular applications of the obfuscation against
such attacks. To understand the possible threats our pro-
posed obfuscation scheme poses, we discuss how malware
authors may manually modify their code in different ways
to take advantage of the obfuscation technique. Finally, we
provide insight into possible ways of defeating the obfus-
cation scheme, including more informed key search attacks
and the incorporation of input-domain information in exist-
ing analyzers.
We summarize our contributions below:
Unlike polymorphic code, our approach does not store
encryption keys inside the program. However, this does not
limit the usage of polymorphism on our obfuscated binaries.
It can be added as a separate layer on top of our obfuscation.
The goal of our obfuscation is to hide malicious behavior
from malware analyzers that extract behavior. For these an-
alyzers, the usual assumption is that the program being ana-
lyzed is already suspicious. Nevertheless, malware authors
wish to develop code that is not easily detected. Naıve usage
of this obfuscation may actually improve malware detection
because of the particular way in which hash functions and
decryption routines are used. However, since attackers can
add existing polymorphic or metamorphic obfuscation tech-
niques on top of our technique, a detector should be able to
detect such malware at best with the same efﬁcacy as poly-
morphic malware detection.
2 Conditional Code Obfuscation
Malware programs routinely employ various kinds of
trigger based events. The most common examples are
bots [14], which wait for commands from the botmaster
via a command and control mechanism. Some keylog-
gers [37] log keys from application windows containing
certain keywords. Timebombs [27] are malicious code ex-
ecuted at a speciﬁc time. Various anti-debugging [8] or
anti-analysis tricks detect side-effects in the executing envi-
ronment caused by analyzers and divert program execution
when present. The problem for the malware writer is that
the checks inside the program that are performed on the in-
puts give away information about what values are expected.
For example, the commands a bot supports are usually con-
tained inside the program as strings. More generally, for any
trigger based behavior, the conditions recognizing the trig-
ger reveal information about the inputs required to activate
the behavior.
The basis of our obfuscation scheme is intuitive. By re-
placing input-checking conditions with equivalent ones that
recognize the inputs without revealing information about
them, the inputs can become secrets that the input-oblivious
analyzer can no longer discover. Such secrets can then be
used as keys to encrypt code. Since the modiﬁed conditions
are satisﬁed only when the inputs are sent to the program,
the code blocks that are conditionally executed can be en-
crypted. In other words, our scheme encrypts conditional
code with a key that is removed from the program, but is
evident when the modiﬁed condition is satisﬁed. Automati-
cally carrying out this transformation as a general obfusca-
tion scheme involves several subtle challenges.
We provide a high-level overview of our obfuscation
with program examples in Section 2.1. The general mecha-
nism is deﬁned in Section 2.2. The program analysis algo-
rithms and transformations required are described in Sec-
•
We present the principles of an automated obfusca-
tion scheme that can conceal condition-dependent ma-
licious behavior from existing and future input oblivi-
ous malware analyzers.
•
We have developed a prototype compiler-level obfus-
cator for Linux and performed experimental evalua-
tion on several existing real-world malware programs,
showing that a large fraction of trigger based malicious
behavior can be successfully hidden.
•
We provide insight into the strengths and weaknesses
of our obfuscation. We discuss how an attacker
equipped with this knowledge can modify programs
to increase the strength of the scheme. We also dis-
cuss the possibility of brute-force attacks as a weak-
ness of our obfuscation and provide insight into how
to develop more capable malware analyzers that incor-
porate input domain knowledge.
Figure 1. Two conditional code snippets.
tion 2.3. Section 2.4 describes the consequences of our
scheme on existing malware analysis approaches. Sec-
tion 2.5 discusses possible brute-force attacks on our ob-
fuscation technique.
ﬁed. Third, the condition must contain an operand that has
a statically determinable constant value.
Given these requirements above, operators that check
equality of two data values are suitable candidates. Hence,
conditions having ‘==’,
strcmp
,
strncmp
,
memcmp
, and
similar operators can be obfuscated with our mechanism.
2.1 Overview
2.2 General Mechanism
Figure 1 shows snippets of two programs that have con-
ditionally executed code. The ﬁrst program snippet calls
a function that starts logging keystrokes after receiving the
command “startkeylogger”. The second example starts an
attack only if the day of month is between the 11th and the
19th. In both the programs, the expected input can be easily
discovered by analyzing the code.
We use cryptographic hash functions to hide informa-
tion. For the ﬁrst example, we can modify the condition
to compare the computed the hard-coded hash of the string
in
cmd
with the hash value of the string “startkeylogger”
(Figure 2). The command string “startkeylogger” becomes
a secret that an input oblivious analyzer cannot know. This
secret can be used as the key to encrypt the conditional code
block and the entire function
log keys()
. Notice that
when the expected command is received, the execution en-
ters the
if
block and the encrypted block is correctly de-
crypted and executed.
We now formally deﬁne the general method of our condi-
tional code obfuscation scheme. Without loss of generality,
we assume that any candidate condition is equivalent to the
simple condition “
X == c
” where the operand
c
has a stati-
cally determinable constant value and
X
is a variable. Also,
suppose that a code block
B
is executed when this condition
is satisﬁed. Figure 3 shows the program transformation re-
quired for the obfuscation. The cryptographic hash function
is denoted by
Hash
and the symmetric encryption and de-
cryption routines are
Encr
and
Decr
, respectively.
Figure 3. General obfuscation mechanism.
The obfuscated condition is “
Hash(X) == H
c
” where
H
c
= Hash(c)
. The
pre-image resistance
property of the
function
Hash
implies that it is infeasible to ﬁnd
c
given
H
c
. This ensures that it is hard to reverse the hash function.
In addition, because of the
second pre-image resistance
property, it is hard to ﬁnd another
c
for which
Hash(c
) =
H
c
. Although this property does not strengthen the obfusca-
tion, it is required to make the obfuscated condition seman-
tically equivalent to the original, ensuring the correctness of
the program.
The block
B
is encrypted with
c
as the key. Let
B
E
be the encrypted block where
B
E
= Encr(B, c)
. Code
is inserted immediately before
B
E
to decrypt it with the
key contained in variable
X
. Since
Hash(X) = H
c
im-
plies
X = c
, when the obfuscated condition is satisﬁed, the
Figure 2. Obfuscated example snippet.
In the second example of Figure 1, cryptographic hashes
of the operands do not provide a condition equivalent to the
original. Moreover, since several values of the variable
n
satisfy the condition, it is problematic to use them as keys
for encryption and decryption.
We deﬁne
candidate
conditions as those suitable for our
obfuscation. A candidate needs three properties. First, the
ordering relation between the pre-images of the hash func-
tion must be maintained in the images. Second, there should
be a unique key derived from the condition when it is satis-
original code block is found, i.e.
B = Decr(B
E
, c)
and
the program execution is equivalent to the original. How-
ever, a malware analyzer can recover the conditional code
B
only by watching for the attacker to trigger this behavior,
by guessing the correct input, or by cracking the crypto-
graphic operations.
Figure 4. Duplicating conditional code.
2.3 Automation using Static Analysis
In order to apply the general mechanism presented in the
previous section automatically on a program, we utilized
several known algorithms in static analysis. In this section,
we describe how these program analysis techniques were
used.
control dependent on it. For each candidate condition, we
ﬁnd the set of blocks that are control dependent on its true
outcome. Therefore, if a true outcome of a candidate con-
dition
C C
i
of function
F
i
has an edge to a block
B V
i
,
then
B CCode(C)
.
Conditional code blocks may call other functions. To
take them into account, we determine reachability in the
inter-procedural CFG. If there exists a call to a function
F
from a conditional code block
B CCode(C)
of some
candidate condition
C
, we consider
F CCode(C)
which
means that we consider all the code in the function
F
as
conditional code of
C
as well. Now, for every block in
F CCode(C)
, we ﬁnd calls to other functions and in-
clude them in
CCode(C)
. This step is performed repeat-
edly until all reachable functions are included. We used this
approach instead of inter-procedural control-dependence
analysis [35] because it allows us to obfuscate functions that
are not control-dependent on a candidate condition but can
be reached only from that condition.
A candidate condition may be contained in a block that
is conditional code of another candidate condition. The next
step is to eliminate these cases by making a function or
block conditional code of only the closest candidate con-
dition that can reach it. For any block
B CCode(C)
,
if
B CCode(C
)
where
C = C
and
C CCode(C
)
then we remove
B
from
CCode(C
)
. We perform the same
operation for functions.
Blocks and functions can be obfuscated when they are
conditional code of candidate conditions only. If they are
reachable by non-candidate conditions, then we cannot ob-
fuscate them. When obfuscations are to be applied, they are
applied in an iterative manner, starting with the candidate
condition that have no other candidate conditions depend-
ing on it. The basic blocks and functions that are condi-
tional code of these conditions are obfuscated ﬁrst. In the
next iteration, candidate conditions with no unobfuscated
candidate conditions depending on it are obfuscated. The
iterative process continues until all candidate conditions are
obfuscated.
We use a conservative approach to identify conditional
code when the CFG is incomplete. If code pointers are used
whose targets are not statically resolvable, we do not en-
crypt code blocks that are potential targets and any other
code blocks that are reachable from them. Otherwise, the
2.3.1 Finding Conditional Code
In order to identify conditional code suitable for obfusca-
tion, we ﬁrst identify candidate conditions in a program.
Let
F
be the set of all functions or procedures and
B
be
the set of all basic blocks in the program that we analyze.
For each function
F
i
F
in the program, we construct
a
control ﬂow graph
(CFG)
G
i
= (V
i
, E
i
)
in the program
where
V
i
B
is the set of basic blocks in
F
i
and
E
i
is the
set of edges in the CFG representing control-ﬂow between
basic blocks. We then identify basic blocks having condi-
tional branches, which have two outgoing edges. Since we
are not interested in conditions used in loops, we employ
loop analysis to identify such conditions and discard them.
From the remaining conditional branches, we select candi-
date conditions as the ones containing equality operators as
described in Section 2.1. Let
C
i
V
i
be the set of blocks
containing candidate conditions for each function
F
i
.
After candidate conditions are identiﬁed in a program,
the next step is to ﬁnd corresponding conditional code
blocks. As described earlier, conditional code is the code
that gets executed when a condition is satisﬁed. It may in-
clude some basic blocks from the same function the condi-
tion resides in and some other functions. Since a basic block
can contain at most one conditional branch instruction, by
a condition, we refer to the basic block that contains it. We
use the mapping
CCode : B (B F)
to represent con-
ditional code for any condition.
In order to determine conditional code, we ﬁrst use
control dependence analysis
[1, 16] at the intra-procedural
level. A block
Q
is control dependent on another block
P
if the outcome of
P
determines the reachability of
Q
.
More precisely, if one outcome of
P
always executes
Q
,
but the other outcome may not necessarily reach
Q
, then
Q
is control dependent on
P
. Using the standard algorithm
for identifying control dependence, we build a
control de-
pendence graph
(CDG) for each function, where each con-
dition block and their outcome has edges to blocks that are
Figure 5. Compound condition simpliﬁcation.
encryption can crash the program. Fortunately, type infor-
mation frequently allows us to limit the set of functions or
blocks that are probable targets of a code pointer.
vealed one, the revealed code gives away the behavior that
was intended to be hidden from an analyzer.
To introduce more candidate conditions in C/C++ pro-
grams, we convert
switch...case
constructs into sev-
eral
if
blocks, each containing a condition using an equal-
ity operator. Every case except the
default
becomes a can-
didate for obfuscation. Complications arise when a code
block under a switch case falls through to another. In such
cases, code of the block in which control-ﬂow falls through
can be duplicated and contained in the earlier switch case
code block, and then the standard approach that we de-
scribed can be applied.
2.3.2 Handling Common Conditional Code
A single block or a function may be conditional code of
more than one candidate condition that are not conditional
code of each other. For example, a bot program may con-
tain a common function that is called after receiving multi-
ple different commands. If a block
B
is conditional code of
two candidate conditions
P
and
Q
, where
P / CCode(Q)
and
Q / CCode(P )
then
B
can be reached via two differ-
ent candidate conditions and cannot be encrypted with the
same key. As shown in Figure 4, we solve this problem by
duplicating the code and encrypting it separately for each
candidate condition.
2.4 Consequences to Existing Analyzers
Our obfuscation can thwart different classes of tech-
niques used by existing malware analyzers. The analysis
refers to the notations presented in Section 2.2.
Path exploration and input discovery:
Various analy-
sis techniques have been proposed that can explore paths in
a program to identify trigger based behavior. Moser et al.’s
dynamic analysis based approach [28] explores multiple
paths during execution by repeatedly restoring earlier saved
program states and solving constructed path constraints in
order to ﬁnd a consistent set of values of in-memory vari-
ables that satisfy conditions leading to different paths. They
use dynamic taint analysis on inputs from system calls
to construct linear constraints representing dependencies
among memory variables. After our obfuscation is applied,
the constraint added to the system is “
Hash(X) == H
c
”,
which is a non-linear function. Therefore, our obfusca-
tion makes it hard for such a multi-path exploring approach
to feasibly ﬁnd value assignments to variables in the pro-
grams’s memory to proceed towards the obfuscated path.
A similar effect can be seen for approaches that discover
inputs from a program that executes it along a speciﬁc path.
EXE [7] uses mixed symbolic and concrete execution to
create constraints that relate inputs to variables in memory.
It uses its own efﬁcient constraint solver called STP, which
supports all arithmetic, logical, bitwise, and relational op-
erators found in C (including non-linear operations). Cryp-
tographic hash functions are designed to be computation-
ally infeasible to reverse. Even with a powerful solver like
2.3.3 Simplifying Compound Constructs
Logical operators such as
&&
or
||
combine more than one
simple condition. To apply our obfuscation, compound con-
ditions must be ﬁrst broken into semantically equivalent but
simpliﬁed conditions. However, parts of a compound condi-
tion may or may not be candidates for obfuscation, making
the compound condition unsuitable for obfuscation.
Logical
and
operators (
&&
) can be written as nested
if
statements containing the operand conditions and the con-
ditional block in the innermost block (Figure 5(a)). Since
both the simple conditions must be satisﬁed to execute con-
ditional code, the code can be obfuscated if at least one of
them is a candidate condition.
Logical
or
operators (
||
) can be obfuscated in two
ways. Since either of the conditions may execute condi-
tional code, the conditional code may be encrypted with a
single key and placed in two blocks that are individually
obfuscated with the simple conditions. Another simple way
is to duplicate the conditional code and use
if...else
if
constructs (Figure 5(b)). Note that if either one of the
two conditions is not a candidate for obfuscation, then the
conditional code will remain revealed. Concealing the other
copy does not gain protection. Although it is not possible
to determine that the concealed code is equivalent to the re-
[ Pobierz całość w formacie PDF ]

Wątki