Secure Software Engineering
Topic #20: Symbolic Execution
Static analysis
Can analyze all possible runs of a program
An explosion of interesting ideas and tools
Commercial companies sell, use static analysis
Great potential to improve software quality
But: Can it find deep, difficult bugs?
Yes, but not often
Commercial viability implies you must deal with developer
confusion, false positives, error management,..
This means that companies specifically aim to keep the
false positive rate down
- They often do this by purposely missing bugs, to keep
the analysis simpler
One issue: Abstraction
Abstraction lets us model all possible runs
But abstraction introduces conservatism
*-sensitivities add precision, to deal with this
* = flow-, context-, path-, etc.
But more precise abstractions are more expensive
- Challenges scalability
- Still have false alarms or missed bugs
Static analysis abstraction ≠ developer abstraction
Because the developer didn’t have them in mind
Symbolic Execution
Testing works: reported bugs are real bugs
But, each test only explores one possible execution
- assert(f(3) == 5)
We hope test cases generalize, but no guarantees
Symbolic execution generalizes testing
“Allows unknown symbolic variables α in evaluation
- y = α; assert(f(y) == 2*y-1);
If execution path depends on unknown, conceptually
fork symbolic executor
- int f(int x) { if (x > 0) then return 2*x - 1; else return 10; }
Symbolic Execution Example
1. int a = α, b = β, c = γ;
2. // symbolic
3. int x = 0, y = 0, z = 0;
4. if (a) {
5. x = -2;
6. }
7. if (b < 5) {
8. if (!a && c) { y = 1; }
9. z = 2;
10. }
11. assert(x+y+z != 3)
x=0, y=0, z=0
α
x=-2
z=2
?
β<5 ?α∧γ
y=1
β<5
z=2
z=2
t f
t f t f
t f
α∧(β<5)
path condition
α∧(β≥5) ?α∧(β≥5)
α∧(β<5)∧?γ
α∧(β<5)∧γ
Insight
Each symbolic execution path stands for many
actual program runs
In fact, exactly the set of runs whose concrete values
satisfy the path condition
Thus, we can cover a lot more of the program’s
execution space than testing
Viewed as a static analysis, symbolic execution
is
Complete, but not sound (usually doesn’t terminate)
Path, flow, and context sensitive
Symbolic Analysis
Static analysis considers all paths are feasible
Dynamic analysis considers one path or a number
of paths
Symbolic analysis reasons about path feasibility
Much more precise
Scalability is an issue
A lot of security applications
Exploit generation
Vulnerability detection
Expose hidden behavior
Verification
Equivalence checking for analyzing obfuscated code
Basic Idea
Explore individual paths in the program; models the
conditions and the symbolic values along a path to a
symbolic constraint; a path is feasible if the
corresponding constraint is satisfiable (SAT)
Similar to the per-path static analysis, a worklist is used
to maintain the paths being explored
Upon a function invocation, the current worklist is pushed
to a stack and a new worklist is initialized for path
exploration within the callee
Upon a return, the symbolic value of the return variable is
passed back to the caller
The idea is an old one
Robert S. Boyer, Bernard Elspas, and Karl N. Levitt.
SELECT–a formal system for testing and debugging
programs by symbolic execution. In ICRS, pages 234–
245, 1975.
James C. King. Symbolic execution and program testing.
CACM, 19(7):385–394, 1976. (most cited)
Leon J. Osterweil and Lloyd D. Fosdick. Program testing
techniques using simulated execution. In ANSS, pages
171–177, 1976.
William E. Howden. Symbolic testing and the DISSECT
symbolic evaluation system. IEEE Transactions on
Software Engineering, 3(4):266–278, 1977.
Why didn’t it take off?
Symbolic execution can be compute-intensive
Lots of possible program paths
Need to query solver a lot to decide which paths are
feasible, which assertions could be false
Program state has many bits
Computers were slow (not much processing
power) and small (not much memory)
Recent Apple iPads are as fast as Cray-2’s from the
80’s
Today
Computers are much faster, bigger
Better algorithms too: powerful SMT/SAT solvers
SMT = Satisfiability Modulo Theories = SAT++
Can solve very large instances, very quickly
Lets us check assertions, prune infeasible paths
2005-2006 reinterest in symbolic execution
Area of success: (security) bug finding
Heuristic search in space of possible executions
Find interesting bugs
Basic symbolic execution
12
Symbolic variables
Extend the language’s support for expressions e to
include symbolic variables, representing unknowns
Symbolic variables are introduced when reading input
Using mmap, read, write, fgets, etc.
So if a bug is found, we can recover an input that
reproduces the bug when the program is run normally
e ::= n | X | e0 + e1 | e0 ≤ e1 | e0 && e1 | …
n ∈ N = integers, X ∈ Var = variables, α ∈ SymVarα |
Symbolic Expressions
We make (or modify) a language interpreter to be
able to compute symbolically
Normally, a program’s variables contain values
Now they can also contain symbolic expressions
- Which are expressions containing symbolic variables
Example normal values:
5, “hello”
Example symbolic expressions: