Fixing C

C/C++ tools for code safety #

Dynamic analysis #

Valgrind #

On-the fly instrumentation of binaries
Works with all binaries compiled by all compilers, even without source code/debug symbols
Drawback: Not much information in binaries; no information about stack allocation (so no detection of stack buffer overflows)

LLVM sanitizers #

Instrumentation of source code
Provided by LLVM compiler suite (i.e., Clang)
More information because source code is instrumented rather than binaries

Examples:

AddressSanitizer - finds improper memory addresses
LeakSanitizer - finds memory leaks
MemorySanitizer - finds uses of uninitialized memory
UndefinedBehaviorSanitizer - finds usage of null pointers, integer/float overflow, etc
ThreadSanitizer - finds improper uses of threads

Weaknesses of dynamic analysis #

Dynamic analysis can only report bad behavior that actually happened
Requires inputs which would cause bad behavior to be input to catch such behavior

Fuzzing #

Blind fuzzing: throw many random inputs at a program
Coverage-guided fuzzing:

Take normal input, run it through the program, observe control flow
Semi-randomly mutate the normal input
Run program again and observe control flow
Keep mutated inputs that changed control flow
Return to step 2 with these kept inputs, infinitely

Common fuzzers: AFL and libfuzzer

Still cannot guarantee that a program is bug-free (if the fuzzer didn’t find anything in some amount of time, maybe we didn’t run it long enough)

Static analysis #

Linting #

Basic static analysis: simple techniques to find obvious mistakes
Person running linter can configure rules to enforce
ex. clang-tidy - can auto-fix some issues!

Dataflow analysis #

Walks through every branch of the abstract syntax tree looking for issues

Limitations:

False positives: dataflow analysis follows every branch, even if it’s impossible for some condition to be true in real life
- If there are a lot of false positives (low signal-to-noise ratio), it’s difficult to actually figure out which issues pointed out are real problems
Many static analyzers only analyze a single file at a time, so some bugs won’t be found if they are split across files

Limitations of static analysis #

Hard to tell whether code is safe without broader context if we can only look at a few lines of code
Impossible to generally get broader context due to halting problem

How to verify small snippets of code in isolation without broader context?

This can be done adding a little bit of information to the code
This is what Rust does!