Skip to content

Final Report

a-alexus edited this page Aug 11, 2020 · 22 revisions

6447 Report

How your fuzzer works

This fuzzer was written in golang.

Overview

  1. The input.txt is used to figure out what type of input the binary is expecting. Based on this a permutator for that specific data format is started.
  2. A permutator uses the original input to create new inputs that conform to the document format the original input follows. A permutator generates test cases. Test cases are given to the harness to run as well as to mutators for further modification.
  3. Mutators take input from the permutator and change it before passing it to the harness. If the mutator does not receive any data from the permutator, it creates a new test case from the original input and performs changes on that.
  4. The harness receives test cases from permutators and mutators. It runs the binary for each test case received, using the test case as the input.
  5. The harness identifies interesting test cases that are passed back to a mutator so the mutator can perform further changes.

Permutation

CSV Permutation strategies

  • plainCSV: No permutations are performed. The original input is passed to the harness and mutator.
  • blankCSV: a CSV containing the original number of rows and columns but no data. The CSV only contains commas.
  • spamCols: Copies the last column multiple times and adds these columns to the CSV
  • spamRows: Copies the last row multiple times and adds these rows to the CSV

XML Permutation strategies

  • plainXML: No permutations are performed. The original input is passed to the harness and permutator.
  • spamElementBreadthWise: Randomly selects two xml elements. One element is treated as the parent, the other is treated as the child. Multiple clones of the child are added to the parent. These clones do not contain any of the child's children.
  • spamElementDepthWise: Randomly selects two xml elements. One element is treated as the parent, the other is treated as the child. This child is recurrsively added to itself creating a tree. This tree is then added to the parent.

JSON Permutation stragegies

  • Randomly select elements in the parent, and adds mutated copies
  • Added copies are mutated depending on their type:
    • Strings: Repeat the string
    • Numbers: Add a large value to the number
    • Arrays: Randomly select and duplicate a value in the array
    • Objects: Randomly select and duplicate a key-value pair in the object
  • There is a chance for a deeper mutation, where nested objects and arrays are added to the parent instead of just the selected duplicate.

The fuzzer does not have a permutator for data in plaintext format. The reasoning behind this was that plaintext does not require the data format to follow a specification like XML, CSV or JSON.

Mutation

A mutator randomly selects which mutation functions to perform and the number of mutation functions to perform.

Mutation functions

  • mutateInts: Identifies integers within a test case and replaces them with 'interesting values'. This includes very large integers, zero and very small negative integers.
  • mutateHex: Identifies hex strings within a test case and replaces them with 'interesting values'. This includes values like 0x0 and 0xFFFFFFFF.
  • mutateFloats: Identifies floats within a test case and replaces them with 'interesting values'. This includes floats like Inf, Nan, -1, 0.
  • insertString: Randomly selects a place to insert an interesting string within a test case. An interesting string includes format strings and long strings containing a large number of characters.
  • mutateShuffle: Splits a test case by white space and shuffles the order of the resulting strings.
  • mutateReverse: Splits a test case by white space and reverses the order of the resulting strings.
  • flipBits: Randomly selects a number of N bits to flip. Randomly selects which N bits to flip and flips them.
  • flipBytes: Randomly selects a number of N bytes to flip. Randomly selects which N bytes to flip and flips them.

Harness + Something Awesome

The fuzzer harness is responsible for starting up the target binary, providing it input, and observing if any crashes occur. The harness also implements some of our 'Something Awesome' feature attempts through the memory snapshot/restore and code coverage features.

The harness starts up the binary with a fork+exec system calls and provides the binary with one end of a pipe to use as it's standard input. This creates an entry at /proc/pid/fd/0 during the process startup, which is written to in order to provide the binary with input.

The target binary is started with ptrace enabled, allowing us to take control early in execution. This is used to run the process until a read system call is performed on stdin, which is probably the point that user input is expected. This is achieved with the ptrace syscall tracing functionality, which allows continuing execution until the next syscall entry/exit. The syscall number and argument registers are architecture dependent, and currently x86 32/64 bit target binaries are supported.

After stopping at the first read from stdin, the state of the target process is saved: This includes saving all the writable memory regions in the program (as determined by parsing /proc/pid/maps, and reading data from /proc/pid/mem), since these are the only memory segments that can be modified during execution. Additionally, the process's current register set is saved (ptrace has functionality to do this). These are saved into a 'snapshot' of the process state.

Afterwards, the program is allowed to execute and is given input via /proc/pid/fd/0. If a crash is observed, this is reported back to a channel in the main thread, leading to the creation of the 'bad.txt' file and fuzzer termination. Otherwise, the process state is restored from the snapshot that was generated, and allowed to run again with a new input. In this way, the overhead of starting up a new process is avoided, and improves harness performance. This memory snapshot/restore feature constitutes part of our 'Something Awesome'.

Another part of the 'Something Awesome' is the use of ptrace to obtain rudimentary code coverage during execution. Ptrace is able to stop a tracee each time it enters/exits a syscall, which we track after providing the target binary with input. The syscall number of every syscall made during an execution run is recorded, and forms a 'trace' identifying that execution run. It is then compared with a unique list of similar traces, and if it hasn't been seen before it is appended to the list, and the input that caused that trace is considered 'interesting', and is given into an interesting input channel for further mutation. In this way, coverage data is used by the fuzzer in order to try hit even more code paths. The pros and cons of this approach to code coverage include:

  • Tracing syscalls is much faster than the other ptrace functionality of tracing every instruction
  • The coverage data is not as useful as tracing every instruction
  • There is still some additional overhead when tracing, however some coverage data is much more valuable than none

Our strategy was chosen because it seemed to provide a good middle ground between speed and useful coverage information.

Harness Improvements

The current way of restoring process snapshots is by writing to /proc/pid/mem. There may be potential for a speed up if we used the process_vm_writev syscall, which allows the transfer of data between address spaces. The golang syscall package doesn't have a direct wrapper for this yet, but it does have support for making arbitrary syscalls.

Additionally, restoring whole segments of memory may be inefficient considering that only parts of those segments may have been modified during execution. Restoring just individual dirty pages may lead to a further speed improvement, which could be achieved by interaction with /proc/pid/clear_refs and /proc/pid/pagemap to figure out which pages were modified between two execution points.

What kinds of bugs your fuzzer can find

  • Overflows
  • Format strings
  • recursive parsing
  • Out of memory bugs
  • fatal arithmetic errors such as divide by zero

What improvements can be made to your fuzzer

  • More intelligent ways of understanding the provided input (e.g. recognising an option and enumerating other options/combinations)
  • Alternative loop detection
  • More coverage analysis
  • adding generators to generate input from scratch
  • incorporating the above into the feedback mechanism
  • better concurrency---production/consumption management

If you attempts any bonus marks - How your fuzzer achieves these bonus marks.

The memory snapshot/restore feature in the harness avoids the overhead of starting new processes, resulting in a performance boost.

The harness also implements a code coverage/feedback mechanism, where the list of syscalls made during an execution run is compared against a unique list of runs, and is used to determine if new code paths are hit, providing this feedback back into a mutator.

For more details, see the Harness section.