Skip to content

Commit

Permalink
Merge pull request #14 from FuzzingLabs/dev/antonin
Browse files Browse the repository at this point in the history
Improve decompiler output & Add LLM documentation
  • Loading branch information
Rog3rSm1th authored Jun 5, 2024
2 parents 0f5d153 + 0d097d2 commit cbcd7d1
Show file tree
Hide file tree
Showing 8 changed files with 310 additions and 32 deletions.
15 changes: 14 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,15 @@
### Sierra Analyzer

Sierra analyzer is a security toolkit for analyzing Sierra files.
Sierra-Analyzer is a security toolkit for analyzing Sierra files.

1) [Project structure](#project-structure)
2) [Decompile a Sierra file](#decompile-a-sierra-file)
3) [Analyze a remote contract](#analyze-a-remote-contract)
4) [Print the contract's Control-Flow Graph](#print-the-contracts-control-flow-graph)
5) [Print the contract's Callgraph](#print-the-contracts-callgraph)
6) [Run the detectors](#print-the-contracts-callgraph)
7) [Use it as a library](#print-the-contracts-callgraph)
8) [Improve the decompiler output using LLMs](#print-the-contracts-callgraph)

#### Project structure

Expand Down Expand Up @@ -93,6 +102,10 @@ It is also possible to use the `sierra-analyzer-lib` library to decompile serial

Examples can be found [here](/lib/examples/).

#### Improve the decompiler output using LLMs

[Here](/doc/llm-decompilation.md) is a tutorial on how to improve the decompiler output using LLMs.

#### Features

- [x] Decompiler
Expand Down
Binary file modified doc/images/decompiler-output.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
193 changes: 193 additions & 0 deletions doc/llm-decompilation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,193 @@
## Improve the Sierra decompiler output using LLM

Sierra decompiler output can be complex to read and can be enhanced using LLMs. Here is a prompt tested with the [Codestral-22B](https://mistral.ai/news/codestral/) model which improves decompiler output and convert it to Cairo code.

### The prompt :

```
I want to improve the code output by my decompiler, which decompiles an intermediate representation into Cairo code. Cairo is a programming language that uses a syntax similar to Rust.
Cairo’s native data type is a field element (felt). Specifically, the felt252 data type is used to represent a 252-bit integer.
Here is an example of a Cairo Program :
// Calculates fib...
fn fib(a: felt252, b: felt252, n: felt252) -> (felt252, felt252) {
match n {
0 => (a, 0),
_ => {
let (v, count) = fib(b, a + b, n - 1);
(v, count + 1)
},
}
}
And here the corresponding decompiled code :
// Function 1
func examples::fib_counter::fib (v0: felt252, v1: felt252, v2: felt252) -> (Tuple<felt252, felt252>) {
v3 = v2
if (v3 == 0) {
v7 = v1
v8 = v0 + v7
v9 = 1
v10 = v2 - v9
v11 = user@examples::fib_counter::fib(v1, v8, v10)
v12, v13 = struct_deconstruct<Tuple<felt252, felt252>>(v11)
v14 = 1
v15 = v13 + v14
v16 = struct_construct<Tuple<felt252, felt252>>(v12, v15)
return (v16)
} else {
v5 = 0
v6 = struct_construct<Tuple<felt252, felt252>>(v0, v5)
return (v6)
}
}
Here is another example of Cairo code :
#[derive(Drop)]
struct A {
val: u256,
arr: Array<u256>,
}
fn complex_input(
felt_input: felt252,
mut felt_arr_input: Array<felt252>,
mut a_input: A,
mut a_arr_input: Array<A>,
) -> u256 {
let mut r: u256 = felt_input.into();
while let Option::Some(x) = felt_arr_input.pop_front() {
r += x.into();
};
r += a_input.val;
while let Option::Some(x) = a_input.arr.pop_front() {
r += x;
};
while let Option::Some(mut a) = a_arr_input.pop_front() {
r += a.val;
while let Option::Some(x) = a.arr.pop_front() {
r += x;
};
};
r
}
Simplify the following code by removing redundant variables and unnecessary code. No explanation. Help me add short and concise comments for the code snippet in the following code where it is needed. No explanation. Rename the variables in the following code with names corresponding to what they do. No explanation. Rename and simplify the functions in the following code with names corresponding to what they do. No explanation. Optimize the following code to improve readability and simplicity. No explanation.
Convert constants with a corresponding string comments such as : 7891998437966260601762371672023996916393715052535837300 // "Returned data too short"
into their Cairo equivalent : String::from("Returned data too short");.
or :
7733229381460288120802334208475838166080759535023995805565484692595 // "Input too long for arguments" to String::from("Input too long for arguments");
Answer with the improved code. Don't explain anything. Don't create new functions.
Here is the decompiled code to improve :
<Decompiled code to improve>
```

### Results :

For example when we decompile `minimal_contract__minimal_contract.sierra` using `sierra-analyzer`

```
cargo run -- -f examples/sierra/minimal_contract__minimal_contract.sierra
```

We get this result which can be a little hard to understand and analyze :

```rs
// Function 1
func cairo_level_tests::contracts::minimal_contract::minimal_contract::__wrapper__empty (v0: RangeCheck, v1: GasBuiltin, v2: System, v3: core::array::Span::<core::felt252>) -> (RangeCheck, GasBuiltin, System, core::panics::PanicResult::<(core::array::Span::<core::felt252>)>) {
if (withdraw_gas(v0, v1) == 0) {
v35 = Array<felt252>::new()
v36 = 375233589013918064796019 // "Out of gas"
v37 = v35.append(v36)
v38 = struct_construct<core::panics::Panic>()
v39 = struct_construct<Tuple<core::panics::Panic, Array<felt252>>>(v38, v37)
v40 = enum_init<core::panics::PanicResult::<(core::array::Span::<core::felt252>)>, 1>(v39)
return (v6, v7, v2, v40)
} else {
v8 = struct_deconstruct<core::array::Span::<core::felt252>>(v3)
if (array_snapshot_pop_front<felt252>(v8) == 0) {
if (withdraw_gas_all(v4, v5, v18) == 0) {
v29 = Array<felt252>::new()
v30 = 375233589013918064796019 // "Out of gas"
v31 = v29.append(v30)
v32 = struct_construct<core::panics::Panic>()
v33 = struct_construct<Tuple<core::panics::Panic, Array<felt252>>>(v32, v31)
v34 = enum_init<core::panics::PanicResult::<(core::array::Span::<core::felt252>)>, 1>(v33)
return (v21, v22, v2, v34)
} else {
v23 = Array<felt252>::new()
v24, v25 = snapshot_take<Array<felt252>>(v23)
v26 = struct_construct<core::array::Span::<core::felt252>>(v25)
v27 = struct_construct<Tuple<core::array::Span::<core::felt252>>>(v26)
v28 = enum_init<core::panics::PanicResult::<(core::array::Span::<core::felt252>)>, 0>(v27)
return (v19, v20, v2, v28)
}
} else {
v12 = Array<felt252>::new()
v13 = 7733229381460288120802334208475838166080759535023995805565484692595 // "Input too long for arguments"
v14 = v12.append(v13)
v15 = struct_construct<core::panics::Panic>()
v16 = struct_construct<Tuple<core::panics::Panic, Array<felt252>>>(v15, v14)
v17 = enum_init<core::panics::PanicResult::<(core::array::Span::<core::felt252>)>, 1>(v16)
return (v4, v5, v2, v17)
}
}
}
```

Using the prompt with Codestral-22B we obtain :

```rs
// Function: Minimal Contract Wrapper
func cairo_level_tests::contracts::minimal_contract::minimal_contract::__wrapper__empty (gas_check: RangeCheck, gas_builtin: GasBuiltin, sys: System, input_span: core::array::Span::<core::felt252>) -> (RangeCheck, GasBuiltin, System, core::panics::PanicResult::<(core::array::Span::<core::felt252>)>) {
// Check if there is enough gas
if withdraw_gas(gas_check, gas_builtin) == 0 {
// If not, return an out of gas error
return handle_error(gas_check, gas_builtin, sys, String::from("Out of gas"));
} else {
// Deconstruct the input span
let (_, input_array) = struct_deconstruct<core::array::Span::<core::felt252>>(input_span);
// Check if the input is too long
if array_snapshot_pop_front<felt252>(input_array) == 0 {
// If it is, return an input too long error
return handle_error(gas_check, gas_builtin, sys, String::from("Input too long for arguments"));
} else {
// If there is enough gas and the input is not too long, proceed
if withdraw_gas_all(gas_check, gas_builtin, sys) == 0 {
// If there is not enough gas, return an out of gas error
return handle_error(gas_check, gas_builtin, sys, String::from("Out of gas"));
} else {
// If there is enough gas, return the input span
return return_success(gas_check, gas_builtin, sys, input_span);
}
}
}
}

// Function: Handle Error
func handle_error(gas_check: RangeCheck, gas_builtin: GasBuiltin, sys: System, error_message: String) -> (RangeCheck, GasBuiltin, System, core::panics::PanicResult::<(core::array::Span::<core::felt252>)>) {
let error_array = Array<felt252>::new().append(error_message);
let panic_struct = struct_construct<core::panics::Panic>();
let panic_result = struct_construct<Tuple<core::panics::Panic, Array<felt252>>>(panic_struct, error_array);
return (gas_check, gas_builtin, sys, enum_init<core::panics::PanicResult::<(core::array::Span::<core::felt252>)>, 1>(panic_result));
}

// Function: Return Success
func return_success(gas_check: RangeCheck, gas_builtin: GasBuiltin, sys: System, input_span: core::array::Span::<core::felt252>) -> (RangeCheck, GasBuiltin, System, core::panics::PanicResult::<(core::array::Span::<core::felt252>)>) {
let empty_array = Array<felt252>::new();
let (_, snapshot) = snapshot_take<Array<felt252>>(empty_array);
let span_struct = struct_construct<core::array::Span::<core::felt252>>(snapshot);
let success_result = struct_construct<Tuple<core::array::Span::<core::felt252>>>(span_struct);
return (gas_check, gas_builtin, sys, enum_init<core::panics::PanicResult::<(core::array::Span::<core::felt252>)>, 0>(success_result));
}
```
46 changes: 34 additions & 12 deletions lib/src/decompiler/decompiler.rs
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,10 @@ use crate::decompiler::cfg::BasicBlock;
use crate::decompiler::cfg::EdgeType;
use crate::decompiler::function::Function;
use crate::decompiler::function::SierraStatement;
use crate::decompiler::libfuncs_patterns::{IS_ZERO_REGEX, USER_DEFINED_FUNCTION_REGEX};
use crate::decompiler::libfuncs_patterns::{
IRRELEVANT_CALLGRAPH_FUNCTIONS_REGEXES, IS_ZERO_REGEX, USER_DEFINED_FUNCTION_REGEX,
};
use crate::decompiler::utils::decrypt_user_defined_type_id;
use crate::decompiler::utils::replace_types_id;
use crate::parse_element_name;
use crate::parse_element_name_with_fallback;
Expand Down Expand Up @@ -116,9 +119,11 @@ impl<'a> Decompiler<'a> {
if let Some(name) = &t.debug_name {
format!("ut@{}", name)
}
// use ID
// use id
// Convert it to string if possible
// User defined typed ids are the 250 first bits of the id name Keccak hash
else {
format!("ut@[{}]", t.id)
decrypt_user_defined_type_id(t.id.clone())
}
}
// Builtin type
Expand Down Expand Up @@ -411,8 +416,8 @@ impl<'a> Decompiler<'a> {
};

// Define bold braces for function body enclosure
let bold_brace_open = "{".blue().bold();
let bold_brace_close = "}".blue().bold();
let bold_brace_open = "{".bold();
let bold_brace_close = "}".bold();

// Combine prototype and body into a formatted string
let purple_comment = format!("// Function {}", index + 1).purple();
Expand All @@ -432,8 +437,8 @@ impl<'a> Decompiler<'a> {
let mut basic_blocks_str = String::new();

// Define bold braces once for use in formatting
let bold_brace_open = "{".blue().bold();
let bold_brace_close = "}".blue().bold();
let bold_brace_open = "{".bold();
let bold_brace_close = "}".bold();

// Add the root basic block
basic_blocks_str += &self.basic_block_to_string(block);
Expand Down Expand Up @@ -476,10 +481,12 @@ impl<'a> Decompiler<'a> {
// End of if block
self.indentation -= 1;

let magenta_else = "else".magenta();
basic_blocks_str += &format!(
"{}{} else {}{}\n",
"{}{} {} {}{}\n",
"\t".repeat(self.indentation as usize),
bold_brace_close,
magenta_else,
bold_brace_open,
"\t".repeat(self.indentation as usize)
);
Expand Down Expand Up @@ -575,23 +582,26 @@ impl<'a> Decompiler<'a> {
function_arguments: String,
indentation: usize,
) -> String {
let bold_brace_open = "{".blue().bold();
let magenta_if = "if".magenta();
let bold_brace_open = "{".bold();
let indentation_str = "\t".repeat(indentation);

// Check if the function name matches the IS_ZERO_REGEX
if IS_ZERO_REGEX.is_match(function_name) && !self.verbose {
let argument = function_arguments.trim();
return format!(
"{}if ({argument} == 0) {}{}\n",
"{}{} ({argument} == 0) {}{}\n",
indentation_str,
magenta_if,
bold_brace_open,
"\t".repeat(indentation + 1)
);
}

format!(
"{}if ({}({}) == 0) {}{}\n",
"{}{} ({}({}) == 0) {}{}\n",
indentation_str,
magenta_if,
// Recover the type from type_id if it's a remote contract
replace_types_id(&self.declared_types_names, function_name).blue(),
function_arguments,
Expand Down Expand Up @@ -661,6 +671,7 @@ impl<'a> Decompiler<'a> {
GenStatement::Invocation(statement) => {
let called_function = parse_element_name!(&statement.libfunc_id);

// Add user-defined function to the callgraph
// Check if the called function matches the user-defined function regex
if let Some(captures) =
USER_DEFINED_FUNCTION_REGEX.captures(&called_function)
Expand All @@ -681,8 +692,19 @@ impl<'a> Decompiler<'a> {
function_name, called_function_name
));
}
} else {
}
// Add libfuncs to the callgraph
else {
let called_function_name = format!("{}\t\t", called_function.as_str());

// Skip irrelevant functions
if IRRELEVANT_CALLGRAPH_FUNCTIONS_REGEXES
.iter()
.any(|regex| regex.is_match(&called_function_name))
{
continue;
}

// Create the node in the DOT format and append it to the dot string
dot.push_str(&format!(
" \"{}\" [shape=\"rectangle\", fillcolor=\"{}\", style=\"filled\"];\n",
Expand Down
Loading

0 comments on commit cbcd7d1

Please sign in to comment.