Skip to content

Commit 547b964

Browse files
committed
WARP format 1.0
- Update to flatbuffers `25.2.10` - Add fuzzing targets for type and function `from_bytes` - Update examples - Simplify type spec - Make constraints generic and remove specialized constraint lists - Space optimizations for type and functions specs - More tests with greater coverage - Introduce the concept of a WARP `File` and `Chunk`s - Make chunk compression configurable - Make `Type` objects class field unboxed (decreases memory pressure) - Use standard directory structure for Rust API - Move tests to `tests` directory for more easy discovery - Remove almost all uses of `unwrap` (needed for server-side parsing) - Refactor `TypeMetadata` - Add `mock` module for easy mocking in tests and examples - Make `Symbol` space optimized - Switch to using `.warp` extension to represent general analysis data instead of just signatures - Add format version to `File` and `Chunk` (allow for breaking changes later) - Make analysis data (signatures and types) copy on write (See `ChunkHandler` impl's) This work is being done to allow for networked WARP information and generally to make the WARP format more usable in a wider set of scenarios. After this commit any breaking changes to the format will be held off for 2.0, if that ever becomes a thing.
1 parent aa84f00 commit 547b964

File tree

145 files changed

+11906
-4320
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

145 files changed

+11906
-4320
lines changed

.gitignore

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -56,4 +56,11 @@ generated/
5656
**/out
5757

5858
*.vsix
59-
*.deb
59+
*.deb
60+
61+
# cargo-mutants
62+
mutants*
63+
64+
*.warp
65+
*.sbin
66+
!fixtures/*.warp

Cargo.toml

Lines changed: 9 additions & 37 deletions
Original file line numberDiff line numberDiff line change
@@ -1,28 +1,13 @@
1-
[package]
2-
name = "warp"
3-
version = "0.1.0"
4-
edition = "2021"
5-
license = "Apache-2.0"
1+
[workspace]
2+
resolver = "2"
3+
members = [
4+
"rust",
5+
"rust/fuzz",
6+
"warp_cli"
7+
]
68

7-
[lib]
8-
path = "rust/lib.rs"
9-
10-
[dependencies]
11-
flatbuffers = "24.3.25"
12-
bon = "2.3.0"
13-
uuid = { version = "1.11.0", features = ["v5"]}
14-
rand = "0.8.5"
15-
flate2 = "1.0.34"
16-
17-
[features]
18-
default = []
19-
gen_flatbuffers = ["dep:flatbuffers-build"]
20-
21-
[dev-dependencies]
22-
criterion = "0.5.1"
23-
24-
[build-dependencies]
25-
flatbuffers-build = { git = "https://github.com/emesare/flatbuffers-build", features = ["vendored"], optional = true }
9+
[workspace.dependencies]
10+
warp = { path = "rust" }
2611

2712
[profile.release]
2813
panic = "abort"
@@ -31,16 +16,3 @@ debug = "full"
3116

3217
[profile.bench]
3318
lto = true
34-
35-
[[example]]
36-
name = "simple"
37-
path = "rust/examples/simple.rs"
38-
39-
[[example]]
40-
name = "random"
41-
path = "rust/examples/random.rs"
42-
43-
[[bench]]
44-
name = "void"
45-
path = "rust/benches/void.rs"
46-
harness = false

LICENSE

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
Copyright 2020-2024 Vector 35 Inc.
1+
Copyright 2020-2025 Vector 35 Inc.
22

33
Licensed under the Apache License, Version 2.0 (the "License");
44
you may not use this file except in compliance with the License.

README.md

Lines changed: 47 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -15,16 +15,16 @@ common functions within any binary efficiently and accurately.
1515

1616
### Integration Requirements
1717

18-
To integrate with **WARP** function matching you must be able to:
18+
To integrate with **WARP** function matching, you must be able to:
1919

2020
1. Disassemble instructions
2121
2. Identify basic blocks that make up a function
2222
3. Identify register groups with implicit extend operation
23-
4. Identify relocatable instructions (see [What is considered a relocatable operand?](#what-is-considered-a-relocatable-operand))
23+
4. Identify relocatable instructions (see [What is considered a relocatable instruction?](#what-is-considered-a-relocatable-instruction))
2424

2525
### Creating a Function GUID
2626

27-
The function GUID is the UUIDv5 of the basic block GUID's (sorted highest to lowest start address) that make up the function.
27+
The function GUID is the UUIDv5 of the basic block GUIDs (sorted highest to lowest start address) that make up the function.
2828

2929
#### Example
3030

@@ -56,44 +56,66 @@ function = uuid5(function_namespace, bb1.bytes + bb2.bytes + bb3.bytes)
5656

5757
#### What is the UUIDv5 namespace?
5858

59-
The namespace for Function GUID's is `0192a179-61ac-7cef-88ed-012296e9492f`.
59+
The namespace for Function GUIDs is `0192a179-61ac-7cef-88ed-012296e9492f`.
6060

6161
### Creating a Basic Block GUID
6262

6363
The basic block GUID is the UUIDv5 of the byte sequence of the instructions (sorted in execution order) with the following properties:
6464

65-
1. Zero out all instructions containing a relocatable operand.
65+
1. Zero out all relocatable instructions.
6666
2. Exclude all NOP instructions.
6767
3. Exclude all instructions that set a register to itself if they are effectively NOPs.
6868

6969
#### When are instructions that set a register to itself removed?
7070

71-
To support hot-patching we must remove them as they can be injected by the compiler at the start of a function (see: [1] and [2]).
71+
To support hot-patching, we must remove them as they can be injected by the compiler at the start of a function (see: [1] and [2]).
7272
This does not affect the accuracy of the function GUID as they are only removed when the instruction is a NOP:
7373

7474
- Register groups with no implicit extension will be removed (see: [3] (under 3.4.1.1))
7575

7676
For the `x86_64` architecture this means `mov edi, edi` will _not_ be removed, but it _will_ be removed for the `x86` architecture.
7777

78-
#### What is considered a relocatable operand?
78+
#### What is considered a relocatable instruction?
7979

80-
An operand that is used as a pointer to a mapped region.
80+
An instruction with an operand that is used as a _constant_ pointer to a mapped region.
8181

82-
For the `x86` architecture the instruction `e8b55b0100` (or `call 0x15bba`) would be zeroed.
82+
- For the `x86` architecture the instruction `e8b55b0100` (or `call 0x15bba`) would be zeroed.
83+
84+
An instruction which is used to calculate a _constant_ pointer to a mapped region.
85+
86+
- For the `aarch64` architecture the instruction `21403c91` (or `add x1, x1, #0xf10`) would be zeroed if the incoming `x1` was a pointer into a mapped region.
8387

8488
#### What is the UUIDv5 namespace?
8589

86-
The namespace for Basic Block GUID's is `0192a178-7a5f-7936-8653-3cbaa7d6afe7`.
90+
The namespace for Basic Block GUIDs is `0192a178-7a5f-7936-8653-3cbaa7d6afe7`.
8791

88-
### Function Constraints
92+
### Constraints
8993

90-
Function constraints allow us to further disambiguate between functions with the same GUID, when creating the functions we store information about the following:
94+
Constraints allow us to further disambiguate between functions with the same GUID; when creating the functions, we retrieve extra information
95+
that is consistent between versions of the same function, some examples are:
9196

9297
- Called functions
9398
- Caller functions
9499
- Adjacent functions
95100

96-
Each entry in the lists above is referred to as a "constraint" that can be used to further reduce the number of matches for a given function GUID.
101+
Each extra piece of information is referred to as a "constraint" that can be used to further reduce the number of matches for a given function GUID.
102+
103+
#### Creating a Constraint
104+
105+
Constraints are made up of a GUID and optionally, a matching offset. Adding a matching offset is preferred to give locality to the constraints,
106+
for example, if you have a function `A` which calls into function `B` that is one constraint, but if the function `B` is also adjacent to function `A`
107+
without a matching offset the two constraints may be merged into a single one, reducing the number of matching constraints.
108+
109+
- The adjacent function `B` as a constraint: `(9F188A12-3EA1-477D-B368-361936EEA213, -30)`
110+
- The call to function `B` as a constraint: `(9F188A12-3EA1-477D-B368-361936EEA213, 48)`
111+
112+
#### Creating a Constraint GUID
113+
114+
The constraint GUID is the UUIDv5 of the relevant bytes that would be computable at creation time and lookup time.
115+
116+
##### What is the UUIDv5 namespace?
117+
118+
The namespace for Constraint GUIDs is `019701f3-e89c-7afa-9181-371a5e98a576`.
97119

98120
##### Why don't we require matching on constraints for trivial functions?
99121

@@ -111,44 +133,46 @@ The main difference between **WARP** and **FLIRT** is the approach to identifica
111133
#### Function Identification
112134

113135
- **WARP** the function identification is described [here](#function-identification).
114-
- **FLIRT** uses incomplete function byte sequence with a mask where there is a single function entry (see: [IDA FLIRT Documentation] for a full description).
136+
- **FLIRT** uses an incomplete function byte sequence with a mask where there is a single function entry (see: [IDA FLIRT Documentation] for a full description).
115137

116-
What this means in practice is **WARP** will have less false positives based solely off the initial function identification.
138+
What this means in practice is **WARP** will have fewer false positives based solely off the initial function identification.
117139
When the returned set of functions is greater than one, we can use the list of [Function Constraints](#function-constraints) to select the best possible match.
118140
However, that comes at the cost of requiring a computed GUID to be created whenever the lookup is requested and that the function GUID is _**always**_ the same.
119141

120142
### WARP vs SigKit
121143

122-
Because WARP is a replacement for SigKit it makes sense to not only talk about the function identification approach, but also the integration with [Binary NInja].
144+
Because WARP is a replacement for SigKit it makes sense to not only talk about the function identification approach, but also the integration with [Binary Ninja].
123145

124146
#### SigKit Function Identification
125147

126-
SigKit is rooted as a FLIRT-like signature matcher so to not repeat what is said above, see [here](#function-identification).
148+
SigKit's function identification is similar to FLIRT so to not repeat what is said above, see [here](#function-identification).
149+
150+
One difference to point out is SigKit relies on relocations during signature generation. Because of this, firmware or other types of binaries lacking relocations will likely fail to mask off the required instructions.
127151

128152
#### Binary Ninja Integration
129153

130154
The two main processes that exist for both SigKit and WARP integration with Binary Ninja are the function lookup process and the signature generation process.
131155

132156
##### Function lookup
133157

134-
SigKit's function lookup process is integrated as a core component to Binary Ninja as such it is not open source, however the process is described [here](https://binary.ninja/2020/03/11/signature-libraries.html).
158+
SigKit's function lookup process is integrated as a core component to Binary Ninja as such it is not open source, however, the process is described [here](https://binary.ninja/2020/03/11/signature-libraries.html).
135159

136-
What this means is **WARP** unlike SigKit can identify a greater number of smaller functions, ones which would be required to be pruned in generation process.
160+
What this means is **WARP** unlike SigKit can identify a greater number of smaller functions, ones which would be required to be pruned in the generation process.
137161
After looking up a function and successfully matching **WARP** will also be able to apply type information.
138162

139163
##### Signature generation
140164

141165
SigKit's signature generation is provided through user python scripts located [here](https://github.com/Vector35/sigkit/tree/master).
142166

143-
Because of the separation of the signature generation and the core integration the process becomes very cumbersome, specifically the process is too convoluted for smaller samples, and too slow for bigger samples.
167+
Because of the separation of the signature generation and the core integration, the process becomes very cumbersome, specifically the process is too convoluted for smaller samples, and too slow for bigger samples.
144168

145169
#### What does this mean?
146170

147-
WARP can match on a greater number of functions which otherwise would be pruned at the generation process, this is obviously not without its tradeoffs, we generate this function UUID on both ends, meaning that the algorithm must be carefully upgraded to ensure that previously generate UUID's are no longer valid.
171+
WARP can match on a greater number of functions which otherwise would be pruned at the generation process, this is not without its tradeoffs, we generate this function UUID on both ends, meaning that the algorithm must be carefully upgraded to ensure that previously generated UUID's are no longer valid.
148172

149-
Aside from just the matching of functions, we _never_ prune functions when added to the dataset this means we actually can store multiple functions for any given UUID, this is a major advantage for users who can now identify exactly what causes a collision and override, or otherwise understand more about the function.
173+
Aside from just the matching of functions, we _never_ prune functions when added to the dataset this means we actually can store multiple functions for any given UUID. This is a major advantage for users who can now identify exactly what causes a collision and override, or otherwise understand more about the function.
150174

151-
After matching on a function successfully we can reconstruct the function signature not just the symbol name. SigKit has no information about the function calling convention or the function type.
175+
After matching on a function successfully, we can reconstruct the function signature, not just the symbol name. SigKit has no information about the function calling convention or the function type.
152176

153177
[1]: https://devblogs.microsoft.com/oldnewthing/20110921-00/?p=9583
154178
[2]: https://devblogs.microsoft.com/oldnewthing/20221109-00/?p=107373

about.toml

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@ accepted = [
33
"Apache-2.0",
44
"MIT",
55
"Unicode-DFS-2016",
6+
"Unicode-3.0",
67
"OFL-1.1",
78
"BSL-1.0",
89
"BSD-3-Clause",
@@ -11,5 +12,6 @@ accepted = [
1112
"NOASSERTION",
1213
"ISC",
1314
"Zlib",
14-
"OpenSSL"
15+
"OpenSSL",
16+
"NCSA"
1517
]

rust/Cargo.toml

Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
[package]
2+
name = "warp"
3+
version = "1.0.0"
4+
edition = "2021"
5+
license = "Apache-2.0"
6+
7+
[dependencies]
8+
flatbuffers = "25.2.10"
9+
bon = "3.6.3"
10+
uuid = { version = "1.11.0", features = ["v5"]}
11+
flate2 = "1.0.34"
12+
itertools = "0.14"
13+
14+
[features]
15+
default = []
16+
gen_flatbuffers = ["dep:flatbuffers-build"]
17+
18+
[dev-dependencies]
19+
criterion = "0.6.0"
20+
insta = { version = "1.43.1", features = ["yaml"] }
21+
22+
[build-dependencies]
23+
flatbuffers-build = { git = "https://github.com/emesare/flatbuffers-build", rev = "44410b9", features = ["vendored"], optional = true }
24+
25+
[[example]]
26+
name = "type_builder"
27+
28+
[[example]]
29+
name = "dumper"
30+
31+
[[bench]]
32+
name = "type"
33+
harness = false
34+
35+
[[bench]]
36+
name = "chunk"
37+
harness = false

rust/benches/chunk.rs

Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,40 @@
1+
use criterion::{criterion_group, criterion_main, Criterion};
2+
use warp::chunk::{Chunk, ChunkKind, CompressionType};
3+
use warp::mock::{mock_function, mock_function_type_class, mock_type};
4+
use warp::r#type::chunk::TypeChunk;
5+
use warp::signature::chunk::SignatureChunk;
6+
use warp::{WarpFile, WarpFileHeader};
7+
8+
pub fn chunk_benchmark(c: &mut Criterion) {
9+
let count = 10000;
10+
// Fill out a signature chunk with functions.
11+
let mut functions = Vec::new();
12+
for i in 0..count {
13+
functions.push(mock_function(&format!("function_{}", i)));
14+
}
15+
let _signature_chunk = SignatureChunk::new(&functions).expect("Failed to create chunk");
16+
let signature_chunk = Chunk::new(
17+
ChunkKind::Signature(_signature_chunk),
18+
CompressionType::None,
19+
);
20+
21+
// Fill out a type chunk with types.
22+
let mut types = Vec::new();
23+
for i in 0..count {
24+
types.push(mock_type(
25+
&format!("type_{}", i),
26+
mock_function_type_class(),
27+
));
28+
}
29+
let _type_chunk = TypeChunk::new(&types).expect("Failed to create chunk");
30+
let type_chunk = Chunk::new(ChunkKind::Type(_type_chunk), CompressionType::Zstd);
31+
let file = WarpFile::new(WarpFileHeader::new(), vec![signature_chunk, type_chunk]);
32+
c.bench_function("file to bytes", |b| {
33+
b.iter(|| {
34+
file.to_bytes();
35+
})
36+
});
37+
}
38+
39+
criterion_group!(benches, chunk_benchmark);
40+
criterion_main!(benches);

rust/benches/type.rs

Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,45 @@
1+
use criterion::{criterion_group, criterion_main, Criterion};
2+
use warp::mock::{mock_int_type_class, mock_type};
3+
use warp::r#type::class::{StructureClass, StructureMember, TypeClass};
4+
use warp::r#type::guid::TypeGUID;
5+
use warp::r#type::Type;
6+
7+
pub fn void_benchmark(c: &mut Criterion) {
8+
let void_type = Type::builder()
9+
.name("my_void".to_owned())
10+
.class(TypeClass::Void)
11+
.build();
12+
13+
c.bench_function("uuid void", |b| {
14+
b.iter(|| {
15+
let _ = TypeGUID::from(&void_type);
16+
})
17+
});
18+
19+
c.bench_function("computed void", |b| b.iter(|| void_type.to_bytes()));
20+
}
21+
22+
pub fn struct_benchmark(c: &mut Criterion) {
23+
let int_type = mock_type("my_int", mock_int_type_class(None, false));
24+
let structure_member = StructureMember::builder()
25+
.name("member")
26+
.ty(int_type)
27+
.offset(0)
28+
.build();
29+
let struct_class = StructureClass::new(vec![structure_member]);
30+
let struct_type = Type::builder()
31+
.name("my_struct".to_owned())
32+
.class(TypeClass::Structure(struct_class))
33+
.build();
34+
35+
c.bench_function("uuid struct", |b| {
36+
b.iter(|| {
37+
let _ = TypeGUID::from(&struct_type);
38+
})
39+
});
40+
41+
c.bench_function("computed struct", |b| b.iter(|| struct_type.to_bytes()));
42+
}
43+
44+
criterion_group!(benches, void_benchmark, struct_benchmark);
45+
criterion_main!(benches);

rust/benches/void.rs

Lines changed: 0 additions & 22 deletions
This file was deleted.

0 commit comments

Comments
 (0)