Members of the Rizin and Cutter core teams have volunteered to guide participants for GSoC’25. They have already been guiding the participants for the GSoC and RSoC in past years. Please feel free to reach out to any of them if you need any help in selecting a project.
Currently, all repositories are hosted on GitHub main organization account and bugs are tracked on GitHub issues too. We are primarily using our own Mattermost instance, IRC, and Telegram) for communication.
+We have a testsuite (which is running on GitHub Actions, Travis CI, AppVeyor and SourceHut) to test and verify that all the features are still working and that a pull requests or commits don’t break anything, to ensure the support of different operating systems (Linux, MacOS, Windows, FreeBSD, OpenBSD), different architectures (x86/x86_64, ARM64, PowerPC, SystemZ), and to find regressions.
+We encourage contributors to write test cases and documentation in order to verify the implementation and ensure that everything fits well together. For complex bugs and examples, we’re using ASCIInema to record the sessions.
For those who want to get introduced to the Rizin codebase and practices, we recommend to pick one of the easy issues for Rizin or Cutter to start with.
Rizin is modular: this means that it aims to make all the elements and features easily reusable from other projects. The choice of LGPL3 as a license is the minimum requirement to get code merged in Rizin. Contributors can choose Apache, BSD, MIT, Public Domain, or other similar licenses. The reason to exclude GPL as a valid license for the project is because we aim to support proprietary software that uses Rizin while protecting our free codebase.
Participants who want to apply to the Rizin project for the Google Summer of Code 2025 are required to submit a small pull request accomplishing one of the microtasks (see below) as part of their application. You can also choose any of the GitHub issues for Rizin if they are big enough to be a qualification task and still small enough to be finished in no more than a couple of weeks. To help participants understand how to contribute to the project, there are issues marked as “good first issue” for both Rizin and Cutter.
Most of Rizin is written in C (conforming to the C99 standard), and hence, we expect participants to be familiar with C programming language. For some of our tasks or microtasks, such as rz-pm, they should know the Go programming language. For the Cutter tasks, it is a requirement to know C++ and Qt framework basics.
Keep it simple enough to fit in no more than a couple of pages. Try to be clear and concise in your writing.
Try to split the entire GSoC period into tasks and each task into subtasks. It helps us understand how you plan to accomplish your goals, but more importantly, it’ll help you understand the task deep enough before starting and prioritize important things to do first.
Please note how much time a day/week you can spend on this project.
Please specify which category you apply for - medium task or extended deadline one.
Specify your timezone so we can assign you a mentor in the same one to ease communication.
Submit your proposal early, not at the last minute!
Be sure to choose a “backup” idea (the second task you want to do) so that conflicts (two participants for one task) can be resolved.
Improving usability and user experience (175 hour project)#
The Cutter’s backend provides many features that are not exposed or exposed in Cutter
+efficiently. The goal of this task would be to figure out the users’ biggest pain points and
+address them by improving or reworking the interface. Some of the issues are already in our GitHub,
+while others might be figured during the cross-comparison with other tools.
Plugins and Python High Level API (175 hour project)#
Our current public API to be used by plugin authors is somewhat limited. We need to improve a lot of things about our Plugins support and take it few steps ahead.
+This task is only about improving the public C++ and Python interface of Cutter, specifically its graphical user interface components. For a task about exposing Rizin’s API for disassembly, analysis and other purposes, see the Rizin bindings task above.
It will greatly improve the scripting experience, will make API more consistent and will ease
+creating Cutter plugins by the community. Moreover, it will simplify testing of the Cutter features.
1st term: Design of the high level API and required Rizin changes. Review and implement all missing API functions that are accessible as interface controls.
Final term: Implement the way to show the API when hovered over some interface control, create documentation.
Multi-Tasking and Event-driven architecture (350 hour project)#
The information Cutter gets about functions, strings, imports, and the analysis are all performed in Rizin and only displayed in Cutter. Currently, it is pulling most information from Rizin only on demand. This is problematic because sometimes the user performs changes (via plugins, the console widget, and more) that are affecting the information from Rizin, but Cutter doesn’t know about these changes to apply the to the UI. For example, if a user will define a new function in a Python script or via the console widget by using the Rizin command af @ <addr>, Cutter will not show this new function in the Functions widget until the user will refresh the interface manually (Edit -> Refresh Contents).
+The goal of this task is to use an event-driven architecture to overcome this limitation.
In addition, this task will also handle the analysis in the background feature, to allow the analysis performed by Rizin to happen while the interface is active.
The overall implementation of this task should start from Rizin by adding events to many of the functions. This can be done using rz_events. For example, add an even for function creating, for section creation, for flag deletion, for name changed, and more
Add events to all the relevant functions inside Rizin
Add support for these events in Cutter and refresh and update the relevant widgets per each event
Support analysis in the background and allow the user to start its session while Rizin is analyzing (see #1856, #1574)
The participant should be comfortable with the C++ for Cutter and C for Rizin. They should also be familiar with Qt framework. Experience in GUI code architecture, for example using functional reactive programming or Elm-like approaches is a plus.
Thanks to the work that was done in the previous GSoC, Cutter and Rizin have nice visualizations of the heap and memory maps. We would like to expand on this feature with performance improvements to the heap parsers and support more memory allocators.
The participant will gain the understanding on how modern runtimes provide the heap for various
+programs, which will be beneficial for the binary exploitation skills.
It will greatly improve the debugging and reverse engineering experience for complex programs,
+also provides the way to design the exploitation techniques with the help of Rizin/Cutter.
Binary diffing is one of the most common tasks for the reverse engineer. There are many
+tools available, but most of them are either detached from the main RE toolbox or poorly integrated.
+Rizin provides basic diffing features out of the box with rz-diff tool, but Cutter has no
+interface to represent similar functionality.
1st term: Expose the rz-diff features in the Cutter core and create the interface for opening
+files for diffing. Implement the diff modes for hexadecimal and disassembly views.
Final term: Implement the diff modes for graph and pseudocode views, create the documentation.
Debugger improvements and portability (175 hour project)#
Rizin debugger already supports most of the platforms, including native and remote debugging.
+Nevertheless, for most platforms it’s limited mostly to the x86/x86_64 and ARMv8, often lacking the
+tests. The task would be to add missing architectures to the native debugger, e.g. MIPS to the Linux
+Native, ARMv7/ARMv8 to the FreeBSD, System Z debugger for Linux, HPPA debugger for Linux, VAX
+debugger for NetBSD, and so on. Moreover, some information isn’t available during the debugging mode, e.g. source-level breakpoints or names, it would be necessary to make sure debug commands understand those.
With the help of emulators like QEMU and OpenSIMH we could extend our CI to automatically test these
+debuggers.
FRIDA is the famous dynamic instrumentation toolkit that is immensely popular among mobile device researches. Rizin could be easily integrated with Frida by creating a plugin that will allow to connect to the Frida instance, receive traces, set breakpoints, get information and events from it.
Since modern architectures are now enforcing W^X, exploiters are using ROP. (Un)fortunately, building ROP chain by hand can be tedious, this is why some tools can be used to ease this construction: ImmunityDBG has mona.py, there is also ROPgadget and dropper.There exist even tools that can generate ROP chains automatically, for example exrop. It’s a shame that despite having RzIL, Rizin doesn’t have something similar yet. One of the possible solutions would be to build an external plugin or tool which will reuse power of librz and rz-gg. Moreover it makes sense to think about SROP, COOP and BROP support.
The last year (GSoC'24) one of our participants started implementing this feature, but it wasn’t finished. You could check the rz-solver repository for more details.
Also, the rz-gg tool while has the ability to create a custom shellcode but there is still a lot of work required.
The participant should be comfortable with the C language, know some assembly and a high-level language. Also, knowing a little bit of automatic binary analysis wouldn’t hurt.
The current code analysis has many caveats and issues which need addressing. Fixing them and writing more tests is important to stabilize and enhance rizin’s analysis engine.
Currently Rizin has support for heap exploration and analysis, but the feature is still basic and can be improved. Additionally, other allocators can be added (MacOS, tmalloc, etc.), but this should be done after a proper refactoring, because heap analysis shouldn’t depend on the debugger backend, and we may be able to use different heap tools.
Class analysis for C++/ObjectiveC/Swift/Dlang/Java #416#
Analysis classes, accessible under the ac command, is a relatively new feature of rizin.
+They provide a way to both manually and automatically manage and use information about classes in the binary.
Devirtualize method calls using class vtables #414#
Consider the following call: call dword [eax + 0x6c]
+Let’s assume eax is the base pointer of a vtable we have saved in class analysis and we want to find out the actual address of the called method.
So there should be a command that takes the offset (in this case 0x6c) and looks up the actual destination.
+It should be possible to call this command with a specific class, so it only looks into its vtable, or without a class, so it gives a list of possible destinations for all vtables that are not too small for the offset.
When that is implemented, one could also add a command that does the same thing, but automatically takes the offset from the opcode at the current seek.
It is required to solve numerous issues, along with improving parallel execution and performance.
+The next interesting idea is to setup and reuse Godbolt compilation engine for generating tests for different compilers and compilation options. There is even a command line tool for interacting with Godbolt - cce.
Almost one thousand of tests marked as “broken” in our testsuite. The task is to take any of those,
+investigate why it fails, if the test makes sense now or already irrelevant today. Then to try to
+fix some of the broken tests.
Some of these issues might be related on how Rizin and RzGhidra integrate and might require changes
+in the Rizin side.
Also note that most of these issues should be paired with the test to verify it will not break in
+the future.
+
\ No newline at end of file
diff --git a/gsoc/index.html b/gsoc/index.html
index 51c1dc0..c00f24e 100644
--- a/gsoc/index.html
+++ b/gsoc/index.html
@@ -1,6 +1,9 @@
Rizin ❤️ Google Summer of Code | Rizin
TL;DR Jump to the Ideas list.
+Introduction This year, we participate again, effectively continuing the tradition since 2015.
+Mentors Members of the Rizin and Cutter core teams have volunteered to guide participants for GSoC’25. They have already been guiding the participants for the GSoC and RSoC in past years. Please feel free to reach out to any of them if you need any help in selecting a project.
+Anton Kochkov Mattermost: xvilka – @akochkov Florian Märkl Mattermost/Telegram: @thestr4ng3r – @thestr4ng3r Giovanni Dante Grazioli Mattermost/Telegram: @deroad @der0ad And many others Development methodology Currently, all repositories are hosted on GitHub main organization account and bugs are tracked on GitHub issues too....
GSoC 2024
TL;DR Jump to the Ideas list.
Introduction This year, we participate again, effectively continuing the tradition since 2015.
Mentors Members of the Rizin and Cutter core teams have volunteered to guide participants for GSoC’24. They have already been guiding the participants for the GSoC and RSoC in past years. Please feel free to reach out to any of them if you need any help in selecting a project.
Anton Kochkov Mattermost: xvilka – @akochkov Riccardo Schirone Mattermost: ret2libc @RickySkiro Florian Märkl Mattermost/Telegram: @thestr4ng3r – @thestr4ng3r Itay Cohen Mattermost/Telegram: @Megabeets @Megabeets_ Giovanni Dante Grazioli Mattermost/Telegram: @deroad @der0ad And many others Development methodology Currently, all repositories are hosted on GitHub main organization account and bugs are tracked on GitHub issues too....
GSoC 2023
TL;DR Jump to the Ideas list.
diff --git a/gsoc/index.xml b/gsoc/index.xml
index e26485c..5512e97 100644
--- a/gsoc/index.xml
+++ b/gsoc/index.xml
@@ -10,7 +10,19 @@
https://rizin.re/images/rizin_preview.png
Hugo -- gohugo.io
- Tue, 23 Jan 2024 00:00:00 +0000
+ Sun, 26 Jan 2025 00:00:00 +0000
+
+ GSoC 2025
+ https://rizin.re/gsoc/2025/
+ Sun, 26 Jan 2025 00:00:00 +0000
+
+ https://rizin.re/gsoc/2025/
+ TL;DR Jump to the Ideas list.
+Introduction This year, we participate again, effectively continuing the tradition since 2015.
+Mentors Members of the Rizin and Cutter core teams have volunteered to guide participants for GSoC’25. They have already been guiding the participants for the GSoC and RSoC in past years. Please feel free to reach out to any of them if you need any help in selecting a project.
+Anton Kochkov Mattermost: xvilka – @akochkov Florian Märkl Mattermost/Telegram: @thestr4ng3r – @thestr4ng3r Giovanni Dante Grazioli Mattermost/Telegram: @deroad @der0ad And many others Development methodology Currently, all repositories are hosted on GitHub main organization account and bugs are tracked on GitHub issues too.
+
+
GSoC 2024
https://rizin.re/gsoc/2024/
diff --git a/index.json b/index.json
index faf7258..c5f9975 100644
--- a/index.json
+++ b/index.json
@@ -1 +1 @@
-[{"content":"This year we focused mainly on the \u0026ldquo;backbone\u0026rdquo; of the Rizin framework and all related tools, including Cutter. This will become a foundation of the future work we plan to finish in 2025. The major goal is to release 0.8.0 in upcoming months. As for the longer term you can see our roadmap for details.\nReleases Rizin versions 0.7.x and Cutter 2.3.3-2.3.4 80% of work is done for Rizin 0.8.0 Capstone A bulk of our effort was spent towards Capstone improvements as it is the core dependency and the main disassembly engine for many architectures supported by Rizin. A long-going project called Auto-Sync was finally merged and largely completed. You could read more details in our corresponding article.\nUpdated PPC, ARM, AArch64 and SystemZ to LLVM 18. Tricore support by billow HPPA by r3v0lt Alpha by r3v0lt ARC PR by r3v0lt (not yet merged: #2570) MIPS, microMIPS, and nanoMIPS support by deroad Xtensa by billow Rizin Started merging RzAsm and RzAnalysis for the future RzArch MIPS update to a Capstone-based plugin by deroad Basic LoongArch support by deroad PIC MCU family RzIL uplifting by billow MSP430 RzIL uplifting by moste00 Xtensa RzIL uplifting by billow Hexagon RzIL uplifting by Rot127 Finished (almost, with one PR still not yet merged) conversion to the rzshell Rewritten text representation of ELF, PE, NE relocations by Roeegg2 Added initial support for the Alpha architecture Many bugfixes, refactorings, and performance optimizations Cutter Switch to the Qt6 by default, including for forming releases Updated Rizin Other projects rz-ghidra: add support for PIC architectures rz-ghidra: add support for Tricore architecture Miscellaneous Google Summer of Code 2024 FOSSAsia 2024 conference ","permalink":"https://rizin.re/posts/year-2024-summary/","summary":"An overview of the work done in 2024","title":"2024 Year Summary"},{"content":"Hello, I’m Mostafa. I graduated with Excellence from Cairo University’s Faculty of Engineering, Computer Engineering Department, class of 2023. I write C++ for a living. I love systems programming, metaprogramming \u0026amp; DSLs, as well as Compilers \u0026amp; VMs. You can find me @Github, and @Linkedin.\nI was honored to participate again as a contributor in the 2024 GSoC with the Rizin Organization. The original project was implementing binary lifting techniques for RISC-V instructions onto Rizin\u0026rsquo;s custom internal representation, called RzIL. However, updating the RISC-V Capstone disassembler (originally a small task in the project preamble) turned out to need much more work than expected, and blocked the rest of the project.\nLet\u0026rsquo;s start at the beginning.\nRISC-V… Lifting? Lifting is a term of art in compiler research and implementation, it refers to any process that takes as input a low-level machine code program and outputs a higher-level program. The reverse process, lowering, is what compilers do when they compile from a relatively high level language like C or LLVM IR to machine code. So you could simply think of lifting as a synonym for “Decompiling” or “Reverse-Compiling”.\nIn the context of Rizin, lifting refers to transforming a machine code program written for any of the hardware architectures that Rizin understands (x86, RISC-V, 6502, etc\u0026hellip;) to a Rizin-specific intermediate language called RzIL.\nBy doing this, Rizin’s developers can write generic analysis algorithms that interpret RzIL instructions, and a generic VM that executes them, only once. Then, for each architecture that Rizin supports, a lifter that transforms machine code written for that architecture into RzIL is written, and as a result we get all the analysis algorithms and VM execution capabilities “for free”.\nIn a nutshell, RzIL is the universal “Lingua Franca” for Rizin, like English is for Software Engineering.\nFigure 1: Without RzIL, there is no smarter way to perform N operations for M assembly languages other than doing an NxM amount of work, implementing the N operations over and over again per each language/architecture.\nFigure 2: With RzIL, the amount of work to support N operations for M architectures is N+M, the N operations are written exactly once for the intermediate language, then M transformers are written to lift each of the M architectures to the intermediate language.\nFor want of a disassembler So the original plan was to write the grey arrow in the figure above: a lifter from RISC-V machine code into RzIL. However, the first step in doing that is to “parse” RISC-V instructions from their binary form into a convenient data structure. We call that “parsing” step disassembly, or, more accurately, decoding.\nSide Note: lots of people, when “disassembly” and “assembly” are mentioned, will probably think of the following diagram:\nThis is not wrong for most purposes. However, in the context of this writeup it’s better to have the following and more detailed picture in mind:\nIn this writeup I’m more interested in the left-to-right flow: decoding from a binary to a structured (e.g. C struct) representation of the instruction, then assembling the structured representation of the instruction into a string form. Confusingly, sometimes “Disassembly” is used to include both Disassembly and Decoding, for example in Capstone the structured representation includes as a member its own toString serialization. It will often be clear from context what step is meant, and decoding is often far more important than disassembly.\nWhere were we? Ah yes, we were supposed to “parse” (i.e. decode) an instruction from its binary form into a convenient data structure, so that we can write elegant code that easily and robustly lifts it into RzIL.\nThe good news is that Rizin already has a RISC-V decoder/disassembler, since it uses as a library the project Capstone, which is a general-purpose disassembler framework for multiple architectures, including RISC-V.\nThe bad news? The RISC-V disassembler was incomplete and out of date.\nYou won\u0026rsquo;t catch it missing a variant of an ADD or a SUB, not even MULs or DIVs, but you can catch it missing the zba, clz, or xnor instructions, for example. Those, respectively, accelerate array indexing, count leading zeros, and perform an exclusive NOR. Capstone\u0026rsquo;s current RISC-V disassembler includes none of those instructions. We could argue whether those instructions are really \u0026ldquo;Useful\u0026rdquo; or \u0026ldquo;Common\u0026rdquo; in real software: but at the end of the day they\u0026rsquo;re part of RISC-V, and any compliant RISC-V tool must be aware of them. Capstone sometimes also chokes on quite basic instructions, like LOAD.\nRISC-V has a somewhat unusual approach to ISA evolution: it embraces extensions openly in its standard. Most architectures define new \u0026ldquo;versions\u0026rdquo; or \u0026ldquo;editions\u0026rdquo; whenever they change, RISC-V instead defines self-contained \u0026ldquo;modules\u0026rdquo; of behaviour and ISA state, even opening the door to vendors (companies selling SoCs and other products with RISC-V cores) to make their own vedor-defined extensions that co-exist with the rest of the architecture and its standard extensions. Each extension as well as the base architecture could evolve through different versions indepedently from other extensions. The RISC-V architecture is thus more of a family of architectures specified together rather than a single one.\nCapstone was originally written based on codegen logic from LLVM. It’s essentially a port of LLVM disassembly logic from C++ to C (along with much simplification and cleaning up). Unfortunately, LLVM keeps updating that logic to reflect the fast-moving development and evolution of the architectures; those updates are not magically reflected back into Capstone! To make matters even worse, even LLVM proper can’t completely keep up with all the updates that happen to all the architectures it supports, it lags.\nThe Capstone project maintains an update tool called Auto-Sync, which can semi-automatically synchronize changes from LLVM to Capstone (using Tree-sitter magic). Alas, it can only do that for some architectures, and RISC-V is not among the supported ones. Also, we already saw how even LLVM is not completely on top of all updates. Fortunately, the solution exists, just hiding elswhere.\nTo Sail the high seas and RISC it all The problem of describing Instruction Set Architectures (ISAs) accurately so that we can do plenty of useful things to them (assembly, disassembly, emulation, codegen, etc…) faces many projects and researchers, so much so that some smart people have developed an entirely new special language for it, Sail. Sail is a language designed specifically to address the problem of describing all aspects of ISAs: how the instructions are encoded into binary, how they execute, etc…\nNow, if only there was a project that used Sail to describe RISC-V… wait, there is! It’s called Sail-RISCV. It’s such a complete and up-to-date description of RISC-V that the RISC-V foundation adopted it as the official source of truth for the architecture, this means that however the Sail code behaves, is - by definition - how RISC-V should behave.\nOther architectures modelled in Sail are several versions of ARM, a considerable part of x86, and a research version of MIPS called CHERI-MIPS, which includes hardware extensions to assist and accelerate memory safe pointers. The ARM and x86 models are auto-generated from other descriptions, and all 3 models are much less active than RISC-V\u0026rsquo;s.\nLet’s see a snippet of what Sail looks like in practice, here’s the definition of RISC-V IType (immediate) instructions:\nThe rule might be as cryptic as latin if you’re not used to pattern-matching constructs from functional languages, but what it’s saying is simply the following:\nIf the first (least-significant) 7 bits of the 32-bit instruction are: 0010011 And if the 3 bits from bit 12 to bit 14 are in the table encdec_iop Then, the operation specified by this instruction is whatever enum corresponding to bits 12:14 in the encdec_iop table, and the args are: The 5-bit register index rd in bits 7 through 11 The 5-bit register index rs1 in bits 15 through 19 The 12-bit literal imm in bits 20 through 31 Otherwise, if (1) and (2) are not true, keep checking the 32-bit binary instruction against other rules In case you’re wondering what regidx is, it’s an alias for the type of 5-bit integers (or, as Sail calls them, bitvectors). Sail-RISCV uses it to refer to registers everywhere because register files in RISC-V always have 32 registers.\nIn case you’re wondering about the double arrow, that’s because this rule is a clause in a “Mapping”, a Sail innovation that basically means a bidirectional function: it can be used to decode binary instructions into structured objects, and to encode structured objects into binary instructions (that’s why it’s called encdec!).\nSail-RISCV contains ~280-290 rules of this form, and hundreds of other rules, sub-rules, and varied logic describing how RISC-V instructions are assembled, executed, how memory is accessed, how privliege levels and syscalls work, and so on.\nWe want this information, and we want it in C. We could port it by hand into C (good luck finishing in 2/3 years :(), or… we can use trusty code generation.\nRISC-V Auto-Sync So this is what my GSoC project this year was all about:\nUse Sail’s compiler (written in OCaml) as a library to load, parse, and typecheck Sail-RISCV Process the AST of Sail-RISCV and generate data structures representing the important logic Generate C code from those data structures That is, the code I wrote transformed the rule above into the following C code:\n// ---------------------------ITYPE------------------------------- { if (((binary_stream \u0026amp; 0x000000000000007F) == 0x13)) { uint64_t op = 0xFFFFFFFFFFFFFFFF; switch ((binary_stream \u0026amp; 0x0000000000007000) \u0026gt;\u0026gt; 12) { case 0x7: op = RISCV_ANDI; break; case 0x3: op = RISCV_SLTIU; break; case 0x2: op = RISCV_SLTI; break; case 0x6: op = RISCV_ORI; break; case 0x4: op = RISCV_XORI; break; case 0x0: op = RISCV_ADDI; break; } if (op != 0xFFFFFFFFFFFFFFFF) { uint64_t rd = (binary_stream \u0026amp; 0x0000000000000F80) \u0026gt;\u0026gt; 7; uint64_t rs1 = (binary_stream \u0026amp; 0x00000000000F8000) \u0026gt;\u0026gt; 15; uint64_t imm = (binary_stream \u0026amp; 0x00000000FFF00000) \u0026gt;\u0026gt; 20; tree-\u0026gt;ast_node_type = RISCV_ITYPE; tree-\u0026gt;ast_node.itype.imm = imm; tree-\u0026gt;ast_node.itype.rs1 = rs1; tree-\u0026gt;ast_node.itype.rd = rd; tree-\u0026gt;ast_node.itype.op = op; return; } } } //------------------------------------------------------------ This low-level soup of shifts and masks performs the exact logic described in the Sail snippet earlier, just in C. It continues on like that for 9K lines of generated code (#includeing approximately 2K lines of generated AST definition).\nIn addition to this, the logic that disassembles the decoded structured objects into strings is also translated. Overall, the generated code is about 20K of C, but it’s still not finished yet.\nLoose Ends The 20K of generated C code is still not merged into Capstone. Before merging, it must first incorporate additional logic into the generated decode functions, it must also infer the type of each operand (whether it’s a register, a memory address, or a literal. If it’s a register, is it floating point or integer, etc…). Those details can easily consume another writeup, but they\u0026rsquo;re all managable.\nThe generator itself is ~2500 lines of idiomatic (I hope) OCaml. But since Sail is a complex language, the tool must make some assumptions about the input that might not survive the evolution of Sail-RISCV, the application to other Sail models, or the active evolution of Sail itself.\nFinally, we haven’t addressed the original problem yet! I hope to eventually and finally write the grey arrow in the first diagram: Lifting RISC-V instructions to RzIL code.\nConclusion Let’s summarize this rollercoaster journey:\nWe just wanted to write a binary lifter for RISC-V instructions into RzIL, Rizin’s intermediate language. But in order to do that, we first have to have an up-to-date RISC-V decoder/disassembler. Rizin depends on Capstone for RISC-V disassembly, but Capstone RISC-V disassembly logic is ported from old LLVM logic that is not up-to-date. Even modern LLVM is not completely up-to-date with RISC-V. But Sail-RISCV is, and it\u0026rsquo;s adopted by the RISC-V foundation as the most authoritative model of the RISC-V architecture. And thus, we can generate a Capstone disassembler module from Sail-RISCV, by depending on the Sail compiler as a library. It was fun. Frustrating and long-winded at times, but what kind of programming isn’t? That’s part of the thrill anyway!\nThat\u0026rsquo;s all and Happy Holidays! Keep coding through the wind and the snow.\n","permalink":"https://rizin.re/posts/gsoc-2024-auto-sync-sail/","summary":"A description of the original GSoC 2024 task plans of updating RISC-V disassembler in Capstone, updating it in Rizin, and implementing RzIL uplifting and the actual progress","title":"GSoC 2024 - RISC-V Capstone auto-sync and RzIL uplifting"},{"content":"We are grateful to Google for being able to participate in Google Summer of Code 2024. We received many applications and are happy that the project has substantial interest. We thank every participant and wish them luck in their future endeavors. We also thank Google for providing us with the platform to attract new contributors. Many of the past participants stayed with the project after previous GSoC and RSoC programs, and we sincerely hope it will continue.\nThis summer, the accepted projects aim to improve the ROP gadget searching capability, add an ROP compiler in Rizin, and uplift more architectures to our next-generation intermediate language - RzIL.\nz3phyr: Exploitation Capabilities Improvements Hi! I’m Giridhar Prasath Rajendran (a.k.a z3phyr). I’m a student pursuing my Master\u0026rsquo;s in Cybersecurity at the University of Maryland, College Park. I have experience developing user-space networks and web applications. I enjoy playing binary exploitation CTF challenges and am passionate about anything low-level.\nI started contributing to Rizin in January 2024 by fixing a UI issue PR#4095. As a part of my microtask, fixing Issue#1259 seemed perfect as I learned more about Linux heap exploitation. PR#4355, PR#4426 were merged as a part of fixing this issue. This helped me strengthen my knowledge about TLS and Glibc heap internals. I was also using this feature in my binary exploitation class assignments.\nI will integrate the ROP chain generation feature with the rz-gg tool. This involves:\nGadget Analysis: Implementing functionality to analyze raw assembly gadgets, categorize them based on their semantics (e.g., load, store, syscall), and store this information in a Gadget DB. Gadget Selection and Chaining: Developing an API to process constraints (e.g., set register RDI to \u0026ldquo;/bin/sh\u0026rdquo;, set RSI to NULL) using an SMT solver. This API will automatically select appropriate gadgets from the database to construct a functional ROP chain. Interaction: Provide an interface in rz-gg tool to allow users to generate ROP chains based on the specified constraints The initial support will cover architectures like x86, x86-64, and ARM.\nI look forward to a great summer of contributing and learning. I would like to thank the maintainers for their support and Google for providing this opportunity.\nmoste00: Uplifting RISC-V Instructions to RzIL Hello, my name is Mostafa Mahmoud (aka moste00 @ github, kotlinenjoyer @ mattermost). I graduated from Cairo University, Faculty of Engineering, a Computer Engineer. I\u0026rsquo;m passionate about anything and everything involving metaprogramming, macro systems, compilers, developer tools, IDEs, debuggers, virtualization \u0026amp; virtual machines, operating systems, computer architectures and hardware, plugin systems, and many other things! If I were to summarize my interests in one sentence, I would say that I like programs that \u0026ldquo;manage\u0026rdquo; or \u0026ldquo;provide services for\u0026rdquo; other programs, programs that serve as the bottom layers in a software stack, programs that manipulate other programs as data, and programs that other programs \u0026ldquo;run on top of\u0026rdquo;, so to speak.\nI started to contribute to Rizin in early March 2024. My first PR was a cleaning up of and removing dead code in the source for the database SDB, then I cleaned up the code for the Z80 assembler by removing all global variables and moving them into a state struct. The great people at Rizin were consistently helpful at every step, first providing tips on how to build Rizin, set up its dev env, navigate its large codebase, and provide helpful code reviews when I had the PRs ready.\nOn advice from my mentor in Rizin, I started to write a RzIL lifter for the very simple architecture MSP430, the purpose being to get used to writing IL lifters and encountering the challenges that people typically face while writing one. The incomplete lifter is here. It still needs to be finished as of the time of writing this, but I hope it will soon be!\nThe MSP430 pull request is itself just a micro-task and preparation for the main task I will do in GSoC 2024: Lifting the RISC-V architecture assembly into the RzIL intermediate language, a 350-hour project. I have set the MVP for this project at the point where the RV-32I and the RV-32F (Integer instructions and Floating-Point instructions, respectively) subsets of RISC-V are both lifted and tested using trace-testing, but see my proposal for stretch goals as well as more details, such as the proposed road plan.\nIn conclusion, by the end of my GSoC project in approximately October or early November of 2024, I hope that Rizin will have the capability to transform into RzIL the assembly for two new architectures, the MSP430, and the RISC-V instruction set. My goal is to implement this in a clean, efficient, and easy-to-reason-about and maintain fashion. I hope that I get more competent and knowledgeable about different Computer architectures and IL compilers because of working with all the smart and knowledgeable people around me in Rizin. I hope Rizin continues to attract contributors interested in low-level programming and reverse-engineering dev tools.\nIt\u0026rsquo;s always a pleasure to participate in GSOC; I look forward to a summer full of hacking :\u0026rsquo;).\n","permalink":"https://rizin.re/posts/gsoc-2024-announcement/","summary":"An announcement of the Google Summer of Code 2024. Two accepted candidates.","title":"Google Summer of Code 2024 Announcement"},{"content":"Updated on 2024.10.01\nA disassembler is obviously a must-have tool to do any reversing task. But using just any disassembler, especially for frameworks like Rizin, doesn\u0026rsquo;t really do it.\nThere are several capabilities which would be nice to have.\nIt should:\nBe correct. And if it isn\u0026rsquo;t, it should be easy to test and spot the error (in our case we want to compare the output directly to llvm-objdump). Provide a single API for multiple architectures. Support niche architectures or make it relatively easy to add them. Apart from the text disassembly, provide additional information about the operands and other meta-data. Be easy to update when new processor extensions come out. Relatively lightweight. Written in C or any other language that is easy to integrate into C/C++ software (specifically needed by Rizin/Cutter) One of the first disassembler engines which was capable of some of those points was Capstone. Quynh Nguyen Anh, the author of Capstone, figured that all the information we need, is basically already there in compiler projects like LLVM.\nObviously, for compilation you need the same and much more information you would need for disassembling. And you need them in a well-defined and machine-readable way.\nSo, what Capstone did, was re-implementing the LLVM disassembler logic in C, add meta-data for each instruction from the architecture definitions (also given in the LLVM-project) and add a single API to interact with it.\nTo summarize, Capstone is in the end:\nA more lightweight API than LLVM because it re-implements only the necessary code for disassembly from LLVM. Can support as many architectures as LLVM supports (if someone ports them to Capstone). Provides more information than the textual disassembly one gets via llvm-objdump. Relies on a well maintained and large project, which will (likely) be there even in 10+ years and is managed by people who know more about the architectures. The big problem with Capstone was though, that it hadn\u0026rsquo;t a working update mechanism. There were a bunch of Python scripts and very little documentation. Definitely an unsustainable solution.\nDue to this, Capstone became outdated over the years and most disassembler modules didn\u0026rsquo;t support modern processor extensions.\nWhat can be done? Besides LLVM we attempted once to generate a disassembler module for the Hexagon architecture (a DSP architecture from Qualcomm). But instead of LLVM, we used the ISA PDF for our first try. We parsed it and generated the decoding tables for the instructions. This worked, but was a little messy. Parsing PDF files is not fun and as soon as the PDF file changes somehow, stuff is broken again. Also, it is hard to test if you actually extracted the encoding information from the PDF correctly.\nOur second attempt uses LLVM. LLVM provides a way to get the definitions of an architecture in JSON format (llvm-tblgen --dump-json). The JSON dump has all the details about instructions you can wish for. Opcodes, operand types, read/write info and more. Pretty much anything you could wish for. With this experience we decided that LLVM proved to be a good source for disassembler generation.\nNow, with this experience we decided we could extend Capstone with a proper updater. The alternative, implementing something Capstone like from scratch in a new project, did not really seem a good idea. Capstone has already a large user base, and we would need to migrate to the new tool as well. The last point is maybe annoying but doable. Getting a user base again is a way harder task.\nSo over the last two years we added an updater to Capstone. With it, we updated some core modules (ARM, AArch64, PPC + Paired Single, SystemZ, Mips + NanoMips) and added new ones (Alpha, TriCore, Xtensa, HPPA). And to our delight jiegec and FurryAcetylCoA added support for LoongArch.\nIn the following blog post we\u0026rsquo;ll not just explain the update procedure in detail, but also reflect on some challenges and problems you run into when you generate disassemblers.\nHow LLVM generates its disassemblers LLVM is our ground truth Capstone is built on. So let\u0026rsquo;s start with it.\nHow is the LLVM disassembler generated and how does it work?\nLLVM defines its various supported architectures in a language, specifically designed for this purpose. Each target\u0026rsquo;s instructions, instruction operands, scheduling information and more is written in the TableGen language. The definitions can be found in llvm/lib/Target/\u0026lt;TARGET-NAME\u0026gt;/*.td.\nPlease note, that from now on we use \u0026ldquo;target\u0026rdquo; and \u0026ldquo;architecture\u0026rdquo; are interchangeable terms. In the LLVM realm we speak about a target. In Capstone context about an architecture.\nSince each target is defined in the same way, LLVM can apply the same procedures on them to generate C++ code with it. This is way better than implementing every target directly in C++. Because otherwise a disassembler must be implemented again and again for each target. This would be of course too much effort. With TableGen every instruction is already well-defined in the td files. So, LLVM uses a universal method to generate a decoder for each of them. There is still some manual work left. E.g. handling edge cases and parsing operand bits. But the core logic is generated.\nTo use the content of the td files in a programmable way, the llvm-tblegen tool parses them and converts its content into C++ classes and saves them in a RecordKeeper. The RecordKeeper class is TableGen\u0026rsquo;s internal representation of the td files content. Which can now be used to generate arbitrary code. These classes basically hold all the td file information in a uniform and programmable way. The C++ classes belong logically to the so called CodeGen layer. Which, as the name already says, are used to generate code.\nExample:\nDefinition of the ARM setend instruction in TableGen:\ndef SETEND : AXI\u0026lt;(outs), (ins setend_op:$end), MiscFrm, NoItinerary, \u0026#34;setend\\t$end\u0026#34;, []\u0026gt;, Requires\u0026lt;[IsARM]\u0026gt;, Deprecated\u0026lt;HasV8Ops\u0026gt; { bits\u0026lt;1\u0026gt; end; let Inst{31-10} = 0b1111000100000001000000; let Inst{9} = end; let Inst{8-0} = 0; } becomes a C++ class of the form of this\nSETEND {\t// InstructionEncoding Instruction InstTemplate Encoding InstARM XI AXI Requires Deprecated // Instruction bits, the operand bits are marked (in this case only one bit for \u0026#34;end\u0026#34;). field bits\u0026lt;32\u0026gt; Inst = { 1, 1, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, end{0}, 0, 0, 0, 0, 0, 0, 0, 0, 0 }; field bits\u0026lt;32\u0026gt; Unpredictable = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }; field bits\u0026lt;32\u0026gt; SoftFail = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }; int Size = 4; string DecoderNamespace = \u0026#34;ARM\u0026#34;; list\u0026lt;Predicate\u0026gt; Predicates = [IsARM]; string DecoderMethod = \u0026#34;\u0026#34;; bit hasCompleteDecoder = 1; string Namespace = \u0026#34;ARM\u0026#34;; // Note this list of in and out operands. Setend has only an operand which is read and no operands it writes. dag OutOperandList = (outs); dag InOperandList = (ins setend_op:$end); string AsmString = \u0026#34;setend\t$end\u0026#34;; list\u0026lt;dag\u0026gt; Pattern = []; list\u0026lt;Register\u0026gt; Uses = []; list\u0026lt;Register\u0026gt; Defs = []; int CodeSize = 0; int AddedComplexity = 0; bit isPreISelOpcode = 0; bit isReturn = 0; bit isBranch = 0; ... bit isBarrier = 0; bit isCall = 0; bit isAdd = 0; bit isTrap = 0; bit canFoldAsLoad = 0; bit mayLoad = ?; bit mayStore = ?; bit mayRaiseFPException = 0; ... bit doubleWidthResult = 0; SubtargetFeature DeprecatedFeatureMask = HasV8Ops; bits\u0026lt;1\u0026gt; end = { ? }; } Note, that the C++ class above has the same structure for each target. Hence, LLVM\u0026rsquo;s code generation, can reason on them without the need to know specific target details.\nNow, what code is actually generated? This depends on what you need. TableGen has several backends. Each of them uses the RecordKeeper's content to generate different files. For example, the RegisterInfo backend generates several tables with information about target registers. As mentioned above, the RegisterInfo backend doesn\u0026rsquo;t need to know details about targets specific registers. It just implements methods to generate an enumeration with all register names. Or it generates tables which map registers to their alias, or lookup tables which map bits to a register ID.\nGenerating enumerations is nice, but more complex C++ code is, of course, also generated. For us very relevant is the decoding logic, which decodes a byte sequence into a Machine Code instruction. A Machine Code instruction (MCInst) is the class which represents a target\u0026rsquo;s decoded instruction. It holds the ID of the instruction, its operands, some flags (isBranch etc.), and some more.\nDecoding procedures from bytes to MCInst are the same for each target (except x86, because historical reasons I guess). In the CodeGen layer we still know the encoding of each instruction of a target. Another backend, the DecoderEmitter, consumes these encodings and builds a state machine over them. The generated state machine simply checks certain bits and transitions into states. The end state is either an identified instruction or the disassembly failed. After the instruction ID is decoded, a big switch case is walked over to call the different decoder methods of the instruction\u0026rsquo;s operands.\nCheckout [`PPCGenDisassemblerTables.inc`](https://github.com/capstone-engine/capstone/blob/next/arch/PowerPC/PPCGenDisassemblerTables.inc). It contains this state machine. Or see the examples below. The key is: the state machine table and the big switch cases can be generated independently of the target. Each target still needs to implement the operand decoders, because those are unique, but this is essentially it. It saves quite some work, compared to implementing the decoding logic every time again and again.\nExcerpt from state machine\nstatic const uint8_t DecoderTableARM32[] = { // What to do in the state | Bits to check or to extract /* 0 */ MCD::OPC_ExtractField, 25, 3, // Extract 3 bits at offset 25 from byte sequence /* 3 */ MCD::OPC_FilterValue, 0, 47, 14, 0 // Check the certain bits for properties and transition to another state depending on the result. /* 8 */ MCD::OPC_ExtractField, 21, 1, /* 11 */ MCD::OPC_FilterValue, 0, 110, 7, 0 /* 16 */ MCD::OPC_ExtractField, 24, 1, /* 19 */ MCD::OPC_FilterValue, 0, 139, 1, 0 /* 24 */ MCD::OPC_ExtractField, 4, 1, /* 27 */ MCD::OPC_FilterValue, 0, 123, 0, 0 /* 32 */ MCD::OPC_ExtractField, 22, 2, /* 35 */ MCD::OPC_FilterValue, 0, 25, 0, 0 /* 40 */ MCD::OPC_CheckPredicate, 0, 11, 0, 0 // Check if a predicate is fulfilled (CPU feature X enabled etc.) and transision in a certain state depending on the result. ... Excerpt of the operand decoding switch statement\nswitch (Idx) { default: llvm_unreachable(\u0026#34;Invalid index!\u0026#34;); case 0: // Extract bits of operand tmp = fieldFromInstruction(insn, 12, 4); // Decode a GPR register and check if it worked. if (!Check(S, DecodeGPRRegisterClass(MI, tmp, Address, Decoder))) { return MCDisassembler::Fail; } tmp = fieldFromInstruction(insn, 16, 4); if (!Check(S, DecodeGPRRegisterClass(MI, tmp, Address, Decoder))) { return MCDisassembler::Fail; } tmp = fieldFromInstruction(insn, 0, 4); if (!Check(S, DecodeGPRRegisterClass(MI, tmp, Address, Decoder))) { return MCDisassembler::Fail; } tmp = fieldFromInstruction(insn, 28, 4); if (!Check(S, DecodePredicateOperand(MI, tmp, Address, Decoder))) { return MCDisassembler::Fail; } tmp = fieldFromInstruction(insn, 20, 1); if (!Check(S, DecodeCCOutOperand(MI, tmp, Address, Decoder))) { return MCDisassembler::Fail; } return S; A target\u0026rsquo;s disassembler module in LLVM consists effectively of two parts. The generated logic (those are written to .inc files) and handwritten decoder and printing methods. Decoders for operands like DecodeGPRRegisterClass from above, need to be implemented per target. They cannot be generated currently.\nThe handwritten code is in files like \u0026lt;ARCH\u0026gt;Disassembler.cpp or \u0026lt;ARCH\u0026gt;AsmWriter.cpp in their respective target source directories.\nLLVM to Capstone Capstone simply copies the LLVM disassembler and enriches the output. Because we do not want to build LLVM just to build Capstone (LLVM is a huge dependency), we have to tackle two problems:\nLLVM code is in C++, Capstone in C. The LLVM disassembler has no knowledge about read/write access of operands or instruction groups. Theoretically it could, but it is not implemented. For Capstone we need this information though. C++ and C For Capstone we need the C++ files in C. We also need the generated *.inc files, as well as handwritten disassembler components (\u0026lt;ARCH\u0026gt;Disassembler.cpp and \u0026lt;ARCH\u0026gt;AsmWriter.cpp from above).\nLets see how to get them in C.\nGenerate C code with TableGen We already described the generation procedure of the .inc files above in detail. Though what we have not mentioned is the way the actual code is emitted. The problem with the TableGen backends is, they do not separate the generation of their data from the actual printing of the code.\nFor example, the backend which generates the AsmWriter (the module which prints an asm string of an instruction) mixes it\u0026rsquo;s table generation with emitting code. There is no clear separation between generating abstract objects, like state machines and tables, and printing them into code. It is all intermingled.\nThis goes so far that it is even allowed to specify custom code in the td files for operands or instructions.\nSo, if we want TableGen backends emit C, we either need to redesign and rewrite them from scratch or patch them. Designing it from scratch is a rather complex task. And needs a lot of thought (see this discussion). Simply because of time constraints and because we don\u0026rsquo;t know if it will be merged, we sided with patching.\nOur patched TableGen backends work pretty straight forward. We add two new classes which only emit code. PrinterLLVM and PrinterCapstone. The PrinterLLVM emits the standard C++ code from LLVM. PrinterCapstone emits our C code. Each backend gets one of those printer classes assigned. And whenever it emits code, it calls the corresponding method of the printer. In practice, we simply moved the emitting code from the backend to the printer classes.\nGeneral design problems with TableGen backends The problem is, it is ugly. Although we are now able to emit C, it only works because C and C++ are so similar. An array initialization in C++ is almost the same as in C. So the code structure is basically the same. But there is no way to emit the same information (tables, functions etc.) in a different order or in a fundamentally different language (think of Lisp). This is a simple necessity how the backends were build. Because there is no clear separation between generating logic and printing it as code, the backends in the current form cannot be refactored nicely to emit code in other languages than C++.\nMost of it is also untouched for 10 years and was never modernized. This is understandable, since it never really was necessary. It works for the current use case (generate code for LLVM tools). But it doesn\u0026rsquo;t allow using the generated logic in any other way.\nFor example, the state machine for decoding bytes to instructions is useful logic. Also for non-LLVM projects. It could be written once and used by everyone else. But it is pretty much hard-coded to provide the state machine only in C++.\nThis is unfortunate. LLVM is a huge project, and many tools use the information about architectures it provides. Providing these kinds of often used algorithms in an accessible way, would be a nice addition.\nTranslating C++ to C But back to the problem at hand. While we have now the generated C code, we still have handwritten code in C++. As mentioned before, the operand decoder and printer methods are handwritten in LLVM. Additionally, some edge cases are handled there as well. These files have to be translated from C++ to C.\nDoing this by hand is a tedious task. We need to do it for every architecture module again and again. Because those files are not shared between targets. And if we add a new architecture module from LLVM to Capstone, we would need to translate multiple thousand lines of C++ to C.\nThis of cause is not particular fun and hinders people to do it at all. Hence, we built the Auto-Sync framework to do most of the annoying work.\nThe translation process follows a simple procedure. We have a bunch of patches defined. Each patch replaces certain syntax in an C++ file with its C equivalent.\nTo find the patterns we want to replace we use tree-sitter. It allows us to query for specific syntax in the abstract syntax tree (AST) of the file. And since we translate source code, it is way easier to search in an AST, instead in the file content itself.\nTo control the patching, we have a controller called CppTranslator. It simply:\nOpens each source file Reads and parses the file with tree-sitter into an AST for each Patch: Match the Patch\u0026rsquo;s tree-sitter query in the AST. If it found something, get the equivalent C code from the Patch. Replace the C++ code with the C equivalent. Example:\nMI::addImm(int(10)); Let\u0026rsquo;s say we want to patch int(10) to its C equivalent of (int)(10). The Patch for this has a tree-sitter query for this pattern:\n// The @ names elements in the query. (call_expression // Matches a call expression. (primitive_type) @cast_type // Matches primitive types like int, unsigned etc. (argument_list) @cast_target // Matches anything within the () brackets ) @cast If the CppTranslator finds a substring matching the pattern, it passes it as a capture to the Patch. A capture is just a dictionary with the named sub-strings found. In our example it contains cast: \u0026quot;int(10)\u0026quot;, cast_type: \u0026quot;int\u0026quot; and cast_target: \u0026quot;(10)\u0026quot;.\nNow it is trivial to concatenate sub-strings to (int)(10) and return it. The CppTranslator now replaces int(10) in the source file with (int)(10).\nThe result:\nMI::addImm((int)(10)); This is done with most C++ syntax. Of course, there are exceptions. Some C++ concepts are so complex to replace, we implement special scripts for them (e.g. C++ templates). But the end-result is a source file which has very little C++ syntax left.\nDiffing Note: The diffing step is still unstable and is not yet reproducable. Translating C++ files only gets so good. After all patches were applied to the file, it will likely not compile. Some syntax issues are just too difficult to fix automatically.\nFixing a handful of issues by hand again and again, is a tedious task. Especially, if you need to run the whole translation procedure multiple times. We can hardly ask users to do the fixes by hand again every time they ran the translator.\nThe mechanism to solve this annoyance is diffing. It basically works as you know it from git. The difference is it isn\u0026rsquo;t file focused like git but diffs tree-sitter queries. You can decide what to diff in the configuration, but let\u0026rsquo;s go over it in an example:\nSo let\u0026rsquo;s assume we update a fictional architecture.\nWe have an old \u0026lt;ARCH\u0026gt;Disassembler.c file which might need an update. In there we have a function which decodes an operand:\nvoid decodeOperandA(MCInst *MI, unsigned OpNum, unsigned Val) { if (MCInst_isPredicable(MI)) { MCOperand_CreateImm0(MI, Val + 1); } MCOperand_CreateImm0(MI, Val); } The same function in the original C++ file looks like this:\nvoid decodeOperandA(MCInst \u0026amp;MI, unsigned OpNum, unsigned Val) { if (MI.isPredicable()) { MI.createImm(Val + 1); } MI.createImm(Val); } Now, after we ran the CppTranslator the result is almost valid C. But the translation was not perfect and there is still a method invocation left:\nvoid decodeOperandA(MCInst *MI, unsigned OpNum, unsigned Val) { if (MI.isPredicable()) { // The isPredicable method call was not translated. MCOperand_CreateImm0(MI, Val + 1); } MCOperand_CreateImm0(MI, Val); } Capstone\u0026rsquo;s MCInst struct doesn\u0026rsquo;t have a callback member isPredicable(). So this code would not compile. Instead, we need to replace it with the function call MCInst_isPredicable(MI).\nFor whatever reason no Patch was added, and we now have to fix it by hand. Note though that the old file (see above) already has the correct function implementation. So instead of fixing it again by hand, we diff the previous function to the newly translated code and let the user decide what to do.\nPatch: 15/230 Node: \u0026#34;Some Node\u0026#34; +Color: NEW FILE - (Just translated) -Color: OLD FILE - (Currently in Capstone) ⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼ void decodeOperandA(MCInst *MI, unsigned OpNum, unsigned Val) { -\tif (MCInst_isPredicable(MI)) { +\tif (MI.isPredicable()) { MCOperand_CreateImm0(MI, Val + 1); } MCOperand_CreateImm0(MI, Val); } ═════════════════════════════════════════════════════════════════════════════════════════════════════════ Choice: O, o, n, s (none) , e, p, q, ? \u0026gt; ? O\t- Accept ALL old diffs o\t- Accept old diff n\t- Accept new diff e\t- Edit diff (not yet implemented) s (none) - Select saved choice p\t- Ignore and go to previous diff q\t- Quit (previous selections will be saved) ?\t- Show this help They can accept the version from the old file or accept the version from the new file. The version from the new file would not compile, but it can be fixes by hand later.\nIn most cases though, the old version is the correct one, because it was fixed by someone before.\nThe diffing happens for each translated function which doesn\u0026rsquo;t match the old code. Of course, you can not just diff functions but any nodes in the AST of a file. And for convenience the choices are saved as well. So if the update is run again and nothing changed, the user doesn\u0026rsquo;t have to redo previous decisions. It just automatically applies them.\nThis diffing step saves a lot of time.\nAdding new architectures Adding a new architecture module works pretty much the same as above.\nGenerally though it gives us a standardized way of doing it. And if you know one architecture module in Capstone, you know them all. If LLVM doesn\u0026rsquo;t support your architecture you maybe find a fork which does (this is the case for TriCore or EVM).\nIn fact, we added two niche architectures this way. TriCore was only implemented in a fork and never upstreamed. And the DEC Alpha architecture support was dropped in LLVM 4. We just added the td files again, and here we go, we have support for Alpha in Capstone.\nLast overview To give you a last overview what components were updated and how they interact in Capstone, take a look at this diagram:\nARCH_LLVM_getInstr( ARCH_getInstr(bytes) ┌───┐ bytes) ┌─────────┐ ┌──────────┐ ┌──────────────────────►│ A ├──────────────────► │ ├───────────►│ ├────┐ │ │ R │ │ LLVM │ │ LLVM │ │ Decode │ │ C │ │ │ │ │ │ Instr. │ │ H │ │ │decode(Op0) │ │◄───┘ ┌────────┐ disasm(bytes) ┌──────────┴──┐ │ │ │ Disass- │ ◄──────────┤ Decoder │ │CS Core ├──────────────►│ ARCH Module │ │ │ │ embler ├──────────► │ State │ └────────┘ └─────────────┘ │ M │ │ │ │ Machine │ ▲ │ A │ │ │decode(Op1) │ │ │ │ P │ │ │ ◄──────────┤ │ │ │ P │ │ ├──────────► │ │ │ │ I │ │ │ │ │ │ │ N │ │ │ │ │ └───────────────────────┤ G │◄───────────────────┤ │◄───────────┤ │ └───┘ └─────────┘ └──────────┘ The Capstone Core, Arch Module and Arch_Mapping provide the API to the LLVM disassembler logic. We have not spoken about those because they are irrelevant for the topic of generating disassemblers.\nThe two boxes on the right, are the code copies from LLVM, which do the actual decoding work. The LLVM Disassembler component decodes single operands and handles special cases. This one was translated by the CppTranslator. While the LLVM Decoder State Machine was generated by our patched LLVM backends.\nThe same structure applies to the printing of the asm text. Though we have only scratched this here for the sake of brevity.\nWrap up If one looks at the whole update procedure, it is still a rather complicated. But the result is worth it.\nThe amount of time someone has to spend for updating an architecture module in Capstone went down from \u0026ldquo;no one did it\u0026rdquo; to roughly 6-29 hours. To update the ARM architecture module to LLVM 16 for example, the times were:\nRebasing patched backends to new LLVM release = ~1-3h Running the update scripts and diffing = 5min - 1h Fixing rest of build errors by hand = ~30min - 5h Handle new operands on the CS side (filling the detail info and tests) - 3-10h Bug fixing - 2h-10h (Please be aware though, that the time estimates from above don\u0026rsquo;t include the \u0026ldquo;read into\u0026rdquo; time someone has to spend).\nThe lower estimates are for small changes on the LLVM side, the upper for many changes (spanning multiple LLVM releases).\nHere is process described above in action:\nFuture plans We have a list of features and architectures which will come. Most architectures are already updated, but ARC, BPF and SPARC are still on the list.\nWhile working on the updater we found many shortcomings or flaws in the target definitions in LLVM. Those will be upstreamed to LLVM eventually.\nIn the very long run we would like to participate with the LLVM folks in redesign the TableGen backends. It would be nice to have the problems solved, which we mentioned above.\nAnd of course, if you want to update now an already present Capstone module or add support for a new one, feel free to drop a message in issue #2015. We are happy to hear about it and will guide you through the process.\nReferences Auto-Sync progress issue Auto-Sync documentation TableGen documentation Capstone\u0026rsquo;s LLVM fork ","permalink":"https://rizin.re/posts/auto-sync/","summary":"Auto-Sync","title":"Auto-Sync - Generating disassembler plugins"},{"content":"Hi! I\u0026rsquo;m Billow, and I had the privilege of participating in GSoC 2023, working on improving DWARF support for the Rizin project. In this blog post, I\u0026rsquo;m excited to share my journey, the challenges I faced, and my future plans for this project. Let\u0026rsquo;s dive right in!\nOver the past few months, my primary focus has been on enhancing the Debugging With Arbitrary Record Formats (DWARF) support within Rizin. DWARF is a crucial standard for debugging information in binary files. My work brings significant improvements, including the introduction of exprloc, compressed debug sections and composite variable storage.\nTo showcase some of my achievements, I\u0026rsquo;m comparing the disassembly output obtained using the pdf command for the write_fmt\u0026lt;Write\u0026gt; function in the ELF file dwarf_rust_bubble ↗ before and after my DWARF contributions were integrated. The enhanced output demonstrates Rizin\u0026rsquo;s improved ability to parse DWARF debugging information and precisely locate variables.\n[0x00005180]\u0026gt; pdf @ dbg.write_fmt_Write ... ┌ dbg.write_fmt\u0026lt;Write\u0026gt;(); │ ; var int64_t var_28h @ stack - 0x28 │ ; var int64_t var_18h @ stack - 0x18 │ 0x00010270 push rbx ; impls.rs:155 ; struct Result\u0026lt;(), std::io::error::Error\u0026gt; write_fmt\u0026lt;Write\u0026gt;(struct Box\u0026lt;Write\u0026gt; *self, struct Arguments fmt); ... [0x00005180]\u0026gt; pdf @ dbg.write_fmt_Write ... ┌ struct Result\u0026lt;(), std::io::error::Error\u0026gt; write_fmt\u0026lt;Write\u0026gt;(struct Box\u0026lt;Write\u0026gt; *self, struct Arguments fmt) ... │ ; arg struct Box\u0026lt;Write\u0026gt; *self @ rsi │ ; arg struct Arguments fmt @ ... │ 0x00010270 push rbx ; impls.rs:155 ; struct Result\u0026lt;(), std::io::error::Error\u0026gt; write_fmt\u0026lt;Write\u0026gt;(struct Box\u0026lt;Write\u0026gt; *self, struct Arguments fmt) ... Another example, the iterPreorder function in the ELF file dwarf_go_tree ↗ shows arguments and local variables precisely located on the stack. Most notably, the tree argument is represented as a composite variable spread across multiple stack locations. This new composite storage capability handles complex DWARF types. Overall, the improved output matches the original DWARF debugging data much more closely, rather than using generic unknown types. This showcases Rizin’s significantly upgraded DWARF parsing including features like composite variables and the immense value delivered to reverse engineers through my work.\n[0x0045d5a0]\u0026gt; pdf @ dbg.main.tree.iterPreorder ... ┌ dbg.main.tree.iterPreorder(unknown_t visit, unknown_t t); ... │ ; arg unknown_t t @ stack + 0x10 │ ; arg unknown_t visit @ stack + 0x20 │ ; var unknown_t traverse @ stack + 0x40 │ ┌─\u0026gt; 0x00491ce0 mov rcx, qword fs:[0xfffffffffffffff8] ; tree.go:26 ; void main.tree.iterPreorder(struct main.tree t, func(int) visit); │ ╎ 0x00491ce9 cmp rsp, qword [rcx + 0x10] [0x0045d5a0]\u0026gt; pdf @ dbg.main.tree.iterPreorder ... ┌ void main.tree.iterPreorder(main.tree t, func(int) visit) ... │ ; var func(int) traverse @ stack - 0x40 │ ; arg main.tree t @ composite: [(.0, 64): stack + 0x8, (.0, 64): stack + 0x10, (.0, 64): stack + 0x18] │ ; arg func(int) visit @ stack + 0x20 │ ┌─\u0026gt; 0x00491ce0 mov rcx, qword fs:[0xfffffffffffffff8] ; tree.go:26 ; void main.tree.iterPreorder(main.tree t, func(int) visit) While I\u0026rsquo;m proud of these accomplishments, the project did not come without its challenges. Working on a project of this magnitude required grappling with the complexity of the DWARF5 standard and ensuring compatibility across architectures and binary formats through rigorous testing. Additionally, collaborating remotely with the Rizin community was essential but posed communication and coordination challenges.\nHowever, through dedication and guidance from my mentors, I was able to overcome these hurdles. Moreover, my journey with Rizin is far from over. Looking ahead, there are several exciting plans on the horizon:\nContinued DWARF5 Improvements: I will keep an eye on DWARF developments and ensure Rizin remains up-to-date with future revisions.\nPerformance Optimization: There is always room for performance optimization. I will explore ways to make Rizin even more efficient when dealing with DWARF information. The main idea is that we can make DWARF load only when needed, instead of loading all DWARF directly.\nUnifying Debug Information 1: As the reverse engineering landscape continues to evolve, the need for unified support of various debuginfo formats like DWARF, PDB, and others becomes increasingly evident. In the spirit of unification, I am excited to take on the challenge of integrating and harmonizing these diverse debuginfo standards within Rizin. This ambitious endeavor aims to provide a seamless experience for developers and analysts working with different binary formats, making Rizin an even more versatile and indispensable tool in the field of reverse engineering. Stay tuned for updates on this exciting journey towards unified debuginfo support!\nDWARF Call Frame Information: To utilize Call Frame Information (CFI) and Canonical Frame Address (CFA) data to accurately locate variables and function arguments on the stack. This will build on my previous work enhancing DWARF parsing as described in issues 2 and 3. Additional background on implementing stack unwinding with CFI and CFA can be found in this blog post 4. ↗ The goal is to leverage the debugging information already present in DWARF to reconstruct calling conventions and provide users more precise variable information during disassembly and analysis.\nIn conclusion, participating in GSoC 2023 has been an invaluable learning experience. I\u0026rsquo;ve expanded my skills, contributed to the Rizin project, and become part of a vibrant open-source community. There is still work to be done, but I\u0026rsquo;m excited about the future and making reverse engineering more efficient for everyone through enhancements like unified debuginfo support. Thank you to the Rizin community and my mentors for making this journey possible!\nUnify code of source information access for DWARF, PDB, dSYM\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nLoad function types and arguments from DWARF when CFA and CFI information is used\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nARMv7 failure to load register arguments when subroutine uses CFA\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nStack unwinding\u0026#160;\u0026#x21a9;\u0026#xfe0e;\n","permalink":"https://rizin.re/posts/gsoc-2023-dwarf/","summary":"In this article, I discuss my experience enhancing DWARF support in Rizin for Google Summer of Code 2023.","title":"GSoC 2023 - Enhancing DWARF Support"},{"content":"Like all previous years, we are grateful to Google for being able to participate in Google Summer of Code 2023. We received many applications, and we are happy that the project has substantial interest. We thank every participant and wish them luck in their future endeavors. We also thank Google for providing us with the platform for attracting new contributors. Many of the past participants stayed with the project after previous GSoC and RSoC programs, and we sincerely hope it will continue.\nThis summer, the accepted projects aim to improve the quality of handling debugging information in Rizin and uplifting more architectures to our next-generation intermediate language - RzIL.\nbillow: Debug information handling improvements Hello. I’m billow. A silent open-source enthusiast. I will be working on improving the handling of debugging information in Rizin.\nI started contributing to Rizin in November 2021. It took some time to get to know the code base, but the maintainers were helpful and supportive. Without their help, I wouldn\u0026rsquo;t have accomplished this. Since November 2021, I have worked on various issues and features, like uplifting 8051 architecture to RzIL, refactoring graph-related code, porting the commands parser to the tree-sitter-based one, and fixing windows and cpp compatibility, EBCDIC character support. I am also working on uplifting TriCore architecture to RzIL (#3478).\nThe plan, based on the project description and tracking issue #1285, involves several key objectives. Firstly, we will enable support for loading DWARF information from separate files 1 and debuginfod 2. Next, we will unify access to source lines/types information for DWARF, PDB, dSYM, and refactor or fix any parsing code as needed. This will be followed by the integration of source line and types/variables information with the \u0026ldquo;p\u0026rdquo; commands in debug mode for seamless printing. Furthermore, we will integrate this information with breakpoint commands and APIs to enhance overall functionality. Finally, we will implement parsing performance improvements to optimize the project\u0026rsquo;s efficiency.\nbrightprogrammer: Uplifting MIPS to RzIL Hi! I’m Siddharth Mishra (a.k.a brightprogrammer). I\u0026rsquo;m a student of Mathematics \u0026amp; Computing at the Birla Institute of Technology, Mesra. I like developing software in C/C++, and I love Reverse Engineering \u0026amp; Malware Analysis. My summer project this year is to convert the MIPS assembly code to RzIL (Rizin\u0026rsquo;s Intermediate Language). This conversion will help improve Rizin\u0026rsquo;s analysis module, which can help enhance reverse engineering efforts.\nI started contributing around two months before the application submission deadline. I helped convert the old rizin shell to a new tree-sitter-based rizin shell for a command group named cmd_help (PR#3421 and PR#3452). I\u0026rsquo;ve never had a formal course in Compilers and Formal Languages/Grammars, and this was the first time I saw how grammars are written! I was amazed by this. I also got a chance to write some tests for some changes I did. This was also a first-time experience. During this work period, I started to have a good bonding with my mentors and other awesome contributors/maintainers.\nThe uplifting process is divided into two parts :\nThe first part is to convert MIPS to RzIL. The second part uses uplifted code to migrate analysis from ESIL (old intermediate language) to RzIL. By improving analysis, I mean better function detection, type detection, structures\u0026rsquo; detection, better control flow graphs etc.\nOther small tasks are involved, too, like adding support in Rizin to help visualize RzIL and to update/implement cutter widgets to use the new RzIL code. Each of these steps needs to be heavily tested by updating and using rz-tracetest, which is used to generate instruction traces using BAP\u0026rsquo;s QEMU and then compare the trace with RzIL\u0026rsquo;s execution. If they match, and if we strongly assume that Qemu is emulating the code correctly, then RzIL is also working correctly!\nI\u0026rsquo;m starting work early to complete the project on time. I want to work hard this summer and learn a lot. Besides, working on this project will improve my knowledge of IL which I need for a personal project that I intend to work on, which will act as a personal motivation for me.\nDWARF is the debuginfo standard used on most operating systems today. Separate debuginfo files are files that contain debugging information extracted from executable binaries and shared libraries. These files help developers debug and diagnose issues in their software without bloating the primary executable or library with debug symbols. Debuginfo files typically have a .debug extension and are generated using tools like \u0026lsquo;objcopy\u0026rsquo; or \u0026rsquo;eu-strip\u0026rsquo;. They enable a more apparent separation between production code and debugging data, resulting in smaller binaries and improved performance in production environments, while still providing necessary debugging information when needed.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nDebuginfod is a service that provides a convenient way to access debugging information for software components, such as executables, shared libraries, and separate debuginfo files. It is part of the ELFutils project and is designed to simplify the process of debugging and tracing software by automatically locating and retrieving the required debugging data over HTTP. Debuginfod works with various debugger tools like GDB, LLDB, and SystemTap, allowing developers to focus on debugging their code without worrying about managing debuginfo files.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\n","permalink":"https://rizin.re/posts/gsoc-2023-announcement/","summary":"An announcement of the Google Summer of Code 2023. Two accepted candidates.","title":"Google Summer of Code 2023 Announcement"},{"content":"Hi! I\u0026rsquo;m DMaroo, a GSoC 2022 mentee, working on IL migration for x86 ISA in Rizin. For the past few months, I have worked on implementing x86 instructions (from 8086, 80186, 80286 and 80386 instruction sets) in Rizin\u0026rsquo;s intermediate language, RzIL.\nThe following article covers all the work, design decisions, challenges and future plans of the work that I\u0026rsquo;ve been doing. The RzIL can be accessed using aez commands (RzIL emulation).\nRelevant code Implementation of RzIL lifting for x86 instructions:\nPull request: rizinorg/rizin#2747 Commit: ce80a13 Files: x86_il.c, x86_il.h Adding tracing for x86 emulation in BAP\u0026rsquo;s QEMU: BinaryAnalysisPlatform/qemu#21\nAbstract RzIL is Rizin\u0026rsquo;s intermediate language. It is designed for improved analysis of binaries by serving as an intermediate language to run the analysis loop on. This removes the need to write architecture-specific analysis code. Having an intermediate language also allows for many other features like taint analysis, symbolic execution and de-obfuscation.\nMy goal for the project was to \u0026ldquo;lift\u0026rdquo; the x86 architecture to RzIL. Lifting here means implementing the x86 instructions from the x86 ISA using opcodes present in the IL. The IL is largely based on BAP\u0026rsquo;s Core Theory.\nThroughout my GSoC period, I have lifted x86 instructions for majority of the instructions belonging in the 8086, 80186, 80286 and 80386 instruction sets. These include almost all the commonly used instructions one can find in modern x86 binaries, and hence would be enough to do fairly satisfactory analysis. I outline my work in detail below.\nWork Getting instructions from Capstone Rizin uses Capstone for the disassembly of some of the instruction sets, including x86. So I had to start off by figuring out all the information that Capstone gives about an instruction. Capstone returns a cs_x86 struct which contains the instruction bytes, operands, prefixes and other data. I wrapped it in a X86ILIns struct for convenience. Once I had the wrappers ready, I had to write general helper functions for generating the opcodes for variety of tasks like for getting and setting the operands, registers, memory, flags, and arithmetic overflow and so on.\nSetting up the helper functions I have conceptually outlined the rough thought process for the IL lifting in one of the posts on my blog about SuperH IL lifting. Following a similar design process, I set up the functions to get and set the various entities involved. I also set up th functions for overflow/underflow and carry/borrow. More about the challenges faced during setting up these functions in the challenges section. Once these were setup, the lifting was just reading the implementations off of the ISA manual and translating them to the IL.\nImplementing the instructions This is probably the main part of the whole project. Now, I was supposed to go through the instruction manual\u0026rsquo;s implementation for all the instructions and convert them to the IL. I could not however just blindly copy them, since the x86 instruction set frequently contains instructions with very weird special cases. Also, the x86 architecture is not simple, and contains multiple modes of operation, and also has segmentation and paging support. All of this makes it non-trivial to implement the instructions. Many of the instructions are not practically possible to implement given the current scope of the IL and the disassembler.\nHowever, I did implement majority of the instructions in the 8086, 80186, 80286 and 80486 instruction set. That constitutes the meat of all the instructions used in x86 binaries. There are much more very specific and exotic instructions, but their relevance is diminishingly low. A sample implementation looks as follows:\n/** * ADD dest, src * (ADD family of instructions) * Add * dest = dest + src * Possible encodings: * - I * - MI * - MR * - RM */ IL_LIFTER(add) { RzILOpEffect *op1 = SETL(\u0026#34;op1\u0026#34;, x86_il_get_op(0)); RzILOpEffect *op2 = SETL(\u0026#34;op2\u0026#34;, x86_il_get_op(1)); RzILOpEffect *sum = SETL(\u0026#34;sum\u0026#34;, ADD(VARL(\u0026#34;op1\u0026#34;), VARL(\u0026#34;op2\u0026#34;))); RzILOpEffect *set_dest = x86_il_set_op(0, VARL(\u0026#34;sum\u0026#34;)); RzILOpEffect *set_res_flags = x86_il_set_result_flags(VARL(\u0026#34;sum\u0026#34;)); RzILOpEffect *set_arith_flags = x86_il_set_arithmetic_flags(VARL(\u0026#34;sum\u0026#34;), VARL(\u0026#34;op1\u0026#34;), VARL(\u0026#34;op2\u0026#34;), true); return SEQ6(op1, op2, sum, set_dest, set_res_flags, set_arith_flags); } And the generated opcode looks something like this:\nadd byte [eax], al (seq (set op1 (loadw 0 8 (+ (var eax) (bv 32 0x0)))) (set op2 (cast 8 false (var eax))) (set sum (+ (var op1) (var op2))) (storew 0 (+ (var eax) (bv 32 0x0)) (var sum)) (set _result (var sum)) (set _popcnt (bv 8 0x0)) (set _val (cast 8 false (var _result))) (repeat (is_zero (var _val)) (seq (set _popcnt (+ (var _popcnt) (ite (lsb (var _val)) (bv 8 0x1) (bv 8 0x0)))) (set _val (\u0026gt;\u0026gt; (var _val) (bv 8 0x1) false)))) (set pf (is_zero (smod (var _popcnt) (bv 8 0x2)))) (set zf (is_zero (var _result))) (set sf (msb (var _result))) (set _result (var sum)) (set _x (var op1)) (set _y (var op2)) (set cf (|| (|| (\u0026amp;\u0026amp; (msb (var _x)) (msb (var _y))) (\u0026amp;\u0026amp; (! (msb (var _result))) (msb (var _y)))) (\u0026amp;\u0026amp; (msb (var _x)) (! (msb (var _result)))))) (set of (|| (\u0026amp;\u0026amp; (\u0026amp;\u0026amp; (! (msb (var _result))) (msb (var _x))) (msb (var _y))) (\u0026amp;\u0026amp; (\u0026amp;\u0026amp; (msb (var _result)) (! (msb (var _x)))) (! (msb (var _y)))))) (set af (|| (|| (\u0026amp;\u0026amp; (msb (cast 4 false (var _x))) (msb (cast 4 false (var _y)))) (\u0026amp;\u0026amp; (! (msb (cast 4 false (var _result)))) (msb (cast 4 false (var _y))))) (\u0026amp;\u0026amp; (msb (cast 4 false (var _x))) (! (msb (cast 4 false (var _result)))))))) As you can see, the IL is quite large even for a simple instruction like ADD. This is because of all the extra work which needs to be done other than just adding the operands. The operand needs to be loaded from the proper memory location (which requires using the correct segment base register, correct scale and correct offset). Once the addition is done, the flags need to be set. Setting the flags is not a simple operation since it requires finding the parity bit (which requires XORing the bits in a loop). Once all the flag bits are set, the result is written back to the memory.\nAs of this post, 100+ such instructions have been lifted to the IL. There are more instructions to be lifted as well, but as I stated above, these are enough for a start and testing.\nOnce I was done with implementing the instructions, I added IL tests for the implemented instructions for the tests in tests/db/asm/x86_*. This part ensures that the generated IL code for the instructions is type-safe and doesn\u0026rsquo;t have any memory issues.\nEnabling tracing in QEMU Now to verify the semantics of the IL instructions, we use traces generated by QEMU and compare them with the effects of teh IL when executed by the RzIL VM. We use a fork of QEMU, so that we can add the tracing code and modify QEMU source if needed. To add the tracing code, I had to familiarize myself with QEMU\u0026rsquo;s code, and specifically the TCG (Tiny Code Generator). Currently, adding the tracing has been almost done. However, there are just some very minor issues which need to be resolved.\nChallenges This is more of an interesting-stuff kinda section ;)\nRegisters Choosing the correct registers without a chain of if-else statements or switch-case statements was one of the initial issues I faced. Not only do I have to resolve to the correct register, I should also be able to store and load from the same global IL variable when I reference different sections of a register (i.e. AL, AX, EAX and RAX overlap over the same global IL variable). Along with this, I should also have the ability to choose the largest register of a certain bitness (for example, RAX for 64-bit, EAX for 32-bit and AX for 16-bit) without using switch-case statements. I decided to do this using a statically defined array of registers and their get and set functions (gpr_lookup_table, array of struct gpr_lookup_helper_t). This lead to no sort of if-else or switch-case constructs, and I could reuse the same functions for getting and setting the same sections o a register.\nOperands Also, there had to be a general interface for accessing the operands, irrespective of whether they were a register, or a memory location, or an immediate value. Writing the helper functions is easy, but providing ease of usage and removing duplication code requires thought. I have also slightly touched upon this in the SuperH IL lifting post on my blog.\nIL implementation The descriptions in the x86 manual for some instructions get pretty complex and intense. Making sure that they are implemented correctly is not easy. This is exactly why we have tracetesting to verify the semantics, but manual verification never hurts. Also, the IL gets pretty large for many instructions, and to avoid this, it is important to choose a semantically correct, yet concise implementation. This involves removing unnecessary casting, using loops, removing duplicating and so on. In fact, removing duplication by using variables brings down the IL code size by a lot.\nQEMU QEMU\u0026rsquo;s codebase is a great for learning purposes. It consists of complex C constructs (tons of non-trivial preprocessor macros), but at the same time it is quite well-written, and also reasonably well documented. However, debugging the codebase is not that easy, since the code generates a buffer (code_gen_buffer), which then executes more code.\nFuture Work The work I have done in my duration is just the core part of the lifting. The lifting is far from being stable and ready-to-use. It is more of an pre-alpha release than a finished feature. The next steps will be as follows.\nTracetest: Thoroughly tracetest the implementations to make sure the semantics are right. API: Add Rizin commands and API, which would be wrappers around the IL, so as to visualize and access the IL. A similar graphical widget should also be added to Cutter. Documentation: Document the IL and its usage in the Rizin book. Analysis: Integrate the IL in the analysis loop to lead to a better analysis. Instructions: Add lifting for more x86 instructions. The last two options haven\u0026rsquo;t been thoroughly thought out by me yet, but they do seem to be reasonable future goals. Alongside, there are also plans on lifting other architectures to RzIL and extending RzIL by adding floating point and transcendental operations support.\nClosure I would say that I had a very educative experience throughout the period. I learnt quite a lot about the x86 architecture and the instruction set in the past few months. I was not able to meet all my goals for my GSoC project, but I think the work done until now is a good enough checkpoint and I plan to continue working on this.\nI would like to thank my mentors, Anton Kochkov, Deroad and Florian Märkl, for all the guidance they have provided me throughout my project. I look forward to keep working with them :)\n","permalink":"https://rizin.re/posts/gsoc-2022-x86-il/","summary":"A summary of all the work done on RzIL lifting for x86 ISA, and how you can use it.","title":"GSoC 2022 - x86 ISA lifting for RzIL"},{"content":"Hello. I\u0026rsquo;m wingdeans, a participant of GSoC 2022 with Rizin. For the past few months, I\u0026rsquo;ve been working on creating rz-bindgen - a framework for making Rizin scriptable from other languages.\nThis document covers some of the design decisions and internals of the tool. To get started with the bindings, see the usage documentation.\nRationale Rz-pipe, the currently recommended way to script Rizin, only works with commands exposed to the Rizin shell. Although it can do everything the Rizin shell can, it cannot match the full Rizin C API in performance, feature-completeness, or type guarantees. The C API on the other hand, is more difficult to work with, especially for one-off scripts. Rz-bindgen seeks to be a middle-ground, making the C API accessible from other programming languages. Python is the primary target for rz-bindgen, as it is usable for both scripts and plugins, and has been incorporated successfully in other reverse-engineering tools.\nDesign Many languages already have tools for creating bindings to C/C++, such as rust-bindgen for Rust or CLIF for Python. However, these tools often rely on mapping C++ constructs to their own, and require extra work to create idiomatic bindings for plain C code. Like many of these tools, rz-bindgen parses C headers and generates bindings as output. However, rz-bindgen targets one project and multiple languages, rather than one language and multiple projects. This allows rz-bindgen to make use of Rizin-specific annotations, such as the RZ_NULLABLE and RZ_DEPRECATE C macros.\nSee this post on the Rizin blog for more details on the thought process behind my proposal and my implementation ideas from before I started the task.\nImplementation I considered my primary options for parsing the C headers to be tree-sitter and libclang. Even though I wrote about tree-sitter in the Rizin GSoC announcement blogpost, the integrated preprocessor and semantic analysis led me to choose libclang\u0026rsquo;s Python bindings.\nC Structs and Functions Once a header is parsed, C data structures are grouped with functions that operate on them. In this snippet from rz-bindgen, the RzAnalysis struct from the rz_analysis.h header is grouped with functions that have the rz_analysis_ prefix. In the generated Python bindings, these groupings are mapped to object-oriented classes, with the RzAnalysis class containing the grouped functions as its methods. The RzAnalysis class also makes all the fields of the C struct accessible except for leaddrs (which is ignored as per the ignore_fields argument) and type_links (which is renamed as per the rename_fields argument).\nrz_analysis = Class( analysis_h, typedef=\u0026#34;RzAnalysis\u0026#34;, ignore_fields={\u0026#34;leaddrs\u0026#34;}, rename_fields={\u0026#34;type_links\u0026#34;: \u0026#34;_type_links\u0026#34;}, ) rz_analysis.add_method(\u0026#34;rz_analysis_reflines_get\u0026#34;, rename=\u0026#34;get_reflines\u0026#34;) rz_analysis.add_prefixed_methods(\u0026#34;rz_analysis_\u0026#34;) rz_analysis.add_prefixed_funcs(\u0026#34;rz_analysis_\u0026#34;) Generation Rz-bindgen is designed to support multiple backends to generate bindings for a variety of languages. A backend takes the Class objects created in the transformation step and generates output. There are, at the time of writing, a SWIG backend and a Sphinx backend.\nThe SWIG backend is currently only used for Python bindings, but SWIG targets other languages too, such as Java and OCaml. Supporting them in rz-bindgen should be relatively simple. The Sphinx backend generates documentation for the Python bindings and can be viewed here.\nGenerics One of the main challenges in translating the C headers was the existence of generic container types. Rizin uses types like RzList and RzVector to represent a linked-list and dynamic array respectively and, being written in C, uses void* for the type of the data contained within. This means that trying to use these types from Python would be difficult, as their elements lack the type information to generate methods. Fortunately, Rizin developers were already annotating the types of these functions for developer ergonomics using comments such as RzList /*\u0026lt;RzAnalysisBlock *\u0026gt;*/ *bbs.\nThis allows bindings to use container types in a type-safe manner. In this Python example from rz-bindgen, a specialized RzList_RzBinSymbol is created, and RzBinSymbols are appended to it. Appending any other type will result in an error.\nsyms = rizin.RzList_RzBinSymbol() for sym in self.loader.main_object.symbols: binsym = rizin.RzBinSymbol() binsym.thisown = False binsym.name = sym.name binsym.type = rizin.RZ_BIN_TYPE_FUNC_STR binsym.paddr = sym.linked_addr binsym.vaddr = sym.rebased_addr binsym.size = sym.size syms.append(binsym) Additional Features The snippet above is from an example of implementing an RzBinPlugin in Python. See the bin_plugin documentation for more details.\nThe Python bindings also make it easier to access Rizin internals when writing scripts, as can be seen in the rz_cmd example (see the cmd documentation for more details). One key feature is the ability to register a Rizin command backed by a Python function, like so:\ndef print_function_info(fn: rizin.RzAnalysisFunction): print(\u0026#34;name:\u0026#34;, fn.name) print(\u0026#34;number of xrefs from:\u0026#34;, len(fn.get_xrefs_from())) print(\u0026#34;number of xrefs to:\u0026#34;, len(fn.get_xrefs_to())) return True core.register_group(\u0026#34;u\u0026#34;, \u0026#34;A custom group for user-defined commands\u0026#34;) core.register_command(\u0026#34;uf\u0026#34;, print_function_info) The Rizin plugin registers Python as an RzLang, allowing users to load Python scripts on the fly. It also adds a core variable to the rizin Python module, allowing scripts that import it to access Rizin\u0026rsquo;s own RzCore.\nReflections The coverage of the bindings is currently lacking - it is not yet possible to use every bit of the C API. I hope this will change as I get more eyes on the project. I also hope to improve the Rizin plugin and finalize the Cutter plugin.\nIn the long term, I hope to add bindings for extensions such as rz-ghidra which expose their functions. This could allow access to Ghidra\u0026rsquo;s P-Code and decompiler once implemented.\nI would like to thank my GSoC mentors XVilka and megabeets, as well as Rizin core contributors ret2libc and deroad.\nIf you need help with rz-bindgen or wish to build a project using the generated bindings, feel free to reach me on the Rizin mattermost @wingdeans (we have an IRC bridge too).\n","permalink":"https://rizin.re/posts/gsoc-2022-rz-bindgen/","summary":"An overview of rz-bindgen\u0026rsquo;s design, implementation, features, and future.","title":"GSoC 2022 - rz-bindgen"},{"content":"Linux If your distribution ships Rizin from the official repositories, use that. We are currently aware of the following Linux distributions shipping an up-to-date Rizin:\nArch Linux Fedora Gentoo If your distribution is not in the list above, but it does ship Rizin/Cutter, please let us know and we will fix it! If you cannot find Rizin/Cutter in the official repositories, we provide install instructions for some other distributions through OBS. Follow the instructions here (select the \u0026ldquo;Add repository and install manually\u0026rdquo; option).\nWindows You can install Rizin through the installer for your architecture provided in the latest release (e.g. rizin_installer-vX.Y.Z-x86_64.exe).\nOtherwise, you can download the portable builds that can be run without any installation on your system, by just extracting the archives in the path you want and executing Rizin from there (e.g. rizin-windows-share64-vX.Y.Z.zip).\nYou find Cutter for Windows in the latest Cutter release. The archive can be extracted anywhere on your system and Cutter can be executed from there.\nMacOS You can install both Rizin and Cutter through Homebrew\n$ brew install rizin $ brew install --cask cutter Alternatively, you can find Pkg/DMG files for both Rizin and Cutter.\nOpenBSD Rizin and Cutter are available in stable releases starting with OpenBSD-7.3.\n# pkg_add rizin cutter Android Statically compiled binaries for some common architectures where Android runs are compiled and attached to all releases. We currently support aarch64, armv7, and x86_64. You can find the artifacts for Android on the latest Rizin release.\nThose files are named as rizin-\u0026lt;version\u0026gt;-android-\u0026lt;architecture\u0026gt;.tar.gz. Files within the archive can be extracted anywhere on your Android device because Rizin is compiled in a \u0026ldquo;portable\u0026rdquo; way, allowing moving the whole directory anywhere.\nRizin also have a package on Termux, and can be installed using Termux package manager i.e, pkg:\npkg install rizin Building from source Source code for Rizin and Cutter can be downloaded from Github:\nRizin repository Cutter repository Build instructions can be found in the README.md files.\nInstall Rizin plugins To install Rizin plugins you can use our package manager, rz-pm, that will compile and install packages for you in the right locations.\nGet the latest version for your system here, make the file executable and you are good to go!\nThe list of currently supported plugins is available in the rz-pm-db repository.\n","permalink":"https://rizin.re/install/","summary":"How to install Rizin and Cutter","title":"Install"},{"content":"Google Summer of Code 2022 is here and we are excited to participate! 🎉. This is the 2nd year we participate in GSoC as a Rizin organization.\nWe received many applications, and we are happy that there is a substantial interest in the project. We thank every participant and wish them luck in their future endeavors. We also thank Google for providing us the platform for attracting new contributors. Many of the past participants stayed with the project after previous GSoC and RSoC programs, and we sincerely hope it will continue in the future.\nThis summer, the accepted projects aim to improve the quality of analysis of Rizin by employing our next generation intermediate language - RzIL, along with making the scripting and automation easier with Rizin and Cutter by improving the API, especially the Python one.\nDhruv: RzIL uplifting migration Hey, I\u0026rsquo;m Dhruv Maroo (DMaroo)! I am a computer science and engineering undergraduate student from Indian Institute of Technology, Madras. I am excited to work on RzIL migration of x86 architecture as a contributor under GSoC 2022.\nI have been interested in computer security ever since I started learning computer science and engineering. Out of all the various subdomains within security, binary exploitation and reverse engineering appealed to me the most. I was also fascinated with systems engineering and low-level programming. Recently, I had also been exposed to the relevance of intermediate languages in symbolic execution. Given these interests, working on RzIL under Rizin was a golden opportunity for me.\nI started contributing to Rizin in August 2021. It took a fair bit of time to get to know the code base, but the maintainers were very helpful and supportive. Without their help, I wouldn\u0026rsquo;t have accomplished this. Since August 2021, I have worked on a variety of issues and features, like project compression, seeking and autocompletion for global variables, breakpoint serialization, type pretty printing API, porting db (debug breakpoint), c (compare) and shell commands to newshell, fixing Coverity scan issues and memory leaks, RzIL refactoring, DWARF attribute type checking, and lots of other miscellaneous issues. I am also working on migrating the SuperH ISA to RzIL (#2518).\nThe RzIL uplifting project is a 350-hour project, spanning 5 months, starting from the second week of June. During the project, I plan to implement the 8086, 80186, 80286, 80386 and 80486 instructions from the x86 instruction set. Along with these, I will also be implementing Pentium and MMX instructions. This will be followed by testing the migration using rz-tracetest and getting the traces. I will also be adding API commands in Rizin to visualize and interface with the IL tree. These commands will be augmented with corresponding widgets in Cutter. Then, I am planning to document all of this in the Rizin book, to allow others to easily use it and contribute as well. One of the optional (but very interesting) goals is to use the power of RzIL in the binary analysis loop, to provide better analysis and new features. Other optional goals involve migrating SSE and AVX instructions to RzIL and migrating other architectures to RzIL as well.\nI am looking forward to start working on the project and improve Rizin. I would again like to thank the maintainers to help me through my contributions, and I would like to thank GSoC for giving me such a great learning opportunity.\nwingdeans: Automated Python Bindings Hello. I\u0026rsquo;m wingdeans, a second year Computer Science major at the University of Florida.\nI\u0026rsquo;ll be working on exposing the Rizin native API to Python, as an alternative to the current rz-pipe command-based API. This 175-hour project will involve semi-automatically generating Python bindings from the Rizin C headers, as well as abstractions and documentation to increase developer ergonomics.\nRationale Python has seen widespread adoption among the infosec and reversing communities, with several reverse engineering platforms integrating Python as a scripting and plugin language. Although rz-pipe exposes all the Rizin commands to a number of languages, including Python, the integration is not as robust as Rizin\u0026rsquo;s native C API.\nThe completion of this task will make scripting with Rizin more pleasant. It will also help Cutter, which supports Python GUI plugins.\nConsiderations The bindings should feel Pythonic. At the bare minimum, this will involve creating classes for the various structs in Rizin, and creating methods out of any C functions that manipulate those structs. Python features like keywordargs should also be used when appropriate.\nThe bindings should be somewhat automated. This will ensure that the bindings do not become out of sync with the Rizin core API. In addition, many Rizin functions already contain annotations about nullability (RZ_NULLABLE/RZ_NONNULL) and ownership (RZ_OWN/RZ_BORROW). Parsing this information will be useful in managing memory.\nRough Plans The generator will first parse the Rizin headers, perhaps using tree-sitter, and extract function information. It will then create classes and methods from a user-specified list of functions and emit source code to be integrated with additional manually-written modules.\nAbout Me I\u0026rsquo;ve been doing CTFs for about a year now, with a focus on cryptography and reversing. For reversing challenges, I primarily use Cutter, which was why I ended up applying here for this year\u0026rsquo;s GSoC. I’m interested in enhancing Rizin’s scripting capabilities for use in custom reversing tools and plugins.\nFor my microtask, I\u0026rsquo;m implementing support for dotnet binaries. Although the overall file format is the same as in PE files, there are additional headers, streams, and tables that need to be parsed. I also ended up writing a disassembler plugin, and an analysis plugin is in the works.\nI look forward to working with the Rizin team to implement a better scripting API, and I hope these additions will encourage users to extend Rizin and Cutter for their own reversing needs.\n","permalink":"https://rizin.re/posts/gsoc-2022-announcement/","summary":"An announcement of the Google Summer of Code 2022. Two accepted candidates.","title":"Google Summer of Code 2022 Announcement"},{"content":"Rizin is an interactive command line tool and as such it provides a nice shell, where you can execute rizin-specific commands to perform all kinds of actions like analyzing functions, getting information from a binary, showing sections and symbols, and much more.\nIt has a lot of commands that you must know in order to use it properly and for this reason we believe its shell should be as powerful and as discoverable as possible. In this post we are going to talk a bit about its basics and what we have done in Rizin to improve even further, for both users and developers (skip the Background section if you just care about the latter!).\nBackground Since the creation of radare2, the framework Rizin originally emerged from, more and more commands were added to the list of supported actions, help messages were written to help users navigate those commands and various constructs were developed to make the language recognized by the shell more and more powerful (and complicated, at times!). It is possible to execute a command and temporarily switch the architecture defined in Rizin, temporarily seek to another address in the address space, iterate over defined sections, functions or symbols, create macro, aliases and much more.\nCommands are organized in a tree, where each letter represents a node in that tree. For example, a has several sub-commands like aa, ab, af, ah, ao, av, etc. af groups yet other sub-commands, like afr, af+, af-, afl, afv, etc. You get the idea. This tree can be explored by users thanks to the ? suffix, which is used as a way to get help about commands.\n[0x00000000]\u0026gt; afvb? Usage: afvb [idx] [name] ([type]) | afvb list base pointer based arguments, locals | afvb* same as afvb but in r2 commands | afvb [idx] [name] ([type]) define base pointer based arguments, locals | afvbj return list of base pointer based arguments, locals in JSON format | afvb- [name] delete argument/locals at the given name | afvbg [idx] [addr] define var get reference | afvbs [idx] [addr] define var set reference ? can be used alone to get the sub-commands of the root node.\n[0x00000000]\u0026gt; ? Usage: [.][times][cmd][~grep][@[@iter]addr!size][|\u0026gt;pipe] ; ... Append \u0026#39;?\u0026#39; to any char command to get detailed help Prefix with number to repeat command N times (f.ex: 3x) | %var=value alias for \u0026#39;env\u0026#39; command | *[?] off[=[0x]value] pointer read/write data/values (see ?v, wx, wv) | (macro arg0 arg1) manage scripting macros | .[?] [-|(m)|f|!sh|cmd] Define macro or load r2, cparse or rlang file | ,[?] [/jhr] create a dummy table import from file and query it to filter/sort | _[?] Print last output | =[?] [cmd] send/listen for remote commands (rap://, raps://, udp://, http://, \u0026amp;lt;fd\u0026gt;) [output truncated] | t[?] types, noreturn, signatures, C parser and more | T[?] [-] [num|msg] Text log utility (used to chat, sync, log, ...) | u[?] uname/undo seek/write | v panels mode | V visual mode (Vv = func/var anal, VV = graph mode, ...) | w[?] [str] multiple write operations | x[?] [len] alias for \u0026#39;px\u0026#39; (print hexadecimal) | y[?] [len] [[[@]addr Yank/paste bytes from/to memory Usually the letter of the group is representative of what the sub-commands do. So p stands for \u0026ldquo;print\u0026rdquo; and pd stands for \u0026ldquo;print disassembly\u0026rdquo; and so on. After some time, it becomes much easier to navigate this structure.\nAs mentioned above, you can create statements that in one way or another modify the behaviour of a command (or multiple ones). Suppose you are analyzing a x86-32bit binary, but you know that some pieces of code will actually be executed in 64bit mode, you could do:\n[0x00006b60]\u0026gt; s 0x6c00 [0x00006c00]\u0026gt; e asm.bits=64 [0x00006c00]\u0026gt; pd 2 ;-- entry.fini0: 0x00006c00 f30f1efa endbr64 0x00006c04 803d75b60100. cmp byte [section..bss], 0 ; [0x22280:1]=0 [0x00006c00]\u0026gt; e asm.bits=32 [0x00006c00]\u0026gt; s 0x6b60 Or you can simply apply some modifiers to the pd command with pd 2 @b:64 @0x6c00. This statement will temporarily switch asm.bits and the current seek to execute pd 2 within the right context. There are also other kinds of statements that allow you to redirect the output/error of a command to a file, or that provides a foreach-like behaviour. For example, if you want to execute a command on all segments, you could do \u0026lt;cmd\u0026gt; @@iSS. Iterating over all basic blocks of all functions recognized by Rizin could be done with \u0026lt;cmd\u0026gt; @@b @@F. See @? and @@? for more info.\nImprovements in Rizin shell We rewrote from scratch most of the code dealing with parsing and handling of commands, moving from a simple and scattered approach to a more centralized and uniform one. We started this effort in radare2 with what was called cfg.newshell which since then became the default shell of Rizin and has improved even further. Some of the issues we had with the previous implementation and that led to the rewrite of this code: manually written command help strings inconsistently displayed, hand-written parser with a non clear formal and global grammar defined, impossibility of dynamically registering/deregistering commands in the shell, commands handlers code mixing core logic with input handling, manually implemented autocompletion of commands and partial autocompletion of arguments. These are just some of the things we solved.\nRizin keeps track of all commands that can be executed in its shell, their place in the commands tree, the brief summary of what they do, a possibly longer description, additional details that may be useful, the number of arguments a command accepts, their types and whether they are optional.\nThis information (and more) are stored by Rizin, making it possible to write code in a uniform way. For example, the list of sub-commands available under z? is automatically generated from the data Rizin has available. Autocompletion of commands can be easily performed as soon as a developer adds a new command. Argument autocompletion is straightforward as well, because when a developer adds a new command with its arguments (and types, which are mandatory), Rizin already has all the information to perform the autocompletion.\nAlso error reporting is more uniform, because when the shell detects a non-existing command it can immediately report that to the user. It is also possible to extend this behaviour to propose similar commands that a user wanted to type or provide help for the most similar command available. Similar errors can be reported when a user uses a command in the wrong way, for example by not providing enough arguments or by providing too many. Rizin, knowing how many arguments a command accepts, can immediately return an error (and this behaviour could be extended as well to provide the help of the command, to make life easier for users).\nAnother important aspect of having a database of commands which are available to the Rizin shell is that commands can be easily registered and deregistered at runtime by Rizin plugins (e.g. rz-ghidra). Of course you want to see the commands provided by a plugin only when that plugin is actually loaded. You also want your new command, defined by an external plugin, to behave similarly to the internal commands: you want the plugin commands and their arguments to be autocompleted and you want uniform error reporting. All this is possible because these operations are all automatically performed by the Rizin shell and not by each individual command.\nWe also recognize that sometimes having just a short summary of what a command does is not enough. You would want to have a much longer description of what operations the command performs, which config variables affect its behaviour, etc. This is why we added the possibility to provide a description to each command, that can be shown by using the ?? suffix.\n[0x00000000]\u0026gt; wv?? Usage: wv \u0026lt;value\u0026gt; # Write value as 4-bytes/8-bytes based on value Write the number passed as argument at the current offset as a 4 - bytes value or 8 - bytes value if the input is bigger than UT32_MAX, respecting the cfg.bigendian variable Examples: | wv 0xdeadbeef # Write the value 0xdeadbeef at current offset | wv2 0xdead # Write the word 0xdead at current offset | wv1 0xde # Write the byte 0xde at current offset [0x00000000]\u0026gt; Only a few commands have such description right now, but we very much welcome pull requests to improve our user documentation. If you want to help us, see the next section to know which files to touch.\nWe also changed the way commands and statements are parsed: instead of relying on a very simple hand-written parser, we switched to a tree-sitter based parser, where we just have to write a formal grammar and tree-sitter automatically generates the parser for us. This approach ensures that commands and their arguments are parsed in a consistent way. For example, all new commands accept quoted strings. Wrapping multiple words in single or double quotes would make the shell consider those words as a single argument for the command. If you need to pass ; as an argument of a command (semi-colons usually represent the separator between commands), you can just quote it or escape it with \\; and expect this to work for all (new) commands. Using tree-sitter and defining the grammar in a single file forced us to think about the grammar as a whole in a very uniform way across all commands and not just as the union of different grammars for each command.\nHow it works File rz_cmd.h has all the API to register, deregister, execute commands with a list of arguments and get help for a tree of commands. To add new commands, developers (of a plugin, for example) have to use rz_cmd_desc_argv_new by specifying the parent group in the commands tree, the handler of the command and a structure of type RzCmdDescHelp that describes the command: its summary, an optional longer description, a list of detailed sub-sections in the help and the list of arguments the command accept, including information on their types and whether they are optional or not.\nAs soon as the command is registered (e.g. a Core plugin with a command is loaded), the Rizin shell becomes aware of the new command: the command can now be executed, it is shown in the help tree in the right place, it is autocompleted as necessary, including its arguments. As an example, you can see rz-ghidra or jsdec.\nPlugin developers have to use all the C API and data structures mentioned above and more, included in the rz_cmd.h file. To avoid a lot of boilerplate code and make changes to commands easier also for non developers we thought about auto-generating the C structures like RzCmdDescHelp from a list of YAML files. Commands are all described in YAML files that mimics the final tree structure, like below:\n- name: tc summary: List loaded types in C format subcommands: - name: tc cname: type_list_c summary: List loaded types in C format with newlines args: - name: type type: RZ_CMD_ARG_TYPE_ANY_TYPE optional: true - name: tcc summary: Manage calling convention types subcommands: - name: tcc cname: type_cc_list summary: List all calling conventions modes: - RZ_OUTPUT_MODE_STANDARD - RZ_OUTPUT_MODE_LONG - RZ_OUTPUT_MODE_SDB - RZ_OUTPUT_MODE_RIZIN - RZ_OUTPUT_MODE_JSON args: - name: type type: RZ_CMD_ARG_TYPE_STRING optional: true - name: tcc- cname: type_cc_del summary: Remove the calling convention args: - name: type type: RZ_CMD_ARG_TYPE_STRING While building Rizin, these YAML files are used to automatically generate a .c and a .h file containing all the data structures and C API calls necessary to construct the commands tree as described by the developer.\nThis approach ensures that commands shown in the help are only those that can be executed and that commands that can be executed are listed in the help as well. In the past, due to help messages being just strings manually written for each command, it was too easy to forget to update an help message and ending with a hidden command or with a wrong help that referenced a command which did not exist anymore.\nAnother big change for developers is that commands are not implemented anymore in huge switch-cases like before, but each command handler has its own function with a signature similar to the main function of a C program, including argc/argv arguments. We believe this makes our codebase much cleaner and easier to understand, with short (and less indented) command handlers that just have to deal with the core logic of the command, without having to add boilerplate code just to parse/split arguments like it was done before.\nConclusion We think we are making big changes towards a more usable, discoverable and descriptive shell and although these changes required a lot of time, we have reached a good point. Rizin is now in a mixed state, with some commands still following the old behaviour and other commands being switched to the new way described in this blog post. We are porting new commands approximately every week, but any help is appreciated: you can provide more accurate and descriptive summaries/description to the already converted commands in https://github.com/rizinorg/rizin/tree/dev/librz/core/cmd_descs or you can help us port commands following the old structure to the new approach, so that they can benefit of everything that is explained here (look at #1342 to know which commands are missing). Have also a look at rzshell.md to know more about the shell.\nIf you have issues, bugs, ideas or want to discuss this approach or others with us, feel free to join us on Mattermost.\n","permalink":"https://rizin.re/posts/rzshell/","summary":"Rizin shell","title":"Rizin shell"},{"content":"Rizin Summer of Code 2021 Summary RSoC 2021 is officially finished and we are happy to congratulate both participants for passing the program and completing the most important parts of their tasks.\nBasstorm: Types analysis Hello, I am Basstorm. Over the past two months, I had a fulfilling summer as one of the participants of RSoC. The main subject of RSoC was to improve the Type module.\nAt first, I fixed several bugs in the new tree-sitter based type parser. The new type parser brings us the ability to parse a C type defined as a string. After that, I migrated the type constraints from RzAnalysis to the new RzType module, which makes the type constraints management easier.\n[0x00000530]\u0026gt; e analysis.types.constraint=true [0x00000530]\u0026gt; aaa [x] Analyze all flags starting with sym. and entry0 (aa) [x] Analyze function calls (aac) [x] Analyze len bytes of instructions for references (aar) [x] Check for classes [x] Type matching analysis for all functions (aaft) [x] Propagate noreturn information [x] Use -AA or aaaa to perform additional experimental analysis. [0x00000530]\u0026gt; s sym.range_small [0x0000063a]\u0026gt; pdf ; CALL XREF from main @ 0x720 / sym.range_small (int64_t arg1); | ; var int64_t var_14h { \u0026gt; 0x0 \u0026amp;\u0026amp; \u0026lt;= 0x9} @ rbp-0x14 ;constraint | ; var int64_t var_8h @ rbp-0x8 | ; var int64_t var_4h { } @ rbp-0x4 | ; arg int64_t arg1 @ rdi | 0x0000063a push rbp | 0x0000063b mov rbp, rsp | 0x0000063e sub rsp, 0x20 | 0x00000642 mov dword [var_14h], edi ; arg1 For historical reasons, Rizin has never had support for global variables, which means we can\u0026rsquo;t identify and set a certain global variable, which is detrimental to our analysis. I have added support for global variables so that we can easily manipulate a global variable from the command line.\n[0x00000000]\u0026gt; avg? Usage: avg[jadmnt] # Global variables | avg[j] [\u0026lt;var_name\u0026gt;] # show global variables | avga \u0026lt;var_name\u0026gt; \u0026lt;addr\u0026gt; \u0026lt;type\u0026gt; # add global variable manually | avgd \u0026lt;addr\u0026gt; # delete the global variable at the addr | avgm \u0026lt;name\u0026gt; # delete global variable with name | avgn \u0026lt;old_var_name\u0026gt; \u0026lt;new_var_name\u0026gt; # rename the global variable | avgt \u0026lt;var_name\u0026gt; \u0026lt;type\u0026gt; # change the global variable type [0x00000000]\u0026gt; avga foo 0x100 char [0x00000000]\u0026gt; avg global char foo @ 0x100 [0x00000000]\u0026gt; avgt foo int [0x00000000]\u0026gt; avg global int foo @ 0x100 [0x00000000]\u0026gt; In addition, I completely refactored PDB Parser to make it better cross-platform. Previously, PDB Parser had a lot of problems with its functionality, such as missing information, parsing errors, and unused types. All these problems are solved in this refactoring.\n$ rizin Project1.exe -- Use scr.accel to browse the file faster! [0x00401703]\u0026gt; idpi ./Project1.pdb ········· struct std::_Char_traits\u0026lt;char32_t,unsigned int\u0026gt; { char32_t char_type; uint32_t int_type; int64_t off_type; char32_t copy(char32_t * arg0, const char32_t * arg1, const uint32_t arg2); char32_t _Copy_s(char32_t * arg0, const uint32_t arg1, const char32_t * arg2, const uint32_t arg3); char32_t move(char32_t * arg0, const char32_t * arg1, const uint32_t arg2); int32_t compare(const char32_t * arg0, const char32_t * arg1, uint32_t arg2); uint32_t length(const char32_t * arg0); const char32_t * find(const char32_t * arg0, uint32_t arg1, const char32_t * arg2); bool eq(const char32_t * arg0, const char32_t * arg1); bool lt(const char32_t * arg0, const char32_t * arg1); char32_t to_char_type(const uint32_t * arg0); uint32_t to_int_type(const char3_t * arg0); bool eq_int_type(const uint32_t * arg0, const uint32_t * arg1); uint32_t not_eof(const uint32_t * arg0); uint32_t eof(); } ········· [0x00401703]\u0026gt; tuc ········· union __m64 { uint64_t m64_u64; float m64_f32[8]; unsigned char m64_i8[8]; int16_t m64_i16[8]; int32_t m64_i32[8]; int64_t m64_i64; unsigned char m64_u8[8]; uint16_t m64_u16[8]; uint32_t m64_u32[8]; }; ········· Currently, Heersin has completed the new RzIL, but it still lacks support for many architectures. So I am now porting the 8051 architecture from the old ESIL to the new RzIL, and I will be working with Heersin to port more architectures to the new IL afterwards.\nDuring this RSoC, I grew a lot and learned a lot of development skills that I would not normally be exposed to. I would like to especially thank my mentor Anton Kochkov for his selfless help. I would also like to thank all the community members for their help!\nHeersin: New Rizin IL Hi, I\u0026rsquo;m Heersin, I particpated in RSoC this summer to introduce a new Intermediate Language and refactor ESIL related code. Rizin previously used ESIL(a stack-based IL) as its IL to analyse binary. In fact, ESIL is neither user friendly nor developer friendly, those are some of the reasons that led to this work. We take BAP\u0026rsquo;s Core Theory as our new IL. Because it\u0026rsquo;s designed to be similar to SMT, and it may be the latest IL (the most \u0026ldquo;fashionable\u0026rdquo; one) we can trust for now.\nIn the first few days, I didn\u0026rsquo;t have any clue about implementing a Core Theory VM, so I started to work on the some basic data structures (Bool/BitVector/Array) used in VM. They are basic types in core theory, we can emulate other types (ut8/ut16/ut32/ut64) by using bitvector and bool.\nAfter that, I focused on the concepts in VM and the execution procedure. In short, there are Variable and Value in the Core Theory VM. A Variable is a symbol while a Value represents the evaluating result of an expression. read register is used to get the value of a variable and write register is used to assign a value to a variable. Memory is a Hashtable (kv-map), where the address is the key and the data is the value. The Memory concept is similar to the SMT Arrays theory where both values and indexes are Bitvectors.\nThen, I uplifted the brainfuck to test the new IL. That\u0026rsquo;s the uplifted expression.\n# print mode # ++++++++++[\u0026gt;+++++++\u0026gt;++++++++++\u0026gt;+++\u0026gt;+\u0026lt;\u0026lt;\u0026lt;\u0026lt;-]\u0026gt;++.\u0026gt;+.+++++++..+++.\u0026gt;++.\u0026lt;\u0026lt;+++++++++++++++.\u0026gt;.+++.------.--------.\u0026gt;+.\u0026gt;. (STORE (VAR ptr) (ADD (LOAD (VAR ptr)) (INT 1))) (STORE (VAR ptr) (ADD (LOAD (VAR ptr)) (INT 1))) (STORE (VAR ptr) (ADD (LOAD (VAR ptr)) (INT 1))) (STORE (VAR ptr) (ADD (LOAD (VAR ptr)) (INT 1))) (STORE (VAR ptr) (ADD (LOAD (VAR ptr)) (INT 1))) (STORE (VAR ptr) (ADD (LOAD (VAR ptr)) (INT 1))) (STORE (VAR ptr) (ADD (LOAD (VAR ptr)) (INT 1))) (STORE (VAR ptr) (ADD (LOAD (VAR ptr)) (INT 1))) (STORE (VAR ptr) (ADD (LOAD (VAR ptr)) (INT 1))) (STORE (VAR ptr) (ADD (LOAD (VAR ptr)) (INT 1))) (BRANCH (LOAD (VAR ptr)) \u0026lt;NOP\u0026gt; (GOTO ]0)) (SET ptr (ADD (VAR ptr) (INT 1))) (STORE (VAR ptr) (ADD (LOAD (VAR ptr)) (INT 1))) ... (SET ptr (SUB (VAR ptr) (INT 1))) (STORE (VAR ptr) (SUB (LOAD (VAR ptr)) (INT 1))) (BRANCH (INV (LOAD (VAR ptr))) \u0026lt;NOP\u0026gt; (GOTO [0)) (SET ptr (ADD (VAR ptr) (INT 1))) (STORE (VAR ptr) (ADD (LOAD (VAR ptr)) (INT 1))) (STORE (VAR ptr) (ADD (LOAD (VAR ptr)) (INT 1))) (GOTO write) (SET ptr (ADD (VAR ptr) (INT 1))) (STORE (VAR ptr) (ADD (LOAD (VAR ptr)) (INT 1))) (GOTO write) (STORE (VAR ptr) (ADD (LOAD (VAR ptr)) (INT 1))) ... (GOTO write) (SET ptr (ADD (VAR ptr) (INT 1))) (STORE (VAR ptr) (ADD (LOAD (VAR ptr)) (INT 1))) (GOTO write) (SET ptr (ADD (VAR ptr) (INT 1))) (GOTO write) Next step is to integrate new IL with Rizin, and porting analysis_bf to use it. The analysis code is huge and ESIL is tightly integrated within it. I removed some dead code and reorganized the directory structure with the help from community members. Moreover, I added new structures for the trace and stat (they are used to collect info about reg/mem read/write) to replace the sdb approach with vectors and make it easier to understand. Then I continued to integrate the new IL; added aezi and aezs commands for init VM and step emulate respectively.\n[0x00000000]\u0026gt; aezi [0x00000000]\u0026gt; aezs 390 Hello World! [0x00000000]\u0026gt; aezi Porting more architectures will be a huge work. I will continue to contribute to rizin and improve the IL part.\nI am grateful for such an opportunity to participate in RSoC and contribute to Rizin. There is a friendly atmosphere and I learned a lot. I want to give special thanks to my mentor XVilka for his guidance and help, and also to ret2libc, Ivg, Wargio, Pelijah, Thestr4ng3r and 08A for answering my questions and giving feedback on my PRs.\n","permalink":"https://rizin.re/posts/rsoc-2021-summary/","summary":"Rizin Summer of Code 2021 Summary RSoC 2021 is officially finished and we are happy to congratulate both participants for passing the program and completing the most important parts of their tasks.\nBasstorm: Types analysis Hello, I am Basstorm. Over the past two months, I had a fulfilling summer as one of the participants of RSoC. The main subject of RSoC was to improve the Type module.\nAt first, I fixed several bugs in the new tree-sitter based type parser.","title":"Rizin Summer of Code 2021 Summary"},{"content":"Google Summer of Code 2021 Summary GSoC 2021 is officially finished and we are happy to congratulate all 3 participants for passing the program and completing the most important parts of their tasks. It brought us some long-needed code cleanup and user-visible changes in the analysis and binary/heap parsing. See what students wrote themselves:\n08A: Refactoring ELF binaries loading This summer I have been doing the GSoC for Rizin. The subject of the GSoC was to refactor and improve how elf binaries are loaded by Rizin.\nI have added support for the elf hash table and gnu hash table. Those 2 data structures are used to deduct the number of dynamic symbols in the file, which replaced the old way of doing it (assuming that the data is a symbol until there is an error).\nMoreover, I have changed the source of trust used to load symbols\u0026rsquo; versions (from sections information to dynamic section\u0026rsquo;s information). So Rizin is now able to read symbols\u0026rsquo; versions even if there is no section.\n\u0026gt; rz-bin -V bins/elf/analysis/clark WARNING: Invalid section header (check array failed). Version symbols has 9 entries: Addr: 0x080482c2 Offset: 0x000002c2 0x00000000: 0 (*local*) 0x00000001: 2 (GLIBC_2.0) 0x00000002: 2 (GLIBC_2.0) 0x00000003: 0 (*local*) 0x00000004: 2 (GLIBC_2.0) 0x00000005: 2 (GLIBC_2.0) 0x00000006: 2 (GLIBC_2.0) 0x00000007: 2 (GLIBC_2.0) 0x00000008: 1 (*global*) Version need has 1 entries: Addr: 0x080482d4 Offset: 0x000002d4 0x000002d4: Version: 1 File: libc.so.6 Cnt: 1 0x000002e4: Name: GLIBC_2.0 Flags: none Version: 2 There was a hard-coded maximum length for all string found in any elf string table. This limitation was removed and some small check of the string table integrity were added.\n\u0026gt; rizin bins/elf/long-symbol.elf WARNING: Neither hash nor gnu_hash exist. Falling back to heuristics for deducing the number of dynamic symbols... WARNING: Neither hash nor gnu_hash exist. Falling back to heuristics for deducing the number of dynamic symbols... WARNING: Neither hash nor gnu_hash exist. Falling back to heuristics for deducing the number of dynamic symbols... -- Add custom Have you setup your ~/.rizinrc today? [0x00001040]\u0026gt; is~AAA 28 0x00001139 0x00001139 GLOBAL FUNC 15 AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA The main problem with how symbols and imports were loaded, was their mutual dependency during the loading phase. So both processes were split and heavily refactored. As a side effect, an old bug in the symbols loading was fixed.\nThe call to the function system is correctly identified:\n\u0026gt; rizin bins/elf/analysis/phdr-override WARNING: The segment 3 at 0x774 seems to be invalid. WARNING: Neither hash nor gnu_hash exist. Falling back to heuristics for deducing the number of dynamic symbols... WARNING: Neither hash nor gnu_hash exist. Falling back to heuristics for deducing the number of dynamic symbols... WARNING: Neither hash nor gnu_hash exist. Falling back to heuristics for deducing the number of dynamic symbols... -- Change your fortune types with \u0026#39;e cfg.fortunes.file=fun,tips\u0026#39; in your ~/.rizinrc [0x004003f0]\u0026gt; s main [0x004004e6]\u0026gt; af [0x004004e6]\u0026gt; pdf ┌ int main (int argc, char **argv, char **envp); │ ; var int64_t var_10h @ rbp-0x10 │ ; var int64_t var_4h @ rbp-0x4 │ ; arg int argc @ rdi │ ; arg char **argv @ rsi │ 0x004004e6 push rbp │ 0x004004e7 mov rbp, rsp │ 0x004004ea sub rsp, 0x10 │ 0x004004ee mov dword [var_4h], edi ; argc │ 0x004004f1 mov qword [var_10h], rsi ; argv │ 0x004004f5 mov rax, qword [var_10h] │ 0x004004f9 add rax, 8 │ 0x004004fd mov rax, qword [rax] │ 0x00400500 mov rdi, rax │ 0x00400503 mov eax, 0 │ 0x00400508 call sym.imp.system ; int system(const char *string) │ 0x0040050d mov eax, 0 │ 0x00400512 leave └ 0x00400513 ret During the loading phase, sections and segment information checks have been added to verify the integrity of the data. Those checks are stricter than the elf loader. So 3 configurations variable were implemented to allow the user to customize how segments and sections are loaded.\n\u0026gt; rizin bins/elf/analysis/phdr-override WARNING: The segment 3 at 0x774 seems to be invalid. WARNING: Neither hash nor gnu_hash exist. Falling back to heuristics for deducing the number of dynamic symbols... WARNING: Neither hash nor gnu_hash exist. Falling back to heuristics for deducing the number of dynamic symbols... WARNING: Neither hash nor gnu_hash exist. Falling back to heuristics for deducing the number of dynamic symbols... -- Press \u0026#39;C\u0026#39; in visual mode to toggle colors [0x004003f0]\u0026gt; \u0026gt; rizin -e elf.checks.segments=false bins/elf/analysis/phdr-override WARNING: Neither hash nor gnu_hash exist. Falling back to heuristics for deducing the number of dynamic symbols... WARNING: Neither hash nor gnu_hash exist. Falling back to heuristics for deducing the number of dynamic symbols... WARNING: Neither hash nor gnu_hash exist. Falling back to heuristics for deducing the number of dynamic symbols... -- Add colors to your screen with \u0026#39;e scr.color=X\u0026#39; where 1 is 16 colors, 2 is 256 colors and 3 is 16M colors [0x004003f0]\u0026gt; There is still a lot of work to do, specially on the elf plugin interface. If you want to follow the update on this, you can use this link: Refactoring the elf plugin interface\nIn conclusion, the GSoC was an incredible source of motivation to contribute to the Open Source community. And it helped me improve my knowledge of elf internals. I would like to thank my mentors Anton Kochkov and Florian Märkl for their help during the GSoC.\nPulak: Heap viewer for Cutter Hi, I am Pulak Malhotra. Over the past few months, I participated in GSoC with the Rizin organization. My main contributions revolve around the heap parsing code for Rizin and the GUI implementation of heap viewer for Cutter. The initial work started with improving the output format of the dmh family of commands. I made them much more readable, taking inspiration from gdb gef. I added a new command, dmhd, which prints concise information about different bins of a given arena. I also refactored and rewrote a significant part of the Glibc heap codebase, making it more modular and maintainable, including porting it to the new shell. I added new Rizin API calls and used them in Cutter to implement the GUI version of the heap viewer. Heap viewer in Cutter has many features, like getting a list of heap chunks in an arena, editing the heap chunks, getting information about bins in the arena, and visualizations for linked lists of the bins. I encourage everyone to give it a try in their next heap exploitation hack! After Glibc heap, I made some contributions towards the windows heap and windows heap widget. Some of the changes have been merged, like the Rizin API and the new shell port. I\u0026rsquo;ll try my best to ship the other modifications to production soon.\nGSoC was one of my first experiences working on a real-world project, and I learned and grew a lot. I want to give special thanks to my mentors Yossizap and Megabeets, and the Rizin community members XVilka, Ret2libc, Deroad, Gustavolcr, and Thestr4ng3r, who were always there to answer my questions and review and give feedback for my PRs.\nAswin: Support for CPU and platform profiles Hello everybody!\nI\u0026rsquo;m Aswin and this is a brief summary about the work I did on the summer of 2021 with Rizin on adding support for CPU and platform profiles. Rizin previously relied upon manually writing code for adding a new CPU or an IO port and it was a bit tedious to handle them all and it was not user friendly. Providing a level of abstraction in handling this entropy in embedded systems by adding support for editable CPU and platform profiles was the goal of this project.\nAfter getting accepted, the first thing I did was to remove the existing implementation of RzSyscallPorts - the module which took care of the architecture and CPU specific system registers. Here, I made two new modules: RzSysregsDB and RzSysregItem to make this happen. RzSysregsDB just housed a hashtable which paired the address of the port and an RzSysregItem which contained the comment, type and all the other information related it.\nThen, I started working on CPU profiles. The whole idea of CPU profiles is to store all the CPU specifics in one file, parse it and use it at places like analysis, emulation and wherever it\u0026rsquo;s needed. Inside CPU profiles, we store information like size of the ROM, size of the RAM CPU and other things and they are parsed and stored into various data structures inside RzArchProfile, where RzArchTarget houses the name of the CPU and architecture and a pointer to RzArchProfile. Information about the CPU IO registers and Extended IO registers can also be added in CPU profiles. During the analysis loop, they are added as flags (labels) at their corresponding offsets. A feature to map the ROM as sections (iS) were also added with it.\nThis is how the IO and extended IO registers are defined in the SDB files:\nSPH=reg SPH.address=0x3e SPH.comment=Stack higher bits SP8 SP10 After that, I added support for platform profiles. Platform profiles were introduced to handle the platform specific differences. These files contains the name, offset and a short description of each port or register, which are parsed and added as flags and comments. Support for one platforms like BCM2835, which one of the Raspberry Pi runs on, BCM2711 and OMAP 3430 were added along with the x86 IO ports were added subsequently.\nA new configuration variable asm.platform was also added to choose the platform profile. This will let the user choose the name of the profile they want to load and Rizin will load the profile based upon the CPU and the architecture that the user have previously set. For that, I added a new variable platforms to RzAsmPlugin which will hold the list of all supported platforms of that architecture.\nPlatform Profiles also follow a format similar to the CPU profiles that you saw earlier. Here\u0026rsquo;s an excerpt BCM 2835\u0026rsquo;s platform profile:\nAUX_MU_IER_REG=name AUX_MU_IER_REG.address=0x7e215044 AUX_MU_IER_REG.comment=Mini UART Interrupt Enable AUX_MU_IIR_REG=name AUX_MU_IIR_REG.address=0x7e215048 AUX_MU_IIR_REG.comment=Mini UART Interrupt Identify Then, I worked on porting uefi_r2 - a tool used to analyze UEFI modules to Rizin. This tool works by analyzing the firmware using Rizin\u0026rsquo;s RzAnalysis utilities and inspecting its functions, strings and other particulars - for example, while searching for the UEFI GUIDs inside the analyzed strings. Here, the tool is a Python package and all the interaction with rizin is done through rz-pipe\u0026rsquo;s Python module. Overall, this was not particularly challenging but it was indeed very informative. UEFI is insanely complex!\nLater, I continued to work on improving the SVD parser plugin I had started making during the microtask. SVD files are files containing information about a device\u0026rsquo;s peripherals, MMIO registers and other particulars. They are usually made by the manufacturer. This plugin would load the data from SVD file to Rizin mainly the registers\u0026rsquo; name, size, base address and its offset and adds them as flags and comments.\nI would like to thank my mentors xvilka and deroad for their guidance. I was regularly in touch with them and they were constantly trying make sure that everything was going smooth.\nAlso kudos to all the folks at #Rizin-dev, #gsoc-2021 and the other channels where my questions were answered.\n","permalink":"https://rizin.re/posts/gsoc-2021-summary/","summary":"Google Summer of Code 2021 Summary GSoC 2021 is officially finished and we are happy to congratulate all 3 participants for passing the program and completing the most important parts of their tasks. It brought us some long-needed code cleanup and user-visible changes in the analysis and binary/heap parsing. See what students wrote themselves:\n08A: Refactoring ELF binaries loading This summer I have been doing the GSoC for Rizin. The subject of the GSoC was to refactor and improve how elf binaries are loaded by Rizin.","title":"Google Summer of Code 2021 Summary"},{"content":"Google Summer of Code 2021 is here and we are excited to participate! 🎉. This is the 2nd internship program we are running this year, along with RSoC 2021.\nWe received many applications, and we are happy that there is a substantial interest in the project. We thank every participant and wish them luck in their future endeavors. We also thank Google for providing us the platform for attracting new contributors. Many of the past participants stayed with the project after GSoC, and we sincerely hope it will continue in the future.\nThis summer, the accepted projects aim to improve the correctness and quality of the Rizin and Cutter output, along with advancing user experience for embedded reverse engineering and exploitation.\nAswin: Support for CPU and platform profiles Hey, I\u0026rsquo;m Aswin! I\u0026rsquo;m a sophomore year undergraduate from India and I\u0026rsquo;m thrilled to be working with Rizin this summer. For Google Summer of Code, I will be working on adding support for CPU and platform profiles.\nI have always had a passion for reverse engineering as well as on how computers work at a low level. I hope to learn more about reverse engineering and hardware platforms by participating in GSoC at Rizin. I chose this as my project as it will help Rizin be more compatible with obscure hardware platforms, architectures and chips.\nFor the microtask, I was working on writing an SVD parser for Rizin. It was basically a plugin which lets you open SVD files inside Rizin and make use of all the information about the peripherals and registers present inside the file. I came to know about so many things about microcontrollers and many other things like Memory Mapped IO, registers and a lot more about platforms and on how they work while working on this. Over the summer, I\u0026rsquo;m going to do the following tasks:\nGet the configuration variables related to the CPU and the platform to be dynamically populated by listing the available dedicated SDB files and add them to the analysis loop.\nProvide rz-lang and rz-pipe bindings so that the users can choose through scripts, as well.\nAs an additional goal, I\u0026rsquo;ll be working on improving the SVD parser and adding it to Rizin as a core plugin, so that it\u0026rsquo;ll be shipped with Rizin and Cutter, as well. At the end, I also hope to write a good how-to article on how to add and use a profile\nI am engrossed by the wonderful feeling of community I have felt during contributing to Rizin. I have gained amazing insights and skills and feel very grateful to be working and learning more from one of the smartest, kindest and knowledgeable people I\u0026rsquo;ve ever known.\nHoping to accomplish great things and have a really great summer. I wish the very best to all the folks who got in!\nPulak: Heap viewer for Cutter Hi, I am Pulak Malhotra from India. I am an undergraduate student and researcher at IIIT Hyderabad. GSoC provides me an excellent opportunity to work on real-world codebases, contribute to the open-source community, meet new people and learn new things. I am relatively new to reverse engineering. In the past, I enjoyed working on low-level systems in my university courses which drew me towards reverse engineering. Rizin\u0026rsquo;s welcoming and helpful community is a significant factor that makes me want to contribute to the project.\nMy GSoC project aims to deliver widgets that provide information about the heap state while debugging programs in Cutter. These widgets will give information regarding chunks in a heap and the bins in which free chunks are located. I also aim to deliver graphical visualization of the linked list of the arena bins. My project also includes refactoring the heap codebase in Rizin, so the heap allocator is dynamically selected based on binary. Currently, the heap allocators are compiled at compile time in Rizin. I will also try to add support for more heap allocators in Rizin.\nI contributed the Pull Request #810 as my microtask. I improved the output of existing heap-related commands like dmh and dmhf in this microtask, and I created new commands dmhv and dmhd. dmhv is similar to dmh command but provides extra information about the heap chunks and dmhd command provides concise information about the bins of main arena. At various points, while solving the issue, I felt lost, especially when I was not familiar with the codebase. To get over this, I would make small changes, rerun the code, and note the difference in output. At every step, many members of Rizin gave me detailed advice and feedback and guided me. I also came across some suggestions which I pursued further in PR #912 and issue #1088. Overall, it was a fantastic experience contributing to the Rizin.\n08A: Refactoring ELF binaries loading Hi, I\u0026rsquo;m 08A from France. I\u0026rsquo;m an undergraduate student at EPITA, majoring in Systems, Networks and Security (SRS).\nSome friends of mine are Free Software gurus, and they motivated me to contribute to Open Source software. So I chose to contribute to a tool I already used.\nI started working on the code base with Radare2, and after the fork I switched to Rizin. The majority of my time was allocated to fix various issues found by clang-analysis and refactoring how Rizin parses ELF information. The overall experience was like going down a rabbit hole, the code base is huge and some parts are rusty. But the community is awesome and I learned a lot of things.\nFor this summer of code, I will be going to work on refactoring the ELF loading feature. The main challenge will be to fix the imported function detection. If you want additional information, you can check this link.\n","permalink":"https://rizin.re/posts/gsoc-2021-announcement/","summary":"An announcement of the Google Summer of Code 2021. Three accepted candidates.","title":"Google Summer of Code 2021 Announcement"},{"content":"We are excited to announce RSoC 2021! Rizin Summer of Code is a summer internship program we organize together with KeenLab of Tencent. We provide an opportunity for students to work full-time on Rizin and RzGhidra. We use the experience we gathered by participating in Google Summer of Code as an organization and organizing our own RSoC as part of the radare2 project.\nThe application period continued through all of April and in its end we finally chose two students. We wish them the best of luck and happy to give them this stage to introduce themselves.\nHeersin: Intermediate language improvements Hello, I\u0026rsquo;m Heersin from China, an undergraduate student majoring in information security. At the very first, I was looking for a handful RE tool on the Linux platform, and then I met radare2. I have been using it to solve some basic CTF tasks and dumps from some malwares. I found there are some imperfect features (the concept of project, ESIL, search\u0026hellip;) and want to contribute to it. After knowing there is a new fork named Rizin, which aimed at refactoring radare2, I started to get involved.\nDuring my spare time, I\u0026rsquo;ve done some work for rizin, including:\nUpdate some out-of-date pages in the rizin documentation and add more examples. Fix some bugs in pyc plugin Add support for luac format Extend the testsuite to cover more platforms For RSoC this year, I will be working on the ESIL and follow the issue #277 to refactor it, and will add support for floating point and bitvectors. I will also try to fix some issues in ESIL (e.g. prevent rizin from getting stuck when hitting some C-library functions).\nIt will be an exciting and challenging summer, looking forward to it!\nBasstorm: Types analysis Hi, I\u0026rsquo;m basstorm from China, and I am a 21 years old student. Over the last couple of weeks, I have done some bug fixes and improved the class analysis module:\nFixed display of duplicate vtables in acll command when using aaa command to analyze over 2 times. Improved the output of the acll command to be more concise and clear. Implemented the integration of data from the RzBin and RzAnalysis modules, which makes the results of class analysis more accurate. Implemented constructor and destructor detection based on the function name. This summer, I am going to do the following tasks:\nImprove the support of PDB Structure. Implementing new features in RzTypes. Continue to implement new features or bug fixes around class analysis. ","permalink":"https://rizin.re/posts/rsoc-2021-announcement/","summary":"An announcement of the Rizin Summer of Code 2021. Two accepted candidates.","title":"Rizin Summer of Code 2021 Announcement"},{"content":"As developers, we think it is essential to have a building system that eases our work, allows us to compile Rizin quickly on a wide range of devices, is easy to understand and to modify, and provides a nice set of features one would usually expect from a full-fledged building system. Since its inception, Rizin has focused on improving its Meson build files and making its support first-class while deprecating the original building system used in radare2. In the following article, we will explain the reasons behind this choice and the key benefits of Meson.\nTL;DR Meson is declarative and easy to understand Ninja is fast, no files are recompiled if not necessary Meson keeps your source directory clean with out-of-source builds Meson makes it easy to build and run multiple versions of Rizin Meson simplifies dependency handling and switching from internal dependencies to system-provided ones A bit of context Historically radare2 has been compiled with the usual ./configure; make approach. This essentially consists of a shell script, configure, and a set of Makefiles. configure allows the user to customize the compilation and installation process performed by make by setting, for example, the destination directories where executables, libraries, etc. are installed on the system. It is also used to enable or disable specific features (e.g. the debugger) or to check for the existence of specific libraries, header files, functions, compiler or linker arguments.\nTo some, this may be very similar to what is done by Autotools. However, in radare2/Rizin case, configure is generated by another shell script, acr, by parsing a configure.acr file. acr is a tool developed by the original author of radare2, and it is an Autoconf replacement.\nDuring the years some attempts were made to introduce other build systems, like Jam and a NodeJS-based build system. It was only in 2017 that radare2 started introducing Meson. Since then, many people have improved this system to compile on several platforms and making sure it is (almost) feature-wise on par with the ACR/Make build system.\nRizin has chosen to deprecate the use of ACR/Make and switch to Meson as the main build system. We believe this will make the overall build process more standard, easy to understand, and easy to integrate with other tools/libraries. Other very valid alternatives such as CMake were considered, however we preferred to keep working with Meson, which was already tested and tried with Rizin for a long time, rather than starting completely from scratch with another build system.\nProblems with ACR/Make There are of course several reasons for this choice, so let\u0026rsquo;s first see what we have identified as the problems of the historical approach:\nACR is essentially a one-person project, with mostly only radare2 and other radare-related tools using it. This by itself is not a bad thing, but it comes with the downside that you find no help or documentation online and if you have issues or missing features, you have to rely on one person only who understand its internals. Moreover, the features you find are usually just the ones used by radare2 project (e.g. not long ago, it was not possible to easily check if the compiler supported a particular compilation flag, because it was never necessary for radare2). configure script needs a sh shell, which makes it hard to use on platforms such as Windows. There are of course ways to use it, but they may involve installing MinGW or similar, which may not be ideal for Windows users who usually work within Visual Studio. Makefiles can be written in a very flexible way and they can be used to perform any sort of action, from simply compiling a C file to running scp, various scripts, and much more. Flexibility shall not be abused though. Otherwise, it may become hard to understand how things are actually done. For example, understanding how librz_io.so is compiled involves looking at the Makefile in libr/io, which includes config.mk that setups some variables based on other variables defined in the Makefile and then it includes rules.mk, which uses those variables to actually compile the library. Inside rules.mk you find, hidden with various environment variables, the commands used to build the object files, and then the library. You can look at the compilation command here, which we think is hard to grasp from a quick look even for people familiar with radare2/Rizin codebase (you may wonder where to find config.mk mentioned above: it is auto-generated). It is \u0026ldquo;low-level\u0026rdquo;, which means that the Makefiles define the specific commands, flags, and options that you have to use to actually compile a binary, a library, or an object file. This provides a lot of power, but it may also be overwhelming having to remember to add specific compilation/linking flags for compiling a single file. For example, it is not possible yet to compile radare2/Rizin within a directory with spaces in the name due to limitations within GNU Make. ACR/Make cannot be used as-is to compile Rizin on Windows systems. What we like about the Meson Build System It is declarative, which means you don\u0026rsquo;t have to remember or care about how to actually compile a shared library or a static library on Linux, Windows, BSD, etc. or how to link an executable with some other libraries or make sure include paths are right. As an example, look at this piece of meson.build:\nlibrary(\u0026#39;io\u0026#39;, [\u0026#39;file1.cpp\u0026#39;, \u0026#39;file2.cpp\u0026#39;], dependencies: [util_dep], install: true, soversion: rz_asm_lib.version() ) You don\u0026rsquo;t need to know how meson is going to build your library, but it is going to do it by compiling two source files (e.g. file1.cpp and file2.cpp), name the library io (e.g. on Linux the library would be called libio.so, but the full name and the extensions might be different on Windows) and give it the proper API version, make sure the dependency specified by util_dep, whatever it is, is used to compile this library, by adding the proper include paths and link directives.\nIt is fast. This is extremely important for developers, as while developing a feature or fixing a bug they may need to compile Rizin multiple times and we want this process to be as fast as possible. Meson/Ninja performs quite well compared to other build systems (https://mesonbuild.com/Simple-comparison.html). It forces you to list all source files used to compile a target and it is able to automatically compute other dependencies between targets. In ACR/Make, due to its complexity as implemented in radare2/Rizin and to the low-level approach, it is easy to mess with the dependencies between targets and to recompile multiple times the same files even when there are no changes. For example, until very recently, running make multiple times caused the recompilation of several objects even if no file was changed (in last few months this problem was caused by wrong dependencies of sdb, in the past due to wrong dependencies of the capstone target).\nmeson can run everywhere python3 can. This includes a very wide range of platforms nowadays. It automatically provides a very powerful scripting language, python, that you are guaranteed to find on the build machine. Moreover, it can be used with various backends, like Ninja, Visual Studio and Xcode, which means it can be used to generate a Visual Studio solution that you can import there.\nIt forces you to build out-of-source, meaning that no changes (mostly) will be done to your source directory, which must contain only the source files of your project and not be mixed with other auto-generated files like executables or object files. This also allows you to have the project compiled with different options or with slightly different code, cleanly separated in different directories.\nDue to its declarative nature, it does not matter whether a dependency is in a path or another or if it comes from the system or it was bundled with the source code. You just define capstone_dep variable properly in one of your meson.build files and you reference it wherever it is needed, leaving all the details to meson itself. This encourages splitting the repository into sub-projects when it makes sense, in contrast with the ACR/Make system where even a small change to e.g. SDB path would require rewriting several Makefiles. If in the future some systems will ship their own version of SDB, we would just need to change few lines in the definition of sdb_dep to actually take the system library instead of the bundled one and no other place would need to be changed to make sure everything is compiled/linked with the right headers/libraries.\nIn case of problems with meson there is a healthy community out there ready to help you, a nice and extensive documentation and active developers that improve the system with new releases. New developers who want to work on our build system can easily find other examples online and have available documentation to get them up to speed.\nMany complex low-level pure C projects recently switched to Meson: Mesa, Wayland, PipeWire, QEMU, and many others. We are not alone in this!\nExamples of using meson Development process As a developer when you download Rizin, you can install it for your user in ~/.local, so you don\u0026rsquo;t need root access to install files. You can do this with meson --prefix=~/.local build; ninja -C build install. After that, you can change the source code however you need and then run ninja again with ninja -C build. Only the changed files are re-built.\nMoreover, running ninja by default builds files with explicit RPATHs, which means that the executables and libraries contain direct references to the paths of dependent libraries they are linked against so the loader can then always find them without having to specify LD_LIBRARY_PATH or similar. For this reason, most of the times you will not need to re-install the Rizin files, but while developing you can just run rizin from ./build/binrz/rizin/rizin.\nRPATH are not, of course, always good. Indeed they are usually removed during the installation process. However, when you install Rizin in a place that is not /usr, we have chosen to keep RPATHs to make the installation process as simple as possible, without requiring users to mess with their environment to make sure the binaries can find the proper libraries. Packagers, who usually use /usr as a prefix, should not be affected by this decision, but they can anyway disable it by specifying -Dlocal=false when running meson.\nReviewing a PR and testing changes When testing a PR with a fix or comparing multiple changes, you need to have access to multiple versions of Rizin. Doing this with ACR/Make is of course possible, but it usually involves installing everything in separated directories and making sure your environment variables (e.g. PATH, LD_LIBRARY_PATH, etc.) are correctly set. With meson, you can build one version (e.g. from dev branch) with meson --prefix=~/.local build-dev; ninja -C build-dev, then switch branch with git checkout my-other-branch and build Rizin again with meson --prefix=~/.local build-pr; ninja -C build-pr. Due to the RPATH used by default, as mentioned above, each build directory can be used without installation to actually run the Rizin tools. At that point, you can quickly compare the results of ./build-dev/binrz/rizin/rizin and ./build-pr/binrz/rizin/rizin.\nConclusion Of course it\u0026rsquo;s not all perfect with meson either. Right now the meson build system is missing some features that were only available with ACR/Make.\nTo uninstall Rizin you have to run ninja -C build uninstall from the same build directory you used to run the install step, otherwise, it will not uninstall files. However if during install step we add any custom installation script (e.g. to sign your rizin binary in macOS), there is no counter part to actually have an uninstall script. That said, nothing prevents us from having a custom target similar to what ACR/Makefile system does to manually remove, with a script, the installed files, but we believe proper file tracking should be done by distributions and packages.\nMeson is quite new and, although rare, you may find issues from time to time. That said, its community is healthy and active so you can count on them to fix these problems as soon as possible or provide help, also thanks to the many big projects that have switched to meson in the last years.\nAll in all, we hope to make it easier for our developers and users to build Rizin. We are trying to build a good Reverse Engineering Framework and we want to focus our efforts on this rather than dealing with the limitations of a niche build system.\nIf you find issues or find particular installation setups difficult or missing, feel free to open a bug in GitHub and we will be happy to either guide you through a solution or develop the fix according to our roadmap.\n","permalink":"https://rizin.re/posts/why-meson/","summary":"Why we switched to Meson/Ninja as our main build system.","title":"Why we chose Meson as our build system"},{"content":"When manually analyzing a complex binary, possibly over the course of days, weeks or even months, it is crucial to be able to keep track of the gained knowledge through annotations such as comments, function and variable names. As such, the tool one is working with also need to provide a reliable and future-proof way to save and restore this information. One of the biggest additions in Rizin surely is the new projects feature, which provides exactly this functionality in both rizin on the command line and Cutter. In this article, we would like to give an overview of how it was designed, what exactly it promises to you, as well as the current limitations you should be aware of when using it right now.\ntl;dr Projects can be used in rizin using the Ps [\u0026lt;project.rzdb\u0026gt;] and Po \u0026lt;project.rzdb\u0026gt; commands and in Cutter through its regular user interface. Projects are currently in beta, including in any 0.x.y releases of Rizin, and will be considered stable starting with release 1.0.0. Beta means that all functionality is implemented and ready to use, but there is no guarantee that the format itself will not further change slightly and thus maybe break loading a project saved right now in a future version of Rizin. Stable means that the format is finalized and all changes inside of it will come with migrations and tests ensuring that all projects saved before are still be loaded correctly. Projects may be conceptually split into two parts: the binary that is being analyzed, and any info that has been put on top by automatic analysis or the user. Saving and loading of all analysis data on top of a binary, including flags, functions, variables, types, comments is implemented. Automatic reloading of the underlying binary is currently limited to only a single binary available as a regular file, but this will be extended to arbitrarily complex IO mappings in the future. However, even with the current state, it is possible to manually reconstruct more complex mappings and then load any analysis data on top using the Poo \u0026lt;project.rzdb\u0026gt; command. Wait, weren\u0026rsquo;t there already projects in Radare2 before? Indeed, there has been a projects feature in Radare2 since 2017. This has been removed entirely from Rizin and is now entirely replaced by the new implementation, which has been re-designed from scratch and shares no code with the old one.\nTo understand why such a radical change was necessary, let us take a closer look at how old projects were designed. They primarily consisted of a single rc file, which was a radare2 script containing regular commands that would reconstruct the session state when run. As an example, a part of such a script to load one function could look like this:\n\u0026#34;f main 127 0x080485f5\u0026#34; \u0026#34;af+ 0x080485f5 main s n\u0026#34; afb+ 0x080485f5 0x080485f5 54 0x08048655 0x0804862b afb+ 0x080485f5 0x0804862b 24 0x08048655 0x08048643 afb+ 0x080485f5 0x08048643 18 0x08048665 0xffffffffffffffff afb+ 0x080485f5 0x08048655 16 0x08048665 0xffffffffffffffff afb+ 0x080485f5 0x08048665 15 0xffffffffffffffff 0xffffffffffffffff We can see it is first creating a flag (f), then creating a function (af+) and finally adding basic blocks to it (afb+).\nWhile this general approach can work in theory, it comes with several implications:\nCommands can have side effects. As an example, until only very recently, the afb+ command would trigger a heavy function analysis loop after adding a basic block in some circumstances, creating variables, X-Refs and other information. The information coming out of this side effect would then mix with the rest of the restored session, resulting for example in unwanted variables being present after loading. Commands and their semantics can change over time. Simple changes include command name changes or the order of arguments, more complex ones may involve major restructuring of underlying concepts, thus requiring entirely different command sequences to achieve the same results. Of course, since the saving instance can not predict the future, it would be solely the responsibility of loading instance to account for such changes. However with the project being an unstructured sequence of commands that may not even be part of the codebase anymore at this point, performing such a migration is far from trivial and highly error-prone. Moreover, before rizin\u0026rsquo;s new command parser was created, there was no formal specification of the command syntax. You can see in the above example that the first af+ command is enclosed in \u0026quot;...\u0026quot;, which is to account for cases such as the function name being ma;in where otherwise the ; would be interpreted as a separator for a new command, similar as in an SQL injection, eventually resulting in broken project loading. However, this quoting scheme still fails for names such as ma\u0026quot;in. As mentioned, this could have been eventually fixed using the new command parser, which has a well-defined escaping syntax, but it still has been the source of many bugs in the past.\nOn top of all these fundamental issues comes the fact that these projects were never tested apart from very few integration tests covering only a tiny fraction of the information potentially included in a session. All these aspects combined led to a high density of bugs and uncertainty when working with this feature. If you were very lucky, the project would save and load as expected. If you were less lucky, the loading would simply result in an error. But, and this has been the most likely case, if you were unlucky, the project would load seemingly correctly, but you would notice only later that the loaded data was deeply corrupted.\nDespite some of these issues being theoretically possible to fix, the conceptual problems of using commands for projects remain. Because the ability to save a session is only even remotely useful when it can also be relied upon to always correctly restore it in the future, a different approach had to be taken here, hence requiring an entire rewrite of the feature. This new approach, detailed in the following section, takes concrete learnings from the mistakes of the previous approach and thus avoids all problems mentioned above right from the start.\nDesign Projects take a classic, fully declarative approach to store their information, saving and loading a direct dump of the internal state.\nSerialization All relevant modules and data structures now have serialization and deserialization functions added, commonly prefixed with rz_serialize_ and implemented in files called serialize_*.c, as for example serialize_flag.c in the case of flags.\nFor the target data structure, SDB is being used, which is a database that is also used in other parts of rizin. What makes SDB special is its simplicity: One SDB is simply a mapping from arbitrary string keys to string values, and multiple SDBs can be nested in a tree of namespaces. This restricted design makes SDB unsuitable for many applications, but for our projects it turned out to fit very well. Inside such an SDB, when more complex structures are needed, JSON is used. This combination of well-defined formats means we can rely on them and forget about escaping or sanitizing strings in our actual serialization code.\nFor example, the same function as in the previous example would now be serialized like this:\n/core/analysis/functions 0x80485f5={\u0026#34;name\u0026#34;:\u0026#34;main\u0026#34;,\u0026#34;bits\u0026#34;:32,\u0026#34;type\u0026#34;:4,\u0026#34;cc\u0026#34;:\u0026#34;cdecl\u0026#34;,\u0026#34;stack\u0026#34;:16,\u0026#34;maxstack\u0026#34;:32,\u0026#34;ninstr\u0026#34;:43,\u0026#34;bp_frame\u0026#34;:true,\u0026#34;bp_off\u0026#34;:8,\u0026#34;diff\u0026#34;:{},\u0026#34;bbs\u0026#34;:[134514165,134514219,134514243,134514261,134514277],\u0026#34;vars\u0026#34;:[{\u0026#34;name\u0026#34;:\u0026#34;argv\u0026#34;,\u0026#34;type\u0026#34;:\u0026#34;char **\u0026#34;,\u0026#34;kind\u0026#34;:\u0026#34;s\u0026#34;,\u0026#34;delta\u0026#34;:4,\u0026#34;arg\u0026#34;:true,\u0026#34;accs\u0026#34;:[{\u0026#34;off\u0026#34;:0,\u0026#34;type\u0026#34;:\u0026#34;r\u0026#34;,\u0026#34;sp\u0026#34;:4,\u0026#34;reg\u0026#34;:\u0026#34;esp\u0026#34;}]},{\u0026#34;name\u0026#34;:\u0026#34;var_8h\u0026#34;,\u0026#34;type\u0026#34;:\u0026#34;int32_t\u0026#34;,\u0026#34;kind\u0026#34;:\u0026#34;b\u0026#34;,\u0026#34;delta\u0026#34;:-16,\u0026#34;accs\u0026#34;:[{\u0026#34;off\u0026#34;:117,\u0026#34;type\u0026#34;:\u0026#34;r\u0026#34;,\u0026#34;sp\u0026#34;:18446744073709551608,\u0026#34;reg\u0026#34;:\u0026#34;ebp\u0026#34;}]}]} /core/analysis/blocks 0x80485f5={\u0026#34;size\u0026#34;:54,\u0026#34;jump\u0026#34;:134514261,\u0026#34;fail\u0026#34;:134514219,\u0026#34;traced\u0026#34;:true,\u0026#34;ninstr\u0026#34;:18,\u0026#34;op_pos\u0026#34;:[4,7,10,11,13,14,15,17,20,25,30,33,36,41,46,49,52],\u0026#34;stackptr\u0026#34;:16,\u0026#34;parent_stackptr\u0026#34;:0,\u0026#34;cmpval\u0026#34;:1} 0x804862b={\u0026#34;size\u0026#34;:24,\u0026#34;jump\u0026#34;:134514261,\u0026#34;fail\u0026#34;:134514243,\u0026#34;traced\u0026#34;:true,\u0026#34;ninstr\u0026#34;:9,\u0026#34;op_pos\u0026#34;:[3,6,8,11,12,17,20,22],\u0026#34;stackptr\u0026#34;:16,\u0026#34;parent_stackptr\u0026#34;:16} 0x8048643={\u0026#34;size\u0026#34;:18,\u0026#34;jump\u0026#34;:134514277,\u0026#34;traced\u0026#34;:true,\u0026#34;ninstr\u0026#34;:5,\u0026#34;op_pos\u0026#34;:[3,8,13,16],\u0026#34;stackptr\u0026#34;:16,\u0026#34;parent_stackptr\u0026#34;:16} 0x8048655={\u0026#34;size\u0026#34;:16,\u0026#34;jump\u0026#34;:134514277,\u0026#34;traced\u0026#34;:true,\u0026#34;ninstr\u0026#34;:4,\u0026#34;op_pos\u0026#34;:[3,8,13],\u0026#34;parent_stackptr\u0026#34;:16} 0x8048665={\u0026#34;size\u0026#34;:15,\u0026#34;traced\u0026#34;:true,\u0026#34;ninstr\u0026#34;:7,\u0026#34;op_pos\u0026#34;:[5,8,9,10,11,14],\u0026#34;parent_stackptr\u0026#34;:0} While this certainly is harder to read for humans, it follows a clearly defined structure and all relevant information can be extracted from it directly. This kind of serialization design now also allows unit tests to be written easily and in fact all currently implemented serializations already come with such tests, aiming to ensure that all internal state is correctly saved and loaded, down to even subtle details and corner cases.\nWhat you see above is already an example of how the serialization will eventually be saved to a file. It is a simple, text-based format that stores the SDB entries line by line and takes care of any necessary escaping. While such a text-based format may not be the most efficient representation, it turned out to be more than good enough for even larger projects and in addition has certain nice properties, which we will make use of further down. However, due to the simplicity of SDB, other file formats to store the same data are theoretically feasible too.\nVersioning An important aspect is that the possibility to correctly load a project will survive even significant updates of the software. To ensure this, a simple version-based migration approach is used: The project code contains a version number defined as RZ_DB_PROJECT_VERSION, which is simply an integer that is increased every time there is a change in the format. This number is then simply saved into every project\u0026rsquo;s metadata namespace.\nLater, when loading the same project in newer rizin that also has a higher internal project version number, it will be able to know exactly the kind of format that the old project was saved with and will be able to upgrade it by successively applying migrations, which will be implemented along every increase of the project version number.\nAt the current point in time, the version number is 1 and there are no migrations. This is because at the moment, the projects feature is considered to be in a Beta phase, allowing it to be tested thoroughly and still receive changes to the format that might turn out sensible without the additional engineering overhead of implementing migrations.\nThis means that right now, everybody is highly encouraged to test projects and report any issues that might come up, but be aware of the fact that compatibility with later rizin versions may not be guaranteed and might require small manual edits in the serialized file.\nThe Beta phase will continue throughout all 0.x.y versions of rizin and end by version 1.0.0 where projects will be considered stable, meaning that all projects should always be properly loaded in all future versions and if a case is discovered where this promise is not held, it will be considered a bug and shall be fixed.\nRe-loading of underlying binaries One of the trickiest aspects of serializing a rizin session is handling the actual underlying binary that is being analyzed. In fact, speaking of \u0026ldquo;the binary\u0026rdquo; in this context is a crude underapproximation of what is actually present in Rizin.\nIgnoring debug, three modules are working together to load files: RzIO provides a generic IO layer, which can map data coming from plugins in a 64-bit address space. RzBin takes raw files from RzIO, parses their binary file formats such as ELF or PE, also using an independent plugin for each, and eventually provides information how to then lay out the contained sections in RzIO again, along with a list of symbols and other information parsed from the binary. RzCore controls how these modules are created and work together.\nThis design makes rizin\u0026rsquo;s loading mechanism very powerful and flexible, but imposes certain challenges on serialization: How to handle all the different IO plugins? Next to the one that simply loads a regular file, there are plugins for files in zip, malloc, http, shared memory, \u0026hellip; that all need individual reconstruction logic. For regular files, how to relocate the actual file when the project is moved to another machine? From RzBin, should the symbols information also be serialized or re-parsed?\nBecause this part needs to be designed properly first and might even require some refactoring in the respective modules, its implementation has been postponed for now. But the preliminary, rough plan is the following: Every IO plugin itself provides callbacks for (de)serialization of maps created with it. All IO maps are serialized to the file using these callbacks. Information in RzBin will not be serialized but re-parsed on top of the deseralized IO maps.\nHowever, despite this full implementation being postponed, a very simple temporary solution has been implemented, which is strictly limited to the case where only a single binary is loaded from a regular file with the default loading settings, i.e. without explicitly specifying the base address for example. This makes it possible to use projects conveniently right now for the majority of use-cases. More complex cases are also already possible, as long as the loading process is done manually and the project is then loaded on top using the Poo \u0026lt;file.rzdb\u0026gt; command, as shown in the following section.\nUsage Saving and loading projects from rizin is as simple as it can be:\n[0x00000000]\u0026gt; P? Usage: P\u0026lt;so?\u0026gt; # Project management | Ps [\u0026lt;project.rzdb\u0026gt;] # Save a project | Po \u0026lt;project.rzdb\u0026gt; # Open a project | Poo \u0026lt;project.rzdb\u0026gt; # Open a project on top of currently loaded binaries Use Ps [\u0026lt;project.rzdb\u0026gt;] from a running session to save it and Po \u0026lt;project.rzdb\u0026gt; to discard the current session and load the saved one. Alternatively, a project can also be loaded directly when starting rizin like rz -p project.rzdb.\nPo and -p will also take care of loading the single, underlying binary as explained in the previous section. If this is not desired, you can use the Poo \u0026lt;project.rzdb\u0026gt; command to keep all current state of IO mappings and parsed binaries in place and only load the analysis information on top.\nIn Cutter, simply use the File -\u0026gt; Save Project... menu entry or Ctrl+s shortcut to save and the Projects tab in the initial dialog to open a project: Cutter will also ask you to save the project before quitting so no work will get lost by accident.\nFor the case explained before, where the project depends on more complex mappings than a single binary file, or if the same project should be loaded on top of another binary, the Poo \u0026lt;project.rzdb\u0026gt; can be used. For example, this is how a project can be loaded on top of two files:\n$ rizin -- # start rizin without any file [0x00000801]\u0026gt; on crackme.bin 0x7ff # load first file at 0x7ff [0x00000801]\u0026gt; on kernal.bin 0xe000 # load second file at 0xe000 [0x00000801]\u0026gt; Poo crackme.bin.rzdb # load project on top [0x00000815]\u0026gt; pd 1 # disassemble inside the first file 0x00000815 jsr CHROUT_in_kernal ; this is a call from crackme.bin into kernal [0x00000815]\u0026gt; pd 1 @ CHROUT_in_kernal # disassemble inside the second file ;-- CHROUT_in_kernal: 0x0000ffd2 jmp (0x0326) Version Control and Collaboration If you have used Ghidra before, you might have come across its \u0026ldquo;shared project\u0026rdquo; and Ghidra server, which are its strong, built-in features for collaborative reverse engineering with version control. Rizin takes a different approach to provide this functionality that is more in line with its UNIX-like focus. It does not implement version control itself, but instead creates project files in a way that they can work well with existing version control systems like git, which are well-tested and likely to already be familiar for users.\nBeing text files where independent content is generally split by lines, git already knows how to deal with tracking differences and merging for these files most of the time. This is for example a diff of a project where the current seek was changed and a comment added:\ndiff --git a/megabeets_0x1.rzdb b/megabeets_0x1.rzdb index 9c828f4..aed7e64 100644 --- a/megabeets_0x1.rzdb +++ b/megabeets_0x1.rzdb @@ -4,7 +4,7 @@ version=1 /core blocksize=0x100 -offset=0x8048370 +offset=0x8048600 /core/analysis @@ -158,6 +158,7 @@ watcom=cc 0x804859a=[{\u0026#34;type\u0026#34;:\u0026#34;t\u0026#34;,\u0026#34;str\u0026#34;:\u0026#34;char *dest\u0026#34;}] 0x80485db=[{\u0026#34;type\u0026#34;:\u0026#34;t\u0026#34;,\u0026#34;str\u0026#34;:\u0026#34;const char *s2\u0026#34;}] 0x80485e2=[{\u0026#34;type\u0026#34;:\u0026#34;t\u0026#34;,\u0026#34;str\u0026#34;:\u0026#34;const char *s1\u0026#34;}] +0x8048600=[{\u0026#34;type\u0026#34;:\u0026#34;C\u0026#34;,\u0026#34;str\u0026#34;:\u0026#34;I am putting a comment here!\u0026#34;}] 0x8048609=[{\u0026#34;type\u0026#34;:\u0026#34;t\u0026#34;,\u0026#34;str\u0026#34;:\u0026#34;const char *s\u0026#34;}] 0x8048619=[{\u0026#34;type\u0026#34;:\u0026#34;t\u0026#34;,\u0026#34;str\u0026#34;:\u0026#34;const char *s\u0026#34;}] 0x8048646=[{\u0026#34;type\u0026#34;:\u0026#34;t\u0026#34;,\u0026#34;str\u0026#34;:\u0026#34;const char *s\u0026#34;}] Examining these json-based diffs surely is not be the most convenient way to view differences for every user, but it provides a working compromise between readability for both humans and software at the same time without requiring any programs except git up to this point. In addition, we are investigating implementing custom diff- and mergetools that could be integrated into git and are fully aware of the meaning of data in project files to present and merge differences in the best way possible while still relying on an existing version control system.\nRegarding the binary that is being analyzed in a project, if desired, it can also be put into the same git repository as the project. Since projects contain a reference to the binary file relative to the project file, it can still be re-loaded when moved to another machine.\nConclusion We hope you will enjoy using rizin with its new projects feature. If you are interested, we highly encourage you to try it out, put it through its paces, and report any potentially upcoming issues, so we will be able to iron them out until the end of the beta phase!\n","permalink":"https://rizin.re/posts/introducing-projects/","summary":"An overview of the new projects feature in Rizin to save and load reversing sessions. Its design, promises and future.","title":"Introducing Projects in Rizin"},{"content":"We are excited to announce Rizin — a free and open-source Reverse Engineering framework, providing a complete binary analysis experience with features like Disassembler, Hexadecimal editor, Emulation, Binary inspection, Debugger, and more.\nRizin is a fork of radare2 with a focus on usability, stability, and working features, which strives to provide a welcoming environment for developers and users alike. Rizin was founded by a group of the core developers of radare2 and Cutter who contributed to the project in one way or the other in the past years and together constructed the Core group of radare2. With the establishment of Rizin, we are committed to creating an environment and a project which will be aligned with our values and vision.\nDuring recent years, the environment that was created in radare2 was one where many of us felt stressed, disrespected, and unwelcome. Moreover, the number of users of radare2 grew every year, and we held the ultimate responsibility to provide them a stable, usable framework. As the core developer team, we have come to the conclusion that it is impossible for us to continue to pursue the goal of making radare2 better under the current circumstances and environment, and we decided to move forward on our own and fork the project. Cutter, the Graphical User Interface for radare2, and its entire team will also join Rizin and will use it as its backend.\nRizin is a newborn project that was created from radare2, hence more and more changes and differences will appear over time. A lot of efforts were put into improving our workflows, putting more tests in place, improving the API, removing redundant features, and more. We hope to provide better consistency between releases, making the framework more trustworthy to users.\nWe are also working to create a more inclusive and diverse community that will be inviting for new contributors and users. As an initial step, we adopted a Code of Conduct that we believe is aligned with our values and with the community we want to create around Rizin.\nFinally, we know and understand that now it is our turn to prove that Rizin can become a tool you can trust and enjoy using, and a community in which you feel welcome. We invite you to read our answers to your Frequently Asked Questions and join our communities on Mattermost and other chat platforms.\n","permalink":"https://rizin.re/posts/announcing-rizin/","summary":"We are excited to announce Rizin — a free and open-source Reverse Engineering framework, providing a complete binary analysis experience.","title":"Announcing Rizin! 🎉"},{"content":"Who are you? We are a group of developers and security enthusiasts who contributed to radare2 in one way or the other in the past years. Some of us got involved with radare2 up to 8 years ago. We were, together with pancake — the original author — the maintainers of the radare2 project. We developed, handled issues, pull requests, review, CI and more. Some of us are the team who lead and maintain the Cutter project, a popular Graphic User Interface for the radare2 project. Among others, we started the development and integration of popular decompilation plugins for radare2 such as r2ghidra and r2dec.\nWhy did you fork radare2? During the years, the direction that radare2 was led to was not aligned with what we believed is the best for the project and the community. These disagreements covered many of the aspects involved in creating an open source project — technical, interpersonal, and managerial.\nWith time, the environment that was created was one where many of us felt stressed, disrespected, and unwelcome. An environment that for years affected users, contributors, and core members.\nRadare2 as a project evolved and couldn\u0026rsquo;t anymore be treated as a toy tool. With the number of users growing every year, we are in the ultimate responsibility to provide them a stable, usable framework. As the core developer team, we have come to the conclusion that it is impossible for us to continue to pursue the goal of making radare2 better under the current circumstances and environment.\nIt is natural for Open Source projects to separate to different journeys with different visions. We all want to participate and contribute to projects we are passionate about, which we believe in, feel safe and welcome, and enjoy working on. For the aforementioned reasons and others, we believe that it is better for us to move forward on our own and fork the project.\nWhat are the differences between Rizin and radare2? Rizin is a new born project that was created from radare2, hence more and more changes and differences will appear over time. With the establishment of Rizin, we are committed to create an environment and a project which will be aligned with our values and vision for an open source project and community.\nWe see it as our ultimate responsibility to provide the users with a stable and usable program that they can rely on. We will put efforts on releasing stable versions of Rizin and improving our test suite.\nIt is also in our obligation to create an environment where developers, contributors and users feel welcome and safe. For this, we put in place multiple instruments that will allow us to enforce such behavior. We adopted the Contributor Covenant Code of Conduct as we believe it is aligned with our values and with the community we want to create around Rizin. We will follow the code of conduct and enforce it on our different platforms. We started efforts of cleaning the source code from phrases that can\u0026rsquo;t be part of the environment we want to create. In addition, we will put efforts in creating a more inclusive and diverse community and welcome new contributors.\nTechnically speaking, Rizin already contains many changes that do not exist in radare2. Some of them are noted below:\nNew Projects: we replaced the existing project functionality with a new one, developed entirely from scratch, that is based on serialization of existing objects instead of replication of commands. A blog post about this new feature will soon be published, so stay tuned if you want to know more! Removal of less tested/stable features: As we strive to provide a stable tool that you can trust, we chose to remove some features that we believe are not widely used, are old or are not tested at all and thus do not provide any value in their current state. This includes features such as the embedded WebUI, m commands, old projects, the pdc command, T commands, and others. Switch to Git submodules instead of copy-pasted code: this will allow us to better track the external code used in Rizin. Deprecation of ACR/Makefile build system in favor of Meson: experience has shown that a more declarative approach as used by Meson is easier to maintain and understand. Although at the moment, the ACR/Makefile build system contains some features that Meson in Rizin is missing, it is also slow (in terms of compilation time), complicated to edit and does not support out-of-source builds. If more additions are needed, we will be able to implement them in Meson. New shell behavior and overall commands handling: We recently developed in radare2 a new way to parse user commands, register them and develop them. This feature is called cfg.newshell and it will both make the user experience more consistent and the developer experience smoother. For these reason we have improved and enabled this by default in Rizin. We will publish a separate blog post about this soon! What will happen to radare2 now? We don\u0026rsquo;t know. radare2 is a popular project with many contributors and users. The maintainer of radare2 will decide how things will proceed. Such a big move will naturally cause changes and we wish to work together to resolve them while causing the least amount of discomfort to the members of the radare2 community and the users.\nWe wish the radare2 project the best of luck.\nWhat about Cutter? The Core team of Cutter, who was also a part of radare2 Core team, left radare2 and co-founded Rizin. Following this, Cutter is switching from radare2 to Rizin as its backend. For the users of Cutter, nothing major should change. Development on Cutter will continue as usual. Changes in the organization and policies (e.g, Code of Conduct) will also apply to Cutter. Radare2 may or may not fork Cutter back to support radare2 instead and that is up to the radare2 maintainers.\nWill you contribute to radare2? As we are forking radare2, we would stop the contribution to the original project, though we expect patches to be imported from one project to the other for some time. In some cases, like a discovery of security vulnerabilities in mutual code, we would love to notify the radare2 team so users of the project will be protected.\nCan I take part and contribute to Rizin? Absolutely! We are thrilled to help you start and join Rizin. Please read our initial documentation for new contributors. Please join our Mattermost chat or #rizindev IRC channel on Libera.Chat! We hope to create better on-boarding guides for new contributors in the coming months, but for the meantime, we are here for any question you have.\nWhat actions will you make to keep Rizin a safe environment for contributors and users? The Rizin organization believes that contributors, developers and users should enjoy their time around the community and feel safe and welcome. We adopted a Code of Conduct that we believe is aligned with our values and with the community we want to create around Rizin. We will enforce it on our different platforms.\nWe started efforts of cleaning the source code from offensive phrases and comments. In addition, we will put efforts in creating a more inclusive and diverse community and welcome new contributors.\nFinally, we created the concept of teams that will be responsible for different aspects of Rizin. Such teams will also include a Community team that, among other things, will be an address for requests and complaints from community members.\nWhat is the future of Rizin? We intend to make Rizin a stable project you can trust for your reverse engineering tasks and a welcoming environment where people can work together on something they care. We will release a roadmap with the features we want to work on and the direction we will take. In the short run, you can expect refinements to the new projects and to the shell.\nHow to pronounce \u0026ldquo;Rizin\u0026rdquo;? Thanks for asking! Your browser does not support the audio element. I have more questions, where can I ask? We would love to answer your question. You can send us a message on Mattermost or email us. Please note that we do not guarantee to answer all questions, as some topics are personal or we prefer to keep for ourselves.\n","permalink":"https://rizin.re/posts/faq/","summary":"Who are you? Why did you fork radare2? What will happen to Cutter now? Our answers to your frequently asked questions.","title":"Frequently Asked Questions"},{"content":"TL;DR Jump to the Ideas list.\nIntroduction This year, we participate again, effectively continuing the tradition since 2015.\nMentors Members of the Rizin and Cutter core teams have volunteered to guide participants for GSoC’24. They have already been guiding the participants for the GSoC and RSoC in past years. Please feel free to reach out to any of them if you need any help in selecting a project.\nAnton Kochkov Mattermost: xvilka \u0026ndash; @akochkov Riccardo Schirone Mattermost: ret2libc @RickySkiro Florian Märkl Mattermost/Telegram: @thestr4ng3r \u0026ndash; @thestr4ng3r Itay Cohen Mattermost/Telegram: @Megabeets @Megabeets_ Giovanni Dante Grazioli Mattermost/Telegram: @deroad @der0ad And many others Development methodology Currently, all repositories are hosted on GitHub main organization account and bugs are tracked on GitHub issues too. We are primarily using our own Mattermost instance, IRC, and Telegram) for communication. We have a testsuite (which is running on GitHub Actions, Travis CI, AppVeyor and SourceHut) to test and verify that all the features are still working and that a pull requests or commits don\u0026rsquo;t break anything, to ensure the support of different operating systems (Linux, MacOS, Windows, FreeBSD, OpenBSD), different architectures (x86/x86_64, ARM64, PowerPC, SystemZ), and to find regressions. We encourage contributors to write test cases and documentation in order to verify the implementation and ensure that everything fits well together. For complex bugs and examples, we\u0026rsquo;re using ASCIInema to record the sessions.\nSee also our guides for corresponding projects:\nRizin Contributing Guide and Developers Intro Cutter Contributing Guide and Developers Intro For those who want to get introduced to the Rizin codebase and practices, we recommend to pick one of the easy issues for Rizin or Cutter to start with.\nLicense Rizin is modular: this means that it aims to make all the elements and features easily reusable from other projects. The choice of LGPL3 as a license is the minimum requirement to get code merged in Rizin. Contributors can choose Apache, BSD, MIT, Public Domain, or other similar licenses. The reason to exclude GPL as a valid license for the project is because we aim to support proprietary software that uses Rizin while protecting our free codebase.\nInstructions for participants Participants who want to apply to the Rizin project for the Google Summer of Code 2024 are required to submit a small pull request accomplishing one of the microtasks (see below) as part of their application. You can also choose any of the GitHub issues for Rizin if they are big enough to be a qualification task and still small enough to be finished in no more than a couple of weeks. To help participants understand how to contribute to the project, there are issues marked as \u0026ldquo;good first issue\u0026rdquo; for both Rizin and Cutter.\nProgramming languages Most of Rizin is written in C (conforming to the C99 standard), and hence, we expect participants to be familiar with C programming language. For some of our tasks or microtasks, such as rz-pm, they should know the Go programming language. For the Cutter tasks, it is a requirement to know C++ and Qt framework basics.\nRecommended steps Read Google\u0026rsquo;s instructions for participating Grab any of the projects from the list of ideas that you\u0026rsquo;re interested in (or propose your own). Write a first draft proposal using Google Docs and our template and ask one of the mentors or administrators to review it with you. Submit it using Google\u0026rsquo;s web interface. Participant proposal guidelines Keep it simple enough to fit in no more than a couple of pages. Try to be clear and concise in your writing. Try to split the entire GSoC period into tasks and each task into subtasks. It helps us understand how you plan to accomplish your goals, but more importantly, it\u0026rsquo;ll help you understand the task deep enough before starting and prioritize important things to do first. Please note how much time a day/week you can spend on this project. Please specify which category you apply for - medium task or extended deadline one. Specify your timezone so we can assign you a mentor in the same one to ease communication. Submit your proposal early, not at the last minute! Be sure to choose a “backup” idea (the second task you want to do) so that conflicts (two participants for one task) can be resolved. Project Ideas Cutter Improving usability and user experience (175 hour project) The Cutter\u0026rsquo;s backend provides many features that are not exposed or exposed in Cutter efficiently. The goal of this task would be to figure out the users\u0026rsquo; biggest pain points and address them by improving or reworking the interface. Some of the issues are already in our GitHub, while others might be figured during the cross-comparison with other tools.\nTask Add a scrollbar to the disassembly and hexdump widgets Better syntax highlight and theming Managing window/widget overlays Add information about status of the analysis, signature searching, and other operations Address various small UI problems that make user\u0026rsquo;s life harder than necessary Skills The participant should be comfortable with the C++ and be familiar with Qt framework. Basics of the design/UX would be a plus.\nDifficulty Advanced\nBenefits for the participant The participant will gain an experience of creating comfortable and efficient user interface with C++/Qt.\nBenefits for the project It will make interface and user experience more consistent, on par with Rizin itself, and other tools.\nAssess requirements for midterm/final evaluation 1st term: Add scrollbar to necessary widget, improve theming and syntax highlight Final term: Managing widgets layouts, docking; provide action status information Mentors thestr4ng3r xvilka Megabeets Links/Resources User Experience project for Cutter User Experience project for Rizin Plugins and Python High Level API (175 hour project) Our current public API to be used by plugin authors is somewhat limited. We need to improve a lot of things about our Plugins support and take it few steps ahead. This task is only about improving the public C++ and Python interface of Cutter, specifically its graphical user interface components. For a task about exposing Rizin\u0026rsquo;s API for disassembly, analysis and other purposes, see the Rizin bindings task above.\nTask Expose everything Cutter can offer for plugin authors. This includes high level API, integration of the plugin management etc. Accessing everything from Python (like Blender) - see issue #1662 Python integration and IPython console. Skills The participant should be comfortable with the C++ and Python languages, and be familiar with Qt framework\nDifficulty Advanced\nBenefits for the participant The participant will gain an experience of creating a suitable API for scripting graphical interface programs.\nBenefits for the project It will greatly improve the scripting experience, will make API more consistent and will ease creating Cutter plugins by the community. Moreover, it will simplify testing of the Cutter features.\nAssess requirements for midterm/final evaluation 1st term: Design of the high level API and required Rizin changes. Review and implement all missing API functions that are accessible as interface controls. Final term: Implement the way to show the API when hovered over some interface control, create documentation. Mentors thestr4ng3r Megabeets Links/Resources SDB Module/API for Cutter Python/Jupyter integration Jupyter plugin for Cutter Multi-Tasking and Event-driven architecture (350 hour project) The information Cutter gets about functions, strings, imports, and the analysis are all performed in Rizin and only displayed in Cutter. Currently, it is pulling most information from Rizin only on demand. This is problematic because sometimes the user performs changes (via plugins, the console widget, and more) that are affecting the information from Rizin, but Cutter doesn\u0026rsquo;t know about these changes to apply the to the UI. For example, if a user will define a new function in a Python script or via the console widget by using the Rizin command af @ \u0026lt;addr\u0026gt;, Cutter will not show this new function in the Functions widget until the user will refresh the interface manually (Edit -\u0026gt; Refresh Contents). The goal of this task is to use an event-driven architecture to overcome this limitation.\nIn addition, this task will also handle the analysis in the background feature, to allow the analysis performed by Rizin to happen while the interface is active.\nTasks The overall implementation of this task should start from Rizin by adding events to many of the functions. This can be done using rz_events. For example, add an even for function creating, for section creation, for flag deletion, for name changed, and more\nAdd events to all the relevant functions inside Rizin Add support for these events in Cutter and refresh and update the relevant widgets per each event Support analysis in the background and allow the user to start its session while Rizin is analyzing (see #1856, #1574) Skills The participant should be comfortable with the C++ for Cutter and C for Rizin. They should also be familiar with Qt framework. Experience in GUI code architecture, for example using functional reactive programming or Elm-like approaches is a plus.\nDifficulty Advanced\nBenefits for the participant The participant will gain an experience of creating complex event-driven software in both C and C++ languages.\nBenefits for the project It will allow to work on big files effortlessly in Cutter, will improve analysis quality as well.\nAssess requirements for midterm/final evaluation 1st term: Implement events everywhere in the relevant places across Rizin code and event-driven interaction with Cutter. Final term: Add support for the Cutter interface refresh based on the events from Rizin, implement analysis in background. Mentors thestr4ng3r Heap viewer completion (175 hour project) Thanks to the work that was done in the previous GSoC, Cutter and Rizin have nice visualizations of the heap and memory maps. We would like to expand on this feature with performance improvements to the heap parsers and support more memory allocators.\nTask Complete Cutter\u0026rsquo;s implementation of the windows heap widget #2723 Improve the performance of the Windows heap parser Fix Windows heap parsing errors Make the implementation work with remote debugging modes Skills The participant should be comfortable with the C++, and be familiar with Qt framework\nDifficulty Medium\nBenefits for the participant The participant will gain the understanding on how modern runtimes provide the heap for various programs, which will be beneficial for the binary exploitation skills.\nBenefits for the project It will greatly improve the debugging and reverse engineering experience for complex programs, also provides the way to design the exploitation techniques with the help of Rizin/Cutter.\nAssess requirements for midterm/final evaluation 1st term: Design and implement heap visualization widgets, add Rizin test and fixes Final term: Various bugfixes related to the heap inspection support on various platforms and allocators, tests and documentation. Mentors xvilka Megabeets Links/Resources Issue #1041 Heap Viewer plugin for IDA Pro Heap parsing for MacOS, tmalloc, jmalloc Dynamic Allocator Detection \u0026ldquo;heap\u0026rdquo;-marked Rizin issues Diffing mode (175 hour project) Binary diffing is one of the most common tasks for the reverse engineer. There are many tools available, but most of them are either detached from the main RE toolbox or poorly integrated. Rizin provides basic diffing features out of the box with rz-diff tool, but Cutter has no interface to represent similar functionality.\nTask Expose basic rz-diff features in the Cutter Create the interface to choose two files for diffing Create the way to show the differences in all main widgets: Hexadecimal view Disassembly view Graph view Pseudocode view Skills The participant should be comfortable with the C++ language, and be familiar with Qt framework\nDifficulty Medium\nBenefits for the participant The participant will gain an experience of creating efficient graphical interfaces.\nBenefits for the project It will greatly benefit the project since Cutter will be the only FOSS RE tool to provide this feature out of the box.\nAssess requirements for midterm/final evaluation 1st term: Expose the rz-diff features in the Cutter core and create the interface for opening files for diffing. Implement the diff modes for hexadecimal and disassembly views. Final term: Implement the diff modes for graph and pseudocode views, create the documentation. Mentors xvilka deroad Links/Resources Issue #1104 BinDiff Diaphora Rizin RzIL uplifting migration (350 hour project) Rizin has had an intermediate language for over a decade. Major architectures are supporting uplifting to ESIL. During the RSoC 2021, the initial version of the new intermediate language, which is based on the BAP\u0026rsquo;s Core Theory was implemented. In the following years it was improved and some of the architectures were ported to use RZIL instead of ESIL. The main goal of this project is to finish the migration of one or more existing architectures that still use ESIL or add a RzIL support for the architectures that hadn\u0026rsquo;t any uplifting at all.\nTasks Implement a RZIL uplifting for any non-trivial architecture, preferably that is supported by ESIL already Improve the integration with analysis (variables and types differences) for the chosen architecture Write the test cases for Rizin regression tests and improve the results. Update and use rz-tracetest for the chosen architectures Implement necessary commands and APIs in Rizin for visual representation of the IL tree Implement standard and graph views in Cutter for the IL output (optional) Due to the sensivity of uplifting to the precision, it\u0026rsquo;s important to follow these steps:\nFor every single lifted opcode, have at the very least one asm test in test/db/asm/... containing the IL to detect changes when the code is changed and have it type-checked. This should produce 100% coverage on the lifter C code (except e.g. malloc() error handling). Run rz-tracetest on real traces. It\u0026rsquo;s also possible to write custom assembly programs that execute specific obscure instructions where it\u0026rsquo;s hard to be sure that they were implemented correctly on many random inputs and then feed these executions into rz-tracetest. Few rz-test command tests that emulate some code snippets in rizin. For example a simple decryption loop to check the overall integration in rizin, or for specific edge cases (like running a division by zero). Skills The participant should know C and bits of C++ as well as be familiar with basics of the program analysis. Having an experience with other intermediate language, SAT/SMT, and mathematical logic is a plus.\nDifficulty Medium\nBenefits for the participant The participant will understand the state of the art of intermediate languages research, it\u0026rsquo;s relation to the mathematical logic, SMT, and program analysis. Moreover, the participant will become familiar with the both symbolic and concrete emulation during the implementation process.\nBenefits for the project Migrating most architectures will help to deprecate and remove outdated ESIL and will help improving the analysis precision. Adding uplifting for new architectures that weren\u0026rsquo;t even supported by ESIL will improve the analysis to even greater degree.\nAssess requirements for midterm/final evaluation 1st term: finish the RZIL uplifting for the chosen architecture with basic instruction-level tests Final term: implement all changes in the analysis code, added more complex integration tests with types analysis Mentors xvilka thestr4ng3r Links/Resources RZIL-labeled issues ESIL-labeled issues ESIL to RZIL conversion tracking issue Cutter: IL output and graph visual representation Debugger improvements and portability (175 hour project) Rizin debugger already supports most of the platforms, including native and remote debugging. Nevertheless, for most platforms it\u0026rsquo;s limited mostly to the x86/x86_64 and ARMv8, often lacking the tests. The task would be to add missing architectures to the native debugger, e.g. MIPS to the Linux Native, ARMv7/ARMv8 to the FreeBSD, System Z debugger for Linux, HPPA debugger for Linux, VAX debugger for NetBSD, and so on. Moreover, some information isn\u0026rsquo;t available during the debugging mode, e.g. source-level breakpoints or names, it would be necessary to make sure debug commands understand those.\nWith the help of emulators like QEMU and OpenSIMH we could extend our CI to automatically test these debuggers.\nTask Integrated source-level information loaded from DWARF or PDB into debug commands and print p commands Support for missing architectures that are supported by Rizin statically in the Linux native debugger Support for missing architectures that are supported by Rizin statically in the BSD native debugger Cover more platforms supported by the debugger with automated tests, with CI whenever it\u0026rsquo;s possible Fix the bugs in debuggers, minor refactorings of the code Skills Good knowledge of the C language Some experience in debugging with GDB or LLDB Difficulty Hard\nBenefits for the participant Participant will understand how debugging works on the low level, and will gain experience with variety of different platforms and operating systems.\nAssess requirements for midterm/final evaluation 1st term: `SystemZ, MIPS, HPPA support in Linux native, remote GDB debuggers Final term: ARM and SPARC support in *BSD debuggers, VAX support in NetBSD Mentors xvilka thestr4ng3r ret2libc Links/Resources Debug-labeled issues RzDebug-labaled issues New Platform support New Architecture support FRIDA integration (175 hour project) FRIDA is the famous dynamic instrumentation toolkit that is immensely popular among mobile device researches. Rizin could be easily integrated with Frida by creating a plugin that will allow to connect to the Frida instance, receive traces, set breakpoints, get information and events from it.\nTask Create the basic plugin that allows attaching, spwaning, launching processes within Frida loco ally Support remote connection Add feature to receive information from the Frida instanced Add breakpoints and run/step/continue feature\u0026rsquo;s Support calling functions and scripts in the context of the instrumented process Skills Participant should know C as well as have the experience of working with debuggers.\nDifficulty Hard\nBenefits for the participant Participant will understand and learn how to use Frida toolkit, also the internals of the debugging and instrumentation processes.\nAssess requirements for midterm/final evaluation 1st term: Implement core of the FRIDA plugin, allowing local and remote debugging features Final term: Add support for extended features like calling functions or scripts within the context Mentors xvilka thestr4ng3r wargio Links/Resources FRIDA FRIDA (GitHub) r2frida Rewriting GPL-only code (175 hour project) Currently some of the Rizin main features rely on the GPL-only code copied from binutils or GCC. The goal is to rewrite all this code from GPL-only to LGPL or any other less restrictive license. It is quite important for better adoption of Rizin as a library in other FOSS and commercial projects.\nTasks Rewrite C++ demangler to and remove the GPL code Rewrite some of the mainstream architectures that still rely on binutils without using GPL-only code Good example of such architectures are:\nSPARC (there is already capstone-based RzAsm and RzAnalysis plugin but it\u0026rsquo;s less complete than binutils-based one, capstone update is also required) Xtensa (It\u0026rsquo;s better to implement/update it in Capstone) ARC (Same, better to implement/update it in Capstone) Lanai CRIS VAX Skills Participant should know C and basics of C++ for understanding the mangling scheme\nDifficulty Medium\nBenefits for the participant Participant will understand how C++ type information is stored in the name of the methods and classes.\nAssess requirements for midterm/final evaluation 1st term: Basic demangling for C++ is rewritten under less restrictive license. Final term: At least one binutils-based architecture is reimplemented with more permissive license. Mentors xvilka thestr4ng3r wargio Links/Resources rizin: rewrite/remove GPL-only code rz-libdemangle: rewrite/remove GPL-only code Update binutils code to latest Exploitation capabilities improvements (175 hour project) Since modern architectures are now enforcing W^X, exploiters are using ROP. (Un)fortunately, building ROP chain by hand can be tedious, this is why some tools can be used to ease this construction: ImmunityDBG has mona.py, there is also ROPgadget and dropper.There exist even tools that can generate ROP chains automatically, for example exrop. It\u0026rsquo;s a shame that despite having RzIL, Rizin doesn\u0026rsquo;t have something similar yet. One of the possible solutions would be to build an external plugin or tool which will reuse power of librz and rz-gg. Moreover it makes sense to think about SROP, COOP and BROP support.\nThe rz-gg tool while has the ability to create a custom shellcode has the outdated database of the shellcodes, so updating them is crucial for the tool to be relevant.\nTask Update the shellcodes database, improve rz-gg features and documentation Implement a ropchain syntax parser that uses rz-gg or a custom DSL, something like: register reg1 = 0; register reg2 = whatever; register reg3 = reg1 + reg2; system(reg3); Write a compiler which uses SMT solver (like Z3 for example) to produce the ropchain. Support main architectures - x86, ARM, MIPS, PowerPC Skills The participant should be comfortable with the C language, know some assembly and a high-level language. Also, knowing a little bit of automatic binary analysis wouldn’t hurt.\nDifficulty Advanced\nBenefits for the participant The participant will improve their skills in software exploitation and solvers.\nBenefits for the project This feature would greatly help during exploits development, and people would be able to ditch mona.py for Rizin ;)\nAssess requirements for evaluation 1st term: Creating the language for defining the ROP chain semantics and integrating it with SMT solver Final term: Working ropchain compiler, covered by tests and documented in the Rizin book. Mentors xvilka ret2libc Links/Resources ROPGadget Ropper Angrop ROPC exrop roper2 mona.py from corelan Hunting for ROP Gadgets in Style (2012) dropper a BARF-based rop chain generator Materials about the exloitation workshop at Hack.lu 2014 Slides for the exploitation part of workshop at Hack.lu 2015 RzEgg related bugs Microtasks When taking any of microtasks please be sure someone isn\u0026rsquo;t already working on them, and let us know if you are going to work on a particular one.\nCutter UX improvements There are many small issues and missing features that when implemented will improve the user experience significantly:\nAllow adding new flags from hexdump Scrollbar inside disasssembly windows Variables and values popup widgets on mouse hover Allow to set RzRun profiles from the GUI during debugging Double-click on the type in Disasm and Graph widgets should switch to the Types windows and show the selected type Set breakpoint inside X-Refs window Unified dialogue to set debug symbols servers See full list at our User Experience project covering all parts of RizinOrg: Rizin, Cutter, RzGhidra, rz-pm.\nFile formats Implementing the support for any new file format counts as a microtask. See New File-Format label for pending issues.\nDisassemblers and assemblers Implementing the support for any new architecture counts as a microtask. See New-Architecture label for pending issues.\nTwo notable examples are updating existing bytecode plugins to support newer versions of the respective languages:\nSupport for the Lua 5.2 language changes Support for the Python 3.11 and 3.12 language changes Analysis The current code analysis has many caveats and issues which need addressing. Fixing them and writing more tests is important to stabilize and enhance rizin\u0026rsquo;s analysis engine.\nSee these issues on our GitHub dashboard.\nHeap analysis #157 Currently Rizin has support for heap exploration and analysis, but the feature is still basic and can be improved. Additionally, other allocators can be added (MacOS, tmalloc, etc.), but this should be done after a proper refactoring, because heap analysis shouldn\u0026rsquo;t depend on the debugger backend, and we may be able to use different heap tools.\nClass analysis for C++/ObjectiveC/Swift/Dlang/Java #416 Analysis classes, accessible under the ac command, is a relatively new feature of rizin. They provide a way to both manually and automatically manage and use information about classes in the binary.\nDevirtualize method calls using class vtables #414 Consider the following call: call dword [eax + 0x6c] Let\u0026rsquo;s assume eax is the base pointer of a vtable we have saved in class analysis and we want to find out the actual address of the called method.\nSo there should be a command that takes the offset (in this case 0x6c) and looks up the actual destination. It should be possible to call this command with a specific class, so it only looks into its vtable, or without a class, so it gives a list of possible destinations for all vtables that are not too small for the offset.\nWhen that is implemented, one could also add a command that does the same thing, but automatically takes the offset from the opcode at the current seek.\nAdd classes list to Vb Vb already supports browsing bin classes. The same thing should be implemented for classes from analysis.\nRizin legacy code refactoring Use internal API instead of commands Currently, Rizin\u0026rsquo;s source code is rife with calls to rz_core_cmd()-like functions that run the Rizin command. While it is a useful shortcut for developer, it makes a good source of the potential bugs in case of the command syntax or behavior change. If these changes happen they are invisible to the compiler, so it cannot warn on the changed syntax. It isn\u0026rsquo;t the case of changed function arguments count or type. Thus, all these calls eventually should be substituted with direct calls to the corresponding API functions. If there is no corresponding API function, then one should be created. Good examples of such cases are:\nRefactor Graph processing from commands to the API use Refactor Visual mode from commands to the API use Refactor Panels mode from commands to the API use In general you can just search for rz_core_cmd pattern in any place inside librz/.\nMiscellaneous Shell (dietline) improvements Currently Rizin uses its own readline-compatible implementation of the input handling in the embedded shell that is compact and portable between all supported platforms. It supports both Emacs and Vi modes, but not all bindings and features are supported. Some are omitted by choice, but some were simply not implemented. See the \u0026ldquo;dietline\u0026rdquo;-labeled issues.\nImproving regression suite and testing It is required to solve numerous issues, along with improving parallel execution and performance. The next interesting idea is to setup and reuse Godbolt compilation engine for generating tests for different compilers and compilation options. There is even a command line tool for interacting with Godbolt - cce.\nUnbreaking broken tests Almost one thousand of tests marked as \u0026ldquo;broken\u0026rdquo; in our testsuite. The task is to take any of those, investigate why it fails, if the test makes sense now or already irrelevant today. Then to try to fix some of the broken tests.\nRzGhidra There are many small issues in the decompiler output:\npdgsd commands showing incorrect P-code Improvements in recovering jump tables rz-ghidra can\u0026rsquo;t detect string Ghidra Decompiler Error: Could not finish collapsing block structure Mishandled tail jump with relocation inside the jump function Prioritize keeping vars with lower addresses Minor improvements for the SLEIGH plugin Some of these issues might be related on how Rizin and RzGhidra integrate and might require changes in the Rizin side.\nAlso note that most of these issues should be paired with the test to verify it will not break in the future.\n","permalink":"https://rizin.re/gsoc/2024/","summary":"TL;DR Jump to the Ideas list.\nIntroduction This year, we participate again, effectively continuing the tradition since 2015.\nMentors Members of the Rizin and Cutter core teams have volunteered to guide participants for GSoC’24. They have already been guiding the participants for the GSoC and RSoC in past years. Please feel free to reach out to any of them if you need any help in selecting a project.\nAnton Kochkov Mattermost: xvilka \u0026ndash; @akochkov Riccardo Schirone Mattermost: ret2libc @RickySkiro Florian Märkl Mattermost/Telegram: @thestr4ng3r \u0026ndash; @thestr4ng3r Itay Cohen Mattermost/Telegram: @Megabeets @Megabeets_ Giovanni Dante Grazioli Mattermost/Telegram: @deroad @der0ad And many others Development methodology Currently, all repositories are hosted on GitHub main organization account and bugs are tracked on GitHub issues too.","title":"GSoC 2024"},{"content":"TL;DR Jump to the Ideas list.\nIntroduction This year is the third time we participate as Rizin, effectively continuing the tradition since the year 2015.\nMentors Members of the Rizin and Cutter core teams have volunteered to guide participants for GSoC’23. They were already guiding the participants for the GSoC and RSoC in past years. Please feel free to reach out to any of them in case you need any help in selecting a project.\nAnton Kochkov Mattermost: xvilka \u0026ndash; @akochkov Riccardo Schirone Mattermost: ret2libc @RickySkiro Florian Märkl Mattermost/Telegram: @thestr4ng3r \u0026ndash; @thestr4ng3r Itay Cohen Mattermost/Telegram: @Megabeets @Megabeets_ Giovanni Dante Grazioli Mattermost/Telegram: @deroad @der0ad And many others Development methodology Currently, all repositories are hosted on GitHub main organization account, bugs are tracked on GitHub issues too. We are mostly using our own Mattermost instance, IRC, and Telegram) for communication. We have a testsuite (which is running on GitHub Actions, Travis CI, AppVeyor and SourceHut) to test and verify that all the features are still working and that a pull requests or commits don\u0026rsquo;t break anything, to ensure the support of different operating systems (Linux, MacOS, Windows, FreeBSD, OpenBSD), different architectures (x86/x86_64, ARM64, PowerPC, SystemZ), and to find regressions. We encourage contributors to write test cases and documentation in order to verify the implementation and ensure that everything fits well together. For complex bugs and examples we\u0026rsquo;re using ASCIInema for recording the sessions.\nSee also our guides for corresponding projects:\nRizin Contributing Guide and Developers Intro Cutter Contributing Guide and Developers Intro For those who want to get introduced to the Rizin codebase and practices, we recommend to pick one of the easy issues for Rizin or Cutter to start with.\nLicense Rizin is modular: this means that it aims to make all the elements and features easily reusable from other projects. The choice of LGPL3 as a license is the minimum requirement to get code merged in Rizin. Contributors can choose Apache, BSD, MIT, Public Domain, or other similar licenses. The reason to exclude GPL as a valid license for the project is because we aim to support proprietary software that uses Rizin, while protecting our free codebase.\nInstructions for participants It is a requirement that participants who want to apply to the Rizin project for the Google Summer of Code 2023 should submit a small pull request accomplishing one of the microtasks (see below) as part of their application. Though you can also choose any of the GitHub issues for Rizin if they are big enough to be a qualification task, and still small enough to be finished no more than in a couple of weeks. To help participants to understand how to contribute to the project there are issues marked as \u0026ldquo;good first issue\u0026rdquo; for both Rizin and Cutter.\nProgramming languages Most of Rizin is written in C (conforming C99 standard) and hence we expect participants to be familiar with C programming language. For some of our tasks or microtasks, such as rz-pm, they should know the Go programming language. For the Cutter tasks, it is a requirement to know C++ and Qt framework basics.\nRecommended steps Read Google\u0026rsquo;s instructions for participating Grab any of the project from the list of ideas that you\u0026rsquo;re interested in (or propose your own). Write a first draft proposal using Google Docs and our template and ask one of the mentors or administrators to review it with you. Submit it using Google\u0026rsquo;s web interface. Participant proposal guidelines Keep it simple enough to fit in no more than a couple of pages. Try to be clear and concise in your writing. Try to split the entire GSoC period into tasks, and each task into subtasks. It helps us to understand how you plan to accomplish your goals, but more importantly, it\u0026rsquo;ll help you to understand the task deep enough before starting, and prioritize important things to do first. Please note how much time a day/week you are able to spend on this project. Please specify which category you apply for - medium task or extended deadline one. Specify your timezone, since so we can assign you a mentor in the same one, to ease communication. Submit your proposal early, not in the last minute! Be sure to choose a “backup” idea (the second task you want to do), so that conflicts (two participants for one task) can be resolved. Project Ideas Cutter Improving usability and user experience (175 hour project) The Cutter\u0026rsquo;s backend provides a lot of features that are not exposed or exposed in Cutter efficiently. The goal of this task would be to figure out the biggest pain points of the users and address them by improving or reworking the interface. Some of the issues are already in our GitHub, while other might be figured during the cross-comparison with other tools.\nTask Add a scrollbar to the disassembly and hexdump widgets Better syntax highlight and theming Managing window/widget overlays Add information about status of the analysis, signature searching, and other operations Address various small UI problems that make user\u0026rsquo;s life harder than necessary Skills The participant should be comfortable with the C++ and be familiar with Qt framework. Basics of the design/UX would be a plus.\nDifficulty Advanced\nBenefits for the participant The participant will gain an experience of creating comfortable and efficient user interface with C++/Qt.\nBenefits for the project It will make interface and user experience more consistent, on par with Rizin itself, and other tools.\nAssess requirements for midterm/final evaluation 1st term: Add scrollbar to necessary widget, improve theming and syntax highlight Final term: Managing widgets layouts, docking; provide action status information Mentors thestr4ng3r xvilka Megabeets Links/Resources User Experience project for Cutter User Experience project for Rizin Plugins and Python High Level API (175 hour project) Our current public API to be used by plugin authors is somewhat limited. We need to improve a lot of things about our Plugins support and take it few steps ahead. This task is only about improving the public C++ and Python interface of Cutter, specifically its graphical user interface components. For a task about exposing Rizin\u0026rsquo;s API for disassembly, analysis and other purposes, see the Rizin bindings task above.\nTask Expose everything Cutter can offer for plugins authors. This includes high level API, integration of the plugin management etc. Accessing everything from Python (like Blender) - see issue #1662 Python integration and IPython console. Skills The participant should be comfortable with the C++ and Python languages, and be familiar with Qt framework\nDifficulty Advanced\nBenefits for the participant The participant will gain an experience of creating a suitable API for scripting graphical interface programs.\nBenefits for the project It will greatly improve the scripting experience, will make API more consistent and will ease creating Cutter plugins by the community. Moreover, it will simplify testing of the Cutter features.\nAssess requirements for midterm/final evaluation 1st term: Design of the high level API and required Rizin changes. Review and implement all missing API functions that are accessible as interface controls. Final term: Implement the way to show the API when hovered over some interface control, create documentation. Mentors thestr4ng3r Megabeets Links/Resources SDB Module/API for Cutter Python/Jupyter integration Jupyter plugin for Cutter Multi-Tasking and Event-driven architecture (350 hour project) The information Cutter gets about functions, strings, imports, and the analysis are all performed in Rizin and only displayed in Cutter. Currently, it is pulling most information from Rizin only on demand. This is problematic because sometimes the user performs changes (via plugins, the console widget, and more) that are affecting the information from Rizin, but Cutter doesn\u0026rsquo;t know about these changes to apply the to the UI. For example, if a user will define a new function in a Python script or via the console widget by using the Rizin command af @ \u0026lt;addr\u0026gt;, Cutter will not show this new function in the Functions widget until the user will refresh the interface manually (Edit -\u0026gt; Refresh Contents). The goal of this task is to use an event-driven architecture to overcome this limitation.\nIn addition, this task will also handle the analysis in the background feature, to allow the analysis performed by Rizin to happen while the interface is active.\nTasks The overall implementation of this task should start from Rizin by adding events to many of the functions. This can be done using rz_events. For example, add an even for function creating, for section creation, for flag deletion, for name changed, and more\nAdd events to all the relevant functions inside Rizin Add support for these events in Cutter and refresh and update the relevant widgets per each event Support analysis in the background and allow the user to start its session while Rizin is analyzing (see #1856, #1574) Skills The participant should be comfortable with the C++ for Cutter and C for Rizin. They should also be familiar with Qt framework. Experience in GUI code architecture, for example using functional reactive programming or Elm-like approaches is a plus.\nDifficulty Advanced\nBenefits for the participant The participant will gain an experience of creating complex event-driven software in both C and C++ languages.\nBenefits for the project It will allow to work on big files effortlessly in Cutter, will improve analysis quality as well.\nAssess requirements for midterm/final evaluation 1st term: Implement events everywhere in the relevant places across Rizin code and event-driven interaction with Cutter. Final term: Add support for the Cutter interface refresh based on the events from Rizin, implement analysis in background. Mentors thestr4ng3r Heap viewer completion (175 hour project) Thanks to the work that was done in the previous GSoC, Cutter and Rizin have nice visualizations of the heap and memory maps. We would like to expand on this feature with performance improvements to the heap parsers and support more memory allocators.\nTask Complete Cutter\u0026rsquo;s implementation of the windows heap widget #2723 Improve the performance of the Windows heap parser Fix Windows heap parsing errors Make the implementation work with remote debugging modes Skills The participant should be comfortable with the C++, and be familiar with Qt framework\nDifficulty Medium\nBenefits for the participant The participant will gain the understanding on how modern runtimes provide the heap for various programs, which will be beneficial for the binary exploitation skills.\nBenefits for the project It will greatly improve the debugging and reverse engineering experience for complex programs, also provides the way to design the exploitation techniques with the help of Rizin/Cutter.\nAssess requirements for midterm/final evaluation 1st term: Design and implement heap visualization widgets, add Rizin test and fixes Final term: Various bugfixes related to the heap inspection support on various platforms and allocators, tests and documentation. Mentors xvilka Megabeets Links/Resources Issue #1041 Heap Viewer plugin for IDA Pro Heap parsing for MacOS, tmalloc, jmalloc Dynamic Allocator Detection \u0026ldquo;heap\u0026rdquo;-marked Rizin issues Diffing mode (175 hour project) Binary diffing is one of the most common tasks for the reverse engineer. There are many tools available, but most of them are either detached from the main RE toolbox or poorly integrated. Rizin provides basic diffing features out of the box with rz-diff tool, but Cutter has no interface to represent similar functionality.\nTask Expose basic rz-diff features in the Cutter Create the interface to choose two files for diffing Create the way to show the differences in all main widgets: Hexadecimal view Disassembly view Graph view Pseudocode view Skills The participant should be comfortable with the C++ language, and be familiar with Qt framework\nDifficulty Medium\nBenefits for the participant The participant will gain an experience of creating efficient graphical interfaces.\nBenefits for the project It will greatly benefit the project since Cutter will be the only FOSS RE tool to provide this feature out of the box.\nAssess requirements for midterm/final evaluation 1st term: Expose the rz-diff features in the Cutter core and create the interface for opening files for diffing. Implement the diff modes for hexadecimal and disassembly views. Final term: Implement the diff modes for graph and pseudocode views, create the documentation. Mentors xvilka deroad Links/Resources Issue #1104 BinDiff Diaphora Rizin RzIL uplifting migration (350 hour project) Rizin has had an intermediate language for over a decade. Major architectures are supporting uplifting to ESIL. During the RSoC 2021, the initial version of the new intermediate language, which is based on the BAP\u0026rsquo;s Core Theory was implemented. In the following years it was improved and some of the architectures were ported to use RZIL instead of ESIL. The main goal of this project is to finish the migration of one or more existing architectures that still use ESIL or add a RzIL support for the architectures that hadn\u0026rsquo;t any uplifting at all.\nTasks Implement a RZIL uplifting for any non-trivial architecture, preferably that is supported by ESIL already Improve the integration with analysis (variables and types differences) for the chosen architecture Write the test cases for Rizin regression tests and improve the results. Update and use rz-tracetest for the chosen architectures Implement necessary commands and APIs in Rizin for visual representation of the IL tree Implement standard and graph views in Cutter for the IL output (optional) Due to the sensivity of uplifting to the precision, it\u0026rsquo;s important to follow these steps:\nFor every single lifted opcode, have at the very least one asm test in test/db/asm/... containing the IL to detect changes when the code is changed and have it type-checked. This should produce 100% coverage on the lifter C code (except e.g. malloc() error handling). Run rz-tracetest on real traces. It\u0026rsquo;s also possible to write custom assembly programs that execute specific obscure instructions where it\u0026rsquo;s hard to be sure that they were implemented correctly on many random inputs and then feed these executions into rz-tracetest. Few rz-test command tests that emulate some code snippets in rizin. For example a simple decryption loop to check the overall integration in rizin, or for specific edge cases (like running a division by zero). Skills The participant should know C and bits of C++ as well as be familiar with basics of the program analysis. Having an experience with other intermediate language, SAT/SMT, and mathematical logic is a plus.\nDifficulty Medium\nBenefits for the participant The participant will understand the state of the art of intermediate languages research, it\u0026rsquo;s relation to the mathematical logic, SMT, and program analysis. Moreover, the participant will become familiar with the both symbolic and concrete emulation during the implementation process.\nBenefits for the project Migrating most architectures will help to deprecate and remove outdated ESIL and will help improving the analysis precision. Adding uplifting for new architectures that weren\u0026rsquo;t even supported by ESIL will improve the analysis to even greater degree.\nAssess requirements for midterm/final evaluation 1st term: finish the RZIL uplifting for the chosen architecture with basic instruction-level tests Final term: implement all changes in the analysis code, added more complex integration tests with types analysis Mentors xvilka thestr4ng3r Links/Resources RZIL-labeled issues ESIL-labeled issues ESIL to RZIL conversion tracking issue Cutter: IL output and graph visual representation Debug information handling improvements (175 hour project) Rizin already supports most of the DWARF and PDB features, including cross-platform parsing of both. However information are usually just printed to aid the reverse engineering process, but they are not actually used at their best. For example, you can\u0026rsquo;t use them to configure a breakpoint, nor they can be used to access variables within a function during debugging. Moreover, it is becoming more and more common to store DWARF information in separate files, either shipped as separate file or downloaded on the fly with debuginfod. Rizin does not support these kind of DWARF files yet.\nYour task would be to improve the parsing support of both by fixing smaller bugs, add support for separate DWARF files and debuginfod and enhance breakpoint integration and variable/structure printing in debugging mode with the source information gathered from DWARF/PDB.\nTask Support loading DWARF information from separate files and debuginfod Unify source lines/types information access for DWARF, PDB, dSYM and refactor/fix parsing code as necessary Integrate source line and types/variables information with the analysis (optional) Integrate source line and types/variables with printing with p commands in the debug mode Integrate source line and types/variables with breakpoint commands and APIs Parsing performance improvements Skills Good knowledge of the C language Some experience in debugging with GDB or LLDB Basic knowledge of at least one of the following formats: ELF, DWARF, PDB, PE Difficulty Hard\nBenefits for the participant Participant will understand how high-level features of debuggers work as well as gain skills in the field of software architecture of a large, modular C project.\nAssess requirements for midterm/final evaluation 1st term: debuginfod and source line information refactoring are implemented Final term: Integration of variable information with the debug and printing commands is implemented Mentors xvilka thestr4ng3r ret2libc Links/Resources Loading debug information from debuginfod Unify code of source information access for DWARF, PDB, dSYM Ghidra issue: support DWARF in MinGW PE binaries Debuginfod Debian Debuginfod Fedora Debuginfod DWARF-labeled issues PDB-labaled issues Debugger improvements and portability (175 hour project) Rizin debugger already supports most of the platforms, including native and remote debugging. Nevertheless, for most platforms it\u0026rsquo;s limited mostly to the x86/x86_64 and ARMv8, often lacking the tests. The task would be to add missing architectures to the native debugger, e.g. MIPS to the Linux Native, ARMv7/ARMv8 to the FreeBSD, System Z debugger for Linux, HPPA debugger for Linux, VAX debugger for NetBSD, and so on.\nWith the help of emulators like QEMU and SIMH we could extend our CI to automatically test these debuggers.\nTask Support for missing architectures that are supported by Rizin statically in the Linux native debugger Support for missing architectures that are supported by Rizin statically in the BSD native debugger Cover more platforms supported by the debugger with automated tests, with CI whenever it\u0026rsquo;s possible Fix the bugs in debuggers, minor refactorings of the code Skills Good knowledge of the C language Some experience in debugging with GDB or LLDB Difficulty Hard\nBenefits for the participant Participant will understand how debugging works on the low level, and will gain experience with variety of different platforms and operating systems.\nAssess requirements for midterm/final evaluation 1st term: `SystemZ, MIPS, HPPA support in Linux native, remote GDB debuggers Final term: ARM and SPARC support in *BSD debuggers, VAX support in NetBSD Mentors xvilka thestr4ng3r ret2libc Links/Resources Debug-labeled issues RzDebug-labaled issues New Platform support New Architecture support Thread-safety and multithreading (175 hour project) Currently Rizin is not thread safe completely internally and as a library for a multithreaded application. The goal of this project is to eliminate global states and use contexts, eliminate singletons, e.g. RzCons, and use thread-safe external functions and dependencies.\nTask Migrate from thread-unsafe system and external dependencies Eliminate global state inside RzCons and use of the singleton Make RzBin thread-safe Make RzAnalysis thread-safe Make RzCore thread-safe Add tests for using multiple RzCore and RzAnalysis instances Parallelize some of the RzAnalysis function using the threading API Skills Participant should know C as well as have the experience of developing multithreaded applications.\nDifficulty Hard\nBenefits for the participant Participant will understand the hurdles of multithreaded programming, data synchronization, locks and debugging of such code.\nAssess requirements for midterm/final evaluation 1st term: Eliminate thread-unsafe dependencies and remove global state from RzCons and RzBin Final term: Make RzAnalysis and RzCore (optionally) thread-safe Mentors xvilka thestr4ng3r wargio Links/Resources Migrate from wcstombs() function since it\u0026rsquo;s not thread-safe Rewriting GPL-only code (175 hour project) Currently some of the Rizin main features rely on the GPL-only code copied from binutils or GCC. The goal is to rewrite all this code from GPL-only to LGPL or any other less restrictive license. It is quite important for better adoption of Rizin as a library in other FOSS and commercial projects.\nTasks Rewrite C++ demangler to and remove the GPL code Rewrite some of the mainstream architectures that still rely on binutils without using GPL-only code Good example of such architectures are:\nSPARC (there is already capstone-based RzAsm and RzAnalysis plugin but it\u0026rsquo;s less complete than binutils-based one) Xtensa ARC HPPA (PA-RISC) Skills Participant should know C and basics of C++ for understanding the mangling scheme\nDifficulty Medium\nBenefits for the participant Participant will understand how C++ type information is stored in the name of the methods and classes.\nAssess requirements for midterm/final evaluation 1st term: Basic demangling for C++ is rewritten under less restrictive license. Final term: At least one binutils-based architecture is reimplemented with more permissive license. Mentors xvilka thestr4ng3r wargio Links/Resources rz-libdemangle: rewrite/remove GPL-only code Update binutils code to latest Exploitation capabilities improvements (175 hour project) Since modern architectures are now enforcing W^X, exploiters are using ROP. (Un)fortunately, building ROP chain by hand can be tedious, this is why some tools can be used to ease this construction: ImmunityDBG has mona.py, there is also ROPgadget and dropper.There exist even tools that can generate ROP chains automatically, for example exrop. It\u0026rsquo;s a shame that despite having ESIL, Rizin doesn\u0026rsquo;t have something similar yet. One of the possible solutions would be to build an external plugin or tool which will reuse power of librz and rz-gg. Moreover it makes sense to think about SROP, COOP and BROP support.\nThe rz-gg tool while has the ability to create a custom shellcode has the outdated database of the shellcodes, so updating them is crucial for the tool to be relevant.\nTask Update the shellcodes database, imrove rz-gg features and documentation Implement a ropchain syntax parser that uses rz-gg or a custom DSL, something like: register reg1 = 0; register reg2 = whatever; register reg3 = reg1 + reg2; system(reg3); Write a compiler which uses SMT solver (like Z3 for example) to produce the ropchain. Support main architectures - x86, ARM, MIPS, PowerPC Skills The participant should be comfortable with the C language, know some assembly and a high-level language. Also, knowing a little bit of automatic binary analysis wouldn’t hurt.\nDifficulty Advanced\nBenefits for the participant The participant will improve their skills in software exploitation and solvers.\nBenefits for the project This feature would greatly help during exploits development, and people would be able to ditch mona.py for Rizin ;)\nAssess requirements for evaluation 1st term: Creating the language for defining the ROP chain semantics and integrating it with SMT solver Final term: Working ropchain compiler, covered by tests and documented in the Rizin book. Mentors xvilka ret2libc Links/Resources ROPGadget Ropper Angrop ROPC exrop roper2 mona.py from corelan Hunting for ROP Gadgets in Style (2012) dropper a BARF-based rop chain generator Materials about the exloitation workshop at Hack.lu 2014 Slides for the exploitation part of workshop at Hack.lu 2015 RzEgg related bugs Microtasks When taking any of microtasks please be sure someone isn\u0026rsquo;t already working on them, and let us know if you are going to work on a particular one.\nFile formats Implementing the support for any new file format counts as a microtask. See New File-Format label for pending issues.\nDisassemblers and assemblers Implementing the support for any new architecture counts as a microtask. See New-Architecture label for pending issues.\nAnalysis The current code analysis has many caveats and issues which need addressing. Fixing them and writing more tests is important to stabilize and enhance rizin\u0026rsquo;s analysis engine.\nSee these issues on our GitHub dashboard.\nHeap analysis #157 Currently Rizin has support for heap exploration and analysis, but the feature is still basic and can be improved. Additionally, other allocators can be added (MacOS, tmalloc, etc.), but this should be done after a proper refactoring, because heap analysis shouldn\u0026rsquo;t depend on the debugger backend, and we may be able to use different heap tools.\nClass analysis for C++/ObjectiveC/Swift/Dlang/Java #416 Analysis classes, accessible under the ac command, is a relatively new feature of rizin. They provide a way to both manually and automatically manage and use information about classes in the binary.\nDevirtualize method calls using class vtables #414 Consider the following call: call dword [eax + 0x6c] Let\u0026rsquo;s assume eax is the base pointer of a vtable we have saved in class analysis and we want to find out the actual address of the called method.\nSo there should be a command that takes the offset (in this case 0x6c) and looks up the actual destination. It should be possible to call this command with a specific class, so it only looks into its vtable, or without a class, so it gives a list of possible destinations for all vtables that are not too small for the offset.\nWhen that is implemented, one could also add a command that does the same thing, but automatically takes the offset from the opcode at the current seek.\nAdd classes list to Vb Vb already supports browsing bin classes. The same thing should be implemented for classes from analysis.\nRefactoring Use internal API instead of commands Currently, Rizin\u0026rsquo;s source code is rife with calls to rz_core_cmd()-like functions that run the Rizin command. While it is a useful shortcut for developer, it makes a good source of the potential bugs in case of the command syntax or behavior change. If these changes happen they are invisible to the compiler, so it cannot warn on the changed syntax. It isn\u0026rsquo;t the case of changed function arguments count or type. Thus, all these calls eventually should be substituted with direct calls to the corresponding API functions. If there is no corresponding API function, then one should be created. Good examples of such cases are:\nRefactor Graph processing from commands to the API use Refactor Visual mode from commands to the API use Refactor Panels mode from commands to the API use In general you can just search for rz_core_cmd pattern in any place inside librz/.\nMiscellaneous Shell (dietline) improvements Currently Rizin uses its own readline-compatible implementation of the input handling in the embedded shell that is compact and portable between all supported platforms. It supports both Emacs and Vi modes, but not all bindings and features are supported. Some are omitted by choice, but some were simply not implemented. See the \u0026ldquo;dietline\u0026rdquo;-labeled issues.\nImproving regression suite and testing It is required to solve numerous issues, along with improving parallel execution and performance. The next interesting idea is to setup and reuse Godbolt compilation engine for generating tests for different compilers and compilation options. There is even a command line tool for interacting with Godbolt - cce.\nUnbreaking broken tests Almost one thousand of tests marked as \u0026ldquo;broken\u0026rdquo; in our testsuite. The task is to take any of those, investigate why it fails, if the test makes sense now or already irrelevant today. Then to try to fix some of the broken tests.\nRzGhidra There are many small issues in the decompiler output:\npdgsd commands showing incorrect P-code Improvements in recovering jump tables rz-ghidra can\u0026rsquo;t detect string Ghidra Decompiler Error: Could not finish collapsing block structure Mishandled tail jump with relocation inside the jump function Prioritize keeping vars with lower addresses Minor improvements for the SLEIGH plugin Some of these issues might be related on how Rizin and RzGhidra integrate and might require changes in the Rizin side.\nAlso note that most of these issues should be paired with the test to verify it will not break in the future.\n","permalink":"https://rizin.re/gsoc/2023/","summary":"TL;DR Jump to the Ideas list.\nIntroduction This year is the third time we participate as Rizin, effectively continuing the tradition since the year 2015.\nMentors Members of the Rizin and Cutter core teams have volunteered to guide participants for GSoC’23. They were already guiding the participants for the GSoC and RSoC in past years. Please feel free to reach out to any of them in case you need any help in selecting a project.","title":"GSoC 2023"},{"content":"TL;DR Jump to the Ideas list.\nIntroduction This year is the second time we participate as a fork - Rizin, effectively continuing the tradition since the year 2015 (as the radare2 project).\nMentors Members of the Rizin and Cutter core teams have volunteered to guide participants for GSoC’22. They were already guiding the participants for the GSoC and RSoC in past years. Please feel free to reach out to any of them in case you need any help in selecting a project.\nAnton Kochkov Mattermost: xvilka \u0026ndash; @akochkov Riccardo Schirone Mattermost: ret2libc Florian Märkl Mattermost/Telegram: @thestr4ng3r \u0026ndash; @thestr4ng3r Itay Cohen Mattermost/Telegram: @Megabeets @Megabeets_ Giovanni Dante Grazioli Mattermost/Telegram: @deroad @der0ad Yossi Zapesochini Mattermost/Telegram: @yossizap And many others Development methodology Currently, all repositories are hosted on GitHub main organization account, bugs are tracked on GitHub issues too. We are mostly using our own Mattermost instance, IRC, and Telegram) for communication. We have a testsuite (which is running on GitHub Actions, Travis CI, AppVeyor and SourceHut) to test and verify that all the features are still working and that a pull requests or commits don\u0026rsquo;t break anything, to ensure the support of different operating systems (Linux, MacOS, Windows, FreeBSD, OpenBSD), different architectures (x86/x86_64, ARM64, PowerPC, SystemZ), and to find regressions. We encourage contributors to write test cases and documentation in order to verify the implementation and ensure that everything fits well together. For complex bugs and examples we\u0026rsquo;re using ASCIInema for recording the sessions.\nSee also our guides for corresponding projects:\nRizin Contributing Guide and Developers Intro Cutter Contributing Guide and Developers Intro For those who want to get introduced to the Rizin codebase and practices, we recommend to pick one of the easy issues for Rizin or Cutter to start with.\nLicense Rizin is modular: this means that it aims to make all the elements and features easily reusable from other projects. The choice of LGPL3 as a license is the minimum requirement to get code merged in Rizin. Contributors can choose Apache, BSD, MIT, Public Domain, or other similar licenses. The reason to exclude GPL as a valid license for the project is because we aim to support proprietary software that uses Rizin, while protecting our free codebase.\nInstructions for participants It is a requirement that participants who want to apply to the Rizin project for the Google Summer of Code 2022 should submit a small pull request accomplishing one of the microtasks (see below) as part of their application. Though you can also choose any of the GitHub issues for Rizin if they are big enough to be a qualification task, and still small enough to be finished no more than in a couple of weeks. To help participants to understand how to contribute to the project there are issues marked as \u0026ldquo;good first issue\u0026rdquo; for both Rizin and Cutter.\nProgramming languages Most of Rizin is written in C (conforming C99 standard) and hence we expect participants to be familiar with C programming language. For some of our tasks or microtasks, such as rz-pm, they should know the Go programming language. For the Cutter tasks, it is a requirement to know C++ and Qt framework basics.\nRecommended steps Read Google\u0026rsquo;s instructions for participating Grab any of the project from the list of ideas that you\u0026rsquo;re interested in (or propose your own). Write a first draft proposal using Google Docs and our template and ask one of the mentors or administrators to review it with you. Submit it using Google\u0026rsquo;s web interface. Participant proposal guidelines Keep it simple enough to fit in no more than a couple of pages. Try to be clear and concise in your writing. Try to split the entire GSoC period into tasks, and each task into subtasks. It helps us to understand how you plan to accomplish your goals, but more importantly, it\u0026rsquo;ll help you to understand the task deep enough before starting, and prioritize important things to do first. Please note how much time a day/week you are able to spend on this project. Please specify which category you apply for - medium task or extended deadline one. Specify your timezone, since so we can assign you a mentor in the same one, to ease communication. Submit your proposal early, not in the last minute! Be sure to choose a “backup” idea (the second task you want to do), so that conflicts (two participants for one task) can be resolved. Project Ideas Rizin RzIL uplifting migration (350 hour project) Rizin has had an intermediate language for over a decade. Major architectures are supporting uplifting to ESIL. During the RSoC 2021, the initial version of the new intermediate language, which is based on the BAP\u0026rsquo;s Core Theory was implemented. In the following months it was improved and some of the architectures were ported to use RZIL instead of ESIL. The main goal of this project is to finish the migration of one or more existing architectures that still use ESIL or add a RzIL support for the architectures that hadn\u0026rsquo;t any uplifting at all.\nTasks Implement a RZIL uplifting for any non-trivial architecture, preferably that is supported by ESIL already Improve the integration with analysis (variables and types differences) for the chosen architecture Write the test cases for Rizin regression tests and improve the results. Update and use rz-tracetest for the chosen architectures Implement necessary commands and APIs in Rizin for visual representation of the IL tree Implement standard and graph views in Cutter for the IL output (optional) Due to the sensivity of uplifting to the precision, it\u0026rsquo;s important to follow these steps:\nFor every single lifted opcode, have at the very least one asm test in test/db/asm/... containing the IL to detect changes when the code is changed and have it type-checked. This should produce 100% coverage on the lifter C code (except e.g. malloc() error handling). Run rz-tracetest on real traces. It\u0026rsquo;s also possible to write custom assembly programs that execute specific obscure instructions where it\u0026rsquo;s hard to be sure that they were implemented correctly on many random inputs and then feed these executions into rz-tracetest. Few rz-test command tests that emulate some code snippets in rizin. For example a simple decryption loop to check the overall integration in rizin, or for specific edge cases (like running a division by zero). Skills The participant should know C and bits of C++ as well as be familiar with basics of the program analysis. Having an experience with other intermediate language, SAT/SMT, and mathematical logic is a plus.\nDifficulty Medium\nBenefits for the participant The participant will understand the state of the art of intermediate languages research, it\u0026rsquo;s relation to the mathematical logic, SMT, and program analysis. Moreover, the participant will become familiar with the both symbolic and concrete emulation during the implementation process.\nBenefits for the project Migrating most architectures will help to deprecate and remove outdated ESIL and will help improving the analysis precision. Adding uplifting for new architectures that weren\u0026rsquo;t even supported by ESIL will imrove the analysis to even greater degree.\nAssess requirements for midterm/final evaluation 1st term: finish the RZIL uplifting for the chosen architecture with basic instruction-level tests Final term: implement all changes in the analysis code, added more complex integration tests with types analysis Mentors xvilka thestr4ng3r Links/Resources RZIL-labeled issues ESIL-labeled issues ESIL to RZIL conversion tracking issue Cutter: IL output and graph visual representation Debug information handling improvements (175 hour project) Rizin already supports most of the DWARF and PDB features, including cross-platform parsing of both. However information are usually just printed to aid the reverse engineering process, but they are not actually used at their best. For example, you can\u0026rsquo;t use them to configure a breakpoint, nor they can be used to access variables within a function during debugging. Moreover, it is becoming more and more common to store DWARF information in separate files, either shipped as separate file or downloaded on the fly with debuginfod. Rizin does not support these kind of DWARF files yet.\nYour task would be to improve the parsing support of both by fixing smaller bugs, add support for separate DWARF files and debuginfod and enhance breakpoint integration and variable/structure printing in debugging mode with the source information gathered from DWARF/PDB.\nTask Support loading DWARF information from separate files and debuginfod Unify source lines/types information access for DWARF, PDB, dSYM and refactor/fix parsing code as necessary Integrate source line and types/variables information with the analysis (optional) Integrate source line and types/variables with printing with p commands in the debug mode Integrate source line and types/variables with breakpoint commands and APIs Parsing performance improvements Skills Good knowledge of the C language Some experience in debugging with GDB or LLDB Basic knowledge of at least one of the following formats: ELF, DWARF, PDB, PE Difficulty Hard\nBenefits for the participant Participant will understand how high-level features of debuggers work as well as gain skills in the field of software architecture of a large, modular C project.\nAssess requirements for midterm/final evaluation 1st term: debuginfod and source line information refactoring are implemented Final term: Integration of variable information with the debug and printing commands is implemented Mentors xvilka thestr4ng3r ret2libc Links/Resources Loading debug information from debuginfod Unify code of source information access for DWARF, PDB, dSYM Ghidra issue: support DWARF in MinGW PE binaries Debuginfod Debian Debuginfod Fedora Debuginfod DWARF-labeled issues PDB-labaled issues Thread-safety and multithreading (175 hour project) Currently Rizin is not thread safe completely internally and as a library for a multithreaded application. The goal of this project is to eliminate global states and use contexts, eliminate singletons, e.g. RzCons, and use thread-safe external functions and dependencies.\nTask Migrate from thread-unsafe system and external dependencies Eliminate global state inside RzCons and use of the singleton Make RzBin thread-safe Make RzAnalysis thread-safe Make RzCore thread-safe Add tests for using multiple RzCore and RzAnalysis instances Parallelize some of the RzAnalysis function using the threading API Skills Participant should know C as well as have the experience of developing multithreaded applications.\nDifficulty Hard\nBenefits for the participant Participant will understand the hurdles of multithreaded programming, data synchronization, locks and debugging of such code.\nAssess requirements for midterm/final evaluation 1st term: Eliminate thread-unsafe dependencies and remove global state from RzCons and RzBin Final term: Make RzAnalysis and RzCore (optionally) thread-safe Mentors xvilka thestr4ng3r wargio Links/Resources Migrate from wcstombs() function since it\u0026rsquo;s not thread-safe Rewriting GPL-only code (175 hour project) Currently some of the Rizin main features rely on the GPL-only code copied from binutils or GCC. The goal is to rewrite all this code from GPL-only to LGPL or any other less restrictive license. It is quite important for better adoption of Rizin as a library in other FOSS and commercial projects.\nTasks Rewrite C++ demangler to and remove the GPL code Rewrite some of the mainstream architectures that still rely on binutils without using GPL-only code Good example of such architectures are:\nSPARC (there is already capstone-based RzAsm and RzAnalysis plugin but it\u0026rsquo;s less complete than binutils-based one) Xtensa Tricore SH HPPA (PA-RISC) Skills Participant should know C and basics of C++ for understanding the mangling scheme\nDifficulty Medium\nBenefits for the participant Participant will understand how C++ type information is stored in the name of the methods and classes.\nAssess requirements for midterm/final evaluation 1st term: Basic demangling for C++ is rewritten under less restrictive license. Final term: At least one binutils-based architecture is reimplemented with more permissive license. Mentors xvilka thestr4ng3r wargio Links/Resources rz-libdemangle: rewrite/remove GPL-only code Update binutils code to latest Exploitation capabilities improvements (175 hour project) Since modern architectures are now enforcing W^X, exploiters are using ROP. (Un)fortunately, building ROP chain by hand can be tedious, this is why some tools can be used to ease this construction: ImmunityDBG has mona.py, there is also ROPgadget and dropper.There exist even tools that can generate ROP chains automatically, for example exrop. It\u0026rsquo;s a shame that despite having ESIL, Rizin doesn\u0026rsquo;t have something similar yet. One of the possible solutions would be to build an external plugin or tool which will reuse power of librz and rz-gg. Moreover it makes sense to think about SROP, COOP and BROP support.\nThe rz-gg tool while has the ability to create a custom shellcode has the outdated database of the shellcodes, so updating them is crucial for the tool to be relevant.\nTask Update the shellcodes database, imrove rz-gg features and documentation Implement a ropchain syntax parser that uses rz-gg or a custom DSL, something like: register reg1 = 0; register reg2 = whatever; register reg3 = reg1 + reg2; system(reg3); Write a compiler which uses SMT solver (like Z3 for example) to produce the ropchain. Support main architectures - x86, ARM, MIPS, PowerPC Skills The participant should be comfortable with the C language, know some assembly and a high-level language. Also, knowing a little bit of automatic binary analysis wouldn’t hurt.\nDifficulty Advanced\nBenefits for the participant The participant will improve their skills in software exploitation and solvers.\nBenefits for the project This feature would greatly help during exploits development, and people would be able to ditch mona.py for Rizin ;)\nAssess requirements for evaluation 1st term: Creating the language for defining the ROP chain semantics and integrating it with SMT solver Final term: Working ropchain compiler, covered by tests and documented in the Rizin book. Mentors xvilka ret2libc Links/Resources ROPGadget Ropper Angrop ROPC exrop roper2 mona.py from corelan Hunting for ROP Gadgets in Style (2012) dropper a BARF-based rop chain generator Materials about the exloitation workshop at Hack.lu 2014 Slides for the exploitation part of workshop at Hack.lu 2015 RzEgg related bugs Bindings for languages other than C/C++ (175 hour project) Rizin offers a convenient scripting interface through the rz-pipe APIs, which build upon its command-based interface. While this reduced interface is beneficial and well-suited for many scripting tasks, building more complex applications generally requires direct access to the public C api that Rizin offers. Using this API is directly possible in C and C++, as it is done in Cutter for example, but for other languages no generic bindings exist so far. The goal of this task is to use a bindings generator such as SWIG to expose Rizin\u0026rsquo;s C API to languages such as Python, Java or OCaml.\nTask Integrate SWIG-generated bindings into Rizin\u0026rsquo;s build system Write SWIG interfaces for all mature parts of Rizin\u0026rsquo;s C API Integrate the Python bindings into Cutter\u0026rsquo;s Python support Skills The participant should be comfortable with the C and Python languages, as well as have a deep understanding of common memory management patterns such as ownership and reference counting.\nDifficulty Advanced\nBenefits for the participant The participant will gain an experience of exposing a C-based API with manual memory management to high-level, object-oriented languages with automatic memory management.\nAssess requirements for midterm/final evaluation 1st term: Bindings can be generated as part of the standard Rizin build system and small parts of the core API are already usable. Final term: All relevant parts of the API can be used through bindings and also from within Cutter\u0026rsquo;s Python interpreter. Mentors thestr4ng3r xvilka Links/Resources SWIG Website SWIG 4.0 Documentation Small PoC of bindings generated in Rizin\u0026rsquo;s build system Article about Rizin\u0026rsquo;s build system design Cutter Plugins and Python High Level API (175 hour project) Our current public API to be used by plugin authors is somewhat limited. We need to improve a lot of things about our Plugins support and take it few steps ahead. This task is only about improving the public C++ and Python interface of Cutter, specifically its graphical user interface components. For a task about exposing Rizin\u0026rsquo;s API for disassembly, analysis and other purposes, see the Rizin bindings task above.\nTask Expose everything Cutter can offer for plugins authors. This includes high level API, integration of the plugin management etc. Accessing everything from Python (like Blender) - see issue #1662 Python integration and IPython console. Skills The participant should be comfortable with the C++ and Python languages, and be familiar with Qt framework\nDifficulty Advanced\nBenefits for the participant The participant will gain an experience of creating a suitable API for scripting graphical interface programs.\nBenefits for the project It will greatly improve the scripting experience, will make API more consistent and will ease creating Cutter plugins by the community. Moreover, it will simplify testing of the Cutter features.\nAssess requirements for midterm/final evaluation 1st term: Design of the high level API and required Rizin changes. Review and implement all missing API functions that are accessible as interface controls. Final term: Implement the way to show the API when hovered over some interface control, create documentation. Mentors thestr4ng3r Megabeets Links/Resources SDB Module/API for Cutter Python/Jupyter integration Jupyter plugin for Cutter Multi-Tasking and Event-driven architecture (350 hour project) The information Cutter gets about functions, strings, imports, and the analysis are all performed in Rizin and only displayed in Cutter. Currently, it is pulling most information from Rizin only on demand. This is problematic because sometimes the user performs changes (via plugins, the console widget, and more) that are affecting the information from Rizin, but Cutter doesn\u0026rsquo;t know about these changes to apply the to the UI. For example, if a user will define a new function in a Python script or via the console widget by using the Rizin command af @ \u0026lt;addr\u0026gt;, Cutter will not show this new function in the Functions widget until the user will refresh the interface manually (Edit -\u0026gt; Refresh Contents). The goal of this task is to use an event-driven architecture to overcome this limitation.\nIn addition, this task will also handle the analysis in the background feature, to allow the analysis performed by Rizin to happen while the interface is active.\nTasks The overall implementation of this task should start from Rizin by adding events to many of the functions. This can be done using rz_events. For example, add an even for function creating, for section creation, for flag deletion, for name changed, and more\nAdd events to all the relevant functions inside Rizin Add support for these events in Cutter and refresh and update the relevant widgets per each event Support analysis in the background and allow the user to start its session while Rizin is analyzing (see #1856, #1574) Skills The participant should be comfortable with the C++ for Cutter and C for Rizin. They should also be familiar with Qt framework. Experience in GUI code architecture, for example using functional reactive programming or Elm-like approaches is a plus.\nDifficulty Advanced\nBenefits for the participant The participant will gain an experience of creating complex event-driven software in both C and C++ languages.\nBenefits for the project It will allow to work on big files effortlessly in Cutter, will improve analysis quality as well.\nAssess requirements for midterm/final evaluation 1st term: Implement events everywhere in the relevant places across Rizin code and event-driven interaction with Cutter. Final term: Add support for the Cutter interface refresh based on the events from Rizin, implement analysis in background. Mentors thestr4ng3r Heap viewer completion (175 hour project) Thanks to the work that was done in the previous GSoC, Cutter and Rizin have nice visualizations of the heap and memory maps. We would like to expand on this feature with performance improvements to the heap parsers and support more memory allocators.\nTask Complete Cutter\u0026rsquo;s implementation of the windows heap widget #2723 Improve the performance of the Windows heap parser Fix Windows heap parsing errors Make the implementation work with remote debugging modes Skills The participant should be comfortable with the C++, and be familiar with Qt framework\nDifficulty Medium\nBenefits for the participant The participant will gain the understanding on how modern runtimes provide the heap for various programs, which will be beneficial for the binary exploitation skills.\nBenefits for the project It will greatly improve the debugging and reverse engineering experience for complex programs, also provides the way to design the exploitation techniques with the help of Rizin/Cutter.\nAssess requirements for midterm/final evaluation 1st term: Design and implement heap visualization widgets, add Rizin test and fixes Final term: Various bugfixes related to the heap inspection support on various platforms and allocators, tests and documentation. Mentors xvilka Megabeets yossizap Links/Resources Issue #1041 Heap Viewer plugin for IDA Pro Heap parsing for MacOS, tmalloc, jmalloc Dynamic Allocator Detection \u0026ldquo;heap\u0026rdquo;-marked Rizin issues Diffing mode (175 hour project) Binary diffing is one of the most common tasks for the reverse engineer. There are many tools available, but most of them are either detached from the main RE toolbox or poorly integrated. Rizin provides basic diffing features out of the box with rz-diff tool, but Cutter has no interface to represent similar functionality.\nTask Expose basic rz-diff features in the Cutter Create the interface to choose two files for diffing Create the way to show the differences in all main widgets: Hexadecimal view Disassembly view Graph view Pseudocode view Skills The participant should be comfortable with the C++ language, and be familiar with Qt framework\nDifficulty Medium\nBenefits for the participant The participant will gain an experience of creating efficient graphical interfaces.\nBenefits for the project It will greatly benefit the project since Cutter will be the only FOSS RE tool to provide this feature out of the box.\nAssess requirements for midterm/final evaluation 1st term: Expose the rz-diff features in the Cutter core and create the interface for opening files for diffing. Implement the diff modes for hexadecimal and disassembly views. Final term: Implement the diff modes for graph and pseudocode views, create the documentation. Mentors xvilka Megabeets Links/Resources Issue #1104 BinDiff Diaphora Microtasks When taking any of microtasks please be sure someone isn\u0026rsquo;t already working on them, and let us know if you are going to work on a particular one.\nFile formats Implementing the support for any new file format counts as a microtask. See New File-Format label for pending issues.\nDisassemblers and assemblers Implementing the support for any new architecture counts as a microtask. See New-Architecture label for pending issues.\nELF binary parsing. Rizin parses a lot of information about the ELF but doesn\u0026rsquo;t print everything.\nMoreover, some information about PLT stubs not being resolved correctly.\nAnalysis The current code analysis has many caveats and issues which need addressing. Fixing them and writing more tests is important to stabilize and enhance rizin\u0026rsquo;s analysis engine.\nSee these issues or the \u0026ldquo;Analysis\u0026rdquo; project on our GitHub dashboard.\nHeap analysis #157 Currently Rizin has support for heap exploration and analysis, but the feature is still basic and can be improved. Additionally, other allocators can be added (MacOS, tmalloc, etc.), but this should be done after a proper refactoring, because heap analysis shouldn\u0026rsquo;t depend on the debugger backend, and we may be able to use different heap tools.\nClass analysis for C++/ObjectiveC/Swift/Dlang/Java #416 Analysis classes, accessible under the ac command, is a relatively new feature of rizin. They provide a way to both manually and automatically manage and use information about classes in the binary.\nDevirtualize method calls using class vtables #414 Consider the following call: call dword [eax + 0x6c] Let\u0026rsquo;s assume eax is the base pointer of a vtable we have saved in class analysis and we want to find out the actual address of the called method.\nSo there should be a command that takes the offset (in this case 0x6c) and looks up the actual destination. It should be possible to call this command with a specific class, so it only looks into its vtable, or without a class, so it gives a list of possible destinations for all vtables that are not too small for the offset.\nWhen that is implemented, one could also add a command that does the same thing, but automatically takes the offset from the opcode at the current seek.\nAdd classes list to Vb Vb already supports browsing bin classes. The same thing should be implemented for classes from analysis.\nRefactoring Use internal API instead of commands Currently, Rizin\u0026rsquo;s source code is rife with calls to rz_core_cmd()-like functions that run the Rizin command. While it is a useful shortcut for developer, it makes a good source of the potential bugs in case of the command syntax or behavior change. If these changes happen they are invisible to the compiler, so it cannot warn on the changed syntax. It isn\u0026rsquo;t the case of changed function arguments count or type. Thus, all these calls eventually should be substituted with direct calls to the corresponding API functions. If there is no corresponding API function, then one should be created. Good examples of such cases are:\nRefactor Graph processing from commands to the API use Refactor Visual mode from commands to the API use Refactor Panels mode from commands to the API use In general you can just search for rz_core_cmd pattern in any place inside librz/.\nMiscellaneous Shell (dietline) improvements Currently Rizin uses its own readline-compatible implementation of the input handling in the embedded shell that is compact and portable between all supported platforms. It supports both Emacs and Vi modes, but not all bindings and features are supported. Some are omitted by choice, but some were simply not implemented. See the \u0026ldquo;dietline\u0026rdquo;-labeled issues.\nImproving regression suite and testing It is required to solve numerous issues, along with improving parallel execution and performance. Good example is to allow better filtering of the test types to run, for example to ignore debug tests. The next interesting idea is to setup and reuse Godbolt compilation engine for generating tests for different compilers and compilation options. There is even a command line tool for interacting with Godbolt - cce.\nAnother important part of the improving test suite is to cover more different formats and cases with expanding it. See the #114 issue with more details on how it can be done.\nUnbreaking broken tests Almost one thousand of tests marked as \u0026ldquo;broken\u0026rdquo; in our testsuite. The task is to take any of those, investigate why it fails, if the test makes sense now or already irrelevant today. Then to try to fix some of the broken tests.\nBetter portability Due to the mistakes in handling data for big-endian platforms in Rizin code a lot of tests still don\u0026rsquo;t pass on our System Z CI worker. Most of the broken test are related to parsing the formats, in particular reading the integers in portable way. See #297 for details on these formats. In most cases the solution would be to use rz_read_*() API functions: Developers Guide: Manage Endianess.\nRzGhidra There are many small issues in the decompiler output:\npdgsd commands showing incorrect P-code Improvements in recovering jump tables rz-ghidra can\u0026rsquo;t detect string Ghidra Decompiler Error: Could not finish collapsing block structure Mishandled tail jump with relocation inside the jump function Prioritize keeping vars with lower addresses Minor improvements for the SLEIGH plugin Some of these issues might be related on how Rizin and RzGhidra integrate and might require changes in the Rizin side.\nAlso note that most of these issues should be paired with the test to verify it will not break in the future.\n","permalink":"https://rizin.re/gsoc/2022/","summary":"TL;DR Jump to the Ideas list.\nIntroduction This year is the second time we participate as a fork - Rizin, effectively continuing the tradition since the year 2015 (as the radare2 project).\nMentors Members of the Rizin and Cutter core teams have volunteered to guide participants for GSoC’22. They were already guiding the participants for the GSoC and RSoC in past years. Please feel free to reach out to any of them in case you need any help in selecting a project.","title":"GSoC 2022"},{"content":"TL;DR Jump to the Ideas list.\nIntroduction Each year since 2015, we have participated in Google Summer of Code as the Radare2 project and accomplished many goals. This year we participate as a fork - Rizin, but effectively continuing the same process and the same mentors.\nMentors Members of the Rizin and Cutter core teams have volunteered to guide students for GSoC’21. They were already guiding the students for the GSoC and RSoC in past years as part of the Radare2 project. Please feel free to reach out to any of them in case you need any help in selecting a project.\nAnton Kochkov Mattermost: xvilka \u0026ndash; @akochkov Riccardo Shirone Mattermost: ret2libc Florian Märkl Mattermost/Telegram: @thestr4ng3r \u0026ndash; @thestr4ng3r Antide Petit IRC/Telegram: xarkes \u0026ndash; @xarkes_ Itay Cohen Mattermost/Telegram: @Megabeets @Megabeets_ Giovanni Dante Grazioli Mattermost/Telegram: @deroad @der0ad Yossi Zapesochini Mattermost/Telegram: @yossizap And many others Development methodology Currently, all repositories are hosted on GitHub main organization account, bugs are tracked on GitHub issues too. We are mostly using our own Mattermost instance, IRC, and Telegram) for communication. We have a testsuite (that is running on GitHub Actions, Travis CI, AppVeyor and SourceHut) to test and verify that all the features are still working and that a pull requests or commits don\u0026rsquo;t break anything, to ensure the support of different operating systems (Linux, MacOS, Windows, FreeBSD, OpenBSD), different architectures (x86/x86_64, ARM64, PowerPC, SystemZ), and to find regressions. We encourage contributors to write test cases and documentation in order to verify the implementation and ensure that everything fits well together. For complex bugs and examples we\u0026rsquo;re using ASCIInema for recording the sessions.\nSee also our guides for corresponding projects:\nRizin Contributing Guide and Developers Intro Cutter Contributing Guide and Developers Intro For those who want to get introduced to the Rizin codebase and practices, we recommend to pick one of the easy issues for Rizin or Cutter to start with.\nLicense Rizin is modular: this means that it aims to make all the elements and features easily reusable from other projects. The choice of LGPL3 as a license is the minimum requirement to get code merged in Rizin. Contributors can choose Apache, BSD, MIT, Public Domain, or other similar licenses. The reason to exclude GPL as a valid license for the project is because we aim to support proprietary software that uses Rizin, while protecting our free codebase.\nInstructions for students It is a requirement that students who want to apply to the Rizin project for the Google Summer of Code 2021 should submit a small pull request accomplishing one of the microtasks (see below) as part of their application. Though you can also choose any of the GitHub issues for Rizin if they are big enough to be a qualification task, and still small enough to be finished no more than in a couple of weeks.\nProgramming languages Most of Rizin is written in C (conforming C99 standard) and hence we expect students to be familiar with C programming language. For some of our tasks or microtasks, such as collaborative RE or rz-pm, students should know the Go programming language. For the Cutter tasks, students should know C++ and Qt framework basics.\nRecommended steps Read Google\u0026rsquo;s instructions for participating Grab any of the project from list of ideas that you\u0026rsquo;re interested in (or propose your own). Write a first draft proposal using Google Docs and our template and ask one of the mentors or administrators to review it with you. Submit it using Google\u0026rsquo;s web interface. Student proposal guidelines Keep it simple enough to fit in no more than a couple of pages. Try to be clear and concise in your writing. Try to split GSoC period into tasks, and each task into subtasks. It helps us to understand how you plan to accomplish your goals, but more importantly, it\u0026rsquo;ll help you to understand the task deep enough before starting, and prioritize important things to do first. Please, note, how much time a day/week you are able to spend on this project. Specify your timezone, since so we can assign you a mentor in the same one, to ease communication. Submit your proposal early, not in the last minute! Be sure to choose a “backup” idea (the second task you want to do), so that conflicts (two students for one task) can be resolved. Project Ideas Rizin Type Analysis Improvements Currently we have types support in Rizin, including basic (low-level) ability to edit type with pf and higher-level, C-like types with t command. It is possible to parse the C type definition from C headers for example, or load from \u0026ldquo;precompiled\u0026rdquo; SDB file. However, despite such features being present, many of them still lack the right structures and connections between them to make complex automated and manual analysis of code using types convenient. The goal of this task is to build upon the currently available features and re-think or re-design some of them to fit into the bigger picture of the entire framework. The overall plan for this project is tracked at https://github.com/rizinorg/rizin/projects/3. There are certain dependencies between some of the subtasks, but as long as these are respected, subtasks to be taken for GSoC can be picked by preference.\nTask (proposal) Bundle all types functionality in a new module RzTypes #369 Refactor some base type accesses to use the RzAnalysisBaseType API #368 Replace the current TCC-based C types parser by a Tree-sitter based one #275 Skills Student should know C as well as be familiar with basics of the program analysis. They should also be passionate about software architecture.\nDifficulty Hard\nBenefits for the student Student will understand modern program analysis problems related to type analysis, as well as gain skills in the field of software architecture of a large, modular C project with complex dependencies between modules.\nAssess requirements for midterm/final evaluation 1st term: RzTypes module exists and contains all relevant code. Final term: Tree-sitter based C parser is implemented and integrated into the analysis framework. Mentors xvilka thestr4ng3r Links/Resources Type Analysis Improvements Project C++ grammar for tree-sitter CPU/Platform profiles While instruction set defines architecture, it is common that particular CPU or SoC models implement only a subset of it or extend it with custom instructions and registers. Moreover, various SoC modifications can define peripheral devices interaction through ports (rare), registers or MMIO spaces. All this helps the reverse engineering process, because a lot of the code will make sense upon a glance once you see it accesses certain registers (if named) or peripheral devices (when MMIO area is defined). A common example is SVD loading for ARM architecture.\nA good example how CPU profile should look like:\nasm.cpu to be dynamically populated by listing the available CPU dedicated plaintext/sdb files Add RAM_SIZE info in the CPU files and remove hardcoding from .h and .c files Add ROM_SIZE info in the CPU files and remove hardcoding from .h and .c files Add INTERRUPT_VECTOR_SIZE info in the CPU files and remove hardcoding from .h and .c files Add IO_REGISTERS, EXTENDED_IO_REGISTER, MMIO_REGISTER, coprocessor register info in the CPU files and remove hardcoding from .h and .c files Task Implement support for CPU profiles Implement support for platform profiles Add support for register and MMIO specific setups Integrate these in analysis loop, handling register and memory accesses. Implement tests and documentation in Rizin book Provide an API for setting these values from rz-pipe and lang-* plugins Skills Student should know C and understand basics of the hardware platforms, architectures and chips.\nDifficulty Medium\nBenefits for the student The student will improve familiarity with reverse engineering for various architectures and platforms, along with the improving the efficiency of Rizin.\nBenefits for the project Huge benefits for end users in UX and better support for extension.\nAssess requirements for evaluations 1st term: CPU and platform profiles, some most common profiles, integration with the analysis loop Final term: Support for more platforms, regression and unit tests, documentation (including Rizin book). Mentors xvilka deroad Links/Resources Issue #103 SVD loader for Ghidra How to use SVD loader with Ghidra SVD parser in Rust CMSIS-SVD repository Rz-diff improvements Rizin has had the ability to perform binary diffing for over a decade. Nevertheless the support is quite basic and there is room for improvement. One of the most important tasks is to deepen the integration with analysis loop. Integration with the analysis loop will allow Rizin to find and highlight the difference between arguments count, local variables count, their types and other analysis metainformation. The next big task is to modernize rz-diff (and corresponding parts in RCore) in terms of performance and user interface. And of course - cover the rz-diff and rizin diffing features with regression tests and unit tests.\nTasks Support diffing of the different parts of the same buffer/file Split view for hexadecimal view and disassembly diffing mode Improve the integration with analysis (variables and types differences) Integrate ESIL and decompilation (rz-ghidra, jsdec) pseudocode as an options for binary diffing Implement the most important diffing strategies from Diaphora Write the test cases for Rizin regression tests and improve the results. Skills Student should know C as well as be familiar with basics of the program analysis. Having an experience with other binary diffing software is a plus.\nDifficulty Medium\nBenefits for the student Student will understand modern program analysis problems in application to binary diffing, and how to improve the performance of patch analysis.\nBenefits for the project This feature will make Rizin usable for day-to-day patch analysis of modern software, as well as improve the automation and performance of this task.\nAssess requirements for midterm/final evaluation 1st term: rz-diff/rizin should support highlighting types, arguments, and variables differences between functions. Fina term: Implement split-view for hex, disassembly, and graph modes. Their interface and performance improvements. Write the regression tests for all implemented features, add the documentation in Rizin book. Mentors xvilka Megabeets Links/Resources rz-diff-labeled issues Signature-labeled issues Cutter: Diffing interface feature request #1104 PatchDiff2 BinDiff Diaphora SimHash Exploitation capabilities improvements Since modern architectures are now enforcing W^X, exploiters are using ROP. (Un)fortunately, building ROP chain by hand can be tedious, this is why some tools can be used to ease this construction: ImmunityDBG has mona.py, there is also ROPgadget and dropper.There exist even tools that can generate ROP chains automatically, for example exrop. It\u0026rsquo;s a shame that despite having ESIL, Rizin doesn\u0026rsquo;t have something similar yet. One of the possible solutions would be to build an external plugin or tool which will reuse power of librz and rz-gg. Moreover it makes sense to think about SROP, COOP and BROP support.\nThe rz-gg tool while has the ability to create a custom shellcode has the outdated database of the shellcodes, so updating them is crucial for the tool to be relevant.\nTask Update the shellcodes database, imrove rz-gg features and documentation Implement a ropchain syntax parser that uses rz-gg or a custom DSL, something like: register reg1 = 0; register reg2 = whatever; register reg3 = reg1 + reg2; system(reg3); Write a compiler which uses SMT solver (like Z3 for example) to produce the ropchain. Support main architectures - x86, ARM, MIPS, PowerPC Skills The student should be comfortable with the C language, know some assembly and a high-level language. Also, knowing a little bit of automatic binary analysis wouldn’t hurt.\nDifficulty Advanced\nBenefits for the student The student will improve their skills in software exploitation and solvers.\nBenefits for the project This feature would greatly help during exploits development, and people would be able to ditch mona.py for Rizin ;)\nAssess requirements for evaluation 1st term: Creating the language for defining the ROP chain semantics and integrating it with SMT solver Final term: Working ropchain compiler, covered by tests and documented in the Rizin book. Mentors xvilka ret2libc Links/Resources ROPGadget Ropper Angrop ROPC exrop roper2 mona.py from corelan Hunting for ROP Gadgets in Style (2012) dropper a BARF-based rop chain generator Materials about the exloitation workshop at Hack.lu 2014 Slides for the exploitation part of workshop at Hack.lu 2015 RzEgg related bugs Bindings for languages other than C/C++ Rizin offers a convenient scripting interface through the rz-pipe APIs, which build upon its command-based interface. While this reduced interface is beneficial and well-suited for many scripting tasks, building more complex applications generally requires direct access to the public C api that Rizin offers. Using this API is directly possible in C and C++, as it is done in Cutter for example, but for other languages no generic bindings exist so far. The goal of this task is to use a bindings generator such as SWIG to expose Rizin\u0026rsquo;s C API to languages such as Python, Java or OCaml.\nTask Integrate SWIG-generated bindings into Rizin\u0026rsquo;s build system Write SWIG interfaces for all mature parts of Rizin\u0026rsquo;s C API Integrate the Python bindings into Cutter\u0026rsquo;s Python support Skills The student should be comfortable with the C and Python languages, as well as have a deep understanding of common memory management patterns such as ownership and reference counting.\nDifficulty Advanced\nBenefits for the student The student will gain an experience of exposing a C-based API with manual memory management to high-level, object-oriented languages with automatic memory management.\nAssess requirements for midterm/final evaluation 1st term: Bindings can be generated as part of the standard Rizin build system and small parts of the core API are already usable. Final term: All relevant parts of the API can be used through bindings and also from within Cutter\u0026rsquo;s Python interpreter. Mentors thestr4ng3r xvilka Links/Resources SWIG Website SWIG 4.0 Documentation Small PoC of bindings generated in Rizin\u0026rsquo;s build system Article about Rizin\u0026rsquo;s build system design Cutter Plugins and Python High Level API We currently don\u0026rsquo;t have API almost for plugin authors to use. We need to improve a lot of things about our Plugins support and take it few steps ahead. This task is only about improving the Python interface in Cutter, specifically its graphical user interface components. For a task about exposing Rizin\u0026rsquo;s API for disassembly, analysis and other purposes, see the Rizin bindings task above.\nTask Expose everything Cutter can offer for plugins authors. This includes high level API, integration of the plugin management etc. Accessing everything from Python (like Blender) - see issue #1662 Python integration and IPython console. Skills The student should be comfortable with the C++ and Python languages, and be familiar with Qt framework\nDifficulty Advanced\nBenefits for the student The student will gain an experience of creating a suitable API for scripting graphical interface programs.\nBenefits for the project It will greatly improve the scripting experience, will make API more consistent and will ease creating Cutter plugins by the community. Moreover, it will simplify testing of the Cutter features.\nAssess requirements for midterm/final evaluation 1st term: Design of the high level API and required Rizin changes. Review and implement all missing API functions that are accessible as interface controls. Final term: Implement the way to show the API when hovered over some interface control, create documentation. Mentors thestr4ng3r Megabeets Links/Resources SDB Module/API for Cutter Python/Jupyter integration Jupyter plugin for Cutter Multi-Tasking and Event-driven architecture Cutter is a reverse engineering framework that is powered by Rizin. The information it gets about functions, strings, imports, and the analysis are all performed in Rizin and displayed in Cutter. Currently, Cutter is pulling information from Rizin only on demand. This is problematic because sometimes the user performs changes (via plugins, the console widget, and more) that are affecting the information from Rizin, but Cutter doesn\u0026rsquo;t know about these changes to apply the to the UI. For example, if a user will define a new function in a Python script or via the console widget by using the Rizin command af @ \u0026lt;addr\u0026gt;, Cutter will now show this new function in the Functions widget until the user will refresh the interface manually (edit -\u0026gt; Refresh Contents).\nIn addition, this task will also handle the analysis in the background feature, to allow the analysis performed by Rizin to happen while the interface is active.\nTasks The overall implementation of this task should start from Rizin by adding events to many of the functions. This can be done using rz_events. For example, add an even for function creating, for section creation, for flag deletion, for name changed, and more\nAdd events to all the relevant functions inside Rizin Add support for these events in Cutter and refresh and update the relevant widgets per each event Support analysis in the background and allow the user to start its session while Rizin is analyzing (see #1856, #1574) Skills The student should be comfortable with the C++ for Cutter and C for Rizin. The student should be familiar with Qt framework.\nDifficulty Advanced\nBenefits for the student The student will gain an experience of creating complex event-driven software in both C and C++ languages.\nBenefits for the project It will allow to work on big files effortlessly in Cutter, will improve analysis quality as well.\nAssess requirements for midterm/final evaluation 1st term: Implement events everywhere in the relevant places across Rizin code and event-driven interaction with Cutter. Final term: Add support for the Cutter interface refresh based on the events from Rizin, implement analysis in background. Mentors thestr4ng3r Karliss Heap viewer We already have a nice heap (and memory map) parser and visualizer in Rizin (dm and dmh commands). After debugging becomes a first-class citizen in cutterland it would be awesome to have memory map and heap visualizations.\nTask Expose Rizin API/commands for Cutter to use for visualization Design and implement heap navigation and inspection widgets Provide the integration with current debugging mode in Cutter Make the implementation work with both local (native) and remote debugging modes Skills The student should be comfortable with the C++, and be familiar with Qt framework\nDifficulty Medium\nBenefits for the student The student will gain the understanding on how modern runtimes provide the heap for various programs, which will be beneficial for the binary exploitation skills.\nBenefits for the project It will greatly improve the debugging and reverse engineering experience for complex programs, also provides the way to design the exploitation techniques with the help of Rizin/Cutter.\nAssess requirements for midterm/final evaluation 1st term: Design and implement heap visualization widgets, add Rizin test and fixes Final term: Various bugfixes related to the heap inspection support on various platforms and allocators, tests and documentation. Mentors xvilka Megabeets Links/Resources Issue #1041 Heap Viewer plugin for IDA Pro Heap parsing for MacOS, tmalloc, jmalloc Dynamic Allocator Detection \u0026ldquo;heap\u0026rdquo;-marked Rizin issues Diffing mode Binary diffing is one of the most common tasks for the reverse engineer. There are many various tools available, but most of them are either detached from the main RE toolbox or poorly integrated. Rizin provides basic diffing features out of the box with rz-diff tool, but Cutter has no interface to represent similar functionality.\nTask Expose basic rz-diff features in the Cutter Create the interface to choose two files for diffing Create the way to show the differences in all main widgets: Hexadecimal view Disassembly view Graph view Pseudocode view Skills The student should be comfortable with the C++ language, and be familiar with Qt framework\nDifficulty Advanced\nBenefits for the student The student will gain an experience of creating efficient graphical interfaces.\nBenefits for the project It will greatly benefit the project since Cutter will be the only FOSS RE tool to provide this feature out of the box.\nAssess requirements for midterm/final evaluation 1st term: Expose the rz-diff features in the Cutter core and create the interface for opening files for diffing. Implement the diff modes for hexadecimal and disassembly views. Final term: Implement the diff modes for graph and pseudocode views, create the documentation. Mentors xvilka Megabeets Links/Resources Issue #1104 BinDiff Diaphora Microtasks When taking any of microtasks please be sure someone isn\u0026rsquo;t already working on them, and let us know if you are going to work on a particular one.\nFile formats Implementing the support for any new file format counts as a microtask. See New File-Format label for pending issues.\nDisassemblers and assemblers Implementing the support for any new architectire counts as a microtask. See New-Architecture label for pending issues.\nELF binary parsing. Rizin parses a lot of information about the ELF but doesn\u0026rsquo;t print everything. Thus, the improving the output of i* commands and rz-bin tool is important to match up with readelf (Add file offset and memory alignment for segments information (iSS command))\nMoreover, some information about PLT stubs not being resolved correctly.\nAnalysis The current code analysis has many caveats and issues which need addressing. Fixing them and writing more tests is important to stabilize and enhance rizin\u0026rsquo;s analysis engine.\nSee these issues or the \u0026ldquo;Analysis\u0026rdquo; project on our GitHub dashboard.\nBasefind #413 There are plenty of external scripts and plugins for finding the most probable base for raw firmware images. Opening raw firmwares with rizin is a common use case, so it makes sense to implement it as a part of rizin core.\nHeap analysis #157 Currently Rizin has support for heap exploration and analysis, but the feature is still basic and can be improved. Additionally, other allocators can be added (MacOS, tmalloc, etc.), but this should be done after a proper refactoring, because heap analysis shouldn\u0026rsquo;t depend on the debugger backend, and we may be able to use different heap tools.\nClass analysis for C++/ObjectiveC/Swift/Dlang/Java #416 Analysis classes, accessible under the ac command, is a relatively new feature of rizin. They provide a way to both manually and automatically manage and use information about classes in the binary.\nDevirtualize method calls using class vtables #414 Consider the following call: call dword [eax + 0x6c] Let\u0026rsquo;s assume eax is the base pointer of a vtable we have saved in class analysis and we want to find out the actual address of the called method.\nSo there should be a command that takes the offset (in this case 0x6c) and looks up the actual destination. It should be possible to call this command with a specific class, so it only looks into its vtable, or without a class, so it gives a list of possible destinations for all vtables that are not too small for the offset.\nWhen that is implemented, one could also add a command that does the same thing, but automatically takes the offset from the opcode at the current seek.\nAdd classes list to Vb Vb already supports browsing bin classes. The same thing should be implemented for classes from analysis.\nSignatures Rizin has a good support for loading and creating signatures, but it is not yet complete, thus some problems remain, for example: #272.\nAs Rizin supports FLIRT signatures loading from IDA Pro, not all of them are supported yet - e.g. version 5 compression.\nRefactoring Use \u0026ldquo;newshell\u0026rdquo; instead of old switch/case handling Rizin is in the middle of the switch from the old style switch/case manual parsing of every command to the centralized Tree-Sitter-based parser, providing every command handler argc/argv arguments. Best candidates for the initial switch are:\nlibrz/core/cmd_egg.c librz/core/cmd_hash.c librz/core/cmd_plugins.c A good example of transition is in these pull requests for t (types) command conversion:\nMigrating Types to the Newshell (1) Migrating Types to the Newshell (2) Migrating Types to the Newshell (3) Adding autocompletion for types commands Use internal API instead of commands Currently, Rizin\u0026rsquo;s source code is rife with calls to rz_core_cmd()-like functions that run the Rizin command. While it is a useful shortcut for developer, it makes a good source of the potential bugs in case of the command syntax or behavior change. If these changes happen they are invisible to the compiler, so it cannot warn on the changed syntax. It isn\u0026rsquo;t the case of changed function arguments count or type. Thus, all these calls eventually should be substituted with direct calls to the corresponding API functions. If there is no corresponding API function, then one should be created. Good examples of such cases are:\nRefactor Graph processing from commands to the API use Refactor Visual mode from commands to the API use Refactor Panels mode from commands to the API use In general you can just search for rz_core_cmd pattern in any place inside librz/.\nImproving the uplifting of the code to IL Rizin has its own intermediate language - ESIL, but not yet support it for all architectures. So the task is to add ESIL support to any architecture, which doesn\u0026rsquo;t has it yet.\nMiscellaneous Shell (dietline) improvements Currently Rizin uses its own readline-compatible implementation of the input handling in the embedded shell that is compact and portable between all supported platforms. It supports both Emacs and Vi modes, but not all bindings and features are supported. Some are omitted by choice, but some were simply not implemented. See the \u0026ldquo;dietline\u0026rdquo;-labeled issues.\nImproving regression suite and testing It is required to solve numerous issues, along with improving parallel execution and performance. Good example is to allow better filtering of the test types to run, for example to ignore debug tests. The next interesting idea is to setup and reuse Godbolt compilation engine for generating tests for different compilers and compilation options. There is even a command line tool for interacting with Godbolt - cce.\nAnother important part of the improving test suite is to cover more different formats and cases with expanding it. See the #114 issue with more details on how it can be done.\nRzGhidra There are many small issues in the decompiler output:\nString detection problem and one more. Show function arguments in calls pdgsd commands showing incorrect P-code Prioritize keeping vars with lower addresses Minor improvements for the SLEIGH plugin Some of these issues might be related on how Rizin and RzGhidra integrate and might require changes in the Rizin side.\nAlso note that most of these issues should be paired with the test to verify it will not break in the future.\n","permalink":"https://rizin.re/gsoc/2021/","summary":"TL;DR Jump to the Ideas list.\nIntroduction Each year since 2015, we have participated in Google Summer of Code as the Radare2 project and accomplished many goals. This year we participate as a fork - Rizin, but effectively continuing the same process and the same mentors.\nMentors Members of the Rizin and Cutter core teams have volunteered to guide students for GSoC’21. They were already guiding the students for the GSoC and RSoC in past years as part of the Radare2 project.","title":"GSoC 2021"},{"content":"Concrete high-level feature areas and changes.\n0.9 Add support for missing members of H8 MCU family, and implement RzIL uplifting of them Complete migration from ESIL to RzIL for all supported architectures and features Improve FreeBSD, NetBSD, and OpenBSD debugging Improve ARM64 and PowerPC debugging Migrate from Capstone to Zydis for x86 architecture to address long-standing problems with unsupported x86 instructions Support STABS (pre-DWARF) debug information loading Add support for proper preprocessor in the type parser Refactor types to introduce type scope Rewrite RzNum to support proper formulas, bitvectors, floats, and so on Remove concept of the \u0026ldquo;block\u0026rdquo; in favor of direct transparent IO access Full milestone is at https://github.com/rizinorg/rizin/milestone/21\n1.0 Add KB (Knowledge Base) support for storing metainformation in logic fact-based form Stable and documented API Refactor and merge various visual modes Refactor native debugger Big files loading support Remove GPL-only code in favor of LGPL Create documentation for the framework structure and all modules Create RzIL specification Full milestone is at https://github.com/rizinorg/rizin/milestone/5\n","permalink":"https://rizin.re/roadmap/","summary":"Concrete high-level feature areas and changes.\n0.9 Add support for missing members of H8 MCU family, and implement RzIL uplifting of them Complete migration from ESIL to RzIL for all supported architectures and features Improve FreeBSD, NetBSD, and OpenBSD debugging Improve ARM64 and PowerPC debugging Migrate from Capstone to Zydis for x86 architecture to address long-standing problems with unsupported x86 instructions Support STABS (pre-DWARF) debug information loading Add support for proper preprocessor in the type parser Refactor types to introduce type scope Rewrite RzNum to support proper formulas, bitvectors, floats, and so on Remove concept of the \u0026ldquo;block\u0026rdquo; in favor of direct transparent IO access Full milestone is at https://github.","title":""},{"content":"Our Pledge We as members, contributors, and leaders pledge to make participation in our community a harassment-free experience for everyone, regardless of age, body size, visible or invisible disability, ethnicity, sex characteristics, gender identity and expression, level of experience, education, socio-economic status, nationality, personal appearance, race, religion, or sexual identity and orientation.\nWe pledge to act and interact in ways that contribute to an open, welcoming, diverse, inclusive, and healthy community.\nOur Standards Examples of behavior that contributes to a positive environment for our community include:\nDemonstrating empathy and kindness toward other people Being respectful of differing opinions, viewpoints, and experiences Giving and gracefully accepting constructive feedback Accepting responsibility and apologizing to those affected by our mistakes, and learning from the experience Focusing on what is best not just for us as individuals, but for the overall community Examples of unacceptable behavior include:\nThe use of sexualized language or imagery, and sexual attention or advances of any kind Trolling, insulting or derogatory comments, and personal or political attacks Public or private harassment Publishing others\u0026rsquo; private information, such as a physical or email address, without their explicit permission Other conduct which could reasonably be considered inappropriate in a professional setting Enforcement Responsibilities Community leaders are responsible for clarifying and enforcing our standards of acceptable behavior and will take appropriate and fair corrective action in response to any behavior that they deem inappropriate, threatening, offensive, or harmful.\nCommunity leaders have the right and responsibility to remove, edit, or reject comments, commits, code, wiki edits, issues, and other contributions that are not aligned to this Code of Conduct, and will communicate reasons for moderation decisions when appropriate.\nScope This Code of Conduct applies within all community spaces, and also applies when an individual is officially representing the community in public spaces. Examples of representing our community include using an official e-mail address, posting via an official social media account, or acting as an appointed representative at an online or offline event.\nEnforcement Instances of abusive, harassing, or otherwise unacceptable behavior may be reported to the community leaders responsible for enforcement at moderation@rizin.re. All complaints will be reviewed and investigated promptly and fairly.\nAll community leaders are obligated to respect the privacy and security of the reporter of any incident.\nEnforcement Guidelines Community leaders will follow these Community Impact Guidelines in determining the consequences for any action they deem in violation of this Code of Conduct:\n1. Correction Community Impact: Use of inappropriate language or other behavior deemed unprofessional or unwelcome in the community.\nConsequence: A private, written warning from community leaders, providing clarity around the nature of the violation and an explanation of why the behavior was inappropriate. A public apology may be requested.\n2. Warning Community Impact: A violation through a single incident or series of actions.\nConsequence: A warning with consequences for continued behavior. No interaction with the people involved, including unsolicited interaction with those enforcing the Code of Conduct, for a specified period of time. This includes avoiding interactions in community spaces as well as external channels like social media. Violating these terms may lead to a temporary or permanent ban.\n3. Temporary Ban Community Impact: A serious violation of community standards, including sustained inappropriate behavior.\nConsequence: A temporary ban from any sort of interaction or public communication with the community for a specified period of time. No public or private interaction with the people involved, including unsolicited interaction with those enforcing the Code of Conduct, is allowed during this period. Violating these terms may lead to a permanent ban.\n4. Permanent Ban Community Impact: Demonstrating a pattern of violation of community standards, including sustained inappropriate behavior, harassment of an individual, or aggression toward or disparagement of classes of individuals.\nConsequence: A permanent ban from any sort of public interaction within the community.\nAttribution This Code of Conduct is adapted from the Contributor Covenant, version 2.0, available at https://www.contributor-covenant.org/version/2/0/code_of_conduct.html.\nCommunity Impact Guidelines were inspired by Mozilla\u0026rsquo;s code of conduct enforcement ladder.\nFor answers to common questions about this code of conduct, see the FAQ at https://www.contributor-covenant.org/faq. Translations are available at https://www.contributor-covenant.org/translations.\n","permalink":"https://rizin.re/code-of-conduct/","summary":"Rizin\u0026rsquo;s Code of Conduct","title":"Code Of Conduct"},{"content":"","permalink":"https://rizin.re/community/","summary":"community","title":"Community"},{"content":"","permalink":"https://rizin.re/teams/community/","summary":"","title":"Community Team"},{"content":"","permalink":"https://rizin.re/teams/core/","summary":"","title":"Core Team"},{"content":"","permalink":"https://rizin.re/teams/cutter-core/","summary":"","title":"Cutter Core Team"},{"content":"","permalink":"https://rizin.re/teams/distributions-and-packaging/","summary":"","title":"Distributions and Packaging Team"},{"content":"","permalink":"https://rizin.re/teams/documentation/","summary":"","title":"Documentation Team"},{"content":"","permalink":"https://rizin.re/teams/infrastructure/","summary":"","title":"Infrastructure Team"},{"content":"","permalink":"https://rizin.re/organization/","summary":"organization","title":"Organization"},{"content":"","permalink":"https://rizin.re/teams/package-manager-and-plugins/","summary":"","title":"Package Manager and Plugins Team"},{"content":"","permalink":"https://rizin.re/teams/security/","summary":"","title":"Security Team"}]
\ No newline at end of file
+[{"content":"This year we focused mainly on the \u0026ldquo;backbone\u0026rdquo; of the Rizin framework and all related tools, including Cutter. This will become a foundation of the future work we plan to finish in 2025. The major goal is to release 0.8.0 in upcoming months. As for the longer term you can see our roadmap for details.\nReleases Rizin versions 0.7.x and Cutter 2.3.3-2.3.4 80% of work is done for Rizin 0.8.0 Capstone A bulk of our effort was spent towards Capstone improvements as it is the core dependency and the main disassembly engine for many architectures supported by Rizin. A long-going project called Auto-Sync was finally merged and largely completed. You could read more details in our corresponding article.\nUpdated PPC, ARM, AArch64 and SystemZ to LLVM 18. Tricore support by billow HPPA by r3v0lt Alpha by r3v0lt ARC PR by r3v0lt (not yet merged: #2570) MIPS, microMIPS, and nanoMIPS support by deroad Xtensa by billow Rizin Started merging RzAsm and RzAnalysis for the future RzArch MIPS update to a Capstone-based plugin by deroad Basic LoongArch support by deroad PIC MCU family RzIL uplifting by billow MSP430 RzIL uplifting by moste00 Xtensa RzIL uplifting by billow Hexagon RzIL uplifting by Rot127 Finished (almost, with one PR still not yet merged) conversion to the rzshell Rewritten text representation of ELF, PE, NE relocations by Roeegg2 Added initial support for the Alpha architecture Many bugfixes, refactorings, and performance optimizations Cutter Switch to the Qt6 by default, including for forming releases Updated Rizin Other projects rz-ghidra: add support for PIC architectures rz-ghidra: add support for Tricore architecture Miscellaneous Google Summer of Code 2024 FOSSAsia 2024 conference ","permalink":"https://rizin.re/posts/year-2024-summary/","summary":"An overview of the work done in 2024","title":"2024 Year Summary"},{"content":"Hello, I’m Mostafa. I graduated with Excellence from Cairo University’s Faculty of Engineering, Computer Engineering Department, class of 2023. I write C++ for a living. I love systems programming, metaprogramming \u0026amp; DSLs, as well as Compilers \u0026amp; VMs. You can find me @Github, and @Linkedin.\nI was honored to participate again as a contributor in the 2024 GSoC with the Rizin Organization. The original project was implementing binary lifting techniques for RISC-V instructions onto Rizin\u0026rsquo;s custom internal representation, called RzIL. However, updating the RISC-V Capstone disassembler (originally a small task in the project preamble) turned out to need much more work than expected, and blocked the rest of the project.\nLet\u0026rsquo;s start at the beginning.\nRISC-V… Lifting? Lifting is a term of art in compiler research and implementation, it refers to any process that takes as input a low-level machine code program and outputs a higher-level program. The reverse process, lowering, is what compilers do when they compile from a relatively high level language like C or LLVM IR to machine code. So you could simply think of lifting as a synonym for “Decompiling” or “Reverse-Compiling”.\nIn the context of Rizin, lifting refers to transforming a machine code program written for any of the hardware architectures that Rizin understands (x86, RISC-V, 6502, etc\u0026hellip;) to a Rizin-specific intermediate language called RzIL.\nBy doing this, Rizin’s developers can write generic analysis algorithms that interpret RzIL instructions, and a generic VM that executes them, only once. Then, for each architecture that Rizin supports, a lifter that transforms machine code written for that architecture into RzIL is written, and as a result we get all the analysis algorithms and VM execution capabilities “for free”.\nIn a nutshell, RzIL is the universal “Lingua Franca” for Rizin, like English is for Software Engineering.\nFigure 1: Without RzIL, there is no smarter way to perform N operations for M assembly languages other than doing an NxM amount of work, implementing the N operations over and over again per each language/architecture.\nFigure 2: With RzIL, the amount of work to support N operations for M architectures is N+M, the N operations are written exactly once for the intermediate language, then M transformers are written to lift each of the M architectures to the intermediate language.\nFor want of a disassembler So the original plan was to write the grey arrow in the figure above: a lifter from RISC-V machine code into RzIL. However, the first step in doing that is to “parse” RISC-V instructions from their binary form into a convenient data structure. We call that “parsing” step disassembly, or, more accurately, decoding.\nSide Note: lots of people, when “disassembly” and “assembly” are mentioned, will probably think of the following diagram:\nThis is not wrong for most purposes. However, in the context of this writeup it’s better to have the following and more detailed picture in mind:\nIn this writeup I’m more interested in the left-to-right flow: decoding from a binary to a structured (e.g. C struct) representation of the instruction, then assembling the structured representation of the instruction into a string form. Confusingly, sometimes “Disassembly” is used to include both Disassembly and Decoding, for example in Capstone the structured representation includes as a member its own toString serialization. It will often be clear from context what step is meant, and decoding is often far more important than disassembly.\nWhere were we? Ah yes, we were supposed to “parse” (i.e. decode) an instruction from its binary form into a convenient data structure, so that we can write elegant code that easily and robustly lifts it into RzIL.\nThe good news is that Rizin already has a RISC-V decoder/disassembler, since it uses as a library the project Capstone, which is a general-purpose disassembler framework for multiple architectures, including RISC-V.\nThe bad news? The RISC-V disassembler was incomplete and out of date.\nYou won\u0026rsquo;t catch it missing a variant of an ADD or a SUB, not even MULs or DIVs, but you can catch it missing the zba, clz, or xnor instructions, for example. Those, respectively, accelerate array indexing, count leading zeros, and perform an exclusive NOR. Capstone\u0026rsquo;s current RISC-V disassembler includes none of those instructions. We could argue whether those instructions are really \u0026ldquo;Useful\u0026rdquo; or \u0026ldquo;Common\u0026rdquo; in real software: but at the end of the day they\u0026rsquo;re part of RISC-V, and any compliant RISC-V tool must be aware of them. Capstone sometimes also chokes on quite basic instructions, like LOAD.\nRISC-V has a somewhat unusual approach to ISA evolution: it embraces extensions openly in its standard. Most architectures define new \u0026ldquo;versions\u0026rdquo; or \u0026ldquo;editions\u0026rdquo; whenever they change, RISC-V instead defines self-contained \u0026ldquo;modules\u0026rdquo; of behaviour and ISA state, even opening the door to vendors (companies selling SoCs and other products with RISC-V cores) to make their own vedor-defined extensions that co-exist with the rest of the architecture and its standard extensions. Each extension as well as the base architecture could evolve through different versions indepedently from other extensions. The RISC-V architecture is thus more of a family of architectures specified together rather than a single one.\nCapstone was originally written based on codegen logic from LLVM. It’s essentially a port of LLVM disassembly logic from C++ to C (along with much simplification and cleaning up). Unfortunately, LLVM keeps updating that logic to reflect the fast-moving development and evolution of the architectures; those updates are not magically reflected back into Capstone! To make matters even worse, even LLVM proper can’t completely keep up with all the updates that happen to all the architectures it supports, it lags.\nThe Capstone project maintains an update tool called Auto-Sync, which can semi-automatically synchronize changes from LLVM to Capstone (using Tree-sitter magic). Alas, it can only do that for some architectures, and RISC-V is not among the supported ones. Also, we already saw how even LLVM is not completely on top of all updates. Fortunately, the solution exists, just hiding elswhere.\nTo Sail the high seas and RISC it all The problem of describing Instruction Set Architectures (ISAs) accurately so that we can do plenty of useful things to them (assembly, disassembly, emulation, codegen, etc…) faces many projects and researchers, so much so that some smart people have developed an entirely new special language for it, Sail. Sail is a language designed specifically to address the problem of describing all aspects of ISAs: how the instructions are encoded into binary, how they execute, etc…\nNow, if only there was a project that used Sail to describe RISC-V… wait, there is! It’s called Sail-RISCV. It’s such a complete and up-to-date description of RISC-V that the RISC-V foundation adopted it as the official source of truth for the architecture, this means that however the Sail code behaves, is - by definition - how RISC-V should behave.\nOther architectures modelled in Sail are several versions of ARM, a considerable part of x86, and a research version of MIPS called CHERI-MIPS, which includes hardware extensions to assist and accelerate memory safe pointers. The ARM and x86 models are auto-generated from other descriptions, and all 3 models are much less active than RISC-V\u0026rsquo;s.\nLet’s see a snippet of what Sail looks like in practice, here’s the definition of RISC-V IType (immediate) instructions:\nThe rule might be as cryptic as latin if you’re not used to pattern-matching constructs from functional languages, but what it’s saying is simply the following:\nIf the first (least-significant) 7 bits of the 32-bit instruction are: 0010011 And if the 3 bits from bit 12 to bit 14 are in the table encdec_iop Then, the operation specified by this instruction is whatever enum corresponding to bits 12:14 in the encdec_iop table, and the args are: The 5-bit register index rd in bits 7 through 11 The 5-bit register index rs1 in bits 15 through 19 The 12-bit literal imm in bits 20 through 31 Otherwise, if (1) and (2) are not true, keep checking the 32-bit binary instruction against other rules In case you’re wondering what regidx is, it’s an alias for the type of 5-bit integers (or, as Sail calls them, bitvectors). Sail-RISCV uses it to refer to registers everywhere because register files in RISC-V always have 32 registers.\nIn case you’re wondering about the double arrow, that’s because this rule is a clause in a “Mapping”, a Sail innovation that basically means a bidirectional function: it can be used to decode binary instructions into structured objects, and to encode structured objects into binary instructions (that’s why it’s called encdec!).\nSail-RISCV contains ~280-290 rules of this form, and hundreds of other rules, sub-rules, and varied logic describing how RISC-V instructions are assembled, executed, how memory is accessed, how privliege levels and syscalls work, and so on.\nWe want this information, and we want it in C. We could port it by hand into C (good luck finishing in 2/3 years :(), or… we can use trusty code generation.\nRISC-V Auto-Sync So this is what my GSoC project this year was all about:\nUse Sail’s compiler (written in OCaml) as a library to load, parse, and typecheck Sail-RISCV Process the AST of Sail-RISCV and generate data structures representing the important logic Generate C code from those data structures That is, the code I wrote transformed the rule above into the following C code:\n// ---------------------------ITYPE------------------------------- { if (((binary_stream \u0026amp; 0x000000000000007F) == 0x13)) { uint64_t op = 0xFFFFFFFFFFFFFFFF; switch ((binary_stream \u0026amp; 0x0000000000007000) \u0026gt;\u0026gt; 12) { case 0x7: op = RISCV_ANDI; break; case 0x3: op = RISCV_SLTIU; break; case 0x2: op = RISCV_SLTI; break; case 0x6: op = RISCV_ORI; break; case 0x4: op = RISCV_XORI; break; case 0x0: op = RISCV_ADDI; break; } if (op != 0xFFFFFFFFFFFFFFFF) { uint64_t rd = (binary_stream \u0026amp; 0x0000000000000F80) \u0026gt;\u0026gt; 7; uint64_t rs1 = (binary_stream \u0026amp; 0x00000000000F8000) \u0026gt;\u0026gt; 15; uint64_t imm = (binary_stream \u0026amp; 0x00000000FFF00000) \u0026gt;\u0026gt; 20; tree-\u0026gt;ast_node_type = RISCV_ITYPE; tree-\u0026gt;ast_node.itype.imm = imm; tree-\u0026gt;ast_node.itype.rs1 = rs1; tree-\u0026gt;ast_node.itype.rd = rd; tree-\u0026gt;ast_node.itype.op = op; return; } } } //------------------------------------------------------------ This low-level soup of shifts and masks performs the exact logic described in the Sail snippet earlier, just in C. It continues on like that for 9K lines of generated code (#includeing approximately 2K lines of generated AST definition).\nIn addition to this, the logic that disassembles the decoded structured objects into strings is also translated. Overall, the generated code is about 20K of C, but it’s still not finished yet.\nLoose Ends The 20K of generated C code is still not merged into Capstone. Before merging, it must first incorporate additional logic into the generated decode functions, it must also infer the type of each operand (whether it’s a register, a memory address, or a literal. If it’s a register, is it floating point or integer, etc…). Those details can easily consume another writeup, but they\u0026rsquo;re all managable.\nThe generator itself is ~2500 lines of idiomatic (I hope) OCaml. But since Sail is a complex language, the tool must make some assumptions about the input that might not survive the evolution of Sail-RISCV, the application to other Sail models, or the active evolution of Sail itself.\nFinally, we haven’t addressed the original problem yet! I hope to eventually and finally write the grey arrow in the first diagram: Lifting RISC-V instructions to RzIL code.\nConclusion Let’s summarize this rollercoaster journey:\nWe just wanted to write a binary lifter for RISC-V instructions into RzIL, Rizin’s intermediate language. But in order to do that, we first have to have an up-to-date RISC-V decoder/disassembler. Rizin depends on Capstone for RISC-V disassembly, but Capstone RISC-V disassembly logic is ported from old LLVM logic that is not up-to-date. Even modern LLVM is not completely up-to-date with RISC-V. But Sail-RISCV is, and it\u0026rsquo;s adopted by the RISC-V foundation as the most authoritative model of the RISC-V architecture. And thus, we can generate a Capstone disassembler module from Sail-RISCV, by depending on the Sail compiler as a library. It was fun. Frustrating and long-winded at times, but what kind of programming isn’t? That’s part of the thrill anyway!\nThat\u0026rsquo;s all and Happy Holidays! Keep coding through the wind and the snow.\n","permalink":"https://rizin.re/posts/gsoc-2024-auto-sync-sail/","summary":"A description of the original GSoC 2024 task plans of updating RISC-V disassembler in Capstone, updating it in Rizin, and implementing RzIL uplifting and the actual progress","title":"GSoC 2024 - RISC-V Capstone auto-sync and RzIL uplifting"},{"content":"We are grateful to Google for being able to participate in Google Summer of Code 2024. We received many applications and are happy that the project has substantial interest. We thank every participant and wish them luck in their future endeavors. We also thank Google for providing us with the platform to attract new contributors. Many of the past participants stayed with the project after previous GSoC and RSoC programs, and we sincerely hope it will continue.\nThis summer, the accepted projects aim to improve the ROP gadget searching capability, add an ROP compiler in Rizin, and uplift more architectures to our next-generation intermediate language - RzIL.\nz3phyr: Exploitation Capabilities Improvements Hi! I’m Giridhar Prasath Rajendran (a.k.a z3phyr). I’m a student pursuing my Master\u0026rsquo;s in Cybersecurity at the University of Maryland, College Park. I have experience developing user-space networks and web applications. I enjoy playing binary exploitation CTF challenges and am passionate about anything low-level.\nI started contributing to Rizin in January 2024 by fixing a UI issue PR#4095. As a part of my microtask, fixing Issue#1259 seemed perfect as I learned more about Linux heap exploitation. PR#4355, PR#4426 were merged as a part of fixing this issue. This helped me strengthen my knowledge about TLS and Glibc heap internals. I was also using this feature in my binary exploitation class assignments.\nI will integrate the ROP chain generation feature with the rz-gg tool. This involves:\nGadget Analysis: Implementing functionality to analyze raw assembly gadgets, categorize them based on their semantics (e.g., load, store, syscall), and store this information in a Gadget DB. Gadget Selection and Chaining: Developing an API to process constraints (e.g., set register RDI to \u0026ldquo;/bin/sh\u0026rdquo;, set RSI to NULL) using an SMT solver. This API will automatically select appropriate gadgets from the database to construct a functional ROP chain. Interaction: Provide an interface in rz-gg tool to allow users to generate ROP chains based on the specified constraints The initial support will cover architectures like x86, x86-64, and ARM.\nI look forward to a great summer of contributing and learning. I would like to thank the maintainers for their support and Google for providing this opportunity.\nmoste00: Uplifting RISC-V Instructions to RzIL Hello, my name is Mostafa Mahmoud (aka moste00 @ github, kotlinenjoyer @ mattermost). I graduated from Cairo University, Faculty of Engineering, a Computer Engineer. I\u0026rsquo;m passionate about anything and everything involving metaprogramming, macro systems, compilers, developer tools, IDEs, debuggers, virtualization \u0026amp; virtual machines, operating systems, computer architectures and hardware, plugin systems, and many other things! If I were to summarize my interests in one sentence, I would say that I like programs that \u0026ldquo;manage\u0026rdquo; or \u0026ldquo;provide services for\u0026rdquo; other programs, programs that serve as the bottom layers in a software stack, programs that manipulate other programs as data, and programs that other programs \u0026ldquo;run on top of\u0026rdquo;, so to speak.\nI started to contribute to Rizin in early March 2024. My first PR was a cleaning up of and removing dead code in the source for the database SDB, then I cleaned up the code for the Z80 assembler by removing all global variables and moving them into a state struct. The great people at Rizin were consistently helpful at every step, first providing tips on how to build Rizin, set up its dev env, navigate its large codebase, and provide helpful code reviews when I had the PRs ready.\nOn advice from my mentor in Rizin, I started to write a RzIL lifter for the very simple architecture MSP430, the purpose being to get used to writing IL lifters and encountering the challenges that people typically face while writing one. The incomplete lifter is here. It still needs to be finished as of the time of writing this, but I hope it will soon be!\nThe MSP430 pull request is itself just a micro-task and preparation for the main task I will do in GSoC 2024: Lifting the RISC-V architecture assembly into the RzIL intermediate language, a 350-hour project. I have set the MVP for this project at the point where the RV-32I and the RV-32F (Integer instructions and Floating-Point instructions, respectively) subsets of RISC-V are both lifted and tested using trace-testing, but see my proposal for stretch goals as well as more details, such as the proposed road plan.\nIn conclusion, by the end of my GSoC project in approximately October or early November of 2024, I hope that Rizin will have the capability to transform into RzIL the assembly for two new architectures, the MSP430, and the RISC-V instruction set. My goal is to implement this in a clean, efficient, and easy-to-reason-about and maintain fashion. I hope that I get more competent and knowledgeable about different Computer architectures and IL compilers because of working with all the smart and knowledgeable people around me in Rizin. I hope Rizin continues to attract contributors interested in low-level programming and reverse-engineering dev tools.\nIt\u0026rsquo;s always a pleasure to participate in GSOC; I look forward to a summer full of hacking :\u0026rsquo;).\n","permalink":"https://rizin.re/posts/gsoc-2024-announcement/","summary":"An announcement of the Google Summer of Code 2024. Two accepted candidates.","title":"Google Summer of Code 2024 Announcement"},{"content":"Updated on 2024.10.01\nA disassembler is obviously a must-have tool to do any reversing task. But using just any disassembler, especially for frameworks like Rizin, doesn\u0026rsquo;t really do it.\nThere are several capabilities which would be nice to have.\nIt should:\nBe correct. And if it isn\u0026rsquo;t, it should be easy to test and spot the error (in our case we want to compare the output directly to llvm-objdump). Provide a single API for multiple architectures. Support niche architectures or make it relatively easy to add them. Apart from the text disassembly, provide additional information about the operands and other meta-data. Be easy to update when new processor extensions come out. Relatively lightweight. Written in C or any other language that is easy to integrate into C/C++ software (specifically needed by Rizin/Cutter) One of the first disassembler engines which was capable of some of those points was Capstone. Quynh Nguyen Anh, the author of Capstone, figured that all the information we need, is basically already there in compiler projects like LLVM.\nObviously, for compilation you need the same and much more information you would need for disassembling. And you need them in a well-defined and machine-readable way.\nSo, what Capstone did, was re-implementing the LLVM disassembler logic in C, add meta-data for each instruction from the architecture definitions (also given in the LLVM-project) and add a single API to interact with it.\nTo summarize, Capstone is in the end:\nA more lightweight API than LLVM because it re-implements only the necessary code for disassembly from LLVM. Can support as many architectures as LLVM supports (if someone ports them to Capstone). Provides more information than the textual disassembly one gets via llvm-objdump. Relies on a well maintained and large project, which will (likely) be there even in 10+ years and is managed by people who know more about the architectures. The big problem with Capstone was though, that it hadn\u0026rsquo;t a working update mechanism. There were a bunch of Python scripts and very little documentation. Definitely an unsustainable solution.\nDue to this, Capstone became outdated over the years and most disassembler modules didn\u0026rsquo;t support modern processor extensions.\nWhat can be done? Besides LLVM we attempted once to generate a disassembler module for the Hexagon architecture (a DSP architecture from Qualcomm). But instead of LLVM, we used the ISA PDF for our first try. We parsed it and generated the decoding tables for the instructions. This worked, but was a little messy. Parsing PDF files is not fun and as soon as the PDF file changes somehow, stuff is broken again. Also, it is hard to test if you actually extracted the encoding information from the PDF correctly.\nOur second attempt uses LLVM. LLVM provides a way to get the definitions of an architecture in JSON format (llvm-tblgen --dump-json). The JSON dump has all the details about instructions you can wish for. Opcodes, operand types, read/write info and more. Pretty much anything you could wish for. With this experience we decided that LLVM proved to be a good source for disassembler generation.\nNow, with this experience we decided we could extend Capstone with a proper updater. The alternative, implementing something Capstone like from scratch in a new project, did not really seem a good idea. Capstone has already a large user base, and we would need to migrate to the new tool as well. The last point is maybe annoying but doable. Getting a user base again is a way harder task.\nSo over the last two years we added an updater to Capstone. With it, we updated some core modules (ARM, AArch64, PPC + Paired Single, SystemZ, Mips + NanoMips) and added new ones (Alpha, TriCore, Xtensa, HPPA). And to our delight jiegec and FurryAcetylCoA added support for LoongArch.\nIn the following blog post we\u0026rsquo;ll not just explain the update procedure in detail, but also reflect on some challenges and problems you run into when you generate disassemblers.\nHow LLVM generates its disassemblers LLVM is our ground truth Capstone is built on. So let\u0026rsquo;s start with it.\nHow is the LLVM disassembler generated and how does it work?\nLLVM defines its various supported architectures in a language, specifically designed for this purpose. Each target\u0026rsquo;s instructions, instruction operands, scheduling information and more is written in the TableGen language. The definitions can be found in llvm/lib/Target/\u0026lt;TARGET-NAME\u0026gt;/*.td.\nPlease note, that from now on we use \u0026ldquo;target\u0026rdquo; and \u0026ldquo;architecture\u0026rdquo; are interchangeable terms. In the LLVM realm we speak about a target. In Capstone context about an architecture.\nSince each target is defined in the same way, LLVM can apply the same procedures on them to generate C++ code with it. This is way better than implementing every target directly in C++. Because otherwise a disassembler must be implemented again and again for each target. This would be of course too much effort. With TableGen every instruction is already well-defined in the td files. So, LLVM uses a universal method to generate a decoder for each of them. There is still some manual work left. E.g. handling edge cases and parsing operand bits. But the core logic is generated.\nTo use the content of the td files in a programmable way, the llvm-tblegen tool parses them and converts its content into C++ classes and saves them in a RecordKeeper. The RecordKeeper class is TableGen\u0026rsquo;s internal representation of the td files content. Which can now be used to generate arbitrary code. These classes basically hold all the td file information in a uniform and programmable way. The C++ classes belong logically to the so called CodeGen layer. Which, as the name already says, are used to generate code.\nExample:\nDefinition of the ARM setend instruction in TableGen:\ndef SETEND : AXI\u0026lt;(outs), (ins setend_op:$end), MiscFrm, NoItinerary, \u0026#34;setend\\t$end\u0026#34;, []\u0026gt;, Requires\u0026lt;[IsARM]\u0026gt;, Deprecated\u0026lt;HasV8Ops\u0026gt; { bits\u0026lt;1\u0026gt; end; let Inst{31-10} = 0b1111000100000001000000; let Inst{9} = end; let Inst{8-0} = 0; } becomes a C++ class of the form of this\nSETEND {\t// InstructionEncoding Instruction InstTemplate Encoding InstARM XI AXI Requires Deprecated // Instruction bits, the operand bits are marked (in this case only one bit for \u0026#34;end\u0026#34;). field bits\u0026lt;32\u0026gt; Inst = { 1, 1, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, end{0}, 0, 0, 0, 0, 0, 0, 0, 0, 0 }; field bits\u0026lt;32\u0026gt; Unpredictable = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }; field bits\u0026lt;32\u0026gt; SoftFail = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }; int Size = 4; string DecoderNamespace = \u0026#34;ARM\u0026#34;; list\u0026lt;Predicate\u0026gt; Predicates = [IsARM]; string DecoderMethod = \u0026#34;\u0026#34;; bit hasCompleteDecoder = 1; string Namespace = \u0026#34;ARM\u0026#34;; // Note this list of in and out operands. Setend has only an operand which is read and no operands it writes. dag OutOperandList = (outs); dag InOperandList = (ins setend_op:$end); string AsmString = \u0026#34;setend\t$end\u0026#34;; list\u0026lt;dag\u0026gt; Pattern = []; list\u0026lt;Register\u0026gt; Uses = []; list\u0026lt;Register\u0026gt; Defs = []; int CodeSize = 0; int AddedComplexity = 0; bit isPreISelOpcode = 0; bit isReturn = 0; bit isBranch = 0; ... bit isBarrier = 0; bit isCall = 0; bit isAdd = 0; bit isTrap = 0; bit canFoldAsLoad = 0; bit mayLoad = ?; bit mayStore = ?; bit mayRaiseFPException = 0; ... bit doubleWidthResult = 0; SubtargetFeature DeprecatedFeatureMask = HasV8Ops; bits\u0026lt;1\u0026gt; end = { ? }; } Note, that the C++ class above has the same structure for each target. Hence, LLVM\u0026rsquo;s code generation, can reason on them without the need to know specific target details.\nNow, what code is actually generated? This depends on what you need. TableGen has several backends. Each of them uses the RecordKeeper's content to generate different files. For example, the RegisterInfo backend generates several tables with information about target registers. As mentioned above, the RegisterInfo backend doesn\u0026rsquo;t need to know details about targets specific registers. It just implements methods to generate an enumeration with all register names. Or it generates tables which map registers to their alias, or lookup tables which map bits to a register ID.\nGenerating enumerations is nice, but more complex C++ code is, of course, also generated. For us very relevant is the decoding logic, which decodes a byte sequence into a Machine Code instruction. A Machine Code instruction (MCInst) is the class which represents a target\u0026rsquo;s decoded instruction. It holds the ID of the instruction, its operands, some flags (isBranch etc.), and some more.\nDecoding procedures from bytes to MCInst are the same for each target (except x86, because historical reasons I guess). In the CodeGen layer we still know the encoding of each instruction of a target. Another backend, the DecoderEmitter, consumes these encodings and builds a state machine over them. The generated state machine simply checks certain bits and transitions into states. The end state is either an identified instruction or the disassembly failed. After the instruction ID is decoded, a big switch case is walked over to call the different decoder methods of the instruction\u0026rsquo;s operands.\nCheckout [`PPCGenDisassemblerTables.inc`](https://github.com/capstone-engine/capstone/blob/next/arch/PowerPC/PPCGenDisassemblerTables.inc). It contains this state machine. Or see the examples below. The key is: the state machine table and the big switch cases can be generated independently of the target. Each target still needs to implement the operand decoders, because those are unique, but this is essentially it. It saves quite some work, compared to implementing the decoding logic every time again and again.\nExcerpt from state machine\nstatic const uint8_t DecoderTableARM32[] = { // What to do in the state | Bits to check or to extract /* 0 */ MCD::OPC_ExtractField, 25, 3, // Extract 3 bits at offset 25 from byte sequence /* 3 */ MCD::OPC_FilterValue, 0, 47, 14, 0 // Check the certain bits for properties and transition to another state depending on the result. /* 8 */ MCD::OPC_ExtractField, 21, 1, /* 11 */ MCD::OPC_FilterValue, 0, 110, 7, 0 /* 16 */ MCD::OPC_ExtractField, 24, 1, /* 19 */ MCD::OPC_FilterValue, 0, 139, 1, 0 /* 24 */ MCD::OPC_ExtractField, 4, 1, /* 27 */ MCD::OPC_FilterValue, 0, 123, 0, 0 /* 32 */ MCD::OPC_ExtractField, 22, 2, /* 35 */ MCD::OPC_FilterValue, 0, 25, 0, 0 /* 40 */ MCD::OPC_CheckPredicate, 0, 11, 0, 0 // Check if a predicate is fulfilled (CPU feature X enabled etc.) and transision in a certain state depending on the result. ... Excerpt of the operand decoding switch statement\nswitch (Idx) { default: llvm_unreachable(\u0026#34;Invalid index!\u0026#34;); case 0: // Extract bits of operand tmp = fieldFromInstruction(insn, 12, 4); // Decode a GPR register and check if it worked. if (!Check(S, DecodeGPRRegisterClass(MI, tmp, Address, Decoder))) { return MCDisassembler::Fail; } tmp = fieldFromInstruction(insn, 16, 4); if (!Check(S, DecodeGPRRegisterClass(MI, tmp, Address, Decoder))) { return MCDisassembler::Fail; } tmp = fieldFromInstruction(insn, 0, 4); if (!Check(S, DecodeGPRRegisterClass(MI, tmp, Address, Decoder))) { return MCDisassembler::Fail; } tmp = fieldFromInstruction(insn, 28, 4); if (!Check(S, DecodePredicateOperand(MI, tmp, Address, Decoder))) { return MCDisassembler::Fail; } tmp = fieldFromInstruction(insn, 20, 1); if (!Check(S, DecodeCCOutOperand(MI, tmp, Address, Decoder))) { return MCDisassembler::Fail; } return S; A target\u0026rsquo;s disassembler module in LLVM consists effectively of two parts. The generated logic (those are written to .inc files) and handwritten decoder and printing methods. Decoders for operands like DecodeGPRRegisterClass from above, need to be implemented per target. They cannot be generated currently.\nThe handwritten code is in files like \u0026lt;ARCH\u0026gt;Disassembler.cpp or \u0026lt;ARCH\u0026gt;AsmWriter.cpp in their respective target source directories.\nLLVM to Capstone Capstone simply copies the LLVM disassembler and enriches the output. Because we do not want to build LLVM just to build Capstone (LLVM is a huge dependency), we have to tackle two problems:\nLLVM code is in C++, Capstone in C. The LLVM disassembler has no knowledge about read/write access of operands or instruction groups. Theoretically it could, but it is not implemented. For Capstone we need this information though. C++ and C For Capstone we need the C++ files in C. We also need the generated *.inc files, as well as handwritten disassembler components (\u0026lt;ARCH\u0026gt;Disassembler.cpp and \u0026lt;ARCH\u0026gt;AsmWriter.cpp from above).\nLets see how to get them in C.\nGenerate C code with TableGen We already described the generation procedure of the .inc files above in detail. Though what we have not mentioned is the way the actual code is emitted. The problem with the TableGen backends is, they do not separate the generation of their data from the actual printing of the code.\nFor example, the backend which generates the AsmWriter (the module which prints an asm string of an instruction) mixes it\u0026rsquo;s table generation with emitting code. There is no clear separation between generating abstract objects, like state machines and tables, and printing them into code. It is all intermingled.\nThis goes so far that it is even allowed to specify custom code in the td files for operands or instructions.\nSo, if we want TableGen backends emit C, we either need to redesign and rewrite them from scratch or patch them. Designing it from scratch is a rather complex task. And needs a lot of thought (see this discussion). Simply because of time constraints and because we don\u0026rsquo;t know if it will be merged, we sided with patching.\nOur patched TableGen backends work pretty straight forward. We add two new classes which only emit code. PrinterLLVM and PrinterCapstone. The PrinterLLVM emits the standard C++ code from LLVM. PrinterCapstone emits our C code. Each backend gets one of those printer classes assigned. And whenever it emits code, it calls the corresponding method of the printer. In practice, we simply moved the emitting code from the backend to the printer classes.\nGeneral design problems with TableGen backends The problem is, it is ugly. Although we are now able to emit C, it only works because C and C++ are so similar. An array initialization in C++ is almost the same as in C. So the code structure is basically the same. But there is no way to emit the same information (tables, functions etc.) in a different order or in a fundamentally different language (think of Lisp). This is a simple necessity how the backends were build. Because there is no clear separation between generating logic and printing it as code, the backends in the current form cannot be refactored nicely to emit code in other languages than C++.\nMost of it is also untouched for 10 years and was never modernized. This is understandable, since it never really was necessary. It works for the current use case (generate code for LLVM tools). But it doesn\u0026rsquo;t allow using the generated logic in any other way.\nFor example, the state machine for decoding bytes to instructions is useful logic. Also for non-LLVM projects. It could be written once and used by everyone else. But it is pretty much hard-coded to provide the state machine only in C++.\nThis is unfortunate. LLVM is a huge project, and many tools use the information about architectures it provides. Providing these kinds of often used algorithms in an accessible way, would be a nice addition.\nTranslating C++ to C But back to the problem at hand. While we have now the generated C code, we still have handwritten code in C++. As mentioned before, the operand decoder and printer methods are handwritten in LLVM. Additionally, some edge cases are handled there as well. These files have to be translated from C++ to C.\nDoing this by hand is a tedious task. We need to do it for every architecture module again and again. Because those files are not shared between targets. And if we add a new architecture module from LLVM to Capstone, we would need to translate multiple thousand lines of C++ to C.\nThis of cause is not particular fun and hinders people to do it at all. Hence, we built the Auto-Sync framework to do most of the annoying work.\nThe translation process follows a simple procedure. We have a bunch of patches defined. Each patch replaces certain syntax in an C++ file with its C equivalent.\nTo find the patterns we want to replace we use tree-sitter. It allows us to query for specific syntax in the abstract syntax tree (AST) of the file. And since we translate source code, it is way easier to search in an AST, instead in the file content itself.\nTo control the patching, we have a controller called CppTranslator. It simply:\nOpens each source file Reads and parses the file with tree-sitter into an AST for each Patch: Match the Patch\u0026rsquo;s tree-sitter query in the AST. If it found something, get the equivalent C code from the Patch. Replace the C++ code with the C equivalent. Example:\nMI::addImm(int(10)); Let\u0026rsquo;s say we want to patch int(10) to its C equivalent of (int)(10). The Patch for this has a tree-sitter query for this pattern:\n// The @ names elements in the query. (call_expression // Matches a call expression. (primitive_type) @cast_type // Matches primitive types like int, unsigned etc. (argument_list) @cast_target // Matches anything within the () brackets ) @cast If the CppTranslator finds a substring matching the pattern, it passes it as a capture to the Patch. A capture is just a dictionary with the named sub-strings found. In our example it contains cast: \u0026quot;int(10)\u0026quot;, cast_type: \u0026quot;int\u0026quot; and cast_target: \u0026quot;(10)\u0026quot;.\nNow it is trivial to concatenate sub-strings to (int)(10) and return it. The CppTranslator now replaces int(10) in the source file with (int)(10).\nThe result:\nMI::addImm((int)(10)); This is done with most C++ syntax. Of course, there are exceptions. Some C++ concepts are so complex to replace, we implement special scripts for them (e.g. C++ templates). But the end-result is a source file which has very little C++ syntax left.\nDiffing Note: The diffing step is still unstable and is not yet reproducable. Translating C++ files only gets so good. After all patches were applied to the file, it will likely not compile. Some syntax issues are just too difficult to fix automatically.\nFixing a handful of issues by hand again and again, is a tedious task. Especially, if you need to run the whole translation procedure multiple times. We can hardly ask users to do the fixes by hand again every time they ran the translator.\nThe mechanism to solve this annoyance is diffing. It basically works as you know it from git. The difference is it isn\u0026rsquo;t file focused like git but diffs tree-sitter queries. You can decide what to diff in the configuration, but let\u0026rsquo;s go over it in an example:\nSo let\u0026rsquo;s assume we update a fictional architecture.\nWe have an old \u0026lt;ARCH\u0026gt;Disassembler.c file which might need an update. In there we have a function which decodes an operand:\nvoid decodeOperandA(MCInst *MI, unsigned OpNum, unsigned Val) { if (MCInst_isPredicable(MI)) { MCOperand_CreateImm0(MI, Val + 1); } MCOperand_CreateImm0(MI, Val); } The same function in the original C++ file looks like this:\nvoid decodeOperandA(MCInst \u0026amp;MI, unsigned OpNum, unsigned Val) { if (MI.isPredicable()) { MI.createImm(Val + 1); } MI.createImm(Val); } Now, after we ran the CppTranslator the result is almost valid C. But the translation was not perfect and there is still a method invocation left:\nvoid decodeOperandA(MCInst *MI, unsigned OpNum, unsigned Val) { if (MI.isPredicable()) { // The isPredicable method call was not translated. MCOperand_CreateImm0(MI, Val + 1); } MCOperand_CreateImm0(MI, Val); } Capstone\u0026rsquo;s MCInst struct doesn\u0026rsquo;t have a callback member isPredicable(). So this code would not compile. Instead, we need to replace it with the function call MCInst_isPredicable(MI).\nFor whatever reason no Patch was added, and we now have to fix it by hand. Note though that the old file (see above) already has the correct function implementation. So instead of fixing it again by hand, we diff the previous function to the newly translated code and let the user decide what to do.\nPatch: 15/230 Node: \u0026#34;Some Node\u0026#34; +Color: NEW FILE - (Just translated) -Color: OLD FILE - (Currently in Capstone) ⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼ void decodeOperandA(MCInst *MI, unsigned OpNum, unsigned Val) { -\tif (MCInst_isPredicable(MI)) { +\tif (MI.isPredicable()) { MCOperand_CreateImm0(MI, Val + 1); } MCOperand_CreateImm0(MI, Val); } ═════════════════════════════════════════════════════════════════════════════════════════════════════════ Choice: O, o, n, s (none) , e, p, q, ? \u0026gt; ? O\t- Accept ALL old diffs o\t- Accept old diff n\t- Accept new diff e\t- Edit diff (not yet implemented) s (none) - Select saved choice p\t- Ignore and go to previous diff q\t- Quit (previous selections will be saved) ?\t- Show this help They can accept the version from the old file or accept the version from the new file. The version from the new file would not compile, but it can be fixes by hand later.\nIn most cases though, the old version is the correct one, because it was fixed by someone before.\nThe diffing happens for each translated function which doesn\u0026rsquo;t match the old code. Of course, you can not just diff functions but any nodes in the AST of a file. And for convenience the choices are saved as well. So if the update is run again and nothing changed, the user doesn\u0026rsquo;t have to redo previous decisions. It just automatically applies them.\nThis diffing step saves a lot of time.\nAdding new architectures Adding a new architecture module works pretty much the same as above.\nGenerally though it gives us a standardized way of doing it. And if you know one architecture module in Capstone, you know them all. If LLVM doesn\u0026rsquo;t support your architecture you maybe find a fork which does (this is the case for TriCore or EVM).\nIn fact, we added two niche architectures this way. TriCore was only implemented in a fork and never upstreamed. And the DEC Alpha architecture support was dropped in LLVM 4. We just added the td files again, and here we go, we have support for Alpha in Capstone.\nLast overview To give you a last overview what components were updated and how they interact in Capstone, take a look at this diagram:\nARCH_LLVM_getInstr( ARCH_getInstr(bytes) ┌───┐ bytes) ┌─────────┐ ┌──────────┐ ┌──────────────────────►│ A ├──────────────────► │ ├───────────►│ ├────┐ │ │ R │ │ LLVM │ │ LLVM │ │ Decode │ │ C │ │ │ │ │ │ Instr. │ │ H │ │ │decode(Op0) │ │◄───┘ ┌────────┐ disasm(bytes) ┌──────────┴──┐ │ │ │ Disass- │ ◄──────────┤ Decoder │ │CS Core ├──────────────►│ ARCH Module │ │ │ │ embler ├──────────► │ State │ └────────┘ └─────────────┘ │ M │ │ │ │ Machine │ ▲ │ A │ │ │decode(Op1) │ │ │ │ P │ │ │ ◄──────────┤ │ │ │ P │ │ ├──────────► │ │ │ │ I │ │ │ │ │ │ │ N │ │ │ │ │ └───────────────────────┤ G │◄───────────────────┤ │◄───────────┤ │ └───┘ └─────────┘ └──────────┘ The Capstone Core, Arch Module and Arch_Mapping provide the API to the LLVM disassembler logic. We have not spoken about those because they are irrelevant for the topic of generating disassemblers.\nThe two boxes on the right, are the code copies from LLVM, which do the actual decoding work. The LLVM Disassembler component decodes single operands and handles special cases. This one was translated by the CppTranslator. While the LLVM Decoder State Machine was generated by our patched LLVM backends.\nThe same structure applies to the printing of the asm text. Though we have only scratched this here for the sake of brevity.\nWrap up If one looks at the whole update procedure, it is still a rather complicated. But the result is worth it.\nThe amount of time someone has to spend for updating an architecture module in Capstone went down from \u0026ldquo;no one did it\u0026rdquo; to roughly 6-29 hours. To update the ARM architecture module to LLVM 16 for example, the times were:\nRebasing patched backends to new LLVM release = ~1-3h Running the update scripts and diffing = 5min - 1h Fixing rest of build errors by hand = ~30min - 5h Handle new operands on the CS side (filling the detail info and tests) - 3-10h Bug fixing - 2h-10h (Please be aware though, that the time estimates from above don\u0026rsquo;t include the \u0026ldquo;read into\u0026rdquo; time someone has to spend).\nThe lower estimates are for small changes on the LLVM side, the upper for many changes (spanning multiple LLVM releases).\nHere is process described above in action:\nFuture plans We have a list of features and architectures which will come. Most architectures are already updated, but ARC, BPF and SPARC are still on the list.\nWhile working on the updater we found many shortcomings or flaws in the target definitions in LLVM. Those will be upstreamed to LLVM eventually.\nIn the very long run we would like to participate with the LLVM folks in redesign the TableGen backends. It would be nice to have the problems solved, which we mentioned above.\nAnd of course, if you want to update now an already present Capstone module or add support for a new one, feel free to drop a message in issue #2015. We are happy to hear about it and will guide you through the process.\nReferences Auto-Sync progress issue Auto-Sync documentation TableGen documentation Capstone\u0026rsquo;s LLVM fork ","permalink":"https://rizin.re/posts/auto-sync/","summary":"Auto-Sync","title":"Auto-Sync - Generating disassembler plugins"},{"content":"Hi! I\u0026rsquo;m Billow, and I had the privilege of participating in GSoC 2023, working on improving DWARF support for the Rizin project. In this blog post, I\u0026rsquo;m excited to share my journey, the challenges I faced, and my future plans for this project. Let\u0026rsquo;s dive right in!\nOver the past few months, my primary focus has been on enhancing the Debugging With Arbitrary Record Formats (DWARF) support within Rizin. DWARF is a crucial standard for debugging information in binary files. My work brings significant improvements, including the introduction of exprloc, compressed debug sections and composite variable storage.\nTo showcase some of my achievements, I\u0026rsquo;m comparing the disassembly output obtained using the pdf command for the write_fmt\u0026lt;Write\u0026gt; function in the ELF file dwarf_rust_bubble ↗ before and after my DWARF contributions were integrated. The enhanced output demonstrates Rizin\u0026rsquo;s improved ability to parse DWARF debugging information and precisely locate variables.\n[0x00005180]\u0026gt; pdf @ dbg.write_fmt_Write ... ┌ dbg.write_fmt\u0026lt;Write\u0026gt;(); │ ; var int64_t var_28h @ stack - 0x28 │ ; var int64_t var_18h @ stack - 0x18 │ 0x00010270 push rbx ; impls.rs:155 ; struct Result\u0026lt;(), std::io::error::Error\u0026gt; write_fmt\u0026lt;Write\u0026gt;(struct Box\u0026lt;Write\u0026gt; *self, struct Arguments fmt); ... [0x00005180]\u0026gt; pdf @ dbg.write_fmt_Write ... ┌ struct Result\u0026lt;(), std::io::error::Error\u0026gt; write_fmt\u0026lt;Write\u0026gt;(struct Box\u0026lt;Write\u0026gt; *self, struct Arguments fmt) ... │ ; arg struct Box\u0026lt;Write\u0026gt; *self @ rsi │ ; arg struct Arguments fmt @ ... │ 0x00010270 push rbx ; impls.rs:155 ; struct Result\u0026lt;(), std::io::error::Error\u0026gt; write_fmt\u0026lt;Write\u0026gt;(struct Box\u0026lt;Write\u0026gt; *self, struct Arguments fmt) ... Another example, the iterPreorder function in the ELF file dwarf_go_tree ↗ shows arguments and local variables precisely located on the stack. Most notably, the tree argument is represented as a composite variable spread across multiple stack locations. This new composite storage capability handles complex DWARF types. Overall, the improved output matches the original DWARF debugging data much more closely, rather than using generic unknown types. This showcases Rizin’s significantly upgraded DWARF parsing including features like composite variables and the immense value delivered to reverse engineers through my work.\n[0x0045d5a0]\u0026gt; pdf @ dbg.main.tree.iterPreorder ... ┌ dbg.main.tree.iterPreorder(unknown_t visit, unknown_t t); ... │ ; arg unknown_t t @ stack + 0x10 │ ; arg unknown_t visit @ stack + 0x20 │ ; var unknown_t traverse @ stack + 0x40 │ ┌─\u0026gt; 0x00491ce0 mov rcx, qword fs:[0xfffffffffffffff8] ; tree.go:26 ; void main.tree.iterPreorder(struct main.tree t, func(int) visit); │ ╎ 0x00491ce9 cmp rsp, qword [rcx + 0x10] [0x0045d5a0]\u0026gt; pdf @ dbg.main.tree.iterPreorder ... ┌ void main.tree.iterPreorder(main.tree t, func(int) visit) ... │ ; var func(int) traverse @ stack - 0x40 │ ; arg main.tree t @ composite: [(.0, 64): stack + 0x8, (.0, 64): stack + 0x10, (.0, 64): stack + 0x18] │ ; arg func(int) visit @ stack + 0x20 │ ┌─\u0026gt; 0x00491ce0 mov rcx, qword fs:[0xfffffffffffffff8] ; tree.go:26 ; void main.tree.iterPreorder(main.tree t, func(int) visit) While I\u0026rsquo;m proud of these accomplishments, the project did not come without its challenges. Working on a project of this magnitude required grappling with the complexity of the DWARF5 standard and ensuring compatibility across architectures and binary formats through rigorous testing. Additionally, collaborating remotely with the Rizin community was essential but posed communication and coordination challenges.\nHowever, through dedication and guidance from my mentors, I was able to overcome these hurdles. Moreover, my journey with Rizin is far from over. Looking ahead, there are several exciting plans on the horizon:\nContinued DWARF5 Improvements: I will keep an eye on DWARF developments and ensure Rizin remains up-to-date with future revisions.\nPerformance Optimization: There is always room for performance optimization. I will explore ways to make Rizin even more efficient when dealing with DWARF information. The main idea is that we can make DWARF load only when needed, instead of loading all DWARF directly.\nUnifying Debug Information 1: As the reverse engineering landscape continues to evolve, the need for unified support of various debuginfo formats like DWARF, PDB, and others becomes increasingly evident. In the spirit of unification, I am excited to take on the challenge of integrating and harmonizing these diverse debuginfo standards within Rizin. This ambitious endeavor aims to provide a seamless experience for developers and analysts working with different binary formats, making Rizin an even more versatile and indispensable tool in the field of reverse engineering. Stay tuned for updates on this exciting journey towards unified debuginfo support!\nDWARF Call Frame Information: To utilize Call Frame Information (CFI) and Canonical Frame Address (CFA) data to accurately locate variables and function arguments on the stack. This will build on my previous work enhancing DWARF parsing as described in issues 2 and 3. Additional background on implementing stack unwinding with CFI and CFA can be found in this blog post 4. ↗ The goal is to leverage the debugging information already present in DWARF to reconstruct calling conventions and provide users more precise variable information during disassembly and analysis.\nIn conclusion, participating in GSoC 2023 has been an invaluable learning experience. I\u0026rsquo;ve expanded my skills, contributed to the Rizin project, and become part of a vibrant open-source community. There is still work to be done, but I\u0026rsquo;m excited about the future and making reverse engineering more efficient for everyone through enhancements like unified debuginfo support. Thank you to the Rizin community and my mentors for making this journey possible!\nUnify code of source information access for DWARF, PDB, dSYM\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nLoad function types and arguments from DWARF when CFA and CFI information is used\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nARMv7 failure to load register arguments when subroutine uses CFA\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nStack unwinding\u0026#160;\u0026#x21a9;\u0026#xfe0e;\n","permalink":"https://rizin.re/posts/gsoc-2023-dwarf/","summary":"In this article, I discuss my experience enhancing DWARF support in Rizin for Google Summer of Code 2023.","title":"GSoC 2023 - Enhancing DWARF Support"},{"content":"Like all previous years, we are grateful to Google for being able to participate in Google Summer of Code 2023. We received many applications, and we are happy that the project has substantial interest. We thank every participant and wish them luck in their future endeavors. We also thank Google for providing us with the platform for attracting new contributors. Many of the past participants stayed with the project after previous GSoC and RSoC programs, and we sincerely hope it will continue.\nThis summer, the accepted projects aim to improve the quality of handling debugging information in Rizin and uplifting more architectures to our next-generation intermediate language - RzIL.\nbillow: Debug information handling improvements Hello. I’m billow. A silent open-source enthusiast. I will be working on improving the handling of debugging information in Rizin.\nI started contributing to Rizin in November 2021. It took some time to get to know the code base, but the maintainers were helpful and supportive. Without their help, I wouldn\u0026rsquo;t have accomplished this. Since November 2021, I have worked on various issues and features, like uplifting 8051 architecture to RzIL, refactoring graph-related code, porting the commands parser to the tree-sitter-based one, and fixing windows and cpp compatibility, EBCDIC character support. I am also working on uplifting TriCore architecture to RzIL (#3478).\nThe plan, based on the project description and tracking issue #1285, involves several key objectives. Firstly, we will enable support for loading DWARF information from separate files 1 and debuginfod 2. Next, we will unify access to source lines/types information for DWARF, PDB, dSYM, and refactor or fix any parsing code as needed. This will be followed by the integration of source line and types/variables information with the \u0026ldquo;p\u0026rdquo; commands in debug mode for seamless printing. Furthermore, we will integrate this information with breakpoint commands and APIs to enhance overall functionality. Finally, we will implement parsing performance improvements to optimize the project\u0026rsquo;s efficiency.\nbrightprogrammer: Uplifting MIPS to RzIL Hi! I’m Siddharth Mishra (a.k.a brightprogrammer). I\u0026rsquo;m a student of Mathematics \u0026amp; Computing at the Birla Institute of Technology, Mesra. I like developing software in C/C++, and I love Reverse Engineering \u0026amp; Malware Analysis. My summer project this year is to convert the MIPS assembly code to RzIL (Rizin\u0026rsquo;s Intermediate Language). This conversion will help improve Rizin\u0026rsquo;s analysis module, which can help enhance reverse engineering efforts.\nI started contributing around two months before the application submission deadline. I helped convert the old rizin shell to a new tree-sitter-based rizin shell for a command group named cmd_help (PR#3421 and PR#3452). I\u0026rsquo;ve never had a formal course in Compilers and Formal Languages/Grammars, and this was the first time I saw how grammars are written! I was amazed by this. I also got a chance to write some tests for some changes I did. This was also a first-time experience. During this work period, I started to have a good bonding with my mentors and other awesome contributors/maintainers.\nThe uplifting process is divided into two parts :\nThe first part is to convert MIPS to RzIL. The second part uses uplifted code to migrate analysis from ESIL (old intermediate language) to RzIL. By improving analysis, I mean better function detection, type detection, structures\u0026rsquo; detection, better control flow graphs etc.\nOther small tasks are involved, too, like adding support in Rizin to help visualize RzIL and to update/implement cutter widgets to use the new RzIL code. Each of these steps needs to be heavily tested by updating and using rz-tracetest, which is used to generate instruction traces using BAP\u0026rsquo;s QEMU and then compare the trace with RzIL\u0026rsquo;s execution. If they match, and if we strongly assume that Qemu is emulating the code correctly, then RzIL is also working correctly!\nI\u0026rsquo;m starting work early to complete the project on time. I want to work hard this summer and learn a lot. Besides, working on this project will improve my knowledge of IL which I need for a personal project that I intend to work on, which will act as a personal motivation for me.\nDWARF is the debuginfo standard used on most operating systems today. Separate debuginfo files are files that contain debugging information extracted from executable binaries and shared libraries. These files help developers debug and diagnose issues in their software without bloating the primary executable or library with debug symbols. Debuginfo files typically have a .debug extension and are generated using tools like \u0026lsquo;objcopy\u0026rsquo; or \u0026rsquo;eu-strip\u0026rsquo;. They enable a more apparent separation between production code and debugging data, resulting in smaller binaries and improved performance in production environments, while still providing necessary debugging information when needed.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nDebuginfod is a service that provides a convenient way to access debugging information for software components, such as executables, shared libraries, and separate debuginfo files. It is part of the ELFutils project and is designed to simplify the process of debugging and tracing software by automatically locating and retrieving the required debugging data over HTTP. Debuginfod works with various debugger tools like GDB, LLDB, and SystemTap, allowing developers to focus on debugging their code without worrying about managing debuginfo files.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\n","permalink":"https://rizin.re/posts/gsoc-2023-announcement/","summary":"An announcement of the Google Summer of Code 2023. Two accepted candidates.","title":"Google Summer of Code 2023 Announcement"},{"content":"Hi! I\u0026rsquo;m DMaroo, a GSoC 2022 mentee, working on IL migration for x86 ISA in Rizin. For the past few months, I have worked on implementing x86 instructions (from 8086, 80186, 80286 and 80386 instruction sets) in Rizin\u0026rsquo;s intermediate language, RzIL.\nThe following article covers all the work, design decisions, challenges and future plans of the work that I\u0026rsquo;ve been doing. The RzIL can be accessed using aez commands (RzIL emulation).\nRelevant code Implementation of RzIL lifting for x86 instructions:\nPull request: rizinorg/rizin#2747 Commit: ce80a13 Files: x86_il.c, x86_il.h Adding tracing for x86 emulation in BAP\u0026rsquo;s QEMU: BinaryAnalysisPlatform/qemu#21\nAbstract RzIL is Rizin\u0026rsquo;s intermediate language. It is designed for improved analysis of binaries by serving as an intermediate language to run the analysis loop on. This removes the need to write architecture-specific analysis code. Having an intermediate language also allows for many other features like taint analysis, symbolic execution and de-obfuscation.\nMy goal for the project was to \u0026ldquo;lift\u0026rdquo; the x86 architecture to RzIL. Lifting here means implementing the x86 instructions from the x86 ISA using opcodes present in the IL. The IL is largely based on BAP\u0026rsquo;s Core Theory.\nThroughout my GSoC period, I have lifted x86 instructions for majority of the instructions belonging in the 8086, 80186, 80286 and 80386 instruction sets. These include almost all the commonly used instructions one can find in modern x86 binaries, and hence would be enough to do fairly satisfactory analysis. I outline my work in detail below.\nWork Getting instructions from Capstone Rizin uses Capstone for the disassembly of some of the instruction sets, including x86. So I had to start off by figuring out all the information that Capstone gives about an instruction. Capstone returns a cs_x86 struct which contains the instruction bytes, operands, prefixes and other data. I wrapped it in a X86ILIns struct for convenience. Once I had the wrappers ready, I had to write general helper functions for generating the opcodes for variety of tasks like for getting and setting the operands, registers, memory, flags, and arithmetic overflow and so on.\nSetting up the helper functions I have conceptually outlined the rough thought process for the IL lifting in one of the posts on my blog about SuperH IL lifting. Following a similar design process, I set up the functions to get and set the various entities involved. I also set up th functions for overflow/underflow and carry/borrow. More about the challenges faced during setting up these functions in the challenges section. Once these were setup, the lifting was just reading the implementations off of the ISA manual and translating them to the IL.\nImplementing the instructions This is probably the main part of the whole project. Now, I was supposed to go through the instruction manual\u0026rsquo;s implementation for all the instructions and convert them to the IL. I could not however just blindly copy them, since the x86 instruction set frequently contains instructions with very weird special cases. Also, the x86 architecture is not simple, and contains multiple modes of operation, and also has segmentation and paging support. All of this makes it non-trivial to implement the instructions. Many of the instructions are not practically possible to implement given the current scope of the IL and the disassembler.\nHowever, I did implement majority of the instructions in the 8086, 80186, 80286 and 80486 instruction set. That constitutes the meat of all the instructions used in x86 binaries. There are much more very specific and exotic instructions, but their relevance is diminishingly low. A sample implementation looks as follows:\n/** * ADD dest, src * (ADD family of instructions) * Add * dest = dest + src * Possible encodings: * - I * - MI * - MR * - RM */ IL_LIFTER(add) { RzILOpEffect *op1 = SETL(\u0026#34;op1\u0026#34;, x86_il_get_op(0)); RzILOpEffect *op2 = SETL(\u0026#34;op2\u0026#34;, x86_il_get_op(1)); RzILOpEffect *sum = SETL(\u0026#34;sum\u0026#34;, ADD(VARL(\u0026#34;op1\u0026#34;), VARL(\u0026#34;op2\u0026#34;))); RzILOpEffect *set_dest = x86_il_set_op(0, VARL(\u0026#34;sum\u0026#34;)); RzILOpEffect *set_res_flags = x86_il_set_result_flags(VARL(\u0026#34;sum\u0026#34;)); RzILOpEffect *set_arith_flags = x86_il_set_arithmetic_flags(VARL(\u0026#34;sum\u0026#34;), VARL(\u0026#34;op1\u0026#34;), VARL(\u0026#34;op2\u0026#34;), true); return SEQ6(op1, op2, sum, set_dest, set_res_flags, set_arith_flags); } And the generated opcode looks something like this:\nadd byte [eax], al (seq (set op1 (loadw 0 8 (+ (var eax) (bv 32 0x0)))) (set op2 (cast 8 false (var eax))) (set sum (+ (var op1) (var op2))) (storew 0 (+ (var eax) (bv 32 0x0)) (var sum)) (set _result (var sum)) (set _popcnt (bv 8 0x0)) (set _val (cast 8 false (var _result))) (repeat (is_zero (var _val)) (seq (set _popcnt (+ (var _popcnt) (ite (lsb (var _val)) (bv 8 0x1) (bv 8 0x0)))) (set _val (\u0026gt;\u0026gt; (var _val) (bv 8 0x1) false)))) (set pf (is_zero (smod (var _popcnt) (bv 8 0x2)))) (set zf (is_zero (var _result))) (set sf (msb (var _result))) (set _result (var sum)) (set _x (var op1)) (set _y (var op2)) (set cf (|| (|| (\u0026amp;\u0026amp; (msb (var _x)) (msb (var _y))) (\u0026amp;\u0026amp; (! (msb (var _result))) (msb (var _y)))) (\u0026amp;\u0026amp; (msb (var _x)) (! (msb (var _result)))))) (set of (|| (\u0026amp;\u0026amp; (\u0026amp;\u0026amp; (! (msb (var _result))) (msb (var _x))) (msb (var _y))) (\u0026amp;\u0026amp; (\u0026amp;\u0026amp; (msb (var _result)) (! (msb (var _x)))) (! (msb (var _y)))))) (set af (|| (|| (\u0026amp;\u0026amp; (msb (cast 4 false (var _x))) (msb (cast 4 false (var _y)))) (\u0026amp;\u0026amp; (! (msb (cast 4 false (var _result)))) (msb (cast 4 false (var _y))))) (\u0026amp;\u0026amp; (msb (cast 4 false (var _x))) (! (msb (cast 4 false (var _result)))))))) As you can see, the IL is quite large even for a simple instruction like ADD. This is because of all the extra work which needs to be done other than just adding the operands. The operand needs to be loaded from the proper memory location (which requires using the correct segment base register, correct scale and correct offset). Once the addition is done, the flags need to be set. Setting the flags is not a simple operation since it requires finding the parity bit (which requires XORing the bits in a loop). Once all the flag bits are set, the result is written back to the memory.\nAs of this post, 100+ such instructions have been lifted to the IL. There are more instructions to be lifted as well, but as I stated above, these are enough for a start and testing.\nOnce I was done with implementing the instructions, I added IL tests for the implemented instructions for the tests in tests/db/asm/x86_*. This part ensures that the generated IL code for the instructions is type-safe and doesn\u0026rsquo;t have any memory issues.\nEnabling tracing in QEMU Now to verify the semantics of the IL instructions, we use traces generated by QEMU and compare them with the effects of teh IL when executed by the RzIL VM. We use a fork of QEMU, so that we can add the tracing code and modify QEMU source if needed. To add the tracing code, I had to familiarize myself with QEMU\u0026rsquo;s code, and specifically the TCG (Tiny Code Generator). Currently, adding the tracing has been almost done. However, there are just some very minor issues which need to be resolved.\nChallenges This is more of an interesting-stuff kinda section ;)\nRegisters Choosing the correct registers without a chain of if-else statements or switch-case statements was one of the initial issues I faced. Not only do I have to resolve to the correct register, I should also be able to store and load from the same global IL variable when I reference different sections of a register (i.e. AL, AX, EAX and RAX overlap over the same global IL variable). Along with this, I should also have the ability to choose the largest register of a certain bitness (for example, RAX for 64-bit, EAX for 32-bit and AX for 16-bit) without using switch-case statements. I decided to do this using a statically defined array of registers and their get and set functions (gpr_lookup_table, array of struct gpr_lookup_helper_t). This lead to no sort of if-else or switch-case constructs, and I could reuse the same functions for getting and setting the same sections o a register.\nOperands Also, there had to be a general interface for accessing the operands, irrespective of whether they were a register, or a memory location, or an immediate value. Writing the helper functions is easy, but providing ease of usage and removing duplication code requires thought. I have also slightly touched upon this in the SuperH IL lifting post on my blog.\nIL implementation The descriptions in the x86 manual for some instructions get pretty complex and intense. Making sure that they are implemented correctly is not easy. This is exactly why we have tracetesting to verify the semantics, but manual verification never hurts. Also, the IL gets pretty large for many instructions, and to avoid this, it is important to choose a semantically correct, yet concise implementation. This involves removing unnecessary casting, using loops, removing duplicating and so on. In fact, removing duplication by using variables brings down the IL code size by a lot.\nQEMU QEMU\u0026rsquo;s codebase is a great for learning purposes. It consists of complex C constructs (tons of non-trivial preprocessor macros), but at the same time it is quite well-written, and also reasonably well documented. However, debugging the codebase is not that easy, since the code generates a buffer (code_gen_buffer), which then executes more code.\nFuture Work The work I have done in my duration is just the core part of the lifting. The lifting is far from being stable and ready-to-use. It is more of an pre-alpha release than a finished feature. The next steps will be as follows.\nTracetest: Thoroughly tracetest the implementations to make sure the semantics are right. API: Add Rizin commands and API, which would be wrappers around the IL, so as to visualize and access the IL. A similar graphical widget should also be added to Cutter. Documentation: Document the IL and its usage in the Rizin book. Analysis: Integrate the IL in the analysis loop to lead to a better analysis. Instructions: Add lifting for more x86 instructions. The last two options haven\u0026rsquo;t been thoroughly thought out by me yet, but they do seem to be reasonable future goals. Alongside, there are also plans on lifting other architectures to RzIL and extending RzIL by adding floating point and transcendental operations support.\nClosure I would say that I had a very educative experience throughout the period. I learnt quite a lot about the x86 architecture and the instruction set in the past few months. I was not able to meet all my goals for my GSoC project, but I think the work done until now is a good enough checkpoint and I plan to continue working on this.\nI would like to thank my mentors, Anton Kochkov, Deroad and Florian Märkl, for all the guidance they have provided me throughout my project. I look forward to keep working with them :)\n","permalink":"https://rizin.re/posts/gsoc-2022-x86-il/","summary":"A summary of all the work done on RzIL lifting for x86 ISA, and how you can use it.","title":"GSoC 2022 - x86 ISA lifting for RzIL"},{"content":"Hello. I\u0026rsquo;m wingdeans, a participant of GSoC 2022 with Rizin. For the past few months, I\u0026rsquo;ve been working on creating rz-bindgen - a framework for making Rizin scriptable from other languages.\nThis document covers some of the design decisions and internals of the tool. To get started with the bindings, see the usage documentation.\nRationale Rz-pipe, the currently recommended way to script Rizin, only works with commands exposed to the Rizin shell. Although it can do everything the Rizin shell can, it cannot match the full Rizin C API in performance, feature-completeness, or type guarantees. The C API on the other hand, is more difficult to work with, especially for one-off scripts. Rz-bindgen seeks to be a middle-ground, making the C API accessible from other programming languages. Python is the primary target for rz-bindgen, as it is usable for both scripts and plugins, and has been incorporated successfully in other reverse-engineering tools.\nDesign Many languages already have tools for creating bindings to C/C++, such as rust-bindgen for Rust or CLIF for Python. However, these tools often rely on mapping C++ constructs to their own, and require extra work to create idiomatic bindings for plain C code. Like many of these tools, rz-bindgen parses C headers and generates bindings as output. However, rz-bindgen targets one project and multiple languages, rather than one language and multiple projects. This allows rz-bindgen to make use of Rizin-specific annotations, such as the RZ_NULLABLE and RZ_DEPRECATE C macros.\nSee this post on the Rizin blog for more details on the thought process behind my proposal and my implementation ideas from before I started the task.\nImplementation I considered my primary options for parsing the C headers to be tree-sitter and libclang. Even though I wrote about tree-sitter in the Rizin GSoC announcement blogpost, the integrated preprocessor and semantic analysis led me to choose libclang\u0026rsquo;s Python bindings.\nC Structs and Functions Once a header is parsed, C data structures are grouped with functions that operate on them. In this snippet from rz-bindgen, the RzAnalysis struct from the rz_analysis.h header is grouped with functions that have the rz_analysis_ prefix. In the generated Python bindings, these groupings are mapped to object-oriented classes, with the RzAnalysis class containing the grouped functions as its methods. The RzAnalysis class also makes all the fields of the C struct accessible except for leaddrs (which is ignored as per the ignore_fields argument) and type_links (which is renamed as per the rename_fields argument).\nrz_analysis = Class( analysis_h, typedef=\u0026#34;RzAnalysis\u0026#34;, ignore_fields={\u0026#34;leaddrs\u0026#34;}, rename_fields={\u0026#34;type_links\u0026#34;: \u0026#34;_type_links\u0026#34;}, ) rz_analysis.add_method(\u0026#34;rz_analysis_reflines_get\u0026#34;, rename=\u0026#34;get_reflines\u0026#34;) rz_analysis.add_prefixed_methods(\u0026#34;rz_analysis_\u0026#34;) rz_analysis.add_prefixed_funcs(\u0026#34;rz_analysis_\u0026#34;) Generation Rz-bindgen is designed to support multiple backends to generate bindings for a variety of languages. A backend takes the Class objects created in the transformation step and generates output. There are, at the time of writing, a SWIG backend and a Sphinx backend.\nThe SWIG backend is currently only used for Python bindings, but SWIG targets other languages too, such as Java and OCaml. Supporting them in rz-bindgen should be relatively simple. The Sphinx backend generates documentation for the Python bindings and can be viewed here.\nGenerics One of the main challenges in translating the C headers was the existence of generic container types. Rizin uses types like RzList and RzVector to represent a linked-list and dynamic array respectively and, being written in C, uses void* for the type of the data contained within. This means that trying to use these types from Python would be difficult, as their elements lack the type information to generate methods. Fortunately, Rizin developers were already annotating the types of these functions for developer ergonomics using comments such as RzList /*\u0026lt;RzAnalysisBlock *\u0026gt;*/ *bbs.\nThis allows bindings to use container types in a type-safe manner. In this Python example from rz-bindgen, a specialized RzList_RzBinSymbol is created, and RzBinSymbols are appended to it. Appending any other type will result in an error.\nsyms = rizin.RzList_RzBinSymbol() for sym in self.loader.main_object.symbols: binsym = rizin.RzBinSymbol() binsym.thisown = False binsym.name = sym.name binsym.type = rizin.RZ_BIN_TYPE_FUNC_STR binsym.paddr = sym.linked_addr binsym.vaddr = sym.rebased_addr binsym.size = sym.size syms.append(binsym) Additional Features The snippet above is from an example of implementing an RzBinPlugin in Python. See the bin_plugin documentation for more details.\nThe Python bindings also make it easier to access Rizin internals when writing scripts, as can be seen in the rz_cmd example (see the cmd documentation for more details). One key feature is the ability to register a Rizin command backed by a Python function, like so:\ndef print_function_info(fn: rizin.RzAnalysisFunction): print(\u0026#34;name:\u0026#34;, fn.name) print(\u0026#34;number of xrefs from:\u0026#34;, len(fn.get_xrefs_from())) print(\u0026#34;number of xrefs to:\u0026#34;, len(fn.get_xrefs_to())) return True core.register_group(\u0026#34;u\u0026#34;, \u0026#34;A custom group for user-defined commands\u0026#34;) core.register_command(\u0026#34;uf\u0026#34;, print_function_info) The Rizin plugin registers Python as an RzLang, allowing users to load Python scripts on the fly. It also adds a core variable to the rizin Python module, allowing scripts that import it to access Rizin\u0026rsquo;s own RzCore.\nReflections The coverage of the bindings is currently lacking - it is not yet possible to use every bit of the C API. I hope this will change as I get more eyes on the project. I also hope to improve the Rizin plugin and finalize the Cutter plugin.\nIn the long term, I hope to add bindings for extensions such as rz-ghidra which expose their functions. This could allow access to Ghidra\u0026rsquo;s P-Code and decompiler once implemented.\nI would like to thank my GSoC mentors XVilka and megabeets, as well as Rizin core contributors ret2libc and deroad.\nIf you need help with rz-bindgen or wish to build a project using the generated bindings, feel free to reach me on the Rizin mattermost @wingdeans (we have an IRC bridge too).\n","permalink":"https://rizin.re/posts/gsoc-2022-rz-bindgen/","summary":"An overview of rz-bindgen\u0026rsquo;s design, implementation, features, and future.","title":"GSoC 2022 - rz-bindgen"},{"content":"Linux If your distribution ships Rizin from the official repositories, use that. We are currently aware of the following Linux distributions shipping an up-to-date Rizin:\nArch Linux Fedora Gentoo If your distribution is not in the list above, but it does ship Rizin/Cutter, please let us know and we will fix it! If you cannot find Rizin/Cutter in the official repositories, we provide install instructions for some other distributions through OBS. Follow the instructions here (select the \u0026ldquo;Add repository and install manually\u0026rdquo; option).\nWindows You can install Rizin through the installer for your architecture provided in the latest release (e.g. rizin_installer-vX.Y.Z-x86_64.exe).\nOtherwise, you can download the portable builds that can be run without any installation on your system, by just extracting the archives in the path you want and executing Rizin from there (e.g. rizin-windows-share64-vX.Y.Z.zip).\nYou find Cutter for Windows in the latest Cutter release. The archive can be extracted anywhere on your system and Cutter can be executed from there.\nMacOS You can install both Rizin and Cutter through Homebrew\n$ brew install rizin $ brew install --cask cutter Alternatively, you can find Pkg/DMG files for both Rizin and Cutter.\nOpenBSD Rizin and Cutter are available in stable releases starting with OpenBSD-7.3.\n# pkg_add rizin cutter Android Statically compiled binaries for some common architectures where Android runs are compiled and attached to all releases. We currently support aarch64, armv7, and x86_64. You can find the artifacts for Android on the latest Rizin release.\nThose files are named as rizin-\u0026lt;version\u0026gt;-android-\u0026lt;architecture\u0026gt;.tar.gz. Files within the archive can be extracted anywhere on your Android device because Rizin is compiled in a \u0026ldquo;portable\u0026rdquo; way, allowing moving the whole directory anywhere.\nRizin also have a package on Termux, and can be installed using Termux package manager i.e, pkg:\npkg install rizin Building from source Source code for Rizin and Cutter can be downloaded from Github:\nRizin repository Cutter repository Build instructions can be found in the README.md files.\nInstall Rizin plugins To install Rizin plugins you can use our package manager, rz-pm, that will compile and install packages for you in the right locations.\nGet the latest version for your system here, make the file executable and you are good to go!\nThe list of currently supported plugins is available in the rz-pm-db repository.\n","permalink":"https://rizin.re/install/","summary":"How to install Rizin and Cutter","title":"Install"},{"content":"Google Summer of Code 2022 is here and we are excited to participate! 🎉. This is the 2nd year we participate in GSoC as a Rizin organization.\nWe received many applications, and we are happy that there is a substantial interest in the project. We thank every participant and wish them luck in their future endeavors. We also thank Google for providing us the platform for attracting new contributors. Many of the past participants stayed with the project after previous GSoC and RSoC programs, and we sincerely hope it will continue in the future.\nThis summer, the accepted projects aim to improve the quality of analysis of Rizin by employing our next generation intermediate language - RzIL, along with making the scripting and automation easier with Rizin and Cutter by improving the API, especially the Python one.\nDhruv: RzIL uplifting migration Hey, I\u0026rsquo;m Dhruv Maroo (DMaroo)! I am a computer science and engineering undergraduate student from Indian Institute of Technology, Madras. I am excited to work on RzIL migration of x86 architecture as a contributor under GSoC 2022.\nI have been interested in computer security ever since I started learning computer science and engineering. Out of all the various subdomains within security, binary exploitation and reverse engineering appealed to me the most. I was also fascinated with systems engineering and low-level programming. Recently, I had also been exposed to the relevance of intermediate languages in symbolic execution. Given these interests, working on RzIL under Rizin was a golden opportunity for me.\nI started contributing to Rizin in August 2021. It took a fair bit of time to get to know the code base, but the maintainers were very helpful and supportive. Without their help, I wouldn\u0026rsquo;t have accomplished this. Since August 2021, I have worked on a variety of issues and features, like project compression, seeking and autocompletion for global variables, breakpoint serialization, type pretty printing API, porting db (debug breakpoint), c (compare) and shell commands to newshell, fixing Coverity scan issues and memory leaks, RzIL refactoring, DWARF attribute type checking, and lots of other miscellaneous issues. I am also working on migrating the SuperH ISA to RzIL (#2518).\nThe RzIL uplifting project is a 350-hour project, spanning 5 months, starting from the second week of June. During the project, I plan to implement the 8086, 80186, 80286, 80386 and 80486 instructions from the x86 instruction set. Along with these, I will also be implementing Pentium and MMX instructions. This will be followed by testing the migration using rz-tracetest and getting the traces. I will also be adding API commands in Rizin to visualize and interface with the IL tree. These commands will be augmented with corresponding widgets in Cutter. Then, I am planning to document all of this in the Rizin book, to allow others to easily use it and contribute as well. One of the optional (but very interesting) goals is to use the power of RzIL in the binary analysis loop, to provide better analysis and new features. Other optional goals involve migrating SSE and AVX instructions to RzIL and migrating other architectures to RzIL as well.\nI am looking forward to start working on the project and improve Rizin. I would again like to thank the maintainers to help me through my contributions, and I would like to thank GSoC for giving me such a great learning opportunity.\nwingdeans: Automated Python Bindings Hello. I\u0026rsquo;m wingdeans, a second year Computer Science major at the University of Florida.\nI\u0026rsquo;ll be working on exposing the Rizin native API to Python, as an alternative to the current rz-pipe command-based API. This 175-hour project will involve semi-automatically generating Python bindings from the Rizin C headers, as well as abstractions and documentation to increase developer ergonomics.\nRationale Python has seen widespread adoption among the infosec and reversing communities, with several reverse engineering platforms integrating Python as a scripting and plugin language. Although rz-pipe exposes all the Rizin commands to a number of languages, including Python, the integration is not as robust as Rizin\u0026rsquo;s native C API.\nThe completion of this task will make scripting with Rizin more pleasant. It will also help Cutter, which supports Python GUI plugins.\nConsiderations The bindings should feel Pythonic. At the bare minimum, this will involve creating classes for the various structs in Rizin, and creating methods out of any C functions that manipulate those structs. Python features like keywordargs should also be used when appropriate.\nThe bindings should be somewhat automated. This will ensure that the bindings do not become out of sync with the Rizin core API. In addition, many Rizin functions already contain annotations about nullability (RZ_NULLABLE/RZ_NONNULL) and ownership (RZ_OWN/RZ_BORROW). Parsing this information will be useful in managing memory.\nRough Plans The generator will first parse the Rizin headers, perhaps using tree-sitter, and extract function information. It will then create classes and methods from a user-specified list of functions and emit source code to be integrated with additional manually-written modules.\nAbout Me I\u0026rsquo;ve been doing CTFs for about a year now, with a focus on cryptography and reversing. For reversing challenges, I primarily use Cutter, which was why I ended up applying here for this year\u0026rsquo;s GSoC. I’m interested in enhancing Rizin’s scripting capabilities for use in custom reversing tools and plugins.\nFor my microtask, I\u0026rsquo;m implementing support for dotnet binaries. Although the overall file format is the same as in PE files, there are additional headers, streams, and tables that need to be parsed. I also ended up writing a disassembler plugin, and an analysis plugin is in the works.\nI look forward to working with the Rizin team to implement a better scripting API, and I hope these additions will encourage users to extend Rizin and Cutter for their own reversing needs.\n","permalink":"https://rizin.re/posts/gsoc-2022-announcement/","summary":"An announcement of the Google Summer of Code 2022. Two accepted candidates.","title":"Google Summer of Code 2022 Announcement"},{"content":"Rizin is an interactive command line tool and as such it provides a nice shell, where you can execute rizin-specific commands to perform all kinds of actions like analyzing functions, getting information from a binary, showing sections and symbols, and much more.\nIt has a lot of commands that you must know in order to use it properly and for this reason we believe its shell should be as powerful and as discoverable as possible. In this post we are going to talk a bit about its basics and what we have done in Rizin to improve even further, for both users and developers (skip the Background section if you just care about the latter!).\nBackground Since the creation of radare2, the framework Rizin originally emerged from, more and more commands were added to the list of supported actions, help messages were written to help users navigate those commands and various constructs were developed to make the language recognized by the shell more and more powerful (and complicated, at times!). It is possible to execute a command and temporarily switch the architecture defined in Rizin, temporarily seek to another address in the address space, iterate over defined sections, functions or symbols, create macro, aliases and much more.\nCommands are organized in a tree, where each letter represents a node in that tree. For example, a has several sub-commands like aa, ab, af, ah, ao, av, etc. af groups yet other sub-commands, like afr, af+, af-, afl, afv, etc. You get the idea. This tree can be explored by users thanks to the ? suffix, which is used as a way to get help about commands.\n[0x00000000]\u0026gt; afvb? Usage: afvb [idx] [name] ([type]) | afvb list base pointer based arguments, locals | afvb* same as afvb but in r2 commands | afvb [idx] [name] ([type]) define base pointer based arguments, locals | afvbj return list of base pointer based arguments, locals in JSON format | afvb- [name] delete argument/locals at the given name | afvbg [idx] [addr] define var get reference | afvbs [idx] [addr] define var set reference ? can be used alone to get the sub-commands of the root node.\n[0x00000000]\u0026gt; ? Usage: [.][times][cmd][~grep][@[@iter]addr!size][|\u0026gt;pipe] ; ... Append \u0026#39;?\u0026#39; to any char command to get detailed help Prefix with number to repeat command N times (f.ex: 3x) | %var=value alias for \u0026#39;env\u0026#39; command | *[?] off[=[0x]value] pointer read/write data/values (see ?v, wx, wv) | (macro arg0 arg1) manage scripting macros | .[?] [-|(m)|f|!sh|cmd] Define macro or load r2, cparse or rlang file | ,[?] [/jhr] create a dummy table import from file and query it to filter/sort | _[?] Print last output | =[?] [cmd] send/listen for remote commands (rap://, raps://, udp://, http://, \u0026amp;lt;fd\u0026gt;) [output truncated] | t[?] types, noreturn, signatures, C parser and more | T[?] [-] [num|msg] Text log utility (used to chat, sync, log, ...) | u[?] uname/undo seek/write | v panels mode | V visual mode (Vv = func/var anal, VV = graph mode, ...) | w[?] [str] multiple write operations | x[?] [len] alias for \u0026#39;px\u0026#39; (print hexadecimal) | y[?] [len] [[[@]addr Yank/paste bytes from/to memory Usually the letter of the group is representative of what the sub-commands do. So p stands for \u0026ldquo;print\u0026rdquo; and pd stands for \u0026ldquo;print disassembly\u0026rdquo; and so on. After some time, it becomes much easier to navigate this structure.\nAs mentioned above, you can create statements that in one way or another modify the behaviour of a command (or multiple ones). Suppose you are analyzing a x86-32bit binary, but you know that some pieces of code will actually be executed in 64bit mode, you could do:\n[0x00006b60]\u0026gt; s 0x6c00 [0x00006c00]\u0026gt; e asm.bits=64 [0x00006c00]\u0026gt; pd 2 ;-- entry.fini0: 0x00006c00 f30f1efa endbr64 0x00006c04 803d75b60100. cmp byte [section..bss], 0 ; [0x22280:1]=0 [0x00006c00]\u0026gt; e asm.bits=32 [0x00006c00]\u0026gt; s 0x6b60 Or you can simply apply some modifiers to the pd command with pd 2 @b:64 @0x6c00. This statement will temporarily switch asm.bits and the current seek to execute pd 2 within the right context. There are also other kinds of statements that allow you to redirect the output/error of a command to a file, or that provides a foreach-like behaviour. For example, if you want to execute a command on all segments, you could do \u0026lt;cmd\u0026gt; @@iSS. Iterating over all basic blocks of all functions recognized by Rizin could be done with \u0026lt;cmd\u0026gt; @@b @@F. See @? and @@? for more info.\nImprovements in Rizin shell We rewrote from scratch most of the code dealing with parsing and handling of commands, moving from a simple and scattered approach to a more centralized and uniform one. We started this effort in radare2 with what was called cfg.newshell which since then became the default shell of Rizin and has improved even further. Some of the issues we had with the previous implementation and that led to the rewrite of this code: manually written command help strings inconsistently displayed, hand-written parser with a non clear formal and global grammar defined, impossibility of dynamically registering/deregistering commands in the shell, commands handlers code mixing core logic with input handling, manually implemented autocompletion of commands and partial autocompletion of arguments. These are just some of the things we solved.\nRizin keeps track of all commands that can be executed in its shell, their place in the commands tree, the brief summary of what they do, a possibly longer description, additional details that may be useful, the number of arguments a command accepts, their types and whether they are optional.\nThis information (and more) are stored by Rizin, making it possible to write code in a uniform way. For example, the list of sub-commands available under z? is automatically generated from the data Rizin has available. Autocompletion of commands can be easily performed as soon as a developer adds a new command. Argument autocompletion is straightforward as well, because when a developer adds a new command with its arguments (and types, which are mandatory), Rizin already has all the information to perform the autocompletion.\nAlso error reporting is more uniform, because when the shell detects a non-existing command it can immediately report that to the user. It is also possible to extend this behaviour to propose similar commands that a user wanted to type or provide help for the most similar command available. Similar errors can be reported when a user uses a command in the wrong way, for example by not providing enough arguments or by providing too many. Rizin, knowing how many arguments a command accepts, can immediately return an error (and this behaviour could be extended as well to provide the help of the command, to make life easier for users).\nAnother important aspect of having a database of commands which are available to the Rizin shell is that commands can be easily registered and deregistered at runtime by Rizin plugins (e.g. rz-ghidra). Of course you want to see the commands provided by a plugin only when that plugin is actually loaded. You also want your new command, defined by an external plugin, to behave similarly to the internal commands: you want the plugin commands and their arguments to be autocompleted and you want uniform error reporting. All this is possible because these operations are all automatically performed by the Rizin shell and not by each individual command.\nWe also recognize that sometimes having just a short summary of what a command does is not enough. You would want to have a much longer description of what operations the command performs, which config variables affect its behaviour, etc. This is why we added the possibility to provide a description to each command, that can be shown by using the ?? suffix.\n[0x00000000]\u0026gt; wv?? Usage: wv \u0026lt;value\u0026gt; # Write value as 4-bytes/8-bytes based on value Write the number passed as argument at the current offset as a 4 - bytes value or 8 - bytes value if the input is bigger than UT32_MAX, respecting the cfg.bigendian variable Examples: | wv 0xdeadbeef # Write the value 0xdeadbeef at current offset | wv2 0xdead # Write the word 0xdead at current offset | wv1 0xde # Write the byte 0xde at current offset [0x00000000]\u0026gt; Only a few commands have such description right now, but we very much welcome pull requests to improve our user documentation. If you want to help us, see the next section to know which files to touch.\nWe also changed the way commands and statements are parsed: instead of relying on a very simple hand-written parser, we switched to a tree-sitter based parser, where we just have to write a formal grammar and tree-sitter automatically generates the parser for us. This approach ensures that commands and their arguments are parsed in a consistent way. For example, all new commands accept quoted strings. Wrapping multiple words in single or double quotes would make the shell consider those words as a single argument for the command. If you need to pass ; as an argument of a command (semi-colons usually represent the separator between commands), you can just quote it or escape it with \\; and expect this to work for all (new) commands. Using tree-sitter and defining the grammar in a single file forced us to think about the grammar as a whole in a very uniform way across all commands and not just as the union of different grammars for each command.\nHow it works File rz_cmd.h has all the API to register, deregister, execute commands with a list of arguments and get help for a tree of commands. To add new commands, developers (of a plugin, for example) have to use rz_cmd_desc_argv_new by specifying the parent group in the commands tree, the handler of the command and a structure of type RzCmdDescHelp that describes the command: its summary, an optional longer description, a list of detailed sub-sections in the help and the list of arguments the command accept, including information on their types and whether they are optional or not.\nAs soon as the command is registered (e.g. a Core plugin with a command is loaded), the Rizin shell becomes aware of the new command: the command can now be executed, it is shown in the help tree in the right place, it is autocompleted as necessary, including its arguments. As an example, you can see rz-ghidra or jsdec.\nPlugin developers have to use all the C API and data structures mentioned above and more, included in the rz_cmd.h file. To avoid a lot of boilerplate code and make changes to commands easier also for non developers we thought about auto-generating the C structures like RzCmdDescHelp from a list of YAML files. Commands are all described in YAML files that mimics the final tree structure, like below:\n- name: tc summary: List loaded types in C format subcommands: - name: tc cname: type_list_c summary: List loaded types in C format with newlines args: - name: type type: RZ_CMD_ARG_TYPE_ANY_TYPE optional: true - name: tcc summary: Manage calling convention types subcommands: - name: tcc cname: type_cc_list summary: List all calling conventions modes: - RZ_OUTPUT_MODE_STANDARD - RZ_OUTPUT_MODE_LONG - RZ_OUTPUT_MODE_SDB - RZ_OUTPUT_MODE_RIZIN - RZ_OUTPUT_MODE_JSON args: - name: type type: RZ_CMD_ARG_TYPE_STRING optional: true - name: tcc- cname: type_cc_del summary: Remove the calling convention args: - name: type type: RZ_CMD_ARG_TYPE_STRING While building Rizin, these YAML files are used to automatically generate a .c and a .h file containing all the data structures and C API calls necessary to construct the commands tree as described by the developer.\nThis approach ensures that commands shown in the help are only those that can be executed and that commands that can be executed are listed in the help as well. In the past, due to help messages being just strings manually written for each command, it was too easy to forget to update an help message and ending with a hidden command or with a wrong help that referenced a command which did not exist anymore.\nAnother big change for developers is that commands are not implemented anymore in huge switch-cases like before, but each command handler has its own function with a signature similar to the main function of a C program, including argc/argv arguments. We believe this makes our codebase much cleaner and easier to understand, with short (and less indented) command handlers that just have to deal with the core logic of the command, without having to add boilerplate code just to parse/split arguments like it was done before.\nConclusion We think we are making big changes towards a more usable, discoverable and descriptive shell and although these changes required a lot of time, we have reached a good point. Rizin is now in a mixed state, with some commands still following the old behaviour and other commands being switched to the new way described in this blog post. We are porting new commands approximately every week, but any help is appreciated: you can provide more accurate and descriptive summaries/description to the already converted commands in https://github.com/rizinorg/rizin/tree/dev/librz/core/cmd_descs or you can help us port commands following the old structure to the new approach, so that they can benefit of everything that is explained here (look at #1342 to know which commands are missing). Have also a look at rzshell.md to know more about the shell.\nIf you have issues, bugs, ideas or want to discuss this approach or others with us, feel free to join us on Mattermost.\n","permalink":"https://rizin.re/posts/rzshell/","summary":"Rizin shell","title":"Rizin shell"},{"content":"Rizin Summer of Code 2021 Summary RSoC 2021 is officially finished and we are happy to congratulate both participants for passing the program and completing the most important parts of their tasks.\nBasstorm: Types analysis Hello, I am Basstorm. Over the past two months, I had a fulfilling summer as one of the participants of RSoC. The main subject of RSoC was to improve the Type module.\nAt first, I fixed several bugs in the new tree-sitter based type parser. The new type parser brings us the ability to parse a C type defined as a string. After that, I migrated the type constraints from RzAnalysis to the new RzType module, which makes the type constraints management easier.\n[0x00000530]\u0026gt; e analysis.types.constraint=true [0x00000530]\u0026gt; aaa [x] Analyze all flags starting with sym. and entry0 (aa) [x] Analyze function calls (aac) [x] Analyze len bytes of instructions for references (aar) [x] Check for classes [x] Type matching analysis for all functions (aaft) [x] Propagate noreturn information [x] Use -AA or aaaa to perform additional experimental analysis. [0x00000530]\u0026gt; s sym.range_small [0x0000063a]\u0026gt; pdf ; CALL XREF from main @ 0x720 / sym.range_small (int64_t arg1); | ; var int64_t var_14h { \u0026gt; 0x0 \u0026amp;\u0026amp; \u0026lt;= 0x9} @ rbp-0x14 ;constraint | ; var int64_t var_8h @ rbp-0x8 | ; var int64_t var_4h { } @ rbp-0x4 | ; arg int64_t arg1 @ rdi | 0x0000063a push rbp | 0x0000063b mov rbp, rsp | 0x0000063e sub rsp, 0x20 | 0x00000642 mov dword [var_14h], edi ; arg1 For historical reasons, Rizin has never had support for global variables, which means we can\u0026rsquo;t identify and set a certain global variable, which is detrimental to our analysis. I have added support for global variables so that we can easily manipulate a global variable from the command line.\n[0x00000000]\u0026gt; avg? Usage: avg[jadmnt] # Global variables | avg[j] [\u0026lt;var_name\u0026gt;] # show global variables | avga \u0026lt;var_name\u0026gt; \u0026lt;addr\u0026gt; \u0026lt;type\u0026gt; # add global variable manually | avgd \u0026lt;addr\u0026gt; # delete the global variable at the addr | avgm \u0026lt;name\u0026gt; # delete global variable with name | avgn \u0026lt;old_var_name\u0026gt; \u0026lt;new_var_name\u0026gt; # rename the global variable | avgt \u0026lt;var_name\u0026gt; \u0026lt;type\u0026gt; # change the global variable type [0x00000000]\u0026gt; avga foo 0x100 char [0x00000000]\u0026gt; avg global char foo @ 0x100 [0x00000000]\u0026gt; avgt foo int [0x00000000]\u0026gt; avg global int foo @ 0x100 [0x00000000]\u0026gt; In addition, I completely refactored PDB Parser to make it better cross-platform. Previously, PDB Parser had a lot of problems with its functionality, such as missing information, parsing errors, and unused types. All these problems are solved in this refactoring.\n$ rizin Project1.exe -- Use scr.accel to browse the file faster! [0x00401703]\u0026gt; idpi ./Project1.pdb ········· struct std::_Char_traits\u0026lt;char32_t,unsigned int\u0026gt; { char32_t char_type; uint32_t int_type; int64_t off_type; char32_t copy(char32_t * arg0, const char32_t * arg1, const uint32_t arg2); char32_t _Copy_s(char32_t * arg0, const uint32_t arg1, const char32_t * arg2, const uint32_t arg3); char32_t move(char32_t * arg0, const char32_t * arg1, const uint32_t arg2); int32_t compare(const char32_t * arg0, const char32_t * arg1, uint32_t arg2); uint32_t length(const char32_t * arg0); const char32_t * find(const char32_t * arg0, uint32_t arg1, const char32_t * arg2); bool eq(const char32_t * arg0, const char32_t * arg1); bool lt(const char32_t * arg0, const char32_t * arg1); char32_t to_char_type(const uint32_t * arg0); uint32_t to_int_type(const char3_t * arg0); bool eq_int_type(const uint32_t * arg0, const uint32_t * arg1); uint32_t not_eof(const uint32_t * arg0); uint32_t eof(); } ········· [0x00401703]\u0026gt; tuc ········· union __m64 { uint64_t m64_u64; float m64_f32[8]; unsigned char m64_i8[8]; int16_t m64_i16[8]; int32_t m64_i32[8]; int64_t m64_i64; unsigned char m64_u8[8]; uint16_t m64_u16[8]; uint32_t m64_u32[8]; }; ········· Currently, Heersin has completed the new RzIL, but it still lacks support for many architectures. So I am now porting the 8051 architecture from the old ESIL to the new RzIL, and I will be working with Heersin to port more architectures to the new IL afterwards.\nDuring this RSoC, I grew a lot and learned a lot of development skills that I would not normally be exposed to. I would like to especially thank my mentor Anton Kochkov for his selfless help. I would also like to thank all the community members for their help!\nHeersin: New Rizin IL Hi, I\u0026rsquo;m Heersin, I particpated in RSoC this summer to introduce a new Intermediate Language and refactor ESIL related code. Rizin previously used ESIL(a stack-based IL) as its IL to analyse binary. In fact, ESIL is neither user friendly nor developer friendly, those are some of the reasons that led to this work. We take BAP\u0026rsquo;s Core Theory as our new IL. Because it\u0026rsquo;s designed to be similar to SMT, and it may be the latest IL (the most \u0026ldquo;fashionable\u0026rdquo; one) we can trust for now.\nIn the first few days, I didn\u0026rsquo;t have any clue about implementing a Core Theory VM, so I started to work on the some basic data structures (Bool/BitVector/Array) used in VM. They are basic types in core theory, we can emulate other types (ut8/ut16/ut32/ut64) by using bitvector and bool.\nAfter that, I focused on the concepts in VM and the execution procedure. In short, there are Variable and Value in the Core Theory VM. A Variable is a symbol while a Value represents the evaluating result of an expression. read register is used to get the value of a variable and write register is used to assign a value to a variable. Memory is a Hashtable (kv-map), where the address is the key and the data is the value. The Memory concept is similar to the SMT Arrays theory where both values and indexes are Bitvectors.\nThen, I uplifted the brainfuck to test the new IL. That\u0026rsquo;s the uplifted expression.\n# print mode # ++++++++++[\u0026gt;+++++++\u0026gt;++++++++++\u0026gt;+++\u0026gt;+\u0026lt;\u0026lt;\u0026lt;\u0026lt;-]\u0026gt;++.\u0026gt;+.+++++++..+++.\u0026gt;++.\u0026lt;\u0026lt;+++++++++++++++.\u0026gt;.+++.------.--------.\u0026gt;+.\u0026gt;. (STORE (VAR ptr) (ADD (LOAD (VAR ptr)) (INT 1))) (STORE (VAR ptr) (ADD (LOAD (VAR ptr)) (INT 1))) (STORE (VAR ptr) (ADD (LOAD (VAR ptr)) (INT 1))) (STORE (VAR ptr) (ADD (LOAD (VAR ptr)) (INT 1))) (STORE (VAR ptr) (ADD (LOAD (VAR ptr)) (INT 1))) (STORE (VAR ptr) (ADD (LOAD (VAR ptr)) (INT 1))) (STORE (VAR ptr) (ADD (LOAD (VAR ptr)) (INT 1))) (STORE (VAR ptr) (ADD (LOAD (VAR ptr)) (INT 1))) (STORE (VAR ptr) (ADD (LOAD (VAR ptr)) (INT 1))) (STORE (VAR ptr) (ADD (LOAD (VAR ptr)) (INT 1))) (BRANCH (LOAD (VAR ptr)) \u0026lt;NOP\u0026gt; (GOTO ]0)) (SET ptr (ADD (VAR ptr) (INT 1))) (STORE (VAR ptr) (ADD (LOAD (VAR ptr)) (INT 1))) ... (SET ptr (SUB (VAR ptr) (INT 1))) (STORE (VAR ptr) (SUB (LOAD (VAR ptr)) (INT 1))) (BRANCH (INV (LOAD (VAR ptr))) \u0026lt;NOP\u0026gt; (GOTO [0)) (SET ptr (ADD (VAR ptr) (INT 1))) (STORE (VAR ptr) (ADD (LOAD (VAR ptr)) (INT 1))) (STORE (VAR ptr) (ADD (LOAD (VAR ptr)) (INT 1))) (GOTO write) (SET ptr (ADD (VAR ptr) (INT 1))) (STORE (VAR ptr) (ADD (LOAD (VAR ptr)) (INT 1))) (GOTO write) (STORE (VAR ptr) (ADD (LOAD (VAR ptr)) (INT 1))) ... (GOTO write) (SET ptr (ADD (VAR ptr) (INT 1))) (STORE (VAR ptr) (ADD (LOAD (VAR ptr)) (INT 1))) (GOTO write) (SET ptr (ADD (VAR ptr) (INT 1))) (GOTO write) Next step is to integrate new IL with Rizin, and porting analysis_bf to use it. The analysis code is huge and ESIL is tightly integrated within it. I removed some dead code and reorganized the directory structure with the help from community members. Moreover, I added new structures for the trace and stat (they are used to collect info about reg/mem read/write) to replace the sdb approach with vectors and make it easier to understand. Then I continued to integrate the new IL; added aezi and aezs commands for init VM and step emulate respectively.\n[0x00000000]\u0026gt; aezi [0x00000000]\u0026gt; aezs 390 Hello World! [0x00000000]\u0026gt; aezi Porting more architectures will be a huge work. I will continue to contribute to rizin and improve the IL part.\nI am grateful for such an opportunity to participate in RSoC and contribute to Rizin. There is a friendly atmosphere and I learned a lot. I want to give special thanks to my mentor XVilka for his guidance and help, and also to ret2libc, Ivg, Wargio, Pelijah, Thestr4ng3r and 08A for answering my questions and giving feedback on my PRs.\n","permalink":"https://rizin.re/posts/rsoc-2021-summary/","summary":"Rizin Summer of Code 2021 Summary RSoC 2021 is officially finished and we are happy to congratulate both participants for passing the program and completing the most important parts of their tasks.\nBasstorm: Types analysis Hello, I am Basstorm. Over the past two months, I had a fulfilling summer as one of the participants of RSoC. The main subject of RSoC was to improve the Type module.\nAt first, I fixed several bugs in the new tree-sitter based type parser.","title":"Rizin Summer of Code 2021 Summary"},{"content":"Google Summer of Code 2021 Summary GSoC 2021 is officially finished and we are happy to congratulate all 3 participants for passing the program and completing the most important parts of their tasks. It brought us some long-needed code cleanup and user-visible changes in the analysis and binary/heap parsing. See what students wrote themselves:\n08A: Refactoring ELF binaries loading This summer I have been doing the GSoC for Rizin. The subject of the GSoC was to refactor and improve how elf binaries are loaded by Rizin.\nI have added support for the elf hash table and gnu hash table. Those 2 data structures are used to deduct the number of dynamic symbols in the file, which replaced the old way of doing it (assuming that the data is a symbol until there is an error).\nMoreover, I have changed the source of trust used to load symbols\u0026rsquo; versions (from sections information to dynamic section\u0026rsquo;s information). So Rizin is now able to read symbols\u0026rsquo; versions even if there is no section.\n\u0026gt; rz-bin -V bins/elf/analysis/clark WARNING: Invalid section header (check array failed). Version symbols has 9 entries: Addr: 0x080482c2 Offset: 0x000002c2 0x00000000: 0 (*local*) 0x00000001: 2 (GLIBC_2.0) 0x00000002: 2 (GLIBC_2.0) 0x00000003: 0 (*local*) 0x00000004: 2 (GLIBC_2.0) 0x00000005: 2 (GLIBC_2.0) 0x00000006: 2 (GLIBC_2.0) 0x00000007: 2 (GLIBC_2.0) 0x00000008: 1 (*global*) Version need has 1 entries: Addr: 0x080482d4 Offset: 0x000002d4 0x000002d4: Version: 1 File: libc.so.6 Cnt: 1 0x000002e4: Name: GLIBC_2.0 Flags: none Version: 2 There was a hard-coded maximum length for all string found in any elf string table. This limitation was removed and some small check of the string table integrity were added.\n\u0026gt; rizin bins/elf/long-symbol.elf WARNING: Neither hash nor gnu_hash exist. Falling back to heuristics for deducing the number of dynamic symbols... WARNING: Neither hash nor gnu_hash exist. Falling back to heuristics for deducing the number of dynamic symbols... WARNING: Neither hash nor gnu_hash exist. Falling back to heuristics for deducing the number of dynamic symbols... -- Add custom Have you setup your ~/.rizinrc today? [0x00001040]\u0026gt; is~AAA 28 0x00001139 0x00001139 GLOBAL FUNC 15 AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA The main problem with how symbols and imports were loaded, was their mutual dependency during the loading phase. So both processes were split and heavily refactored. As a side effect, an old bug in the symbols loading was fixed.\nThe call to the function system is correctly identified:\n\u0026gt; rizin bins/elf/analysis/phdr-override WARNING: The segment 3 at 0x774 seems to be invalid. WARNING: Neither hash nor gnu_hash exist. Falling back to heuristics for deducing the number of dynamic symbols... WARNING: Neither hash nor gnu_hash exist. Falling back to heuristics for deducing the number of dynamic symbols... WARNING: Neither hash nor gnu_hash exist. Falling back to heuristics for deducing the number of dynamic symbols... -- Change your fortune types with \u0026#39;e cfg.fortunes.file=fun,tips\u0026#39; in your ~/.rizinrc [0x004003f0]\u0026gt; s main [0x004004e6]\u0026gt; af [0x004004e6]\u0026gt; pdf ┌ int main (int argc, char **argv, char **envp); │ ; var int64_t var_10h @ rbp-0x10 │ ; var int64_t var_4h @ rbp-0x4 │ ; arg int argc @ rdi │ ; arg char **argv @ rsi │ 0x004004e6 push rbp │ 0x004004e7 mov rbp, rsp │ 0x004004ea sub rsp, 0x10 │ 0x004004ee mov dword [var_4h], edi ; argc │ 0x004004f1 mov qword [var_10h], rsi ; argv │ 0x004004f5 mov rax, qword [var_10h] │ 0x004004f9 add rax, 8 │ 0x004004fd mov rax, qword [rax] │ 0x00400500 mov rdi, rax │ 0x00400503 mov eax, 0 │ 0x00400508 call sym.imp.system ; int system(const char *string) │ 0x0040050d mov eax, 0 │ 0x00400512 leave └ 0x00400513 ret During the loading phase, sections and segment information checks have been added to verify the integrity of the data. Those checks are stricter than the elf loader. So 3 configurations variable were implemented to allow the user to customize how segments and sections are loaded.\n\u0026gt; rizin bins/elf/analysis/phdr-override WARNING: The segment 3 at 0x774 seems to be invalid. WARNING: Neither hash nor gnu_hash exist. Falling back to heuristics for deducing the number of dynamic symbols... WARNING: Neither hash nor gnu_hash exist. Falling back to heuristics for deducing the number of dynamic symbols... WARNING: Neither hash nor gnu_hash exist. Falling back to heuristics for deducing the number of dynamic symbols... -- Press \u0026#39;C\u0026#39; in visual mode to toggle colors [0x004003f0]\u0026gt; \u0026gt; rizin -e elf.checks.segments=false bins/elf/analysis/phdr-override WARNING: Neither hash nor gnu_hash exist. Falling back to heuristics for deducing the number of dynamic symbols... WARNING: Neither hash nor gnu_hash exist. Falling back to heuristics for deducing the number of dynamic symbols... WARNING: Neither hash nor gnu_hash exist. Falling back to heuristics for deducing the number of dynamic symbols... -- Add colors to your screen with \u0026#39;e scr.color=X\u0026#39; where 1 is 16 colors, 2 is 256 colors and 3 is 16M colors [0x004003f0]\u0026gt; There is still a lot of work to do, specially on the elf plugin interface. If you want to follow the update on this, you can use this link: Refactoring the elf plugin interface\nIn conclusion, the GSoC was an incredible source of motivation to contribute to the Open Source community. And it helped me improve my knowledge of elf internals. I would like to thank my mentors Anton Kochkov and Florian Märkl for their help during the GSoC.\nPulak: Heap viewer for Cutter Hi, I am Pulak Malhotra. Over the past few months, I participated in GSoC with the Rizin organization. My main contributions revolve around the heap parsing code for Rizin and the GUI implementation of heap viewer for Cutter. The initial work started with improving the output format of the dmh family of commands. I made them much more readable, taking inspiration from gdb gef. I added a new command, dmhd, which prints concise information about different bins of a given arena. I also refactored and rewrote a significant part of the Glibc heap codebase, making it more modular and maintainable, including porting it to the new shell. I added new Rizin API calls and used them in Cutter to implement the GUI version of the heap viewer. Heap viewer in Cutter has many features, like getting a list of heap chunks in an arena, editing the heap chunks, getting information about bins in the arena, and visualizations for linked lists of the bins. I encourage everyone to give it a try in their next heap exploitation hack! After Glibc heap, I made some contributions towards the windows heap and windows heap widget. Some of the changes have been merged, like the Rizin API and the new shell port. I\u0026rsquo;ll try my best to ship the other modifications to production soon.\nGSoC was one of my first experiences working on a real-world project, and I learned and grew a lot. I want to give special thanks to my mentors Yossizap and Megabeets, and the Rizin community members XVilka, Ret2libc, Deroad, Gustavolcr, and Thestr4ng3r, who were always there to answer my questions and review and give feedback for my PRs.\nAswin: Support for CPU and platform profiles Hello everybody!\nI\u0026rsquo;m Aswin and this is a brief summary about the work I did on the summer of 2021 with Rizin on adding support for CPU and platform profiles. Rizin previously relied upon manually writing code for adding a new CPU or an IO port and it was a bit tedious to handle them all and it was not user friendly. Providing a level of abstraction in handling this entropy in embedded systems by adding support for editable CPU and platform profiles was the goal of this project.\nAfter getting accepted, the first thing I did was to remove the existing implementation of RzSyscallPorts - the module which took care of the architecture and CPU specific system registers. Here, I made two new modules: RzSysregsDB and RzSysregItem to make this happen. RzSysregsDB just housed a hashtable which paired the address of the port and an RzSysregItem which contained the comment, type and all the other information related it.\nThen, I started working on CPU profiles. The whole idea of CPU profiles is to store all the CPU specifics in one file, parse it and use it at places like analysis, emulation and wherever it\u0026rsquo;s needed. Inside CPU profiles, we store information like size of the ROM, size of the RAM CPU and other things and they are parsed and stored into various data structures inside RzArchProfile, where RzArchTarget houses the name of the CPU and architecture and a pointer to RzArchProfile. Information about the CPU IO registers and Extended IO registers can also be added in CPU profiles. During the analysis loop, they are added as flags (labels) at their corresponding offsets. A feature to map the ROM as sections (iS) were also added with it.\nThis is how the IO and extended IO registers are defined in the SDB files:\nSPH=reg SPH.address=0x3e SPH.comment=Stack higher bits SP8 SP10 After that, I added support for platform profiles. Platform profiles were introduced to handle the platform specific differences. These files contains the name, offset and a short description of each port or register, which are parsed and added as flags and comments. Support for one platforms like BCM2835, which one of the Raspberry Pi runs on, BCM2711 and OMAP 3430 were added along with the x86 IO ports were added subsequently.\nA new configuration variable asm.platform was also added to choose the platform profile. This will let the user choose the name of the profile they want to load and Rizin will load the profile based upon the CPU and the architecture that the user have previously set. For that, I added a new variable platforms to RzAsmPlugin which will hold the list of all supported platforms of that architecture.\nPlatform Profiles also follow a format similar to the CPU profiles that you saw earlier. Here\u0026rsquo;s an excerpt BCM 2835\u0026rsquo;s platform profile:\nAUX_MU_IER_REG=name AUX_MU_IER_REG.address=0x7e215044 AUX_MU_IER_REG.comment=Mini UART Interrupt Enable AUX_MU_IIR_REG=name AUX_MU_IIR_REG.address=0x7e215048 AUX_MU_IIR_REG.comment=Mini UART Interrupt Identify Then, I worked on porting uefi_r2 - a tool used to analyze UEFI modules to Rizin. This tool works by analyzing the firmware using Rizin\u0026rsquo;s RzAnalysis utilities and inspecting its functions, strings and other particulars - for example, while searching for the UEFI GUIDs inside the analyzed strings. Here, the tool is a Python package and all the interaction with rizin is done through rz-pipe\u0026rsquo;s Python module. Overall, this was not particularly challenging but it was indeed very informative. UEFI is insanely complex!\nLater, I continued to work on improving the SVD parser plugin I had started making during the microtask. SVD files are files containing information about a device\u0026rsquo;s peripherals, MMIO registers and other particulars. They are usually made by the manufacturer. This plugin would load the data from SVD file to Rizin mainly the registers\u0026rsquo; name, size, base address and its offset and adds them as flags and comments.\nI would like to thank my mentors xvilka and deroad for their guidance. I was regularly in touch with them and they were constantly trying make sure that everything was going smooth.\nAlso kudos to all the folks at #Rizin-dev, #gsoc-2021 and the other channels where my questions were answered.\n","permalink":"https://rizin.re/posts/gsoc-2021-summary/","summary":"Google Summer of Code 2021 Summary GSoC 2021 is officially finished and we are happy to congratulate all 3 participants for passing the program and completing the most important parts of their tasks. It brought us some long-needed code cleanup and user-visible changes in the analysis and binary/heap parsing. See what students wrote themselves:\n08A: Refactoring ELF binaries loading This summer I have been doing the GSoC for Rizin. The subject of the GSoC was to refactor and improve how elf binaries are loaded by Rizin.","title":"Google Summer of Code 2021 Summary"},{"content":"Google Summer of Code 2021 is here and we are excited to participate! 🎉. This is the 2nd internship program we are running this year, along with RSoC 2021.\nWe received many applications, and we are happy that there is a substantial interest in the project. We thank every participant and wish them luck in their future endeavors. We also thank Google for providing us the platform for attracting new contributors. Many of the past participants stayed with the project after GSoC, and we sincerely hope it will continue in the future.\nThis summer, the accepted projects aim to improve the correctness and quality of the Rizin and Cutter output, along with advancing user experience for embedded reverse engineering and exploitation.\nAswin: Support for CPU and platform profiles Hey, I\u0026rsquo;m Aswin! I\u0026rsquo;m a sophomore year undergraduate from India and I\u0026rsquo;m thrilled to be working with Rizin this summer. For Google Summer of Code, I will be working on adding support for CPU and platform profiles.\nI have always had a passion for reverse engineering as well as on how computers work at a low level. I hope to learn more about reverse engineering and hardware platforms by participating in GSoC at Rizin. I chose this as my project as it will help Rizin be more compatible with obscure hardware platforms, architectures and chips.\nFor the microtask, I was working on writing an SVD parser for Rizin. It was basically a plugin which lets you open SVD files inside Rizin and make use of all the information about the peripherals and registers present inside the file. I came to know about so many things about microcontrollers and many other things like Memory Mapped IO, registers and a lot more about platforms and on how they work while working on this. Over the summer, I\u0026rsquo;m going to do the following tasks:\nGet the configuration variables related to the CPU and the platform to be dynamically populated by listing the available dedicated SDB files and add them to the analysis loop.\nProvide rz-lang and rz-pipe bindings so that the users can choose through scripts, as well.\nAs an additional goal, I\u0026rsquo;ll be working on improving the SVD parser and adding it to Rizin as a core plugin, so that it\u0026rsquo;ll be shipped with Rizin and Cutter, as well. At the end, I also hope to write a good how-to article on how to add and use a profile\nI am engrossed by the wonderful feeling of community I have felt during contributing to Rizin. I have gained amazing insights and skills and feel very grateful to be working and learning more from one of the smartest, kindest and knowledgeable people I\u0026rsquo;ve ever known.\nHoping to accomplish great things and have a really great summer. I wish the very best to all the folks who got in!\nPulak: Heap viewer for Cutter Hi, I am Pulak Malhotra from India. I am an undergraduate student and researcher at IIIT Hyderabad. GSoC provides me an excellent opportunity to work on real-world codebases, contribute to the open-source community, meet new people and learn new things. I am relatively new to reverse engineering. In the past, I enjoyed working on low-level systems in my university courses which drew me towards reverse engineering. Rizin\u0026rsquo;s welcoming and helpful community is a significant factor that makes me want to contribute to the project.\nMy GSoC project aims to deliver widgets that provide information about the heap state while debugging programs in Cutter. These widgets will give information regarding chunks in a heap and the bins in which free chunks are located. I also aim to deliver graphical visualization of the linked list of the arena bins. My project also includes refactoring the heap codebase in Rizin, so the heap allocator is dynamically selected based on binary. Currently, the heap allocators are compiled at compile time in Rizin. I will also try to add support for more heap allocators in Rizin.\nI contributed the Pull Request #810 as my microtask. I improved the output of existing heap-related commands like dmh and dmhf in this microtask, and I created new commands dmhv and dmhd. dmhv is similar to dmh command but provides extra information about the heap chunks and dmhd command provides concise information about the bins of main arena. At various points, while solving the issue, I felt lost, especially when I was not familiar with the codebase. To get over this, I would make small changes, rerun the code, and note the difference in output. At every step, many members of Rizin gave me detailed advice and feedback and guided me. I also came across some suggestions which I pursued further in PR #912 and issue #1088. Overall, it was a fantastic experience contributing to the Rizin.\n08A: Refactoring ELF binaries loading Hi, I\u0026rsquo;m 08A from France. I\u0026rsquo;m an undergraduate student at EPITA, majoring in Systems, Networks and Security (SRS).\nSome friends of mine are Free Software gurus, and they motivated me to contribute to Open Source software. So I chose to contribute to a tool I already used.\nI started working on the code base with Radare2, and after the fork I switched to Rizin. The majority of my time was allocated to fix various issues found by clang-analysis and refactoring how Rizin parses ELF information. The overall experience was like going down a rabbit hole, the code base is huge and some parts are rusty. But the community is awesome and I learned a lot of things.\nFor this summer of code, I will be going to work on refactoring the ELF loading feature. The main challenge will be to fix the imported function detection. If you want additional information, you can check this link.\n","permalink":"https://rizin.re/posts/gsoc-2021-announcement/","summary":"An announcement of the Google Summer of Code 2021. Three accepted candidates.","title":"Google Summer of Code 2021 Announcement"},{"content":"We are excited to announce RSoC 2021! Rizin Summer of Code is a summer internship program we organize together with KeenLab of Tencent. We provide an opportunity for students to work full-time on Rizin and RzGhidra. We use the experience we gathered by participating in Google Summer of Code as an organization and organizing our own RSoC as part of the radare2 project.\nThe application period continued through all of April and in its end we finally chose two students. We wish them the best of luck and happy to give them this stage to introduce themselves.\nHeersin: Intermediate language improvements Hello, I\u0026rsquo;m Heersin from China, an undergraduate student majoring in information security. At the very first, I was looking for a handful RE tool on the Linux platform, and then I met radare2. I have been using it to solve some basic CTF tasks and dumps from some malwares. I found there are some imperfect features (the concept of project, ESIL, search\u0026hellip;) and want to contribute to it. After knowing there is a new fork named Rizin, which aimed at refactoring radare2, I started to get involved.\nDuring my spare time, I\u0026rsquo;ve done some work for rizin, including:\nUpdate some out-of-date pages in the rizin documentation and add more examples. Fix some bugs in pyc plugin Add support for luac format Extend the testsuite to cover more platforms For RSoC this year, I will be working on the ESIL and follow the issue #277 to refactor it, and will add support for floating point and bitvectors. I will also try to fix some issues in ESIL (e.g. prevent rizin from getting stuck when hitting some C-library functions).\nIt will be an exciting and challenging summer, looking forward to it!\nBasstorm: Types analysis Hi, I\u0026rsquo;m basstorm from China, and I am a 21 years old student. Over the last couple of weeks, I have done some bug fixes and improved the class analysis module:\nFixed display of duplicate vtables in acll command when using aaa command to analyze over 2 times. Improved the output of the acll command to be more concise and clear. Implemented the integration of data from the RzBin and RzAnalysis modules, which makes the results of class analysis more accurate. Implemented constructor and destructor detection based on the function name. This summer, I am going to do the following tasks:\nImprove the support of PDB Structure. Implementing new features in RzTypes. Continue to implement new features or bug fixes around class analysis. ","permalink":"https://rizin.re/posts/rsoc-2021-announcement/","summary":"An announcement of the Rizin Summer of Code 2021. Two accepted candidates.","title":"Rizin Summer of Code 2021 Announcement"},{"content":"As developers, we think it is essential to have a building system that eases our work, allows us to compile Rizin quickly on a wide range of devices, is easy to understand and to modify, and provides a nice set of features one would usually expect from a full-fledged building system. Since its inception, Rizin has focused on improving its Meson build files and making its support first-class while deprecating the original building system used in radare2. In the following article, we will explain the reasons behind this choice and the key benefits of Meson.\nTL;DR Meson is declarative and easy to understand Ninja is fast, no files are recompiled if not necessary Meson keeps your source directory clean with out-of-source builds Meson makes it easy to build and run multiple versions of Rizin Meson simplifies dependency handling and switching from internal dependencies to system-provided ones A bit of context Historically radare2 has been compiled with the usual ./configure; make approach. This essentially consists of a shell script, configure, and a set of Makefiles. configure allows the user to customize the compilation and installation process performed by make by setting, for example, the destination directories where executables, libraries, etc. are installed on the system. It is also used to enable or disable specific features (e.g. the debugger) or to check for the existence of specific libraries, header files, functions, compiler or linker arguments.\nTo some, this may be very similar to what is done by Autotools. However, in radare2/Rizin case, configure is generated by another shell script, acr, by parsing a configure.acr file. acr is a tool developed by the original author of radare2, and it is an Autoconf replacement.\nDuring the years some attempts were made to introduce other build systems, like Jam and a NodeJS-based build system. It was only in 2017 that radare2 started introducing Meson. Since then, many people have improved this system to compile on several platforms and making sure it is (almost) feature-wise on par with the ACR/Make build system.\nRizin has chosen to deprecate the use of ACR/Make and switch to Meson as the main build system. We believe this will make the overall build process more standard, easy to understand, and easy to integrate with other tools/libraries. Other very valid alternatives such as CMake were considered, however we preferred to keep working with Meson, which was already tested and tried with Rizin for a long time, rather than starting completely from scratch with another build system.\nProblems with ACR/Make There are of course several reasons for this choice, so let\u0026rsquo;s first see what we have identified as the problems of the historical approach:\nACR is essentially a one-person project, with mostly only radare2 and other radare-related tools using it. This by itself is not a bad thing, but it comes with the downside that you find no help or documentation online and if you have issues or missing features, you have to rely on one person only who understand its internals. Moreover, the features you find are usually just the ones used by radare2 project (e.g. not long ago, it was not possible to easily check if the compiler supported a particular compilation flag, because it was never necessary for radare2). configure script needs a sh shell, which makes it hard to use on platforms such as Windows. There are of course ways to use it, but they may involve installing MinGW or similar, which may not be ideal for Windows users who usually work within Visual Studio. Makefiles can be written in a very flexible way and they can be used to perform any sort of action, from simply compiling a C file to running scp, various scripts, and much more. Flexibility shall not be abused though. Otherwise, it may become hard to understand how things are actually done. For example, understanding how librz_io.so is compiled involves looking at the Makefile in libr/io, which includes config.mk that setups some variables based on other variables defined in the Makefile and then it includes rules.mk, which uses those variables to actually compile the library. Inside rules.mk you find, hidden with various environment variables, the commands used to build the object files, and then the library. You can look at the compilation command here, which we think is hard to grasp from a quick look even for people familiar with radare2/Rizin codebase (you may wonder where to find config.mk mentioned above: it is auto-generated). It is \u0026ldquo;low-level\u0026rdquo;, which means that the Makefiles define the specific commands, flags, and options that you have to use to actually compile a binary, a library, or an object file. This provides a lot of power, but it may also be overwhelming having to remember to add specific compilation/linking flags for compiling a single file. For example, it is not possible yet to compile radare2/Rizin within a directory with spaces in the name due to limitations within GNU Make. ACR/Make cannot be used as-is to compile Rizin on Windows systems. What we like about the Meson Build System It is declarative, which means you don\u0026rsquo;t have to remember or care about how to actually compile a shared library or a static library on Linux, Windows, BSD, etc. or how to link an executable with some other libraries or make sure include paths are right. As an example, look at this piece of meson.build:\nlibrary(\u0026#39;io\u0026#39;, [\u0026#39;file1.cpp\u0026#39;, \u0026#39;file2.cpp\u0026#39;], dependencies: [util_dep], install: true, soversion: rz_asm_lib.version() ) You don\u0026rsquo;t need to know how meson is going to build your library, but it is going to do it by compiling two source files (e.g. file1.cpp and file2.cpp), name the library io (e.g. on Linux the library would be called libio.so, but the full name and the extensions might be different on Windows) and give it the proper API version, make sure the dependency specified by util_dep, whatever it is, is used to compile this library, by adding the proper include paths and link directives.\nIt is fast. This is extremely important for developers, as while developing a feature or fixing a bug they may need to compile Rizin multiple times and we want this process to be as fast as possible. Meson/Ninja performs quite well compared to other build systems (https://mesonbuild.com/Simple-comparison.html). It forces you to list all source files used to compile a target and it is able to automatically compute other dependencies between targets. In ACR/Make, due to its complexity as implemented in radare2/Rizin and to the low-level approach, it is easy to mess with the dependencies between targets and to recompile multiple times the same files even when there are no changes. For example, until very recently, running make multiple times caused the recompilation of several objects even if no file was changed (in last few months this problem was caused by wrong dependencies of sdb, in the past due to wrong dependencies of the capstone target).\nmeson can run everywhere python3 can. This includes a very wide range of platforms nowadays. It automatically provides a very powerful scripting language, python, that you are guaranteed to find on the build machine. Moreover, it can be used with various backends, like Ninja, Visual Studio and Xcode, which means it can be used to generate a Visual Studio solution that you can import there.\nIt forces you to build out-of-source, meaning that no changes (mostly) will be done to your source directory, which must contain only the source files of your project and not be mixed with other auto-generated files like executables or object files. This also allows you to have the project compiled with different options or with slightly different code, cleanly separated in different directories.\nDue to its declarative nature, it does not matter whether a dependency is in a path or another or if it comes from the system or it was bundled with the source code. You just define capstone_dep variable properly in one of your meson.build files and you reference it wherever it is needed, leaving all the details to meson itself. This encourages splitting the repository into sub-projects when it makes sense, in contrast with the ACR/Make system where even a small change to e.g. SDB path would require rewriting several Makefiles. If in the future some systems will ship their own version of SDB, we would just need to change few lines in the definition of sdb_dep to actually take the system library instead of the bundled one and no other place would need to be changed to make sure everything is compiled/linked with the right headers/libraries.\nIn case of problems with meson there is a healthy community out there ready to help you, a nice and extensive documentation and active developers that improve the system with new releases. New developers who want to work on our build system can easily find other examples online and have available documentation to get them up to speed.\nMany complex low-level pure C projects recently switched to Meson: Mesa, Wayland, PipeWire, QEMU, and many others. We are not alone in this!\nExamples of using meson Development process As a developer when you download Rizin, you can install it for your user in ~/.local, so you don\u0026rsquo;t need root access to install files. You can do this with meson --prefix=~/.local build; ninja -C build install. After that, you can change the source code however you need and then run ninja again with ninja -C build. Only the changed files are re-built.\nMoreover, running ninja by default builds files with explicit RPATHs, which means that the executables and libraries contain direct references to the paths of dependent libraries they are linked against so the loader can then always find them without having to specify LD_LIBRARY_PATH or similar. For this reason, most of the times you will not need to re-install the Rizin files, but while developing you can just run rizin from ./build/binrz/rizin/rizin.\nRPATH are not, of course, always good. Indeed they are usually removed during the installation process. However, when you install Rizin in a place that is not /usr, we have chosen to keep RPATHs to make the installation process as simple as possible, without requiring users to mess with their environment to make sure the binaries can find the proper libraries. Packagers, who usually use /usr as a prefix, should not be affected by this decision, but they can anyway disable it by specifying -Dlocal=false when running meson.\nReviewing a PR and testing changes When testing a PR with a fix or comparing multiple changes, you need to have access to multiple versions of Rizin. Doing this with ACR/Make is of course possible, but it usually involves installing everything in separated directories and making sure your environment variables (e.g. PATH, LD_LIBRARY_PATH, etc.) are correctly set. With meson, you can build one version (e.g. from dev branch) with meson --prefix=~/.local build-dev; ninja -C build-dev, then switch branch with git checkout my-other-branch and build Rizin again with meson --prefix=~/.local build-pr; ninja -C build-pr. Due to the RPATH used by default, as mentioned above, each build directory can be used without installation to actually run the Rizin tools. At that point, you can quickly compare the results of ./build-dev/binrz/rizin/rizin and ./build-pr/binrz/rizin/rizin.\nConclusion Of course it\u0026rsquo;s not all perfect with meson either. Right now the meson build system is missing some features that were only available with ACR/Make.\nTo uninstall Rizin you have to run ninja -C build uninstall from the same build directory you used to run the install step, otherwise, it will not uninstall files. However if during install step we add any custom installation script (e.g. to sign your rizin binary in macOS), there is no counter part to actually have an uninstall script. That said, nothing prevents us from having a custom target similar to what ACR/Makefile system does to manually remove, with a script, the installed files, but we believe proper file tracking should be done by distributions and packages.\nMeson is quite new and, although rare, you may find issues from time to time. That said, its community is healthy and active so you can count on them to fix these problems as soon as possible or provide help, also thanks to the many big projects that have switched to meson in the last years.\nAll in all, we hope to make it easier for our developers and users to build Rizin. We are trying to build a good Reverse Engineering Framework and we want to focus our efforts on this rather than dealing with the limitations of a niche build system.\nIf you find issues or find particular installation setups difficult or missing, feel free to open a bug in GitHub and we will be happy to either guide you through a solution or develop the fix according to our roadmap.\n","permalink":"https://rizin.re/posts/why-meson/","summary":"Why we switched to Meson/Ninja as our main build system.","title":"Why we chose Meson as our build system"},{"content":"When manually analyzing a complex binary, possibly over the course of days, weeks or even months, it is crucial to be able to keep track of the gained knowledge through annotations such as comments, function and variable names. As such, the tool one is working with also need to provide a reliable and future-proof way to save and restore this information. One of the biggest additions in Rizin surely is the new projects feature, which provides exactly this functionality in both rizin on the command line and Cutter. In this article, we would like to give an overview of how it was designed, what exactly it promises to you, as well as the current limitations you should be aware of when using it right now.\ntl;dr Projects can be used in rizin using the Ps [\u0026lt;project.rzdb\u0026gt;] and Po \u0026lt;project.rzdb\u0026gt; commands and in Cutter through its regular user interface. Projects are currently in beta, including in any 0.x.y releases of Rizin, and will be considered stable starting with release 1.0.0. Beta means that all functionality is implemented and ready to use, but there is no guarantee that the format itself will not further change slightly and thus maybe break loading a project saved right now in a future version of Rizin. Stable means that the format is finalized and all changes inside of it will come with migrations and tests ensuring that all projects saved before are still be loaded correctly. Projects may be conceptually split into two parts: the binary that is being analyzed, and any info that has been put on top by automatic analysis or the user. Saving and loading of all analysis data on top of a binary, including flags, functions, variables, types, comments is implemented. Automatic reloading of the underlying binary is currently limited to only a single binary available as a regular file, but this will be extended to arbitrarily complex IO mappings in the future. However, even with the current state, it is possible to manually reconstruct more complex mappings and then load any analysis data on top using the Poo \u0026lt;project.rzdb\u0026gt; command. Wait, weren\u0026rsquo;t there already projects in Radare2 before? Indeed, there has been a projects feature in Radare2 since 2017. This has been removed entirely from Rizin and is now entirely replaced by the new implementation, which has been re-designed from scratch and shares no code with the old one.\nTo understand why such a radical change was necessary, let us take a closer look at how old projects were designed. They primarily consisted of a single rc file, which was a radare2 script containing regular commands that would reconstruct the session state when run. As an example, a part of such a script to load one function could look like this:\n\u0026#34;f main 127 0x080485f5\u0026#34; \u0026#34;af+ 0x080485f5 main s n\u0026#34; afb+ 0x080485f5 0x080485f5 54 0x08048655 0x0804862b afb+ 0x080485f5 0x0804862b 24 0x08048655 0x08048643 afb+ 0x080485f5 0x08048643 18 0x08048665 0xffffffffffffffff afb+ 0x080485f5 0x08048655 16 0x08048665 0xffffffffffffffff afb+ 0x080485f5 0x08048665 15 0xffffffffffffffff 0xffffffffffffffff We can see it is first creating a flag (f), then creating a function (af+) and finally adding basic blocks to it (afb+).\nWhile this general approach can work in theory, it comes with several implications:\nCommands can have side effects. As an example, until only very recently, the afb+ command would trigger a heavy function analysis loop after adding a basic block in some circumstances, creating variables, X-Refs and other information. The information coming out of this side effect would then mix with the rest of the restored session, resulting for example in unwanted variables being present after loading. Commands and their semantics can change over time. Simple changes include command name changes or the order of arguments, more complex ones may involve major restructuring of underlying concepts, thus requiring entirely different command sequences to achieve the same results. Of course, since the saving instance can not predict the future, it would be solely the responsibility of loading instance to account for such changes. However with the project being an unstructured sequence of commands that may not even be part of the codebase anymore at this point, performing such a migration is far from trivial and highly error-prone. Moreover, before rizin\u0026rsquo;s new command parser was created, there was no formal specification of the command syntax. You can see in the above example that the first af+ command is enclosed in \u0026quot;...\u0026quot;, which is to account for cases such as the function name being ma;in where otherwise the ; would be interpreted as a separator for a new command, similar as in an SQL injection, eventually resulting in broken project loading. However, this quoting scheme still fails for names such as ma\u0026quot;in. As mentioned, this could have been eventually fixed using the new command parser, which has a well-defined escaping syntax, but it still has been the source of many bugs in the past.\nOn top of all these fundamental issues comes the fact that these projects were never tested apart from very few integration tests covering only a tiny fraction of the information potentially included in a session. All these aspects combined led to a high density of bugs and uncertainty when working with this feature. If you were very lucky, the project would save and load as expected. If you were less lucky, the loading would simply result in an error. But, and this has been the most likely case, if you were unlucky, the project would load seemingly correctly, but you would notice only later that the loaded data was deeply corrupted.\nDespite some of these issues being theoretically possible to fix, the conceptual problems of using commands for projects remain. Because the ability to save a session is only even remotely useful when it can also be relied upon to always correctly restore it in the future, a different approach had to be taken here, hence requiring an entire rewrite of the feature. This new approach, detailed in the following section, takes concrete learnings from the mistakes of the previous approach and thus avoids all problems mentioned above right from the start.\nDesign Projects take a classic, fully declarative approach to store their information, saving and loading a direct dump of the internal state.\nSerialization All relevant modules and data structures now have serialization and deserialization functions added, commonly prefixed with rz_serialize_ and implemented in files called serialize_*.c, as for example serialize_flag.c in the case of flags.\nFor the target data structure, SDB is being used, which is a database that is also used in other parts of rizin. What makes SDB special is its simplicity: One SDB is simply a mapping from arbitrary string keys to string values, and multiple SDBs can be nested in a tree of namespaces. This restricted design makes SDB unsuitable for many applications, but for our projects it turned out to fit very well. Inside such an SDB, when more complex structures are needed, JSON is used. This combination of well-defined formats means we can rely on them and forget about escaping or sanitizing strings in our actual serialization code.\nFor example, the same function as in the previous example would now be serialized like this:\n/core/analysis/functions 0x80485f5={\u0026#34;name\u0026#34;:\u0026#34;main\u0026#34;,\u0026#34;bits\u0026#34;:32,\u0026#34;type\u0026#34;:4,\u0026#34;cc\u0026#34;:\u0026#34;cdecl\u0026#34;,\u0026#34;stack\u0026#34;:16,\u0026#34;maxstack\u0026#34;:32,\u0026#34;ninstr\u0026#34;:43,\u0026#34;bp_frame\u0026#34;:true,\u0026#34;bp_off\u0026#34;:8,\u0026#34;diff\u0026#34;:{},\u0026#34;bbs\u0026#34;:[134514165,134514219,134514243,134514261,134514277],\u0026#34;vars\u0026#34;:[{\u0026#34;name\u0026#34;:\u0026#34;argv\u0026#34;,\u0026#34;type\u0026#34;:\u0026#34;char **\u0026#34;,\u0026#34;kind\u0026#34;:\u0026#34;s\u0026#34;,\u0026#34;delta\u0026#34;:4,\u0026#34;arg\u0026#34;:true,\u0026#34;accs\u0026#34;:[{\u0026#34;off\u0026#34;:0,\u0026#34;type\u0026#34;:\u0026#34;r\u0026#34;,\u0026#34;sp\u0026#34;:4,\u0026#34;reg\u0026#34;:\u0026#34;esp\u0026#34;}]},{\u0026#34;name\u0026#34;:\u0026#34;var_8h\u0026#34;,\u0026#34;type\u0026#34;:\u0026#34;int32_t\u0026#34;,\u0026#34;kind\u0026#34;:\u0026#34;b\u0026#34;,\u0026#34;delta\u0026#34;:-16,\u0026#34;accs\u0026#34;:[{\u0026#34;off\u0026#34;:117,\u0026#34;type\u0026#34;:\u0026#34;r\u0026#34;,\u0026#34;sp\u0026#34;:18446744073709551608,\u0026#34;reg\u0026#34;:\u0026#34;ebp\u0026#34;}]}]} /core/analysis/blocks 0x80485f5={\u0026#34;size\u0026#34;:54,\u0026#34;jump\u0026#34;:134514261,\u0026#34;fail\u0026#34;:134514219,\u0026#34;traced\u0026#34;:true,\u0026#34;ninstr\u0026#34;:18,\u0026#34;op_pos\u0026#34;:[4,7,10,11,13,14,15,17,20,25,30,33,36,41,46,49,52],\u0026#34;stackptr\u0026#34;:16,\u0026#34;parent_stackptr\u0026#34;:0,\u0026#34;cmpval\u0026#34;:1} 0x804862b={\u0026#34;size\u0026#34;:24,\u0026#34;jump\u0026#34;:134514261,\u0026#34;fail\u0026#34;:134514243,\u0026#34;traced\u0026#34;:true,\u0026#34;ninstr\u0026#34;:9,\u0026#34;op_pos\u0026#34;:[3,6,8,11,12,17,20,22],\u0026#34;stackptr\u0026#34;:16,\u0026#34;parent_stackptr\u0026#34;:16} 0x8048643={\u0026#34;size\u0026#34;:18,\u0026#34;jump\u0026#34;:134514277,\u0026#34;traced\u0026#34;:true,\u0026#34;ninstr\u0026#34;:5,\u0026#34;op_pos\u0026#34;:[3,8,13,16],\u0026#34;stackptr\u0026#34;:16,\u0026#34;parent_stackptr\u0026#34;:16} 0x8048655={\u0026#34;size\u0026#34;:16,\u0026#34;jump\u0026#34;:134514277,\u0026#34;traced\u0026#34;:true,\u0026#34;ninstr\u0026#34;:4,\u0026#34;op_pos\u0026#34;:[3,8,13],\u0026#34;parent_stackptr\u0026#34;:16} 0x8048665={\u0026#34;size\u0026#34;:15,\u0026#34;traced\u0026#34;:true,\u0026#34;ninstr\u0026#34;:7,\u0026#34;op_pos\u0026#34;:[5,8,9,10,11,14],\u0026#34;parent_stackptr\u0026#34;:0} While this certainly is harder to read for humans, it follows a clearly defined structure and all relevant information can be extracted from it directly. This kind of serialization design now also allows unit tests to be written easily and in fact all currently implemented serializations already come with such tests, aiming to ensure that all internal state is correctly saved and loaded, down to even subtle details and corner cases.\nWhat you see above is already an example of how the serialization will eventually be saved to a file. It is a simple, text-based format that stores the SDB entries line by line and takes care of any necessary escaping. While such a text-based format may not be the most efficient representation, it turned out to be more than good enough for even larger projects and in addition has certain nice properties, which we will make use of further down. However, due to the simplicity of SDB, other file formats to store the same data are theoretically feasible too.\nVersioning An important aspect is that the possibility to correctly load a project will survive even significant updates of the software. To ensure this, a simple version-based migration approach is used: The project code contains a version number defined as RZ_DB_PROJECT_VERSION, which is simply an integer that is increased every time there is a change in the format. This number is then simply saved into every project\u0026rsquo;s metadata namespace.\nLater, when loading the same project in newer rizin that also has a higher internal project version number, it will be able to know exactly the kind of format that the old project was saved with and will be able to upgrade it by successively applying migrations, which will be implemented along every increase of the project version number.\nAt the current point in time, the version number is 1 and there are no migrations. This is because at the moment, the projects feature is considered to be in a Beta phase, allowing it to be tested thoroughly and still receive changes to the format that might turn out sensible without the additional engineering overhead of implementing migrations.\nThis means that right now, everybody is highly encouraged to test projects and report any issues that might come up, but be aware of the fact that compatibility with later rizin versions may not be guaranteed and might require small manual edits in the serialized file.\nThe Beta phase will continue throughout all 0.x.y versions of rizin and end by version 1.0.0 where projects will be considered stable, meaning that all projects should always be properly loaded in all future versions and if a case is discovered where this promise is not held, it will be considered a bug and shall be fixed.\nRe-loading of underlying binaries One of the trickiest aspects of serializing a rizin session is handling the actual underlying binary that is being analyzed. In fact, speaking of \u0026ldquo;the binary\u0026rdquo; in this context is a crude underapproximation of what is actually present in Rizin.\nIgnoring debug, three modules are working together to load files: RzIO provides a generic IO layer, which can map data coming from plugins in a 64-bit address space. RzBin takes raw files from RzIO, parses their binary file formats such as ELF or PE, also using an independent plugin for each, and eventually provides information how to then lay out the contained sections in RzIO again, along with a list of symbols and other information parsed from the binary. RzCore controls how these modules are created and work together.\nThis design makes rizin\u0026rsquo;s loading mechanism very powerful and flexible, but imposes certain challenges on serialization: How to handle all the different IO plugins? Next to the one that simply loads a regular file, there are plugins for files in zip, malloc, http, shared memory, \u0026hellip; that all need individual reconstruction logic. For regular files, how to relocate the actual file when the project is moved to another machine? From RzBin, should the symbols information also be serialized or re-parsed?\nBecause this part needs to be designed properly first and might even require some refactoring in the respective modules, its implementation has been postponed for now. But the preliminary, rough plan is the following: Every IO plugin itself provides callbacks for (de)serialization of maps created with it. All IO maps are serialized to the file using these callbacks. Information in RzBin will not be serialized but re-parsed on top of the deseralized IO maps.\nHowever, despite this full implementation being postponed, a very simple temporary solution has been implemented, which is strictly limited to the case where only a single binary is loaded from a regular file with the default loading settings, i.e. without explicitly specifying the base address for example. This makes it possible to use projects conveniently right now for the majority of use-cases. More complex cases are also already possible, as long as the loading process is done manually and the project is then loaded on top using the Poo \u0026lt;file.rzdb\u0026gt; command, as shown in the following section.\nUsage Saving and loading projects from rizin is as simple as it can be:\n[0x00000000]\u0026gt; P? Usage: P\u0026lt;so?\u0026gt; # Project management | Ps [\u0026lt;project.rzdb\u0026gt;] # Save a project | Po \u0026lt;project.rzdb\u0026gt; # Open a project | Poo \u0026lt;project.rzdb\u0026gt; # Open a project on top of currently loaded binaries Use Ps [\u0026lt;project.rzdb\u0026gt;] from a running session to save it and Po \u0026lt;project.rzdb\u0026gt; to discard the current session and load the saved one. Alternatively, a project can also be loaded directly when starting rizin like rz -p project.rzdb.\nPo and -p will also take care of loading the single, underlying binary as explained in the previous section. If this is not desired, you can use the Poo \u0026lt;project.rzdb\u0026gt; command to keep all current state of IO mappings and parsed binaries in place and only load the analysis information on top.\nIn Cutter, simply use the File -\u0026gt; Save Project... menu entry or Ctrl+s shortcut to save and the Projects tab in the initial dialog to open a project: Cutter will also ask you to save the project before quitting so no work will get lost by accident.\nFor the case explained before, where the project depends on more complex mappings than a single binary file, or if the same project should be loaded on top of another binary, the Poo \u0026lt;project.rzdb\u0026gt; can be used. For example, this is how a project can be loaded on top of two files:\n$ rizin -- # start rizin without any file [0x00000801]\u0026gt; on crackme.bin 0x7ff # load first file at 0x7ff [0x00000801]\u0026gt; on kernal.bin 0xe000 # load second file at 0xe000 [0x00000801]\u0026gt; Poo crackme.bin.rzdb # load project on top [0x00000815]\u0026gt; pd 1 # disassemble inside the first file 0x00000815 jsr CHROUT_in_kernal ; this is a call from crackme.bin into kernal [0x00000815]\u0026gt; pd 1 @ CHROUT_in_kernal # disassemble inside the second file ;-- CHROUT_in_kernal: 0x0000ffd2 jmp (0x0326) Version Control and Collaboration If you have used Ghidra before, you might have come across its \u0026ldquo;shared project\u0026rdquo; and Ghidra server, which are its strong, built-in features for collaborative reverse engineering with version control. Rizin takes a different approach to provide this functionality that is more in line with its UNIX-like focus. It does not implement version control itself, but instead creates project files in a way that they can work well with existing version control systems like git, which are well-tested and likely to already be familiar for users.\nBeing text files where independent content is generally split by lines, git already knows how to deal with tracking differences and merging for these files most of the time. This is for example a diff of a project where the current seek was changed and a comment added:\ndiff --git a/megabeets_0x1.rzdb b/megabeets_0x1.rzdb index 9c828f4..aed7e64 100644 --- a/megabeets_0x1.rzdb +++ b/megabeets_0x1.rzdb @@ -4,7 +4,7 @@ version=1 /core blocksize=0x100 -offset=0x8048370 +offset=0x8048600 /core/analysis @@ -158,6 +158,7 @@ watcom=cc 0x804859a=[{\u0026#34;type\u0026#34;:\u0026#34;t\u0026#34;,\u0026#34;str\u0026#34;:\u0026#34;char *dest\u0026#34;}] 0x80485db=[{\u0026#34;type\u0026#34;:\u0026#34;t\u0026#34;,\u0026#34;str\u0026#34;:\u0026#34;const char *s2\u0026#34;}] 0x80485e2=[{\u0026#34;type\u0026#34;:\u0026#34;t\u0026#34;,\u0026#34;str\u0026#34;:\u0026#34;const char *s1\u0026#34;}] +0x8048600=[{\u0026#34;type\u0026#34;:\u0026#34;C\u0026#34;,\u0026#34;str\u0026#34;:\u0026#34;I am putting a comment here!\u0026#34;}] 0x8048609=[{\u0026#34;type\u0026#34;:\u0026#34;t\u0026#34;,\u0026#34;str\u0026#34;:\u0026#34;const char *s\u0026#34;}] 0x8048619=[{\u0026#34;type\u0026#34;:\u0026#34;t\u0026#34;,\u0026#34;str\u0026#34;:\u0026#34;const char *s\u0026#34;}] 0x8048646=[{\u0026#34;type\u0026#34;:\u0026#34;t\u0026#34;,\u0026#34;str\u0026#34;:\u0026#34;const char *s\u0026#34;}] Examining these json-based diffs surely is not be the most convenient way to view differences for every user, but it provides a working compromise between readability for both humans and software at the same time without requiring any programs except git up to this point. In addition, we are investigating implementing custom diff- and mergetools that could be integrated into git and are fully aware of the meaning of data in project files to present and merge differences in the best way possible while still relying on an existing version control system.\nRegarding the binary that is being analyzed in a project, if desired, it can also be put into the same git repository as the project. Since projects contain a reference to the binary file relative to the project file, it can still be re-loaded when moved to another machine.\nConclusion We hope you will enjoy using rizin with its new projects feature. If you are interested, we highly encourage you to try it out, put it through its paces, and report any potentially upcoming issues, so we will be able to iron them out until the end of the beta phase!\n","permalink":"https://rizin.re/posts/introducing-projects/","summary":"An overview of the new projects feature in Rizin to save and load reversing sessions. Its design, promises and future.","title":"Introducing Projects in Rizin"},{"content":"We are excited to announce Rizin — a free and open-source Reverse Engineering framework, providing a complete binary analysis experience with features like Disassembler, Hexadecimal editor, Emulation, Binary inspection, Debugger, and more.\nRizin is a fork of radare2 with a focus on usability, stability, and working features, which strives to provide a welcoming environment for developers and users alike. Rizin was founded by a group of the core developers of radare2 and Cutter who contributed to the project in one way or the other in the past years and together constructed the Core group of radare2. With the establishment of Rizin, we are committed to creating an environment and a project which will be aligned with our values and vision.\nDuring recent years, the environment that was created in radare2 was one where many of us felt stressed, disrespected, and unwelcome. Moreover, the number of users of radare2 grew every year, and we held the ultimate responsibility to provide them a stable, usable framework. As the core developer team, we have come to the conclusion that it is impossible for us to continue to pursue the goal of making radare2 better under the current circumstances and environment, and we decided to move forward on our own and fork the project. Cutter, the Graphical User Interface for radare2, and its entire team will also join Rizin and will use it as its backend.\nRizin is a newborn project that was created from radare2, hence more and more changes and differences will appear over time. A lot of efforts were put into improving our workflows, putting more tests in place, improving the API, removing redundant features, and more. We hope to provide better consistency between releases, making the framework more trustworthy to users.\nWe are also working to create a more inclusive and diverse community that will be inviting for new contributors and users. As an initial step, we adopted a Code of Conduct that we believe is aligned with our values and with the community we want to create around Rizin.\nFinally, we know and understand that now it is our turn to prove that Rizin can become a tool you can trust and enjoy using, and a community in which you feel welcome. We invite you to read our answers to your Frequently Asked Questions and join our communities on Mattermost and other chat platforms.\n","permalink":"https://rizin.re/posts/announcing-rizin/","summary":"We are excited to announce Rizin — a free and open-source Reverse Engineering framework, providing a complete binary analysis experience.","title":"Announcing Rizin! 🎉"},{"content":"Who are you? We are a group of developers and security enthusiasts who contributed to radare2 in one way or the other in the past years. Some of us got involved with radare2 up to 8 years ago. We were, together with pancake — the original author — the maintainers of the radare2 project. We developed, handled issues, pull requests, review, CI and more. Some of us are the team who lead and maintain the Cutter project, a popular Graphic User Interface for the radare2 project. Among others, we started the development and integration of popular decompilation plugins for radare2 such as r2ghidra and r2dec.\nWhy did you fork radare2? During the years, the direction that radare2 was led to was not aligned with what we believed is the best for the project and the community. These disagreements covered many of the aspects involved in creating an open source project — technical, interpersonal, and managerial.\nWith time, the environment that was created was one where many of us felt stressed, disrespected, and unwelcome. An environment that for years affected users, contributors, and core members.\nRadare2 as a project evolved and couldn\u0026rsquo;t anymore be treated as a toy tool. With the number of users growing every year, we are in the ultimate responsibility to provide them a stable, usable framework. As the core developer team, we have come to the conclusion that it is impossible for us to continue to pursue the goal of making radare2 better under the current circumstances and environment.\nIt is natural for Open Source projects to separate to different journeys with different visions. We all want to participate and contribute to projects we are passionate about, which we believe in, feel safe and welcome, and enjoy working on. For the aforementioned reasons and others, we believe that it is better for us to move forward on our own and fork the project.\nWhat are the differences between Rizin and radare2? Rizin is a new born project that was created from radare2, hence more and more changes and differences will appear over time. With the establishment of Rizin, we are committed to create an environment and a project which will be aligned with our values and vision for an open source project and community.\nWe see it as our ultimate responsibility to provide the users with a stable and usable program that they can rely on. We will put efforts on releasing stable versions of Rizin and improving our test suite.\nIt is also in our obligation to create an environment where developers, contributors and users feel welcome and safe. For this, we put in place multiple instruments that will allow us to enforce such behavior. We adopted the Contributor Covenant Code of Conduct as we believe it is aligned with our values and with the community we want to create around Rizin. We will follow the code of conduct and enforce it on our different platforms. We started efforts of cleaning the source code from phrases that can\u0026rsquo;t be part of the environment we want to create. In addition, we will put efforts in creating a more inclusive and diverse community and welcome new contributors.\nTechnically speaking, Rizin already contains many changes that do not exist in radare2. Some of them are noted below:\nNew Projects: we replaced the existing project functionality with a new one, developed entirely from scratch, that is based on serialization of existing objects instead of replication of commands. A blog post about this new feature will soon be published, so stay tuned if you want to know more! Removal of less tested/stable features: As we strive to provide a stable tool that you can trust, we chose to remove some features that we believe are not widely used, are old or are not tested at all and thus do not provide any value in their current state. This includes features such as the embedded WebUI, m commands, old projects, the pdc command, T commands, and others. Switch to Git submodules instead of copy-pasted code: this will allow us to better track the external code used in Rizin. Deprecation of ACR/Makefile build system in favor of Meson: experience has shown that a more declarative approach as used by Meson is easier to maintain and understand. Although at the moment, the ACR/Makefile build system contains some features that Meson in Rizin is missing, it is also slow (in terms of compilation time), complicated to edit and does not support out-of-source builds. If more additions are needed, we will be able to implement them in Meson. New shell behavior and overall commands handling: We recently developed in radare2 a new way to parse user commands, register them and develop them. This feature is called cfg.newshell and it will both make the user experience more consistent and the developer experience smoother. For these reason we have improved and enabled this by default in Rizin. We will publish a separate blog post about this soon! What will happen to radare2 now? We don\u0026rsquo;t know. radare2 is a popular project with many contributors and users. The maintainer of radare2 will decide how things will proceed. Such a big move will naturally cause changes and we wish to work together to resolve them while causing the least amount of discomfort to the members of the radare2 community and the users.\nWe wish the radare2 project the best of luck.\nWhat about Cutter? The Core team of Cutter, who was also a part of radare2 Core team, left radare2 and co-founded Rizin. Following this, Cutter is switching from radare2 to Rizin as its backend. For the users of Cutter, nothing major should change. Development on Cutter will continue as usual. Changes in the organization and policies (e.g, Code of Conduct) will also apply to Cutter. Radare2 may or may not fork Cutter back to support radare2 instead and that is up to the radare2 maintainers.\nWill you contribute to radare2? As we are forking radare2, we would stop the contribution to the original project, though we expect patches to be imported from one project to the other for some time. In some cases, like a discovery of security vulnerabilities in mutual code, we would love to notify the radare2 team so users of the project will be protected.\nCan I take part and contribute to Rizin? Absolutely! We are thrilled to help you start and join Rizin. Please read our initial documentation for new contributors. Please join our Mattermost chat or #rizindev IRC channel on Libera.Chat! We hope to create better on-boarding guides for new contributors in the coming months, but for the meantime, we are here for any question you have.\nWhat actions will you make to keep Rizin a safe environment for contributors and users? The Rizin organization believes that contributors, developers and users should enjoy their time around the community and feel safe and welcome. We adopted a Code of Conduct that we believe is aligned with our values and with the community we want to create around Rizin. We will enforce it on our different platforms.\nWe started efforts of cleaning the source code from offensive phrases and comments. In addition, we will put efforts in creating a more inclusive and diverse community and welcome new contributors.\nFinally, we created the concept of teams that will be responsible for different aspects of Rizin. Such teams will also include a Community team that, among other things, will be an address for requests and complaints from community members.\nWhat is the future of Rizin? We intend to make Rizin a stable project you can trust for your reverse engineering tasks and a welcoming environment where people can work together on something they care. We will release a roadmap with the features we want to work on and the direction we will take. In the short run, you can expect refinements to the new projects and to the shell.\nHow to pronounce \u0026ldquo;Rizin\u0026rdquo;? Thanks for asking! Your browser does not support the audio element. I have more questions, where can I ask? We would love to answer your question. You can send us a message on Mattermost or email us. Please note that we do not guarantee to answer all questions, as some topics are personal or we prefer to keep for ourselves.\n","permalink":"https://rizin.re/posts/faq/","summary":"Who are you? Why did you fork radare2? What will happen to Cutter now? Our answers to your frequently asked questions.","title":"Frequently Asked Questions"},{"content":"TL;DR Jump to the Ideas list.\nIntroduction This year, we participate again, effectively continuing the tradition since 2015.\nMentors Members of the Rizin and Cutter core teams have volunteered to guide participants for GSoC’25. They have already been guiding the participants for the GSoC and RSoC in past years. Please feel free to reach out to any of them if you need any help in selecting a project.\nAnton Kochkov Mattermost: xvilka \u0026ndash; @akochkov Florian Märkl Mattermost/Telegram: @thestr4ng3r \u0026ndash; @thestr4ng3r Giovanni Dante Grazioli Mattermost/Telegram: @deroad @der0ad And many others Development methodology Currently, all repositories are hosted on GitHub main organization account and bugs are tracked on GitHub issues too. We are primarily using our own Mattermost instance, IRC, and Telegram) for communication. We have a testsuite (which is running on GitHub Actions, Travis CI, AppVeyor and SourceHut) to test and verify that all the features are still working and that a pull requests or commits don\u0026rsquo;t break anything, to ensure the support of different operating systems (Linux, MacOS, Windows, FreeBSD, OpenBSD), different architectures (x86/x86_64, ARM64, PowerPC, SystemZ), and to find regressions. We encourage contributors to write test cases and documentation in order to verify the implementation and ensure that everything fits well together. For complex bugs and examples, we\u0026rsquo;re using ASCIInema to record the sessions.\nSee also our guides for corresponding projects:\nRizin Contributing Guide and Developers Intro Cutter Contributing Guide and Developers Intro For those who want to get introduced to the Rizin codebase and practices, we recommend to pick one of the easy issues for Rizin or Cutter to start with.\nLicense Rizin is modular: this means that it aims to make all the elements and features easily reusable from other projects. The choice of LGPL3 as a license is the minimum requirement to get code merged in Rizin. Contributors can choose Apache, BSD, MIT, Public Domain, or other similar licenses. The reason to exclude GPL as a valid license for the project is because we aim to support proprietary software that uses Rizin while protecting our free codebase.\nInstructions for participants Participants who want to apply to the Rizin project for the Google Summer of Code 2025 are required to submit a small pull request accomplishing one of the microtasks (see below) as part of their application. You can also choose any of the GitHub issues for Rizin if they are big enough to be a qualification task and still small enough to be finished in no more than a couple of weeks. To help participants understand how to contribute to the project, there are issues marked as \u0026ldquo;good first issue\u0026rdquo; for both Rizin and Cutter.\nProgramming languages Most of Rizin is written in C (conforming to the C99 standard), and hence, we expect participants to be familiar with C programming language. For some of our tasks or microtasks, such as rz-pm, they should know the Go programming language. For the Cutter tasks, it is a requirement to know C++ and Qt framework basics.\nRecommended steps Read Google\u0026rsquo;s instructions for participating Grab any of the projects from the list of ideas that you\u0026rsquo;re interested in (or propose your own). Write a first draft proposal using Google Docs and our template and ask one of the mentors or administrators to review it with you. Submit it using Google\u0026rsquo;s web interface. Participant proposal guidelines Keep it simple enough to fit in no more than a couple of pages. Try to be clear and concise in your writing. Try to split the entire GSoC period into tasks and each task into subtasks. It helps us understand how you plan to accomplish your goals, but more importantly, it\u0026rsquo;ll help you understand the task deep enough before starting and prioritize important things to do first. Please note how much time a day/week you can spend on this project. Please specify which category you apply for - medium task or extended deadline one. Specify your timezone so we can assign you a mentor in the same one to ease communication. Submit your proposal early, not at the last minute! Be sure to choose a “backup” idea (the second task you want to do) so that conflicts (two participants for one task) can be resolved. Project Ideas Cutter Improving usability and user experience (175 hour project) The Cutter\u0026rsquo;s backend provides many features that are not exposed or exposed in Cutter efficiently. The goal of this task would be to figure out the users\u0026rsquo; biggest pain points and address them by improving or reworking the interface. Some of the issues are already in our GitHub, while others might be figured during the cross-comparison with other tools.\nTask Add a scrollbar to the disassembly and hexdump widgets Better syntax highlight and theming Managing window/widget overlays Add information about status of the analysis, signature searching, and other operations Address various small UI problems that make user\u0026rsquo;s life harder than necessary Skills The participant should be comfortable with the C++ and be familiar with Qt framework. Basics of the design/UX would be a plus.\nDifficulty Advanced\nBenefits for the participant The participant will gain an experience of creating comfortable and efficient user interface with C++/Qt.\nBenefits for the project It will make interface and user experience more consistent, on par with Rizin itself, and other tools.\nAssess requirements for midterm/final evaluation 1st term: Add scrollbar to necessary widget, improve theming and syntax highlight Final term: Managing widgets layouts, docking; provide action status information Mentors thestr4ng3r xvilka Megabeets Links/Resources User Experience project for Cutter User Experience project for Rizin Plugins and Python High Level API (175 hour project) Our current public API to be used by plugin authors is somewhat limited. We need to improve a lot of things about our Plugins support and take it few steps ahead. This task is only about improving the public C++ and Python interface of Cutter, specifically its graphical user interface components. For a task about exposing Rizin\u0026rsquo;s API for disassembly, analysis and other purposes, see the Rizin bindings task above.\nTask Expose everything Cutter can offer for plugin authors. This includes high level API, integration of the plugin management etc. Accessing everything from Python (like Blender) - see issue #1662 Python integration and IPython console. Skills The participant should be comfortable with the C++ and Python languages, and be familiar with Qt framework\nDifficulty Advanced\nBenefits for the participant The participant will gain an experience of creating a suitable API for scripting graphical interface programs.\nBenefits for the project It will greatly improve the scripting experience, will make API more consistent and will ease creating Cutter plugins by the community. Moreover, it will simplify testing of the Cutter features.\nAssess requirements for midterm/final evaluation 1st term: Design of the high level API and required Rizin changes. Review and implement all missing API functions that are accessible as interface controls. Final term: Implement the way to show the API when hovered over some interface control, create documentation. Mentors thestr4ng3r Megabeets Links/Resources SDB Module/API for Cutter Python/Jupyter integration Jupyter plugin for Cutter Multi-Tasking and Event-driven architecture (350 hour project) The information Cutter gets about functions, strings, imports, and the analysis are all performed in Rizin and only displayed in Cutter. Currently, it is pulling most information from Rizin only on demand. This is problematic because sometimes the user performs changes (via plugins, the console widget, and more) that are affecting the information from Rizin, but Cutter doesn\u0026rsquo;t know about these changes to apply the to the UI. For example, if a user will define a new function in a Python script or via the console widget by using the Rizin command af @ \u0026lt;addr\u0026gt;, Cutter will not show this new function in the Functions widget until the user will refresh the interface manually (Edit -\u0026gt; Refresh Contents). The goal of this task is to use an event-driven architecture to overcome this limitation.\nIn addition, this task will also handle the analysis in the background feature, to allow the analysis performed by Rizin to happen while the interface is active.\nTasks The overall implementation of this task should start from Rizin by adding events to many of the functions. This can be done using rz_events. For example, add an even for function creating, for section creation, for flag deletion, for name changed, and more\nAdd events to all the relevant functions inside Rizin Add support for these events in Cutter and refresh and update the relevant widgets per each event Support analysis in the background and allow the user to start its session while Rizin is analyzing (see #1856, #1574) Skills The participant should be comfortable with the C++ for Cutter and C for Rizin. They should also be familiar with Qt framework. Experience in GUI code architecture, for example using functional reactive programming or Elm-like approaches is a plus.\nDifficulty Advanced\nBenefits for the participant The participant will gain an experience of creating complex event-driven software in both C and C++ languages.\nBenefits for the project It will allow to work on big files effortlessly in Cutter, will improve analysis quality as well.\nAssess requirements for midterm/final evaluation 1st term: Implement events everywhere in the relevant places across Rizin code and event-driven interaction with Cutter. Final term: Add support for the Cutter interface refresh based on the events from Rizin, implement analysis in background. Mentors thestr4ng3r Heap viewer completion (175 hour project) Thanks to the work that was done in the previous GSoC, Cutter and Rizin have nice visualizations of the heap and memory maps. We would like to expand on this feature with performance improvements to the heap parsers and support more memory allocators.\nTask Complete Cutter\u0026rsquo;s implementation of the windows heap widget #2723 Improve the performance of the Windows heap parser Fix Windows heap parsing errors Make the implementation work with remote debugging modes Skills The participant should be comfortable with the C++, and be familiar with Qt framework\nDifficulty Medium\nBenefits for the participant The participant will gain the understanding on how modern runtimes provide the heap for various programs, which will be beneficial for the binary exploitation skills.\nBenefits for the project It will greatly improve the debugging and reverse engineering experience for complex programs, also provides the way to design the exploitation techniques with the help of Rizin/Cutter.\nAssess requirements for midterm/final evaluation 1st term: Design and implement heap visualization widgets, add Rizin test and fixes Final term: Various bugfixes related to the heap inspection support on various platforms and allocators, tests and documentation. Mentors xvilka Megabeets Links/Resources Issue #1041 Heap Viewer plugin for IDA Pro Heap parsing for MacOS, tmalloc, jmalloc Dynamic Allocator Detection \u0026ldquo;heap\u0026rdquo;-marked Rizin issues Diffing mode (175 hour project) Binary diffing is one of the most common tasks for the reverse engineer. There are many tools available, but most of them are either detached from the main RE toolbox or poorly integrated. Rizin provides basic diffing features out of the box with rz-diff tool, but Cutter has no interface to represent similar functionality.\nTask Expose basic rz-diff features in the Cutter Create the interface to choose two files for diffing Create the way to show the differences in all main widgets: Hexadecimal view Disassembly view Graph view Pseudocode view Skills The participant should be comfortable with the C++ language, and be familiar with Qt framework\nDifficulty Medium\nBenefits for the participant The participant will gain an experience of creating efficient graphical interfaces.\nBenefits for the project It will greatly benefit the project since Cutter will be the only FOSS RE tool to provide this feature out of the box.\nAssess requirements for midterm/final evaluation 1st term: Expose the rz-diff features in the Cutter core and create the interface for opening files for diffing. Implement the diff modes for hexadecimal and disassembly views. Final term: Implement the diff modes for graph and pseudocode views, create the documentation. Mentors xvilka deroad Links/Resources Issue #1104 BinDiff Diaphora Rizin Debugger improvements and portability (175 hour project) Rizin debugger already supports most of the platforms, including native and remote debugging. Nevertheless, for most platforms it\u0026rsquo;s limited mostly to the x86/x86_64 and ARMv8, often lacking the tests. The task would be to add missing architectures to the native debugger, e.g. MIPS to the Linux Native, ARMv7/ARMv8 to the FreeBSD, System Z debugger for Linux, HPPA debugger for Linux, VAX debugger for NetBSD, and so on. Moreover, some information isn\u0026rsquo;t available during the debugging mode, e.g. source-level breakpoints or names, it would be necessary to make sure debug commands understand those.\nWith the help of emulators like QEMU and OpenSIMH we could extend our CI to automatically test these debuggers.\nTask Integrated source-level information loaded from DWARF or PDB into debug commands and print p commands Support for missing architectures that are supported by Rizin statically in the Linux native debugger Support for missing architectures that are supported by Rizin statically in the BSD native debugger Cover more platforms supported by the debugger with automated tests, with CI whenever it\u0026rsquo;s possible Fix the bugs in debuggers, minor refactorings of the code Skills Good knowledge of the C language Some experience in debugging with GDB or LLDB Difficulty Hard\nBenefits for the participant Participant will understand how debugging works on the low level, and will gain experience with variety of different platforms and operating systems.\nAssess requirements for midterm/final evaluation 1st term: `SystemZ, MIPS, HPPA support in Linux native, remote GDB debuggers Final term: ARM and SPARC support in *BSD debuggers, VAX support in NetBSD Mentors xvilka thestr4ng3r ret2libc Links/Resources Debug-labeled issues RzDebug-labaled issues New Platform support New Architecture support FRIDA integration (175 hour project) FRIDA is the famous dynamic instrumentation toolkit that is immensely popular among mobile device researches. Rizin could be easily integrated with Frida by creating a plugin that will allow to connect to the Frida instance, receive traces, set breakpoints, get information and events from it.\nTask Create the basic plugin that allows attaching, spwaning, launching processes within Frida loco ally Support remote connection Add feature to receive information from the Frida instanced Add breakpoints and run/step/continue feature\u0026rsquo;s Support calling functions and scripts in the context of the instrumented process Skills Participant should know C as well as have the experience of working with debuggers.\nDifficulty Hard\nBenefits for the participant Participant will understand and learn how to use Frida toolkit, also the internals of the debugging and instrumentation processes.\nAssess requirements for midterm/final evaluation 1st term: Implement core of the FRIDA plugin, allowing local and remote debugging features Final term: Add support for extended features like calling functions or scripts within the context Mentors xvilka thestr4ng3r wargio Links/Resources FRIDA FRIDA (GitHub) r2frida Exploitation capabilities improvements (175 hour project) Since modern architectures are now enforcing W^X, exploiters are using ROP. (Un)fortunately, building ROP chain by hand can be tedious, this is why some tools can be used to ease this construction: ImmunityDBG has mona.py, there is also ROPgadget and dropper.There exist even tools that can generate ROP chains automatically, for example exrop. It\u0026rsquo;s a shame that despite having RzIL, Rizin doesn\u0026rsquo;t have something similar yet. One of the possible solutions would be to build an external plugin or tool which will reuse power of librz and rz-gg. Moreover it makes sense to think about SROP, COOP and BROP support.\nThe last year (GSoC'24) one of our participants started implementing this feature, but it wasn\u0026rsquo;t finished. You could check the rz-solver repository for more details.\nAlso, the rz-gg tool while has the ability to create a custom shellcode but there is still a lot of work required.\nTask Fix rz-gg issues Write a compiler which uses SMT solver (like Z3 for example) to produce the ropchain: #4563. Support main architectures - x86, ARM, MIPS, PowerPC at the very least Skills The participant should be comfortable with the C language, know some assembly and a high-level language. Also, knowing a little bit of automatic binary analysis wouldn’t hurt.\nDifficulty Advanced\nBenefits for the participant The participant will improve their skills in software exploitation and solvers.\nBenefits for the project This feature would greatly help during exploits development, and people would be able to ditch mona.py for Rizin ;)\nAssess requirements for evaluation 1st term: Creating the language for defining the ROP chain semantics and integrating it with SMT solver Final term: Working ropchain compiler, covered by tests and documented in the Rizin book. Mentors xvilka ret2libc Links/Resources ROPGadget Ropper Angrop ROPC exrop roper2 mona.py from corelan Hunting for ROP Gadgets in Style (2012) dropper a BARF-based rop chain generator Materials about the exloitation workshop at Hack.lu 2014 Slides for the exploitation part of workshop at Hack.lu 2015 RzEgg related bugs Binary case reduce tool Similar to Csmith/Creduce but operating on the binary files, to reduce the size of the test and to avoid sharing proprietary/classified files.\nIt can perform these operations:\ncut bytes shift zero/0xFF/mask bytes remove section Since it requires some knowledge of the file format, existing libraries like LIEF could be used.\nLinks/Resources https://github.com/rizinorg/ideas/issues/52 Microtasks When taking any of microtasks please be sure someone isn\u0026rsquo;t already working on them, and let us know if you are going to work on a particular one.\nCutter UX improvements There are many small issues and missing features that when implemented will improve the user experience significantly:\nAllow adding new flags from hexdump Scrollbar inside disasssembly windows Variables and values popup widgets on mouse hover Allow to set RzRun profiles from the GUI during debugging Double-click on the type in Disasm and Graph widgets should switch to the Types windows and show the selected type Set breakpoint inside X-Refs window Unified dialogue to set debug symbols servers See full list at our User Experience project covering all parts of RizinOrg: Rizin, Cutter, RzGhidra, rz-pm.\nFile formats Implementing the support for any new file format counts as a microtask. See New File-Format label for pending issues.\nDisassemblers and assemblers Implementing the support for any new architecture counts as a microtask. See New-Architecture label for pending issues.\nTwo notable examples are updating existing bytecode plugins to support newer versions of the respective languages:\nSupport for the Lua 5.2 language changes Support for the Python 3.11 and 3.12 language changes Analysis The current code analysis has many caveats and issues which need addressing. Fixing them and writing more tests is important to stabilize and enhance rizin\u0026rsquo;s analysis engine.\nSee these issues on our GitHub dashboard.\nHeap analysis #157 Currently Rizin has support for heap exploration and analysis, but the feature is still basic and can be improved. Additionally, other allocators can be added (MacOS, tmalloc, etc.), but this should be done after a proper refactoring, because heap analysis shouldn\u0026rsquo;t depend on the debugger backend, and we may be able to use different heap tools.\nClass analysis for C++/ObjectiveC/Swift/Dlang/Java #416 Analysis classes, accessible under the ac command, is a relatively new feature of rizin. They provide a way to both manually and automatically manage and use information about classes in the binary.\nDevirtualize method calls using class vtables #414 Consider the following call: call dword [eax + 0x6c] Let\u0026rsquo;s assume eax is the base pointer of a vtable we have saved in class analysis and we want to find out the actual address of the called method.\nSo there should be a command that takes the offset (in this case 0x6c) and looks up the actual destination. It should be possible to call this command with a specific class, so it only looks into its vtable, or without a class, so it gives a list of possible destinations for all vtables that are not too small for the offset.\nWhen that is implemented, one could also add a command that does the same thing, but automatically takes the offset from the opcode at the current seek.\nAdd classes list to Vb Vb already supports browsing bin classes. The same thing should be implemented for classes from analysis.\nRizin legacy code refactoring Miscellaneous Improving regression suite and testing It is required to solve numerous issues, along with improving parallel execution and performance. The next interesting idea is to setup and reuse Godbolt compilation engine for generating tests for different compilers and compilation options. There is even a command line tool for interacting with Godbolt - cce.\nUnbreaking broken tests Almost one thousand of tests marked as \u0026ldquo;broken\u0026rdquo; in our testsuite. The task is to take any of those, investigate why it fails, if the test makes sense now or already irrelevant today. Then to try to fix some of the broken tests.\nRzGhidra There are many small issues in the decompiler output:\npdgsd commands showing incorrect P-code Improvements in recovering jump tables rz-ghidra can\u0026rsquo;t detect string Ghidra Decompiler Error: Could not finish collapsing block structure Prioritize keeping vars with lower addresses Minor improvements for the SLEIGH plugin Some of these issues might be related on how Rizin and RzGhidra integrate and might require changes in the Rizin side.\nAlso note that most of these issues should be paired with the test to verify it will not break in the future.\n","permalink":"https://rizin.re/gsoc/2025/","summary":"TL;DR Jump to the Ideas list.\nIntroduction This year, we participate again, effectively continuing the tradition since 2015.\nMentors Members of the Rizin and Cutter core teams have volunteered to guide participants for GSoC’25. They have already been guiding the participants for the GSoC and RSoC in past years. Please feel free to reach out to any of them if you need any help in selecting a project.\nAnton Kochkov Mattermost: xvilka \u0026ndash; @akochkov Florian Märkl Mattermost/Telegram: @thestr4ng3r \u0026ndash; @thestr4ng3r Giovanni Dante Grazioli Mattermost/Telegram: @deroad @der0ad And many others Development methodology Currently, all repositories are hosted on GitHub main organization account and bugs are tracked on GitHub issues too.","title":"GSoC 2025"},{"content":"TL;DR Jump to the Ideas list.\nIntroduction This year, we participate again, effectively continuing the tradition since 2015.\nMentors Members of the Rizin and Cutter core teams have volunteered to guide participants for GSoC’24. They have already been guiding the participants for the GSoC and RSoC in past years. Please feel free to reach out to any of them if you need any help in selecting a project.\nAnton Kochkov Mattermost: xvilka \u0026ndash; @akochkov Riccardo Schirone Mattermost: ret2libc @RickySkiro Florian Märkl Mattermost/Telegram: @thestr4ng3r \u0026ndash; @thestr4ng3r Itay Cohen Mattermost/Telegram: @Megabeets @Megabeets_ Giovanni Dante Grazioli Mattermost/Telegram: @deroad @der0ad And many others Development methodology Currently, all repositories are hosted on GitHub main organization account and bugs are tracked on GitHub issues too. We are primarily using our own Mattermost instance, IRC, and Telegram) for communication. We have a testsuite (which is running on GitHub Actions, Travis CI, AppVeyor and SourceHut) to test and verify that all the features are still working and that a pull requests or commits don\u0026rsquo;t break anything, to ensure the support of different operating systems (Linux, MacOS, Windows, FreeBSD, OpenBSD), different architectures (x86/x86_64, ARM64, PowerPC, SystemZ), and to find regressions. We encourage contributors to write test cases and documentation in order to verify the implementation and ensure that everything fits well together. For complex bugs and examples, we\u0026rsquo;re using ASCIInema to record the sessions.\nSee also our guides for corresponding projects:\nRizin Contributing Guide and Developers Intro Cutter Contributing Guide and Developers Intro For those who want to get introduced to the Rizin codebase and practices, we recommend to pick one of the easy issues for Rizin or Cutter to start with.\nLicense Rizin is modular: this means that it aims to make all the elements and features easily reusable from other projects. The choice of LGPL3 as a license is the minimum requirement to get code merged in Rizin. Contributors can choose Apache, BSD, MIT, Public Domain, or other similar licenses. The reason to exclude GPL as a valid license for the project is because we aim to support proprietary software that uses Rizin while protecting our free codebase.\nInstructions for participants Participants who want to apply to the Rizin project for the Google Summer of Code 2024 are required to submit a small pull request accomplishing one of the microtasks (see below) as part of their application. You can also choose any of the GitHub issues for Rizin if they are big enough to be a qualification task and still small enough to be finished in no more than a couple of weeks. To help participants understand how to contribute to the project, there are issues marked as \u0026ldquo;good first issue\u0026rdquo; for both Rizin and Cutter.\nProgramming languages Most of Rizin is written in C (conforming to the C99 standard), and hence, we expect participants to be familiar with C programming language. For some of our tasks or microtasks, such as rz-pm, they should know the Go programming language. For the Cutter tasks, it is a requirement to know C++ and Qt framework basics.\nRecommended steps Read Google\u0026rsquo;s instructions for participating Grab any of the projects from the list of ideas that you\u0026rsquo;re interested in (or propose your own). Write a first draft proposal using Google Docs and our template and ask one of the mentors or administrators to review it with you. Submit it using Google\u0026rsquo;s web interface. Participant proposal guidelines Keep it simple enough to fit in no more than a couple of pages. Try to be clear and concise in your writing. Try to split the entire GSoC period into tasks and each task into subtasks. It helps us understand how you plan to accomplish your goals, but more importantly, it\u0026rsquo;ll help you understand the task deep enough before starting and prioritize important things to do first. Please note how much time a day/week you can spend on this project. Please specify which category you apply for - medium task or extended deadline one. Specify your timezone so we can assign you a mentor in the same one to ease communication. Submit your proposal early, not at the last minute! Be sure to choose a “backup” idea (the second task you want to do) so that conflicts (two participants for one task) can be resolved. Project Ideas Cutter Improving usability and user experience (175 hour project) The Cutter\u0026rsquo;s backend provides many features that are not exposed or exposed in Cutter efficiently. The goal of this task would be to figure out the users\u0026rsquo; biggest pain points and address them by improving or reworking the interface. Some of the issues are already in our GitHub, while others might be figured during the cross-comparison with other tools.\nTask Add a scrollbar to the disassembly and hexdump widgets Better syntax highlight and theming Managing window/widget overlays Add information about status of the analysis, signature searching, and other operations Address various small UI problems that make user\u0026rsquo;s life harder than necessary Skills The participant should be comfortable with the C++ and be familiar with Qt framework. Basics of the design/UX would be a plus.\nDifficulty Advanced\nBenefits for the participant The participant will gain an experience of creating comfortable and efficient user interface with C++/Qt.\nBenefits for the project It will make interface and user experience more consistent, on par with Rizin itself, and other tools.\nAssess requirements for midterm/final evaluation 1st term: Add scrollbar to necessary widget, improve theming and syntax highlight Final term: Managing widgets layouts, docking; provide action status information Mentors thestr4ng3r xvilka Megabeets Links/Resources User Experience project for Cutter User Experience project for Rizin Plugins and Python High Level API (175 hour project) Our current public API to be used by plugin authors is somewhat limited. We need to improve a lot of things about our Plugins support and take it few steps ahead. This task is only about improving the public C++ and Python interface of Cutter, specifically its graphical user interface components. For a task about exposing Rizin\u0026rsquo;s API for disassembly, analysis and other purposes, see the Rizin bindings task above.\nTask Expose everything Cutter can offer for plugin authors. This includes high level API, integration of the plugin management etc. Accessing everything from Python (like Blender) - see issue #1662 Python integration and IPython console. Skills The participant should be comfortable with the C++ and Python languages, and be familiar with Qt framework\nDifficulty Advanced\nBenefits for the participant The participant will gain an experience of creating a suitable API for scripting graphical interface programs.\nBenefits for the project It will greatly improve the scripting experience, will make API more consistent and will ease creating Cutter plugins by the community. Moreover, it will simplify testing of the Cutter features.\nAssess requirements for midterm/final evaluation 1st term: Design of the high level API and required Rizin changes. Review and implement all missing API functions that are accessible as interface controls. Final term: Implement the way to show the API when hovered over some interface control, create documentation. Mentors thestr4ng3r Megabeets Links/Resources SDB Module/API for Cutter Python/Jupyter integration Jupyter plugin for Cutter Multi-Tasking and Event-driven architecture (350 hour project) The information Cutter gets about functions, strings, imports, and the analysis are all performed in Rizin and only displayed in Cutter. Currently, it is pulling most information from Rizin only on demand. This is problematic because sometimes the user performs changes (via plugins, the console widget, and more) that are affecting the information from Rizin, but Cutter doesn\u0026rsquo;t know about these changes to apply the to the UI. For example, if a user will define a new function in a Python script or via the console widget by using the Rizin command af @ \u0026lt;addr\u0026gt;, Cutter will not show this new function in the Functions widget until the user will refresh the interface manually (Edit -\u0026gt; Refresh Contents). The goal of this task is to use an event-driven architecture to overcome this limitation.\nIn addition, this task will also handle the analysis in the background feature, to allow the analysis performed by Rizin to happen while the interface is active.\nTasks The overall implementation of this task should start from Rizin by adding events to many of the functions. This can be done using rz_events. For example, add an even for function creating, for section creation, for flag deletion, for name changed, and more\nAdd events to all the relevant functions inside Rizin Add support for these events in Cutter and refresh and update the relevant widgets per each event Support analysis in the background and allow the user to start its session while Rizin is analyzing (see #1856, #1574) Skills The participant should be comfortable with the C++ for Cutter and C for Rizin. They should also be familiar with Qt framework. Experience in GUI code architecture, for example using functional reactive programming or Elm-like approaches is a plus.\nDifficulty Advanced\nBenefits for the participant The participant will gain an experience of creating complex event-driven software in both C and C++ languages.\nBenefits for the project It will allow to work on big files effortlessly in Cutter, will improve analysis quality as well.\nAssess requirements for midterm/final evaluation 1st term: Implement events everywhere in the relevant places across Rizin code and event-driven interaction with Cutter. Final term: Add support for the Cutter interface refresh based on the events from Rizin, implement analysis in background. Mentors thestr4ng3r Heap viewer completion (175 hour project) Thanks to the work that was done in the previous GSoC, Cutter and Rizin have nice visualizations of the heap and memory maps. We would like to expand on this feature with performance improvements to the heap parsers and support more memory allocators.\nTask Complete Cutter\u0026rsquo;s implementation of the windows heap widget #2723 Improve the performance of the Windows heap parser Fix Windows heap parsing errors Make the implementation work with remote debugging modes Skills The participant should be comfortable with the C++, and be familiar with Qt framework\nDifficulty Medium\nBenefits for the participant The participant will gain the understanding on how modern runtimes provide the heap for various programs, which will be beneficial for the binary exploitation skills.\nBenefits for the project It will greatly improve the debugging and reverse engineering experience for complex programs, also provides the way to design the exploitation techniques with the help of Rizin/Cutter.\nAssess requirements for midterm/final evaluation 1st term: Design and implement heap visualization widgets, add Rizin test and fixes Final term: Various bugfixes related to the heap inspection support on various platforms and allocators, tests and documentation. Mentors xvilka Megabeets Links/Resources Issue #1041 Heap Viewer plugin for IDA Pro Heap parsing for MacOS, tmalloc, jmalloc Dynamic Allocator Detection \u0026ldquo;heap\u0026rdquo;-marked Rizin issues Diffing mode (175 hour project) Binary diffing is one of the most common tasks for the reverse engineer. There are many tools available, but most of them are either detached from the main RE toolbox or poorly integrated. Rizin provides basic diffing features out of the box with rz-diff tool, but Cutter has no interface to represent similar functionality.\nTask Expose basic rz-diff features in the Cutter Create the interface to choose two files for diffing Create the way to show the differences in all main widgets: Hexadecimal view Disassembly view Graph view Pseudocode view Skills The participant should be comfortable with the C++ language, and be familiar with Qt framework\nDifficulty Medium\nBenefits for the participant The participant will gain an experience of creating efficient graphical interfaces.\nBenefits for the project It will greatly benefit the project since Cutter will be the only FOSS RE tool to provide this feature out of the box.\nAssess requirements for midterm/final evaluation 1st term: Expose the rz-diff features in the Cutter core and create the interface for opening files for diffing. Implement the diff modes for hexadecimal and disassembly views. Final term: Implement the diff modes for graph and pseudocode views, create the documentation. Mentors xvilka deroad Links/Resources Issue #1104 BinDiff Diaphora Rizin RzIL uplifting migration (350 hour project) Rizin has had an intermediate language for over a decade. Major architectures are supporting uplifting to ESIL. During the RSoC 2021, the initial version of the new intermediate language, which is based on the BAP\u0026rsquo;s Core Theory was implemented. In the following years it was improved and some of the architectures were ported to use RZIL instead of ESIL. The main goal of this project is to finish the migration of one or more existing architectures that still use ESIL or add a RzIL support for the architectures that hadn\u0026rsquo;t any uplifting at all.\nTasks Implement a RZIL uplifting for any non-trivial architecture, preferably that is supported by ESIL already Improve the integration with analysis (variables and types differences) for the chosen architecture Write the test cases for Rizin regression tests and improve the results. Update and use rz-tracetest for the chosen architectures Implement necessary commands and APIs in Rizin for visual representation of the IL tree Implement standard and graph views in Cutter for the IL output (optional) Due to the sensivity of uplifting to the precision, it\u0026rsquo;s important to follow these steps:\nFor every single lifted opcode, have at the very least one asm test in test/db/asm/... containing the IL to detect changes when the code is changed and have it type-checked. This should produce 100% coverage on the lifter C code (except e.g. malloc() error handling). Run rz-tracetest on real traces. It\u0026rsquo;s also possible to write custom assembly programs that execute specific obscure instructions where it\u0026rsquo;s hard to be sure that they were implemented correctly on many random inputs and then feed these executions into rz-tracetest. Few rz-test command tests that emulate some code snippets in rizin. For example a simple decryption loop to check the overall integration in rizin, or for specific edge cases (like running a division by zero). Skills The participant should know C and bits of C++ as well as be familiar with basics of the program analysis. Having an experience with other intermediate language, SAT/SMT, and mathematical logic is a plus.\nDifficulty Medium\nBenefits for the participant The participant will understand the state of the art of intermediate languages research, it\u0026rsquo;s relation to the mathematical logic, SMT, and program analysis. Moreover, the participant will become familiar with the both symbolic and concrete emulation during the implementation process.\nBenefits for the project Migrating most architectures will help to deprecate and remove outdated ESIL and will help improving the analysis precision. Adding uplifting for new architectures that weren\u0026rsquo;t even supported by ESIL will improve the analysis to even greater degree.\nAssess requirements for midterm/final evaluation 1st term: finish the RZIL uplifting for the chosen architecture with basic instruction-level tests Final term: implement all changes in the analysis code, added more complex integration tests with types analysis Mentors xvilka thestr4ng3r Links/Resources RZIL-labeled issues ESIL-labeled issues ESIL to RZIL conversion tracking issue Cutter: IL output and graph visual representation Debugger improvements and portability (175 hour project) Rizin debugger already supports most of the platforms, including native and remote debugging. Nevertheless, for most platforms it\u0026rsquo;s limited mostly to the x86/x86_64 and ARMv8, often lacking the tests. The task would be to add missing architectures to the native debugger, e.g. MIPS to the Linux Native, ARMv7/ARMv8 to the FreeBSD, System Z debugger for Linux, HPPA debugger for Linux, VAX debugger for NetBSD, and so on. Moreover, some information isn\u0026rsquo;t available during the debugging mode, e.g. source-level breakpoints or names, it would be necessary to make sure debug commands understand those.\nWith the help of emulators like QEMU and OpenSIMH we could extend our CI to automatically test these debuggers.\nTask Integrated source-level information loaded from DWARF or PDB into debug commands and print p commands Support for missing architectures that are supported by Rizin statically in the Linux native debugger Support for missing architectures that are supported by Rizin statically in the BSD native debugger Cover more platforms supported by the debugger with automated tests, with CI whenever it\u0026rsquo;s possible Fix the bugs in debuggers, minor refactorings of the code Skills Good knowledge of the C language Some experience in debugging with GDB or LLDB Difficulty Hard\nBenefits for the participant Participant will understand how debugging works on the low level, and will gain experience with variety of different platforms and operating systems.\nAssess requirements for midterm/final evaluation 1st term: `SystemZ, MIPS, HPPA support in Linux native, remote GDB debuggers Final term: ARM and SPARC support in *BSD debuggers, VAX support in NetBSD Mentors xvilka thestr4ng3r ret2libc Links/Resources Debug-labeled issues RzDebug-labaled issues New Platform support New Architecture support FRIDA integration (175 hour project) FRIDA is the famous dynamic instrumentation toolkit that is immensely popular among mobile device researches. Rizin could be easily integrated with Frida by creating a plugin that will allow to connect to the Frida instance, receive traces, set breakpoints, get information and events from it.\nTask Create the basic plugin that allows attaching, spwaning, launching processes within Frida loco ally Support remote connection Add feature to receive information from the Frida instanced Add breakpoints and run/step/continue feature\u0026rsquo;s Support calling functions and scripts in the context of the instrumented process Skills Participant should know C as well as have the experience of working with debuggers.\nDifficulty Hard\nBenefits for the participant Participant will understand and learn how to use Frida toolkit, also the internals of the debugging and instrumentation processes.\nAssess requirements for midterm/final evaluation 1st term: Implement core of the FRIDA plugin, allowing local and remote debugging features Final term: Add support for extended features like calling functions or scripts within the context Mentors xvilka thestr4ng3r wargio Links/Resources FRIDA FRIDA (GitHub) r2frida Rewriting GPL-only code (175 hour project) Currently some of the Rizin main features rely on the GPL-only code copied from binutils or GCC. The goal is to rewrite all this code from GPL-only to LGPL or any other less restrictive license. It is quite important for better adoption of Rizin as a library in other FOSS and commercial projects.\nTasks Rewrite C++ demangler to and remove the GPL code Rewrite some of the mainstream architectures that still rely on binutils without using GPL-only code Good example of such architectures are:\nSPARC (there is already capstone-based RzAsm and RzAnalysis plugin but it\u0026rsquo;s less complete than binutils-based one, capstone update is also required) Xtensa (It\u0026rsquo;s better to implement/update it in Capstone) ARC (Same, better to implement/update it in Capstone) Lanai CRIS VAX Skills Participant should know C and basics of C++ for understanding the mangling scheme\nDifficulty Medium\nBenefits for the participant Participant will understand how C++ type information is stored in the name of the methods and classes.\nAssess requirements for midterm/final evaluation 1st term: Basic demangling for C++ is rewritten under less restrictive license. Final term: At least one binutils-based architecture is reimplemented with more permissive license. Mentors xvilka thestr4ng3r wargio Links/Resources rizin: rewrite/remove GPL-only code rz-libdemangle: rewrite/remove GPL-only code Update binutils code to latest Exploitation capabilities improvements (175 hour project) Since modern architectures are now enforcing W^X, exploiters are using ROP. (Un)fortunately, building ROP chain by hand can be tedious, this is why some tools can be used to ease this construction: ImmunityDBG has mona.py, there is also ROPgadget and dropper.There exist even tools that can generate ROP chains automatically, for example exrop. It\u0026rsquo;s a shame that despite having RzIL, Rizin doesn\u0026rsquo;t have something similar yet. One of the possible solutions would be to build an external plugin or tool which will reuse power of librz and rz-gg. Moreover it makes sense to think about SROP, COOP and BROP support.\nThe rz-gg tool while has the ability to create a custom shellcode has the outdated database of the shellcodes, so updating them is crucial for the tool to be relevant.\nTask Update the shellcodes database, improve rz-gg features and documentation Implement a ropchain syntax parser that uses rz-gg or a custom DSL, something like: register reg1 = 0; register reg2 = whatever; register reg3 = reg1 + reg2; system(reg3); Write a compiler which uses SMT solver (like Z3 for example) to produce the ropchain. Support main architectures - x86, ARM, MIPS, PowerPC Skills The participant should be comfortable with the C language, know some assembly and a high-level language. Also, knowing a little bit of automatic binary analysis wouldn’t hurt.\nDifficulty Advanced\nBenefits for the participant The participant will improve their skills in software exploitation and solvers.\nBenefits for the project This feature would greatly help during exploits development, and people would be able to ditch mona.py for Rizin ;)\nAssess requirements for evaluation 1st term: Creating the language for defining the ROP chain semantics and integrating it with SMT solver Final term: Working ropchain compiler, covered by tests and documented in the Rizin book. Mentors xvilka ret2libc Links/Resources ROPGadget Ropper Angrop ROPC exrop roper2 mona.py from corelan Hunting for ROP Gadgets in Style (2012) dropper a BARF-based rop chain generator Materials about the exloitation workshop at Hack.lu 2014 Slides for the exploitation part of workshop at Hack.lu 2015 RzEgg related bugs Microtasks When taking any of microtasks please be sure someone isn\u0026rsquo;t already working on them, and let us know if you are going to work on a particular one.\nCutter UX improvements There are many small issues and missing features that when implemented will improve the user experience significantly:\nAllow adding new flags from hexdump Scrollbar inside disasssembly windows Variables and values popup widgets on mouse hover Allow to set RzRun profiles from the GUI during debugging Double-click on the type in Disasm and Graph widgets should switch to the Types windows and show the selected type Set breakpoint inside X-Refs window Unified dialogue to set debug symbols servers See full list at our User Experience project covering all parts of RizinOrg: Rizin, Cutter, RzGhidra, rz-pm.\nFile formats Implementing the support for any new file format counts as a microtask. See New File-Format label for pending issues.\nDisassemblers and assemblers Implementing the support for any new architecture counts as a microtask. See New-Architecture label for pending issues.\nTwo notable examples are updating existing bytecode plugins to support newer versions of the respective languages:\nSupport for the Lua 5.2 language changes Support for the Python 3.11 and 3.12 language changes Analysis The current code analysis has many caveats and issues which need addressing. Fixing them and writing more tests is important to stabilize and enhance rizin\u0026rsquo;s analysis engine.\nSee these issues on our GitHub dashboard.\nHeap analysis #157 Currently Rizin has support for heap exploration and analysis, but the feature is still basic and can be improved. Additionally, other allocators can be added (MacOS, tmalloc, etc.), but this should be done after a proper refactoring, because heap analysis shouldn\u0026rsquo;t depend on the debugger backend, and we may be able to use different heap tools.\nClass analysis for C++/ObjectiveC/Swift/Dlang/Java #416 Analysis classes, accessible under the ac command, is a relatively new feature of rizin. They provide a way to both manually and automatically manage and use information about classes in the binary.\nDevirtualize method calls using class vtables #414 Consider the following call: call dword [eax + 0x6c] Let\u0026rsquo;s assume eax is the base pointer of a vtable we have saved in class analysis and we want to find out the actual address of the called method.\nSo there should be a command that takes the offset (in this case 0x6c) and looks up the actual destination. It should be possible to call this command with a specific class, so it only looks into its vtable, or without a class, so it gives a list of possible destinations for all vtables that are not too small for the offset.\nWhen that is implemented, one could also add a command that does the same thing, but automatically takes the offset from the opcode at the current seek.\nAdd classes list to Vb Vb already supports browsing bin classes. The same thing should be implemented for classes from analysis.\nRizin legacy code refactoring Use internal API instead of commands Currently, Rizin\u0026rsquo;s source code is rife with calls to rz_core_cmd()-like functions that run the Rizin command. While it is a useful shortcut for developer, it makes a good source of the potential bugs in case of the command syntax or behavior change. If these changes happen they are invisible to the compiler, so it cannot warn on the changed syntax. It isn\u0026rsquo;t the case of changed function arguments count or type. Thus, all these calls eventually should be substituted with direct calls to the corresponding API functions. If there is no corresponding API function, then one should be created. Good examples of such cases are:\nRefactor Graph processing from commands to the API use Refactor Visual mode from commands to the API use Refactor Panels mode from commands to the API use In general you can just search for rz_core_cmd pattern in any place inside librz/.\nMiscellaneous Shell (dietline) improvements Currently Rizin uses its own readline-compatible implementation of the input handling in the embedded shell that is compact and portable between all supported platforms. It supports both Emacs and Vi modes, but not all bindings and features are supported. Some are omitted by choice, but some were simply not implemented. See the \u0026ldquo;dietline\u0026rdquo;-labeled issues.\nImproving regression suite and testing It is required to solve numerous issues, along with improving parallel execution and performance. The next interesting idea is to setup and reuse Godbolt compilation engine for generating tests for different compilers and compilation options. There is even a command line tool for interacting with Godbolt - cce.\nUnbreaking broken tests Almost one thousand of tests marked as \u0026ldquo;broken\u0026rdquo; in our testsuite. The task is to take any of those, investigate why it fails, if the test makes sense now or already irrelevant today. Then to try to fix some of the broken tests.\nRzGhidra There are many small issues in the decompiler output:\npdgsd commands showing incorrect P-code Improvements in recovering jump tables rz-ghidra can\u0026rsquo;t detect string Ghidra Decompiler Error: Could not finish collapsing block structure Mishandled tail jump with relocation inside the jump function Prioritize keeping vars with lower addresses Minor improvements for the SLEIGH plugin Some of these issues might be related on how Rizin and RzGhidra integrate and might require changes in the Rizin side.\nAlso note that most of these issues should be paired with the test to verify it will not break in the future.\n","permalink":"https://rizin.re/gsoc/2024/","summary":"TL;DR Jump to the Ideas list.\nIntroduction This year, we participate again, effectively continuing the tradition since 2015.\nMentors Members of the Rizin and Cutter core teams have volunteered to guide participants for GSoC’24. They have already been guiding the participants for the GSoC and RSoC in past years. Please feel free to reach out to any of them if you need any help in selecting a project.\nAnton Kochkov Mattermost: xvilka \u0026ndash; @akochkov Riccardo Schirone Mattermost: ret2libc @RickySkiro Florian Märkl Mattermost/Telegram: @thestr4ng3r \u0026ndash; @thestr4ng3r Itay Cohen Mattermost/Telegram: @Megabeets @Megabeets_ Giovanni Dante Grazioli Mattermost/Telegram: @deroad @der0ad And many others Development methodology Currently, all repositories are hosted on GitHub main organization account and bugs are tracked on GitHub issues too.","title":"GSoC 2024"},{"content":"TL;DR Jump to the Ideas list.\nIntroduction This year is the third time we participate as Rizin, effectively continuing the tradition since the year 2015.\nMentors Members of the Rizin and Cutter core teams have volunteered to guide participants for GSoC’23. They were already guiding the participants for the GSoC and RSoC in past years. Please feel free to reach out to any of them in case you need any help in selecting a project.\nAnton Kochkov Mattermost: xvilka \u0026ndash; @akochkov Riccardo Schirone Mattermost: ret2libc @RickySkiro Florian Märkl Mattermost/Telegram: @thestr4ng3r \u0026ndash; @thestr4ng3r Itay Cohen Mattermost/Telegram: @Megabeets @Megabeets_ Giovanni Dante Grazioli Mattermost/Telegram: @deroad @der0ad And many others Development methodology Currently, all repositories are hosted on GitHub main organization account, bugs are tracked on GitHub issues too. We are mostly using our own Mattermost instance, IRC, and Telegram) for communication. We have a testsuite (which is running on GitHub Actions, Travis CI, AppVeyor and SourceHut) to test and verify that all the features are still working and that a pull requests or commits don\u0026rsquo;t break anything, to ensure the support of different operating systems (Linux, MacOS, Windows, FreeBSD, OpenBSD), different architectures (x86/x86_64, ARM64, PowerPC, SystemZ), and to find regressions. We encourage contributors to write test cases and documentation in order to verify the implementation and ensure that everything fits well together. For complex bugs and examples we\u0026rsquo;re using ASCIInema for recording the sessions.\nSee also our guides for corresponding projects:\nRizin Contributing Guide and Developers Intro Cutter Contributing Guide and Developers Intro For those who want to get introduced to the Rizin codebase and practices, we recommend to pick one of the easy issues for Rizin or Cutter to start with.\nLicense Rizin is modular: this means that it aims to make all the elements and features easily reusable from other projects. The choice of LGPL3 as a license is the minimum requirement to get code merged in Rizin. Contributors can choose Apache, BSD, MIT, Public Domain, or other similar licenses. The reason to exclude GPL as a valid license for the project is because we aim to support proprietary software that uses Rizin, while protecting our free codebase.\nInstructions for participants It is a requirement that participants who want to apply to the Rizin project for the Google Summer of Code 2023 should submit a small pull request accomplishing one of the microtasks (see below) as part of their application. Though you can also choose any of the GitHub issues for Rizin if they are big enough to be a qualification task, and still small enough to be finished no more than in a couple of weeks. To help participants to understand how to contribute to the project there are issues marked as \u0026ldquo;good first issue\u0026rdquo; for both Rizin and Cutter.\nProgramming languages Most of Rizin is written in C (conforming C99 standard) and hence we expect participants to be familiar with C programming language. For some of our tasks or microtasks, such as rz-pm, they should know the Go programming language. For the Cutter tasks, it is a requirement to know C++ and Qt framework basics.\nRecommended steps Read Google\u0026rsquo;s instructions for participating Grab any of the project from the list of ideas that you\u0026rsquo;re interested in (or propose your own). Write a first draft proposal using Google Docs and our template and ask one of the mentors or administrators to review it with you. Submit it using Google\u0026rsquo;s web interface. Participant proposal guidelines Keep it simple enough to fit in no more than a couple of pages. Try to be clear and concise in your writing. Try to split the entire GSoC period into tasks, and each task into subtasks. It helps us to understand how you plan to accomplish your goals, but more importantly, it\u0026rsquo;ll help you to understand the task deep enough before starting, and prioritize important things to do first. Please note how much time a day/week you are able to spend on this project. Please specify which category you apply for - medium task or extended deadline one. Specify your timezone, since so we can assign you a mentor in the same one, to ease communication. Submit your proposal early, not in the last minute! Be sure to choose a “backup” idea (the second task you want to do), so that conflicts (two participants for one task) can be resolved. Project Ideas Cutter Improving usability and user experience (175 hour project) The Cutter\u0026rsquo;s backend provides a lot of features that are not exposed or exposed in Cutter efficiently. The goal of this task would be to figure out the biggest pain points of the users and address them by improving or reworking the interface. Some of the issues are already in our GitHub, while other might be figured during the cross-comparison with other tools.\nTask Add a scrollbar to the disassembly and hexdump widgets Better syntax highlight and theming Managing window/widget overlays Add information about status of the analysis, signature searching, and other operations Address various small UI problems that make user\u0026rsquo;s life harder than necessary Skills The participant should be comfortable with the C++ and be familiar with Qt framework. Basics of the design/UX would be a plus.\nDifficulty Advanced\nBenefits for the participant The participant will gain an experience of creating comfortable and efficient user interface with C++/Qt.\nBenefits for the project It will make interface and user experience more consistent, on par with Rizin itself, and other tools.\nAssess requirements for midterm/final evaluation 1st term: Add scrollbar to necessary widget, improve theming and syntax highlight Final term: Managing widgets layouts, docking; provide action status information Mentors thestr4ng3r xvilka Megabeets Links/Resources User Experience project for Cutter User Experience project for Rizin Plugins and Python High Level API (175 hour project) Our current public API to be used by plugin authors is somewhat limited. We need to improve a lot of things about our Plugins support and take it few steps ahead. This task is only about improving the public C++ and Python interface of Cutter, specifically its graphical user interface components. For a task about exposing Rizin\u0026rsquo;s API for disassembly, analysis and other purposes, see the Rizin bindings task above.\nTask Expose everything Cutter can offer for plugins authors. This includes high level API, integration of the plugin management etc. Accessing everything from Python (like Blender) - see issue #1662 Python integration and IPython console. Skills The participant should be comfortable with the C++ and Python languages, and be familiar with Qt framework\nDifficulty Advanced\nBenefits for the participant The participant will gain an experience of creating a suitable API for scripting graphical interface programs.\nBenefits for the project It will greatly improve the scripting experience, will make API more consistent and will ease creating Cutter plugins by the community. Moreover, it will simplify testing of the Cutter features.\nAssess requirements for midterm/final evaluation 1st term: Design of the high level API and required Rizin changes. Review and implement all missing API functions that are accessible as interface controls. Final term: Implement the way to show the API when hovered over some interface control, create documentation. Mentors thestr4ng3r Megabeets Links/Resources SDB Module/API for Cutter Python/Jupyter integration Jupyter plugin for Cutter Multi-Tasking and Event-driven architecture (350 hour project) The information Cutter gets about functions, strings, imports, and the analysis are all performed in Rizin and only displayed in Cutter. Currently, it is pulling most information from Rizin only on demand. This is problematic because sometimes the user performs changes (via plugins, the console widget, and more) that are affecting the information from Rizin, but Cutter doesn\u0026rsquo;t know about these changes to apply the to the UI. For example, if a user will define a new function in a Python script or via the console widget by using the Rizin command af @ \u0026lt;addr\u0026gt;, Cutter will not show this new function in the Functions widget until the user will refresh the interface manually (Edit -\u0026gt; Refresh Contents). The goal of this task is to use an event-driven architecture to overcome this limitation.\nIn addition, this task will also handle the analysis in the background feature, to allow the analysis performed by Rizin to happen while the interface is active.\nTasks The overall implementation of this task should start from Rizin by adding events to many of the functions. This can be done using rz_events. For example, add an even for function creating, for section creation, for flag deletion, for name changed, and more\nAdd events to all the relevant functions inside Rizin Add support for these events in Cutter and refresh and update the relevant widgets per each event Support analysis in the background and allow the user to start its session while Rizin is analyzing (see #1856, #1574) Skills The participant should be comfortable with the C++ for Cutter and C for Rizin. They should also be familiar with Qt framework. Experience in GUI code architecture, for example using functional reactive programming or Elm-like approaches is a plus.\nDifficulty Advanced\nBenefits for the participant The participant will gain an experience of creating complex event-driven software in both C and C++ languages.\nBenefits for the project It will allow to work on big files effortlessly in Cutter, will improve analysis quality as well.\nAssess requirements for midterm/final evaluation 1st term: Implement events everywhere in the relevant places across Rizin code and event-driven interaction with Cutter. Final term: Add support for the Cutter interface refresh based on the events from Rizin, implement analysis in background. Mentors thestr4ng3r Heap viewer completion (175 hour project) Thanks to the work that was done in the previous GSoC, Cutter and Rizin have nice visualizations of the heap and memory maps. We would like to expand on this feature with performance improvements to the heap parsers and support more memory allocators.\nTask Complete Cutter\u0026rsquo;s implementation of the windows heap widget #2723 Improve the performance of the Windows heap parser Fix Windows heap parsing errors Make the implementation work with remote debugging modes Skills The participant should be comfortable with the C++, and be familiar with Qt framework\nDifficulty Medium\nBenefits for the participant The participant will gain the understanding on how modern runtimes provide the heap for various programs, which will be beneficial for the binary exploitation skills.\nBenefits for the project It will greatly improve the debugging and reverse engineering experience for complex programs, also provides the way to design the exploitation techniques with the help of Rizin/Cutter.\nAssess requirements for midterm/final evaluation 1st term: Design and implement heap visualization widgets, add Rizin test and fixes Final term: Various bugfixes related to the heap inspection support on various platforms and allocators, tests and documentation. Mentors xvilka Megabeets Links/Resources Issue #1041 Heap Viewer plugin for IDA Pro Heap parsing for MacOS, tmalloc, jmalloc Dynamic Allocator Detection \u0026ldquo;heap\u0026rdquo;-marked Rizin issues Diffing mode (175 hour project) Binary diffing is one of the most common tasks for the reverse engineer. There are many tools available, but most of them are either detached from the main RE toolbox or poorly integrated. Rizin provides basic diffing features out of the box with rz-diff tool, but Cutter has no interface to represent similar functionality.\nTask Expose basic rz-diff features in the Cutter Create the interface to choose two files for diffing Create the way to show the differences in all main widgets: Hexadecimal view Disassembly view Graph view Pseudocode view Skills The participant should be comfortable with the C++ language, and be familiar with Qt framework\nDifficulty Medium\nBenefits for the participant The participant will gain an experience of creating efficient graphical interfaces.\nBenefits for the project It will greatly benefit the project since Cutter will be the only FOSS RE tool to provide this feature out of the box.\nAssess requirements for midterm/final evaluation 1st term: Expose the rz-diff features in the Cutter core and create the interface for opening files for diffing. Implement the diff modes for hexadecimal and disassembly views. Final term: Implement the diff modes for graph and pseudocode views, create the documentation. Mentors xvilka deroad Links/Resources Issue #1104 BinDiff Diaphora Rizin RzIL uplifting migration (350 hour project) Rizin has had an intermediate language for over a decade. Major architectures are supporting uplifting to ESIL. During the RSoC 2021, the initial version of the new intermediate language, which is based on the BAP\u0026rsquo;s Core Theory was implemented. In the following years it was improved and some of the architectures were ported to use RZIL instead of ESIL. The main goal of this project is to finish the migration of one or more existing architectures that still use ESIL or add a RzIL support for the architectures that hadn\u0026rsquo;t any uplifting at all.\nTasks Implement a RZIL uplifting for any non-trivial architecture, preferably that is supported by ESIL already Improve the integration with analysis (variables and types differences) for the chosen architecture Write the test cases for Rizin regression tests and improve the results. Update and use rz-tracetest for the chosen architectures Implement necessary commands and APIs in Rizin for visual representation of the IL tree Implement standard and graph views in Cutter for the IL output (optional) Due to the sensivity of uplifting to the precision, it\u0026rsquo;s important to follow these steps:\nFor every single lifted opcode, have at the very least one asm test in test/db/asm/... containing the IL to detect changes when the code is changed and have it type-checked. This should produce 100% coverage on the lifter C code (except e.g. malloc() error handling). Run rz-tracetest on real traces. It\u0026rsquo;s also possible to write custom assembly programs that execute specific obscure instructions where it\u0026rsquo;s hard to be sure that they were implemented correctly on many random inputs and then feed these executions into rz-tracetest. Few rz-test command tests that emulate some code snippets in rizin. For example a simple decryption loop to check the overall integration in rizin, or for specific edge cases (like running a division by zero). Skills The participant should know C and bits of C++ as well as be familiar with basics of the program analysis. Having an experience with other intermediate language, SAT/SMT, and mathematical logic is a plus.\nDifficulty Medium\nBenefits for the participant The participant will understand the state of the art of intermediate languages research, it\u0026rsquo;s relation to the mathematical logic, SMT, and program analysis. Moreover, the participant will become familiar with the both symbolic and concrete emulation during the implementation process.\nBenefits for the project Migrating most architectures will help to deprecate and remove outdated ESIL and will help improving the analysis precision. Adding uplifting for new architectures that weren\u0026rsquo;t even supported by ESIL will improve the analysis to even greater degree.\nAssess requirements for midterm/final evaluation 1st term: finish the RZIL uplifting for the chosen architecture with basic instruction-level tests Final term: implement all changes in the analysis code, added more complex integration tests with types analysis Mentors xvilka thestr4ng3r Links/Resources RZIL-labeled issues ESIL-labeled issues ESIL to RZIL conversion tracking issue Cutter: IL output and graph visual representation Debug information handling improvements (175 hour project) Rizin already supports most of the DWARF and PDB features, including cross-platform parsing of both. However information are usually just printed to aid the reverse engineering process, but they are not actually used at their best. For example, you can\u0026rsquo;t use them to configure a breakpoint, nor they can be used to access variables within a function during debugging. Moreover, it is becoming more and more common to store DWARF information in separate files, either shipped as separate file or downloaded on the fly with debuginfod. Rizin does not support these kind of DWARF files yet.\nYour task would be to improve the parsing support of both by fixing smaller bugs, add support for separate DWARF files and debuginfod and enhance breakpoint integration and variable/structure printing in debugging mode with the source information gathered from DWARF/PDB.\nTask Support loading DWARF information from separate files and debuginfod Unify source lines/types information access for DWARF, PDB, dSYM and refactor/fix parsing code as necessary Integrate source line and types/variables information with the analysis (optional) Integrate source line and types/variables with printing with p commands in the debug mode Integrate source line and types/variables with breakpoint commands and APIs Parsing performance improvements Skills Good knowledge of the C language Some experience in debugging with GDB or LLDB Basic knowledge of at least one of the following formats: ELF, DWARF, PDB, PE Difficulty Hard\nBenefits for the participant Participant will understand how high-level features of debuggers work as well as gain skills in the field of software architecture of a large, modular C project.\nAssess requirements for midterm/final evaluation 1st term: debuginfod and source line information refactoring are implemented Final term: Integration of variable information with the debug and printing commands is implemented Mentors xvilka thestr4ng3r ret2libc Links/Resources Loading debug information from debuginfod Unify code of source information access for DWARF, PDB, dSYM Ghidra issue: support DWARF in MinGW PE binaries Debuginfod Debian Debuginfod Fedora Debuginfod DWARF-labeled issues PDB-labaled issues Debugger improvements and portability (175 hour project) Rizin debugger already supports most of the platforms, including native and remote debugging. Nevertheless, for most platforms it\u0026rsquo;s limited mostly to the x86/x86_64 and ARMv8, often lacking the tests. The task would be to add missing architectures to the native debugger, e.g. MIPS to the Linux Native, ARMv7/ARMv8 to the FreeBSD, System Z debugger for Linux, HPPA debugger for Linux, VAX debugger for NetBSD, and so on.\nWith the help of emulators like QEMU and SIMH we could extend our CI to automatically test these debuggers.\nTask Support for missing architectures that are supported by Rizin statically in the Linux native debugger Support for missing architectures that are supported by Rizin statically in the BSD native debugger Cover more platforms supported by the debugger with automated tests, with CI whenever it\u0026rsquo;s possible Fix the bugs in debuggers, minor refactorings of the code Skills Good knowledge of the C language Some experience in debugging with GDB or LLDB Difficulty Hard\nBenefits for the participant Participant will understand how debugging works on the low level, and will gain experience with variety of different platforms and operating systems.\nAssess requirements for midterm/final evaluation 1st term: `SystemZ, MIPS, HPPA support in Linux native, remote GDB debuggers Final term: ARM and SPARC support in *BSD debuggers, VAX support in NetBSD Mentors xvilka thestr4ng3r ret2libc Links/Resources Debug-labeled issues RzDebug-labaled issues New Platform support New Architecture support Thread-safety and multithreading (175 hour project) Currently Rizin is not thread safe completely internally and as a library for a multithreaded application. The goal of this project is to eliminate global states and use contexts, eliminate singletons, e.g. RzCons, and use thread-safe external functions and dependencies.\nTask Migrate from thread-unsafe system and external dependencies Eliminate global state inside RzCons and use of the singleton Make RzBin thread-safe Make RzAnalysis thread-safe Make RzCore thread-safe Add tests for using multiple RzCore and RzAnalysis instances Parallelize some of the RzAnalysis function using the threading API Skills Participant should know C as well as have the experience of developing multithreaded applications.\nDifficulty Hard\nBenefits for the participant Participant will understand the hurdles of multithreaded programming, data synchronization, locks and debugging of such code.\nAssess requirements for midterm/final evaluation 1st term: Eliminate thread-unsafe dependencies and remove global state from RzCons and RzBin Final term: Make RzAnalysis and RzCore (optionally) thread-safe Mentors xvilka thestr4ng3r wargio Links/Resources Migrate from wcstombs() function since it\u0026rsquo;s not thread-safe Rewriting GPL-only code (175 hour project) Currently some of the Rizin main features rely on the GPL-only code copied from binutils or GCC. The goal is to rewrite all this code from GPL-only to LGPL or any other less restrictive license. It is quite important for better adoption of Rizin as a library in other FOSS and commercial projects.\nTasks Rewrite C++ demangler to and remove the GPL code Rewrite some of the mainstream architectures that still rely on binutils without using GPL-only code Good example of such architectures are:\nSPARC (there is already capstone-based RzAsm and RzAnalysis plugin but it\u0026rsquo;s less complete than binutils-based one) Xtensa ARC HPPA (PA-RISC) Skills Participant should know C and basics of C++ for understanding the mangling scheme\nDifficulty Medium\nBenefits for the participant Participant will understand how C++ type information is stored in the name of the methods and classes.\nAssess requirements for midterm/final evaluation 1st term: Basic demangling for C++ is rewritten under less restrictive license. Final term: At least one binutils-based architecture is reimplemented with more permissive license. Mentors xvilka thestr4ng3r wargio Links/Resources rz-libdemangle: rewrite/remove GPL-only code Update binutils code to latest Exploitation capabilities improvements (175 hour project) Since modern architectures are now enforcing W^X, exploiters are using ROP. (Un)fortunately, building ROP chain by hand can be tedious, this is why some tools can be used to ease this construction: ImmunityDBG has mona.py, there is also ROPgadget and dropper.There exist even tools that can generate ROP chains automatically, for example exrop. It\u0026rsquo;s a shame that despite having ESIL, Rizin doesn\u0026rsquo;t have something similar yet. One of the possible solutions would be to build an external plugin or tool which will reuse power of librz and rz-gg. Moreover it makes sense to think about SROP, COOP and BROP support.\nThe rz-gg tool while has the ability to create a custom shellcode has the outdated database of the shellcodes, so updating them is crucial for the tool to be relevant.\nTask Update the shellcodes database, imrove rz-gg features and documentation Implement a ropchain syntax parser that uses rz-gg or a custom DSL, something like: register reg1 = 0; register reg2 = whatever; register reg3 = reg1 + reg2; system(reg3); Write a compiler which uses SMT solver (like Z3 for example) to produce the ropchain. Support main architectures - x86, ARM, MIPS, PowerPC Skills The participant should be comfortable with the C language, know some assembly and a high-level language. Also, knowing a little bit of automatic binary analysis wouldn’t hurt.\nDifficulty Advanced\nBenefits for the participant The participant will improve their skills in software exploitation and solvers.\nBenefits for the project This feature would greatly help during exploits development, and people would be able to ditch mona.py for Rizin ;)\nAssess requirements for evaluation 1st term: Creating the language for defining the ROP chain semantics and integrating it with SMT solver Final term: Working ropchain compiler, covered by tests and documented in the Rizin book. Mentors xvilka ret2libc Links/Resources ROPGadget Ropper Angrop ROPC exrop roper2 mona.py from corelan Hunting for ROP Gadgets in Style (2012) dropper a BARF-based rop chain generator Materials about the exloitation workshop at Hack.lu 2014 Slides for the exploitation part of workshop at Hack.lu 2015 RzEgg related bugs Microtasks When taking any of microtasks please be sure someone isn\u0026rsquo;t already working on them, and let us know if you are going to work on a particular one.\nFile formats Implementing the support for any new file format counts as a microtask. See New File-Format label for pending issues.\nDisassemblers and assemblers Implementing the support for any new architecture counts as a microtask. See New-Architecture label for pending issues.\nAnalysis The current code analysis has many caveats and issues which need addressing. Fixing them and writing more tests is important to stabilize and enhance rizin\u0026rsquo;s analysis engine.\nSee these issues on our GitHub dashboard.\nHeap analysis #157 Currently Rizin has support for heap exploration and analysis, but the feature is still basic and can be improved. Additionally, other allocators can be added (MacOS, tmalloc, etc.), but this should be done after a proper refactoring, because heap analysis shouldn\u0026rsquo;t depend on the debugger backend, and we may be able to use different heap tools.\nClass analysis for C++/ObjectiveC/Swift/Dlang/Java #416 Analysis classes, accessible under the ac command, is a relatively new feature of rizin. They provide a way to both manually and automatically manage and use information about classes in the binary.\nDevirtualize method calls using class vtables #414 Consider the following call: call dword [eax + 0x6c] Let\u0026rsquo;s assume eax is the base pointer of a vtable we have saved in class analysis and we want to find out the actual address of the called method.\nSo there should be a command that takes the offset (in this case 0x6c) and looks up the actual destination. It should be possible to call this command with a specific class, so it only looks into its vtable, or without a class, so it gives a list of possible destinations for all vtables that are not too small for the offset.\nWhen that is implemented, one could also add a command that does the same thing, but automatically takes the offset from the opcode at the current seek.\nAdd classes list to Vb Vb already supports browsing bin classes. The same thing should be implemented for classes from analysis.\nRefactoring Use internal API instead of commands Currently, Rizin\u0026rsquo;s source code is rife with calls to rz_core_cmd()-like functions that run the Rizin command. While it is a useful shortcut for developer, it makes a good source of the potential bugs in case of the command syntax or behavior change. If these changes happen they are invisible to the compiler, so it cannot warn on the changed syntax. It isn\u0026rsquo;t the case of changed function arguments count or type. Thus, all these calls eventually should be substituted with direct calls to the corresponding API functions. If there is no corresponding API function, then one should be created. Good examples of such cases are:\nRefactor Graph processing from commands to the API use Refactor Visual mode from commands to the API use Refactor Panels mode from commands to the API use In general you can just search for rz_core_cmd pattern in any place inside librz/.\nMiscellaneous Shell (dietline) improvements Currently Rizin uses its own readline-compatible implementation of the input handling in the embedded shell that is compact and portable between all supported platforms. It supports both Emacs and Vi modes, but not all bindings and features are supported. Some are omitted by choice, but some were simply not implemented. See the \u0026ldquo;dietline\u0026rdquo;-labeled issues.\nImproving regression suite and testing It is required to solve numerous issues, along with improving parallel execution and performance. The next interesting idea is to setup and reuse Godbolt compilation engine for generating tests for different compilers and compilation options. There is even a command line tool for interacting with Godbolt - cce.\nUnbreaking broken tests Almost one thousand of tests marked as \u0026ldquo;broken\u0026rdquo; in our testsuite. The task is to take any of those, investigate why it fails, if the test makes sense now or already irrelevant today. Then to try to fix some of the broken tests.\nRzGhidra There are many small issues in the decompiler output:\npdgsd commands showing incorrect P-code Improvements in recovering jump tables rz-ghidra can\u0026rsquo;t detect string Ghidra Decompiler Error: Could not finish collapsing block structure Mishandled tail jump with relocation inside the jump function Prioritize keeping vars with lower addresses Minor improvements for the SLEIGH plugin Some of these issues might be related on how Rizin and RzGhidra integrate and might require changes in the Rizin side.\nAlso note that most of these issues should be paired with the test to verify it will not break in the future.\n","permalink":"https://rizin.re/gsoc/2023/","summary":"TL;DR Jump to the Ideas list.\nIntroduction This year is the third time we participate as Rizin, effectively continuing the tradition since the year 2015.\nMentors Members of the Rizin and Cutter core teams have volunteered to guide participants for GSoC’23. They were already guiding the participants for the GSoC and RSoC in past years. Please feel free to reach out to any of them in case you need any help in selecting a project.","title":"GSoC 2023"},{"content":"TL;DR Jump to the Ideas list.\nIntroduction This year is the second time we participate as a fork - Rizin, effectively continuing the tradition since the year 2015 (as the radare2 project).\nMentors Members of the Rizin and Cutter core teams have volunteered to guide participants for GSoC’22. They were already guiding the participants for the GSoC and RSoC in past years. Please feel free to reach out to any of them in case you need any help in selecting a project.\nAnton Kochkov Mattermost: xvilka \u0026ndash; @akochkov Riccardo Schirone Mattermost: ret2libc Florian Märkl Mattermost/Telegram: @thestr4ng3r \u0026ndash; @thestr4ng3r Itay Cohen Mattermost/Telegram: @Megabeets @Megabeets_ Giovanni Dante Grazioli Mattermost/Telegram: @deroad @der0ad Yossi Zapesochini Mattermost/Telegram: @yossizap And many others Development methodology Currently, all repositories are hosted on GitHub main organization account, bugs are tracked on GitHub issues too. We are mostly using our own Mattermost instance, IRC, and Telegram) for communication. We have a testsuite (which is running on GitHub Actions, Travis CI, AppVeyor and SourceHut) to test and verify that all the features are still working and that a pull requests or commits don\u0026rsquo;t break anything, to ensure the support of different operating systems (Linux, MacOS, Windows, FreeBSD, OpenBSD), different architectures (x86/x86_64, ARM64, PowerPC, SystemZ), and to find regressions. We encourage contributors to write test cases and documentation in order to verify the implementation and ensure that everything fits well together. For complex bugs and examples we\u0026rsquo;re using ASCIInema for recording the sessions.\nSee also our guides for corresponding projects:\nRizin Contributing Guide and Developers Intro Cutter Contributing Guide and Developers Intro For those who want to get introduced to the Rizin codebase and practices, we recommend to pick one of the easy issues for Rizin or Cutter to start with.\nLicense Rizin is modular: this means that it aims to make all the elements and features easily reusable from other projects. The choice of LGPL3 as a license is the minimum requirement to get code merged in Rizin. Contributors can choose Apache, BSD, MIT, Public Domain, or other similar licenses. The reason to exclude GPL as a valid license for the project is because we aim to support proprietary software that uses Rizin, while protecting our free codebase.\nInstructions for participants It is a requirement that participants who want to apply to the Rizin project for the Google Summer of Code 2022 should submit a small pull request accomplishing one of the microtasks (see below) as part of their application. Though you can also choose any of the GitHub issues for Rizin if they are big enough to be a qualification task, and still small enough to be finished no more than in a couple of weeks. To help participants to understand how to contribute to the project there are issues marked as \u0026ldquo;good first issue\u0026rdquo; for both Rizin and Cutter.\nProgramming languages Most of Rizin is written in C (conforming C99 standard) and hence we expect participants to be familiar with C programming language. For some of our tasks or microtasks, such as rz-pm, they should know the Go programming language. For the Cutter tasks, it is a requirement to know C++ and Qt framework basics.\nRecommended steps Read Google\u0026rsquo;s instructions for participating Grab any of the project from the list of ideas that you\u0026rsquo;re interested in (or propose your own). Write a first draft proposal using Google Docs and our template and ask one of the mentors or administrators to review it with you. Submit it using Google\u0026rsquo;s web interface. Participant proposal guidelines Keep it simple enough to fit in no more than a couple of pages. Try to be clear and concise in your writing. Try to split the entire GSoC period into tasks, and each task into subtasks. It helps us to understand how you plan to accomplish your goals, but more importantly, it\u0026rsquo;ll help you to understand the task deep enough before starting, and prioritize important things to do first. Please note how much time a day/week you are able to spend on this project. Please specify which category you apply for - medium task or extended deadline one. Specify your timezone, since so we can assign you a mentor in the same one, to ease communication. Submit your proposal early, not in the last minute! Be sure to choose a “backup” idea (the second task you want to do), so that conflicts (two participants for one task) can be resolved. Project Ideas Rizin RzIL uplifting migration (350 hour project) Rizin has had an intermediate language for over a decade. Major architectures are supporting uplifting to ESIL. During the RSoC 2021, the initial version of the new intermediate language, which is based on the BAP\u0026rsquo;s Core Theory was implemented. In the following months it was improved and some of the architectures were ported to use RZIL instead of ESIL. The main goal of this project is to finish the migration of one or more existing architectures that still use ESIL or add a RzIL support for the architectures that hadn\u0026rsquo;t any uplifting at all.\nTasks Implement a RZIL uplifting for any non-trivial architecture, preferably that is supported by ESIL already Improve the integration with analysis (variables and types differences) for the chosen architecture Write the test cases for Rizin regression tests and improve the results. Update and use rz-tracetest for the chosen architectures Implement necessary commands and APIs in Rizin for visual representation of the IL tree Implement standard and graph views in Cutter for the IL output (optional) Due to the sensivity of uplifting to the precision, it\u0026rsquo;s important to follow these steps:\nFor every single lifted opcode, have at the very least one asm test in test/db/asm/... containing the IL to detect changes when the code is changed and have it type-checked. This should produce 100% coverage on the lifter C code (except e.g. malloc() error handling). Run rz-tracetest on real traces. It\u0026rsquo;s also possible to write custom assembly programs that execute specific obscure instructions where it\u0026rsquo;s hard to be sure that they were implemented correctly on many random inputs and then feed these executions into rz-tracetest. Few rz-test command tests that emulate some code snippets in rizin. For example a simple decryption loop to check the overall integration in rizin, or for specific edge cases (like running a division by zero). Skills The participant should know C and bits of C++ as well as be familiar with basics of the program analysis. Having an experience with other intermediate language, SAT/SMT, and mathematical logic is a plus.\nDifficulty Medium\nBenefits for the participant The participant will understand the state of the art of intermediate languages research, it\u0026rsquo;s relation to the mathematical logic, SMT, and program analysis. Moreover, the participant will become familiar with the both symbolic and concrete emulation during the implementation process.\nBenefits for the project Migrating most architectures will help to deprecate and remove outdated ESIL and will help improving the analysis precision. Adding uplifting for new architectures that weren\u0026rsquo;t even supported by ESIL will imrove the analysis to even greater degree.\nAssess requirements for midterm/final evaluation 1st term: finish the RZIL uplifting for the chosen architecture with basic instruction-level tests Final term: implement all changes in the analysis code, added more complex integration tests with types analysis Mentors xvilka thestr4ng3r Links/Resources RZIL-labeled issues ESIL-labeled issues ESIL to RZIL conversion tracking issue Cutter: IL output and graph visual representation Debug information handling improvements (175 hour project) Rizin already supports most of the DWARF and PDB features, including cross-platform parsing of both. However information are usually just printed to aid the reverse engineering process, but they are not actually used at their best. For example, you can\u0026rsquo;t use them to configure a breakpoint, nor they can be used to access variables within a function during debugging. Moreover, it is becoming more and more common to store DWARF information in separate files, either shipped as separate file or downloaded on the fly with debuginfod. Rizin does not support these kind of DWARF files yet.\nYour task would be to improve the parsing support of both by fixing smaller bugs, add support for separate DWARF files and debuginfod and enhance breakpoint integration and variable/structure printing in debugging mode with the source information gathered from DWARF/PDB.\nTask Support loading DWARF information from separate files and debuginfod Unify source lines/types information access for DWARF, PDB, dSYM and refactor/fix parsing code as necessary Integrate source line and types/variables information with the analysis (optional) Integrate source line and types/variables with printing with p commands in the debug mode Integrate source line and types/variables with breakpoint commands and APIs Parsing performance improvements Skills Good knowledge of the C language Some experience in debugging with GDB or LLDB Basic knowledge of at least one of the following formats: ELF, DWARF, PDB, PE Difficulty Hard\nBenefits for the participant Participant will understand how high-level features of debuggers work as well as gain skills in the field of software architecture of a large, modular C project.\nAssess requirements for midterm/final evaluation 1st term: debuginfod and source line information refactoring are implemented Final term: Integration of variable information with the debug and printing commands is implemented Mentors xvilka thestr4ng3r ret2libc Links/Resources Loading debug information from debuginfod Unify code of source information access for DWARF, PDB, dSYM Ghidra issue: support DWARF in MinGW PE binaries Debuginfod Debian Debuginfod Fedora Debuginfod DWARF-labeled issues PDB-labaled issues Thread-safety and multithreading (175 hour project) Currently Rizin is not thread safe completely internally and as a library for a multithreaded application. The goal of this project is to eliminate global states and use contexts, eliminate singletons, e.g. RzCons, and use thread-safe external functions and dependencies.\nTask Migrate from thread-unsafe system and external dependencies Eliminate global state inside RzCons and use of the singleton Make RzBin thread-safe Make RzAnalysis thread-safe Make RzCore thread-safe Add tests for using multiple RzCore and RzAnalysis instances Parallelize some of the RzAnalysis function using the threading API Skills Participant should know C as well as have the experience of developing multithreaded applications.\nDifficulty Hard\nBenefits for the participant Participant will understand the hurdles of multithreaded programming, data synchronization, locks and debugging of such code.\nAssess requirements for midterm/final evaluation 1st term: Eliminate thread-unsafe dependencies and remove global state from RzCons and RzBin Final term: Make RzAnalysis and RzCore (optionally) thread-safe Mentors xvilka thestr4ng3r wargio Links/Resources Migrate from wcstombs() function since it\u0026rsquo;s not thread-safe Rewriting GPL-only code (175 hour project) Currently some of the Rizin main features rely on the GPL-only code copied from binutils or GCC. The goal is to rewrite all this code from GPL-only to LGPL or any other less restrictive license. It is quite important for better adoption of Rizin as a library in other FOSS and commercial projects.\nTasks Rewrite C++ demangler to and remove the GPL code Rewrite some of the mainstream architectures that still rely on binutils without using GPL-only code Good example of such architectures are:\nSPARC (there is already capstone-based RzAsm and RzAnalysis plugin but it\u0026rsquo;s less complete than binutils-based one) Xtensa Tricore SH HPPA (PA-RISC) Skills Participant should know C and basics of C++ for understanding the mangling scheme\nDifficulty Medium\nBenefits for the participant Participant will understand how C++ type information is stored in the name of the methods and classes.\nAssess requirements for midterm/final evaluation 1st term: Basic demangling for C++ is rewritten under less restrictive license. Final term: At least one binutils-based architecture is reimplemented with more permissive license. Mentors xvilka thestr4ng3r wargio Links/Resources rz-libdemangle: rewrite/remove GPL-only code Update binutils code to latest Exploitation capabilities improvements (175 hour project) Since modern architectures are now enforcing W^X, exploiters are using ROP. (Un)fortunately, building ROP chain by hand can be tedious, this is why some tools can be used to ease this construction: ImmunityDBG has mona.py, there is also ROPgadget and dropper.There exist even tools that can generate ROP chains automatically, for example exrop. It\u0026rsquo;s a shame that despite having ESIL, Rizin doesn\u0026rsquo;t have something similar yet. One of the possible solutions would be to build an external plugin or tool which will reuse power of librz and rz-gg. Moreover it makes sense to think about SROP, COOP and BROP support.\nThe rz-gg tool while has the ability to create a custom shellcode has the outdated database of the shellcodes, so updating them is crucial for the tool to be relevant.\nTask Update the shellcodes database, imrove rz-gg features and documentation Implement a ropchain syntax parser that uses rz-gg or a custom DSL, something like: register reg1 = 0; register reg2 = whatever; register reg3 = reg1 + reg2; system(reg3); Write a compiler which uses SMT solver (like Z3 for example) to produce the ropchain. Support main architectures - x86, ARM, MIPS, PowerPC Skills The participant should be comfortable with the C language, know some assembly and a high-level language. Also, knowing a little bit of automatic binary analysis wouldn’t hurt.\nDifficulty Advanced\nBenefits for the participant The participant will improve their skills in software exploitation and solvers.\nBenefits for the project This feature would greatly help during exploits development, and people would be able to ditch mona.py for Rizin ;)\nAssess requirements for evaluation 1st term: Creating the language for defining the ROP chain semantics and integrating it with SMT solver Final term: Working ropchain compiler, covered by tests and documented in the Rizin book. Mentors xvilka ret2libc Links/Resources ROPGadget Ropper Angrop ROPC exrop roper2 mona.py from corelan Hunting for ROP Gadgets in Style (2012) dropper a BARF-based rop chain generator Materials about the exloitation workshop at Hack.lu 2014 Slides for the exploitation part of workshop at Hack.lu 2015 RzEgg related bugs Bindings for languages other than C/C++ (175 hour project) Rizin offers a convenient scripting interface through the rz-pipe APIs, which build upon its command-based interface. While this reduced interface is beneficial and well-suited for many scripting tasks, building more complex applications generally requires direct access to the public C api that Rizin offers. Using this API is directly possible in C and C++, as it is done in Cutter for example, but for other languages no generic bindings exist so far. The goal of this task is to use a bindings generator such as SWIG to expose Rizin\u0026rsquo;s C API to languages such as Python, Java or OCaml.\nTask Integrate SWIG-generated bindings into Rizin\u0026rsquo;s build system Write SWIG interfaces for all mature parts of Rizin\u0026rsquo;s C API Integrate the Python bindings into Cutter\u0026rsquo;s Python support Skills The participant should be comfortable with the C and Python languages, as well as have a deep understanding of common memory management patterns such as ownership and reference counting.\nDifficulty Advanced\nBenefits for the participant The participant will gain an experience of exposing a C-based API with manual memory management to high-level, object-oriented languages with automatic memory management.\nAssess requirements for midterm/final evaluation 1st term: Bindings can be generated as part of the standard Rizin build system and small parts of the core API are already usable. Final term: All relevant parts of the API can be used through bindings and also from within Cutter\u0026rsquo;s Python interpreter. Mentors thestr4ng3r xvilka Links/Resources SWIG Website SWIG 4.0 Documentation Small PoC of bindings generated in Rizin\u0026rsquo;s build system Article about Rizin\u0026rsquo;s build system design Cutter Plugins and Python High Level API (175 hour project) Our current public API to be used by plugin authors is somewhat limited. We need to improve a lot of things about our Plugins support and take it few steps ahead. This task is only about improving the public C++ and Python interface of Cutter, specifically its graphical user interface components. For a task about exposing Rizin\u0026rsquo;s API for disassembly, analysis and other purposes, see the Rizin bindings task above.\nTask Expose everything Cutter can offer for plugins authors. This includes high level API, integration of the plugin management etc. Accessing everything from Python (like Blender) - see issue #1662 Python integration and IPython console. Skills The participant should be comfortable with the C++ and Python languages, and be familiar with Qt framework\nDifficulty Advanced\nBenefits for the participant The participant will gain an experience of creating a suitable API for scripting graphical interface programs.\nBenefits for the project It will greatly improve the scripting experience, will make API more consistent and will ease creating Cutter plugins by the community. Moreover, it will simplify testing of the Cutter features.\nAssess requirements for midterm/final evaluation 1st term: Design of the high level API and required Rizin changes. Review and implement all missing API functions that are accessible as interface controls. Final term: Implement the way to show the API when hovered over some interface control, create documentation. Mentors thestr4ng3r Megabeets Links/Resources SDB Module/API for Cutter Python/Jupyter integration Jupyter plugin for Cutter Multi-Tasking and Event-driven architecture (350 hour project) The information Cutter gets about functions, strings, imports, and the analysis are all performed in Rizin and only displayed in Cutter. Currently, it is pulling most information from Rizin only on demand. This is problematic because sometimes the user performs changes (via plugins, the console widget, and more) that are affecting the information from Rizin, but Cutter doesn\u0026rsquo;t know about these changes to apply the to the UI. For example, if a user will define a new function in a Python script or via the console widget by using the Rizin command af @ \u0026lt;addr\u0026gt;, Cutter will not show this new function in the Functions widget until the user will refresh the interface manually (Edit -\u0026gt; Refresh Contents). The goal of this task is to use an event-driven architecture to overcome this limitation.\nIn addition, this task will also handle the analysis in the background feature, to allow the analysis performed by Rizin to happen while the interface is active.\nTasks The overall implementation of this task should start from Rizin by adding events to many of the functions. This can be done using rz_events. For example, add an even for function creating, for section creation, for flag deletion, for name changed, and more\nAdd events to all the relevant functions inside Rizin Add support for these events in Cutter and refresh and update the relevant widgets per each event Support analysis in the background and allow the user to start its session while Rizin is analyzing (see #1856, #1574) Skills The participant should be comfortable with the C++ for Cutter and C for Rizin. They should also be familiar with Qt framework. Experience in GUI code architecture, for example using functional reactive programming or Elm-like approaches is a plus.\nDifficulty Advanced\nBenefits for the participant The participant will gain an experience of creating complex event-driven software in both C and C++ languages.\nBenefits for the project It will allow to work on big files effortlessly in Cutter, will improve analysis quality as well.\nAssess requirements for midterm/final evaluation 1st term: Implement events everywhere in the relevant places across Rizin code and event-driven interaction with Cutter. Final term: Add support for the Cutter interface refresh based on the events from Rizin, implement analysis in background. Mentors thestr4ng3r Heap viewer completion (175 hour project) Thanks to the work that was done in the previous GSoC, Cutter and Rizin have nice visualizations of the heap and memory maps. We would like to expand on this feature with performance improvements to the heap parsers and support more memory allocators.\nTask Complete Cutter\u0026rsquo;s implementation of the windows heap widget #2723 Improve the performance of the Windows heap parser Fix Windows heap parsing errors Make the implementation work with remote debugging modes Skills The participant should be comfortable with the C++, and be familiar with Qt framework\nDifficulty Medium\nBenefits for the participant The participant will gain the understanding on how modern runtimes provide the heap for various programs, which will be beneficial for the binary exploitation skills.\nBenefits for the project It will greatly improve the debugging and reverse engineering experience for complex programs, also provides the way to design the exploitation techniques with the help of Rizin/Cutter.\nAssess requirements for midterm/final evaluation 1st term: Design and implement heap visualization widgets, add Rizin test and fixes Final term: Various bugfixes related to the heap inspection support on various platforms and allocators, tests and documentation. Mentors xvilka Megabeets yossizap Links/Resources Issue #1041 Heap Viewer plugin for IDA Pro Heap parsing for MacOS, tmalloc, jmalloc Dynamic Allocator Detection \u0026ldquo;heap\u0026rdquo;-marked Rizin issues Diffing mode (175 hour project) Binary diffing is one of the most common tasks for the reverse engineer. There are many tools available, but most of them are either detached from the main RE toolbox or poorly integrated. Rizin provides basic diffing features out of the box with rz-diff tool, but Cutter has no interface to represent similar functionality.\nTask Expose basic rz-diff features in the Cutter Create the interface to choose two files for diffing Create the way to show the differences in all main widgets: Hexadecimal view Disassembly view Graph view Pseudocode view Skills The participant should be comfortable with the C++ language, and be familiar with Qt framework\nDifficulty Medium\nBenefits for the participant The participant will gain an experience of creating efficient graphical interfaces.\nBenefits for the project It will greatly benefit the project since Cutter will be the only FOSS RE tool to provide this feature out of the box.\nAssess requirements for midterm/final evaluation 1st term: Expose the rz-diff features in the Cutter core and create the interface for opening files for diffing. Implement the diff modes for hexadecimal and disassembly views. Final term: Implement the diff modes for graph and pseudocode views, create the documentation. Mentors xvilka Megabeets Links/Resources Issue #1104 BinDiff Diaphora Microtasks When taking any of microtasks please be sure someone isn\u0026rsquo;t already working on them, and let us know if you are going to work on a particular one.\nFile formats Implementing the support for any new file format counts as a microtask. See New File-Format label for pending issues.\nDisassemblers and assemblers Implementing the support for any new architecture counts as a microtask. See New-Architecture label for pending issues.\nELF binary parsing. Rizin parses a lot of information about the ELF but doesn\u0026rsquo;t print everything.\nMoreover, some information about PLT stubs not being resolved correctly.\nAnalysis The current code analysis has many caveats and issues which need addressing. Fixing them and writing more tests is important to stabilize and enhance rizin\u0026rsquo;s analysis engine.\nSee these issues or the \u0026ldquo;Analysis\u0026rdquo; project on our GitHub dashboard.\nHeap analysis #157 Currently Rizin has support for heap exploration and analysis, but the feature is still basic and can be improved. Additionally, other allocators can be added (MacOS, tmalloc, etc.), but this should be done after a proper refactoring, because heap analysis shouldn\u0026rsquo;t depend on the debugger backend, and we may be able to use different heap tools.\nClass analysis for C++/ObjectiveC/Swift/Dlang/Java #416 Analysis classes, accessible under the ac command, is a relatively new feature of rizin. They provide a way to both manually and automatically manage and use information about classes in the binary.\nDevirtualize method calls using class vtables #414 Consider the following call: call dword [eax + 0x6c] Let\u0026rsquo;s assume eax is the base pointer of a vtable we have saved in class analysis and we want to find out the actual address of the called method.\nSo there should be a command that takes the offset (in this case 0x6c) and looks up the actual destination. It should be possible to call this command with a specific class, so it only looks into its vtable, or without a class, so it gives a list of possible destinations for all vtables that are not too small for the offset.\nWhen that is implemented, one could also add a command that does the same thing, but automatically takes the offset from the opcode at the current seek.\nAdd classes list to Vb Vb already supports browsing bin classes. The same thing should be implemented for classes from analysis.\nRefactoring Use internal API instead of commands Currently, Rizin\u0026rsquo;s source code is rife with calls to rz_core_cmd()-like functions that run the Rizin command. While it is a useful shortcut for developer, it makes a good source of the potential bugs in case of the command syntax or behavior change. If these changes happen they are invisible to the compiler, so it cannot warn on the changed syntax. It isn\u0026rsquo;t the case of changed function arguments count or type. Thus, all these calls eventually should be substituted with direct calls to the corresponding API functions. If there is no corresponding API function, then one should be created. Good examples of such cases are:\nRefactor Graph processing from commands to the API use Refactor Visual mode from commands to the API use Refactor Panels mode from commands to the API use In general you can just search for rz_core_cmd pattern in any place inside librz/.\nMiscellaneous Shell (dietline) improvements Currently Rizin uses its own readline-compatible implementation of the input handling in the embedded shell that is compact and portable between all supported platforms. It supports both Emacs and Vi modes, but not all bindings and features are supported. Some are omitted by choice, but some were simply not implemented. See the \u0026ldquo;dietline\u0026rdquo;-labeled issues.\nImproving regression suite and testing It is required to solve numerous issues, along with improving parallel execution and performance. Good example is to allow better filtering of the test types to run, for example to ignore debug tests. The next interesting idea is to setup and reuse Godbolt compilation engine for generating tests for different compilers and compilation options. There is even a command line tool for interacting with Godbolt - cce.\nAnother important part of the improving test suite is to cover more different formats and cases with expanding it. See the #114 issue with more details on how it can be done.\nUnbreaking broken tests Almost one thousand of tests marked as \u0026ldquo;broken\u0026rdquo; in our testsuite. The task is to take any of those, investigate why it fails, if the test makes sense now or already irrelevant today. Then to try to fix some of the broken tests.\nBetter portability Due to the mistakes in handling data for big-endian platforms in Rizin code a lot of tests still don\u0026rsquo;t pass on our System Z CI worker. Most of the broken test are related to parsing the formats, in particular reading the integers in portable way. See #297 for details on these formats. In most cases the solution would be to use rz_read_*() API functions: Developers Guide: Manage Endianess.\nRzGhidra There are many small issues in the decompiler output:\npdgsd commands showing incorrect P-code Improvements in recovering jump tables rz-ghidra can\u0026rsquo;t detect string Ghidra Decompiler Error: Could not finish collapsing block structure Mishandled tail jump with relocation inside the jump function Prioritize keeping vars with lower addresses Minor improvements for the SLEIGH plugin Some of these issues might be related on how Rizin and RzGhidra integrate and might require changes in the Rizin side.\nAlso note that most of these issues should be paired with the test to verify it will not break in the future.\n","permalink":"https://rizin.re/gsoc/2022/","summary":"TL;DR Jump to the Ideas list.\nIntroduction This year is the second time we participate as a fork - Rizin, effectively continuing the tradition since the year 2015 (as the radare2 project).\nMentors Members of the Rizin and Cutter core teams have volunteered to guide participants for GSoC’22. They were already guiding the participants for the GSoC and RSoC in past years. Please feel free to reach out to any of them in case you need any help in selecting a project.","title":"GSoC 2022"},{"content":"TL;DR Jump to the Ideas list.\nIntroduction Each year since 2015, we have participated in Google Summer of Code as the Radare2 project and accomplished many goals. This year we participate as a fork - Rizin, but effectively continuing the same process and the same mentors.\nMentors Members of the Rizin and Cutter core teams have volunteered to guide students for GSoC’21. They were already guiding the students for the GSoC and RSoC in past years as part of the Radare2 project. Please feel free to reach out to any of them in case you need any help in selecting a project.\nAnton Kochkov Mattermost: xvilka \u0026ndash; @akochkov Riccardo Shirone Mattermost: ret2libc Florian Märkl Mattermost/Telegram: @thestr4ng3r \u0026ndash; @thestr4ng3r Antide Petit IRC/Telegram: xarkes \u0026ndash; @xarkes_ Itay Cohen Mattermost/Telegram: @Megabeets @Megabeets_ Giovanni Dante Grazioli Mattermost/Telegram: @deroad @der0ad Yossi Zapesochini Mattermost/Telegram: @yossizap And many others Development methodology Currently, all repositories are hosted on GitHub main organization account, bugs are tracked on GitHub issues too. We are mostly using our own Mattermost instance, IRC, and Telegram) for communication. We have a testsuite (that is running on GitHub Actions, Travis CI, AppVeyor and SourceHut) to test and verify that all the features are still working and that a pull requests or commits don\u0026rsquo;t break anything, to ensure the support of different operating systems (Linux, MacOS, Windows, FreeBSD, OpenBSD), different architectures (x86/x86_64, ARM64, PowerPC, SystemZ), and to find regressions. We encourage contributors to write test cases and documentation in order to verify the implementation and ensure that everything fits well together. For complex bugs and examples we\u0026rsquo;re using ASCIInema for recording the sessions.\nSee also our guides for corresponding projects:\nRizin Contributing Guide and Developers Intro Cutter Contributing Guide and Developers Intro For those who want to get introduced to the Rizin codebase and practices, we recommend to pick one of the easy issues for Rizin or Cutter to start with.\nLicense Rizin is modular: this means that it aims to make all the elements and features easily reusable from other projects. The choice of LGPL3 as a license is the minimum requirement to get code merged in Rizin. Contributors can choose Apache, BSD, MIT, Public Domain, or other similar licenses. The reason to exclude GPL as a valid license for the project is because we aim to support proprietary software that uses Rizin, while protecting our free codebase.\nInstructions for students It is a requirement that students who want to apply to the Rizin project for the Google Summer of Code 2021 should submit a small pull request accomplishing one of the microtasks (see below) as part of their application. Though you can also choose any of the GitHub issues for Rizin if they are big enough to be a qualification task, and still small enough to be finished no more than in a couple of weeks.\nProgramming languages Most of Rizin is written in C (conforming C99 standard) and hence we expect students to be familiar with C programming language. For some of our tasks or microtasks, such as collaborative RE or rz-pm, students should know the Go programming language. For the Cutter tasks, students should know C++ and Qt framework basics.\nRecommended steps Read Google\u0026rsquo;s instructions for participating Grab any of the project from list of ideas that you\u0026rsquo;re interested in (or propose your own). Write a first draft proposal using Google Docs and our template and ask one of the mentors or administrators to review it with you. Submit it using Google\u0026rsquo;s web interface. Student proposal guidelines Keep it simple enough to fit in no more than a couple of pages. Try to be clear and concise in your writing. Try to split GSoC period into tasks, and each task into subtasks. It helps us to understand how you plan to accomplish your goals, but more importantly, it\u0026rsquo;ll help you to understand the task deep enough before starting, and prioritize important things to do first. Please, note, how much time a day/week you are able to spend on this project. Specify your timezone, since so we can assign you a mentor in the same one, to ease communication. Submit your proposal early, not in the last minute! Be sure to choose a “backup” idea (the second task you want to do), so that conflicts (two students for one task) can be resolved. Project Ideas Rizin Type Analysis Improvements Currently we have types support in Rizin, including basic (low-level) ability to edit type with pf and higher-level, C-like types with t command. It is possible to parse the C type definition from C headers for example, or load from \u0026ldquo;precompiled\u0026rdquo; SDB file. However, despite such features being present, many of them still lack the right structures and connections between them to make complex automated and manual analysis of code using types convenient. The goal of this task is to build upon the currently available features and re-think or re-design some of them to fit into the bigger picture of the entire framework. The overall plan for this project is tracked at https://github.com/rizinorg/rizin/projects/3. There are certain dependencies between some of the subtasks, but as long as these are respected, subtasks to be taken for GSoC can be picked by preference.\nTask (proposal) Bundle all types functionality in a new module RzTypes #369 Refactor some base type accesses to use the RzAnalysisBaseType API #368 Replace the current TCC-based C types parser by a Tree-sitter based one #275 Skills Student should know C as well as be familiar with basics of the program analysis. They should also be passionate about software architecture.\nDifficulty Hard\nBenefits for the student Student will understand modern program analysis problems related to type analysis, as well as gain skills in the field of software architecture of a large, modular C project with complex dependencies between modules.\nAssess requirements for midterm/final evaluation 1st term: RzTypes module exists and contains all relevant code. Final term: Tree-sitter based C parser is implemented and integrated into the analysis framework. Mentors xvilka thestr4ng3r Links/Resources Type Analysis Improvements Project C++ grammar for tree-sitter CPU/Platform profiles While instruction set defines architecture, it is common that particular CPU or SoC models implement only a subset of it or extend it with custom instructions and registers. Moreover, various SoC modifications can define peripheral devices interaction through ports (rare), registers or MMIO spaces. All this helps the reverse engineering process, because a lot of the code will make sense upon a glance once you see it accesses certain registers (if named) or peripheral devices (when MMIO area is defined). A common example is SVD loading for ARM architecture.\nA good example how CPU profile should look like:\nasm.cpu to be dynamically populated by listing the available CPU dedicated plaintext/sdb files Add RAM_SIZE info in the CPU files and remove hardcoding from .h and .c files Add ROM_SIZE info in the CPU files and remove hardcoding from .h and .c files Add INTERRUPT_VECTOR_SIZE info in the CPU files and remove hardcoding from .h and .c files Add IO_REGISTERS, EXTENDED_IO_REGISTER, MMIO_REGISTER, coprocessor register info in the CPU files and remove hardcoding from .h and .c files Task Implement support for CPU profiles Implement support for platform profiles Add support for register and MMIO specific setups Integrate these in analysis loop, handling register and memory accesses. Implement tests and documentation in Rizin book Provide an API for setting these values from rz-pipe and lang-* plugins Skills Student should know C and understand basics of the hardware platforms, architectures and chips.\nDifficulty Medium\nBenefits for the student The student will improve familiarity with reverse engineering for various architectures and platforms, along with the improving the efficiency of Rizin.\nBenefits for the project Huge benefits for end users in UX and better support for extension.\nAssess requirements for evaluations 1st term: CPU and platform profiles, some most common profiles, integration with the analysis loop Final term: Support for more platforms, regression and unit tests, documentation (including Rizin book). Mentors xvilka deroad Links/Resources Issue #103 SVD loader for Ghidra How to use SVD loader with Ghidra SVD parser in Rust CMSIS-SVD repository Rz-diff improvements Rizin has had the ability to perform binary diffing for over a decade. Nevertheless the support is quite basic and there is room for improvement. One of the most important tasks is to deepen the integration with analysis loop. Integration with the analysis loop will allow Rizin to find and highlight the difference between arguments count, local variables count, their types and other analysis metainformation. The next big task is to modernize rz-diff (and corresponding parts in RCore) in terms of performance and user interface. And of course - cover the rz-diff and rizin diffing features with regression tests and unit tests.\nTasks Support diffing of the different parts of the same buffer/file Split view for hexadecimal view and disassembly diffing mode Improve the integration with analysis (variables and types differences) Integrate ESIL and decompilation (rz-ghidra, jsdec) pseudocode as an options for binary diffing Implement the most important diffing strategies from Diaphora Write the test cases for Rizin regression tests and improve the results. Skills Student should know C as well as be familiar with basics of the program analysis. Having an experience with other binary diffing software is a plus.\nDifficulty Medium\nBenefits for the student Student will understand modern program analysis problems in application to binary diffing, and how to improve the performance of patch analysis.\nBenefits for the project This feature will make Rizin usable for day-to-day patch analysis of modern software, as well as improve the automation and performance of this task.\nAssess requirements for midterm/final evaluation 1st term: rz-diff/rizin should support highlighting types, arguments, and variables differences between functions. Fina term: Implement split-view for hex, disassembly, and graph modes. Their interface and performance improvements. Write the regression tests for all implemented features, add the documentation in Rizin book. Mentors xvilka Megabeets Links/Resources rz-diff-labeled issues Signature-labeled issues Cutter: Diffing interface feature request #1104 PatchDiff2 BinDiff Diaphora SimHash Exploitation capabilities improvements Since modern architectures are now enforcing W^X, exploiters are using ROP. (Un)fortunately, building ROP chain by hand can be tedious, this is why some tools can be used to ease this construction: ImmunityDBG has mona.py, there is also ROPgadget and dropper.There exist even tools that can generate ROP chains automatically, for example exrop. It\u0026rsquo;s a shame that despite having ESIL, Rizin doesn\u0026rsquo;t have something similar yet. One of the possible solutions would be to build an external plugin or tool which will reuse power of librz and rz-gg. Moreover it makes sense to think about SROP, COOP and BROP support.\nThe rz-gg tool while has the ability to create a custom shellcode has the outdated database of the shellcodes, so updating them is crucial for the tool to be relevant.\nTask Update the shellcodes database, imrove rz-gg features and documentation Implement a ropchain syntax parser that uses rz-gg or a custom DSL, something like: register reg1 = 0; register reg2 = whatever; register reg3 = reg1 + reg2; system(reg3); Write a compiler which uses SMT solver (like Z3 for example) to produce the ropchain. Support main architectures - x86, ARM, MIPS, PowerPC Skills The student should be comfortable with the C language, know some assembly and a high-level language. Also, knowing a little bit of automatic binary analysis wouldn’t hurt.\nDifficulty Advanced\nBenefits for the student The student will improve their skills in software exploitation and solvers.\nBenefits for the project This feature would greatly help during exploits development, and people would be able to ditch mona.py for Rizin ;)\nAssess requirements for evaluation 1st term: Creating the language for defining the ROP chain semantics and integrating it with SMT solver Final term: Working ropchain compiler, covered by tests and documented in the Rizin book. Mentors xvilka ret2libc Links/Resources ROPGadget Ropper Angrop ROPC exrop roper2 mona.py from corelan Hunting for ROP Gadgets in Style (2012) dropper a BARF-based rop chain generator Materials about the exloitation workshop at Hack.lu 2014 Slides for the exploitation part of workshop at Hack.lu 2015 RzEgg related bugs Bindings for languages other than C/C++ Rizin offers a convenient scripting interface through the rz-pipe APIs, which build upon its command-based interface. While this reduced interface is beneficial and well-suited for many scripting tasks, building more complex applications generally requires direct access to the public C api that Rizin offers. Using this API is directly possible in C and C++, as it is done in Cutter for example, but for other languages no generic bindings exist so far. The goal of this task is to use a bindings generator such as SWIG to expose Rizin\u0026rsquo;s C API to languages such as Python, Java or OCaml.\nTask Integrate SWIG-generated bindings into Rizin\u0026rsquo;s build system Write SWIG interfaces for all mature parts of Rizin\u0026rsquo;s C API Integrate the Python bindings into Cutter\u0026rsquo;s Python support Skills The student should be comfortable with the C and Python languages, as well as have a deep understanding of common memory management patterns such as ownership and reference counting.\nDifficulty Advanced\nBenefits for the student The student will gain an experience of exposing a C-based API with manual memory management to high-level, object-oriented languages with automatic memory management.\nAssess requirements for midterm/final evaluation 1st term: Bindings can be generated as part of the standard Rizin build system and small parts of the core API are already usable. Final term: All relevant parts of the API can be used through bindings and also from within Cutter\u0026rsquo;s Python interpreter. Mentors thestr4ng3r xvilka Links/Resources SWIG Website SWIG 4.0 Documentation Small PoC of bindings generated in Rizin\u0026rsquo;s build system Article about Rizin\u0026rsquo;s build system design Cutter Plugins and Python High Level API We currently don\u0026rsquo;t have API almost for plugin authors to use. We need to improve a lot of things about our Plugins support and take it few steps ahead. This task is only about improving the Python interface in Cutter, specifically its graphical user interface components. For a task about exposing Rizin\u0026rsquo;s API for disassembly, analysis and other purposes, see the Rizin bindings task above.\nTask Expose everything Cutter can offer for plugins authors. This includes high level API, integration of the plugin management etc. Accessing everything from Python (like Blender) - see issue #1662 Python integration and IPython console. Skills The student should be comfortable with the C++ and Python languages, and be familiar with Qt framework\nDifficulty Advanced\nBenefits for the student The student will gain an experience of creating a suitable API for scripting graphical interface programs.\nBenefits for the project It will greatly improve the scripting experience, will make API more consistent and will ease creating Cutter plugins by the community. Moreover, it will simplify testing of the Cutter features.\nAssess requirements for midterm/final evaluation 1st term: Design of the high level API and required Rizin changes. Review and implement all missing API functions that are accessible as interface controls. Final term: Implement the way to show the API when hovered over some interface control, create documentation. Mentors thestr4ng3r Megabeets Links/Resources SDB Module/API for Cutter Python/Jupyter integration Jupyter plugin for Cutter Multi-Tasking and Event-driven architecture Cutter is a reverse engineering framework that is powered by Rizin. The information it gets about functions, strings, imports, and the analysis are all performed in Rizin and displayed in Cutter. Currently, Cutter is pulling information from Rizin only on demand. This is problematic because sometimes the user performs changes (via plugins, the console widget, and more) that are affecting the information from Rizin, but Cutter doesn\u0026rsquo;t know about these changes to apply the to the UI. For example, if a user will define a new function in a Python script or via the console widget by using the Rizin command af @ \u0026lt;addr\u0026gt;, Cutter will now show this new function in the Functions widget until the user will refresh the interface manually (edit -\u0026gt; Refresh Contents).\nIn addition, this task will also handle the analysis in the background feature, to allow the analysis performed by Rizin to happen while the interface is active.\nTasks The overall implementation of this task should start from Rizin by adding events to many of the functions. This can be done using rz_events. For example, add an even for function creating, for section creation, for flag deletion, for name changed, and more\nAdd events to all the relevant functions inside Rizin Add support for these events in Cutter and refresh and update the relevant widgets per each event Support analysis in the background and allow the user to start its session while Rizin is analyzing (see #1856, #1574) Skills The student should be comfortable with the C++ for Cutter and C for Rizin. The student should be familiar with Qt framework.\nDifficulty Advanced\nBenefits for the student The student will gain an experience of creating complex event-driven software in both C and C++ languages.\nBenefits for the project It will allow to work on big files effortlessly in Cutter, will improve analysis quality as well.\nAssess requirements for midterm/final evaluation 1st term: Implement events everywhere in the relevant places across Rizin code and event-driven interaction with Cutter. Final term: Add support for the Cutter interface refresh based on the events from Rizin, implement analysis in background. Mentors thestr4ng3r Karliss Heap viewer We already have a nice heap (and memory map) parser and visualizer in Rizin (dm and dmh commands). After debugging becomes a first-class citizen in cutterland it would be awesome to have memory map and heap visualizations.\nTask Expose Rizin API/commands for Cutter to use for visualization Design and implement heap navigation and inspection widgets Provide the integration with current debugging mode in Cutter Make the implementation work with both local (native) and remote debugging modes Skills The student should be comfortable with the C++, and be familiar with Qt framework\nDifficulty Medium\nBenefits for the student The student will gain the understanding on how modern runtimes provide the heap for various programs, which will be beneficial for the binary exploitation skills.\nBenefits for the project It will greatly improve the debugging and reverse engineering experience for complex programs, also provides the way to design the exploitation techniques with the help of Rizin/Cutter.\nAssess requirements for midterm/final evaluation 1st term: Design and implement heap visualization widgets, add Rizin test and fixes Final term: Various bugfixes related to the heap inspection support on various platforms and allocators, tests and documentation. Mentors xvilka Megabeets Links/Resources Issue #1041 Heap Viewer plugin for IDA Pro Heap parsing for MacOS, tmalloc, jmalloc Dynamic Allocator Detection \u0026ldquo;heap\u0026rdquo;-marked Rizin issues Diffing mode Binary diffing is one of the most common tasks for the reverse engineer. There are many various tools available, but most of them are either detached from the main RE toolbox or poorly integrated. Rizin provides basic diffing features out of the box with rz-diff tool, but Cutter has no interface to represent similar functionality.\nTask Expose basic rz-diff features in the Cutter Create the interface to choose two files for diffing Create the way to show the differences in all main widgets: Hexadecimal view Disassembly view Graph view Pseudocode view Skills The student should be comfortable with the C++ language, and be familiar with Qt framework\nDifficulty Advanced\nBenefits for the student The student will gain an experience of creating efficient graphical interfaces.\nBenefits for the project It will greatly benefit the project since Cutter will be the only FOSS RE tool to provide this feature out of the box.\nAssess requirements for midterm/final evaluation 1st term: Expose the rz-diff features in the Cutter core and create the interface for opening files for diffing. Implement the diff modes for hexadecimal and disassembly views. Final term: Implement the diff modes for graph and pseudocode views, create the documentation. Mentors xvilka Megabeets Links/Resources Issue #1104 BinDiff Diaphora Microtasks When taking any of microtasks please be sure someone isn\u0026rsquo;t already working on them, and let us know if you are going to work on a particular one.\nFile formats Implementing the support for any new file format counts as a microtask. See New File-Format label for pending issues.\nDisassemblers and assemblers Implementing the support for any new architectire counts as a microtask. See New-Architecture label for pending issues.\nELF binary parsing. Rizin parses a lot of information about the ELF but doesn\u0026rsquo;t print everything. Thus, the improving the output of i* commands and rz-bin tool is important to match up with readelf (Add file offset and memory alignment for segments information (iSS command))\nMoreover, some information about PLT stubs not being resolved correctly.\nAnalysis The current code analysis has many caveats and issues which need addressing. Fixing them and writing more tests is important to stabilize and enhance rizin\u0026rsquo;s analysis engine.\nSee these issues or the \u0026ldquo;Analysis\u0026rdquo; project on our GitHub dashboard.\nBasefind #413 There are plenty of external scripts and plugins for finding the most probable base for raw firmware images. Opening raw firmwares with rizin is a common use case, so it makes sense to implement it as a part of rizin core.\nHeap analysis #157 Currently Rizin has support for heap exploration and analysis, but the feature is still basic and can be improved. Additionally, other allocators can be added (MacOS, tmalloc, etc.), but this should be done after a proper refactoring, because heap analysis shouldn\u0026rsquo;t depend on the debugger backend, and we may be able to use different heap tools.\nClass analysis for C++/ObjectiveC/Swift/Dlang/Java #416 Analysis classes, accessible under the ac command, is a relatively new feature of rizin. They provide a way to both manually and automatically manage and use information about classes in the binary.\nDevirtualize method calls using class vtables #414 Consider the following call: call dword [eax + 0x6c] Let\u0026rsquo;s assume eax is the base pointer of a vtable we have saved in class analysis and we want to find out the actual address of the called method.\nSo there should be a command that takes the offset (in this case 0x6c) and looks up the actual destination. It should be possible to call this command with a specific class, so it only looks into its vtable, or without a class, so it gives a list of possible destinations for all vtables that are not too small for the offset.\nWhen that is implemented, one could also add a command that does the same thing, but automatically takes the offset from the opcode at the current seek.\nAdd classes list to Vb Vb already supports browsing bin classes. The same thing should be implemented for classes from analysis.\nSignatures Rizin has a good support for loading and creating signatures, but it is not yet complete, thus some problems remain, for example: #272.\nAs Rizin supports FLIRT signatures loading from IDA Pro, not all of them are supported yet - e.g. version 5 compression.\nRefactoring Use \u0026ldquo;newshell\u0026rdquo; instead of old switch/case handling Rizin is in the middle of the switch from the old style switch/case manual parsing of every command to the centralized Tree-Sitter-based parser, providing every command handler argc/argv arguments. Best candidates for the initial switch are:\nlibrz/core/cmd_egg.c librz/core/cmd_hash.c librz/core/cmd_plugins.c A good example of transition is in these pull requests for t (types) command conversion:\nMigrating Types to the Newshell (1) Migrating Types to the Newshell (2) Migrating Types to the Newshell (3) Adding autocompletion for types commands Use internal API instead of commands Currently, Rizin\u0026rsquo;s source code is rife with calls to rz_core_cmd()-like functions that run the Rizin command. While it is a useful shortcut for developer, it makes a good source of the potential bugs in case of the command syntax or behavior change. If these changes happen they are invisible to the compiler, so it cannot warn on the changed syntax. It isn\u0026rsquo;t the case of changed function arguments count or type. Thus, all these calls eventually should be substituted with direct calls to the corresponding API functions. If there is no corresponding API function, then one should be created. Good examples of such cases are:\nRefactor Graph processing from commands to the API use Refactor Visual mode from commands to the API use Refactor Panels mode from commands to the API use In general you can just search for rz_core_cmd pattern in any place inside librz/.\nImproving the uplifting of the code to IL Rizin has its own intermediate language - ESIL, but not yet support it for all architectures. So the task is to add ESIL support to any architecture, which doesn\u0026rsquo;t has it yet.\nMiscellaneous Shell (dietline) improvements Currently Rizin uses its own readline-compatible implementation of the input handling in the embedded shell that is compact and portable between all supported platforms. It supports both Emacs and Vi modes, but not all bindings and features are supported. Some are omitted by choice, but some were simply not implemented. See the \u0026ldquo;dietline\u0026rdquo;-labeled issues.\nImproving regression suite and testing It is required to solve numerous issues, along with improving parallel execution and performance. Good example is to allow better filtering of the test types to run, for example to ignore debug tests. The next interesting idea is to setup and reuse Godbolt compilation engine for generating tests for different compilers and compilation options. There is even a command line tool for interacting with Godbolt - cce.\nAnother important part of the improving test suite is to cover more different formats and cases with expanding it. See the #114 issue with more details on how it can be done.\nRzGhidra There are many small issues in the decompiler output:\nString detection problem and one more. Show function arguments in calls pdgsd commands showing incorrect P-code Prioritize keeping vars with lower addresses Minor improvements for the SLEIGH plugin Some of these issues might be related on how Rizin and RzGhidra integrate and might require changes in the Rizin side.\nAlso note that most of these issues should be paired with the test to verify it will not break in the future.\n","permalink":"https://rizin.re/gsoc/2021/","summary":"TL;DR Jump to the Ideas list.\nIntroduction Each year since 2015, we have participated in Google Summer of Code as the Radare2 project and accomplished many goals. This year we participate as a fork - Rizin, but effectively continuing the same process and the same mentors.\nMentors Members of the Rizin and Cutter core teams have volunteered to guide students for GSoC’21. They were already guiding the students for the GSoC and RSoC in past years as part of the Radare2 project.","title":"GSoC 2021"},{"content":"Concrete high-level feature areas and changes.\n0.9 Add support for missing members of H8 MCU family, and implement RzIL uplifting of them Complete migration from ESIL to RzIL for all supported architectures and features Improve FreeBSD, NetBSD, and OpenBSD debugging Improve ARM64 and PowerPC debugging Migrate from Capstone to Zydis for x86 architecture to address long-standing problems with unsupported x86 instructions Support STABS (pre-DWARF) debug information loading Add support for proper preprocessor in the type parser Refactor types to introduce type scope Rewrite RzNum to support proper formulas, bitvectors, floats, and so on Remove concept of the \u0026ldquo;block\u0026rdquo; in favor of direct transparent IO access Full milestone is at https://github.com/rizinorg/rizin/milestone/21\n1.0 Add KB (Knowledge Base) support for storing metainformation in logic fact-based form Stable and documented API Refactor and merge various visual modes Refactor native debugger Big files loading support Remove GPL-only code in favor of LGPL Create documentation for the framework structure and all modules Create RzIL specification Full milestone is at https://github.com/rizinorg/rizin/milestone/5\n","permalink":"https://rizin.re/roadmap/","summary":"Concrete high-level feature areas and changes.\n0.9 Add support for missing members of H8 MCU family, and implement RzIL uplifting of them Complete migration from ESIL to RzIL for all supported architectures and features Improve FreeBSD, NetBSD, and OpenBSD debugging Improve ARM64 and PowerPC debugging Migrate from Capstone to Zydis for x86 architecture to address long-standing problems with unsupported x86 instructions Support STABS (pre-DWARF) debug information loading Add support for proper preprocessor in the type parser Refactor types to introduce type scope Rewrite RzNum to support proper formulas, bitvectors, floats, and so on Remove concept of the \u0026ldquo;block\u0026rdquo; in favor of direct transparent IO access Full milestone is at https://github.","title":""},{"content":"Our Pledge We as members, contributors, and leaders pledge to make participation in our community a harassment-free experience for everyone, regardless of age, body size, visible or invisible disability, ethnicity, sex characteristics, gender identity and expression, level of experience, education, socio-economic status, nationality, personal appearance, race, religion, or sexual identity and orientation.\nWe pledge to act and interact in ways that contribute to an open, welcoming, diverse, inclusive, and healthy community.\nOur Standards Examples of behavior that contributes to a positive environment for our community include:\nDemonstrating empathy and kindness toward other people Being respectful of differing opinions, viewpoints, and experiences Giving and gracefully accepting constructive feedback Accepting responsibility and apologizing to those affected by our mistakes, and learning from the experience Focusing on what is best not just for us as individuals, but for the overall community Examples of unacceptable behavior include:\nThe use of sexualized language or imagery, and sexual attention or advances of any kind Trolling, insulting or derogatory comments, and personal or political attacks Public or private harassment Publishing others\u0026rsquo; private information, such as a physical or email address, without their explicit permission Other conduct which could reasonably be considered inappropriate in a professional setting Enforcement Responsibilities Community leaders are responsible for clarifying and enforcing our standards of acceptable behavior and will take appropriate and fair corrective action in response to any behavior that they deem inappropriate, threatening, offensive, or harmful.\nCommunity leaders have the right and responsibility to remove, edit, or reject comments, commits, code, wiki edits, issues, and other contributions that are not aligned to this Code of Conduct, and will communicate reasons for moderation decisions when appropriate.\nScope This Code of Conduct applies within all community spaces, and also applies when an individual is officially representing the community in public spaces. Examples of representing our community include using an official e-mail address, posting via an official social media account, or acting as an appointed representative at an online or offline event.\nEnforcement Instances of abusive, harassing, or otherwise unacceptable behavior may be reported to the community leaders responsible for enforcement at moderation@rizin.re. All complaints will be reviewed and investigated promptly and fairly.\nAll community leaders are obligated to respect the privacy and security of the reporter of any incident.\nEnforcement Guidelines Community leaders will follow these Community Impact Guidelines in determining the consequences for any action they deem in violation of this Code of Conduct:\n1. Correction Community Impact: Use of inappropriate language or other behavior deemed unprofessional or unwelcome in the community.\nConsequence: A private, written warning from community leaders, providing clarity around the nature of the violation and an explanation of why the behavior was inappropriate. A public apology may be requested.\n2. Warning Community Impact: A violation through a single incident or series of actions.\nConsequence: A warning with consequences for continued behavior. No interaction with the people involved, including unsolicited interaction with those enforcing the Code of Conduct, for a specified period of time. This includes avoiding interactions in community spaces as well as external channels like social media. Violating these terms may lead to a temporary or permanent ban.\n3. Temporary Ban Community Impact: A serious violation of community standards, including sustained inappropriate behavior.\nConsequence: A temporary ban from any sort of interaction or public communication with the community for a specified period of time. No public or private interaction with the people involved, including unsolicited interaction with those enforcing the Code of Conduct, is allowed during this period. Violating these terms may lead to a permanent ban.\n4. Permanent Ban Community Impact: Demonstrating a pattern of violation of community standards, including sustained inappropriate behavior, harassment of an individual, or aggression toward or disparagement of classes of individuals.\nConsequence: A permanent ban from any sort of public interaction within the community.\nAttribution This Code of Conduct is adapted from the Contributor Covenant, version 2.0, available at https://www.contributor-covenant.org/version/2/0/code_of_conduct.html.\nCommunity Impact Guidelines were inspired by Mozilla\u0026rsquo;s code of conduct enforcement ladder.\nFor answers to common questions about this code of conduct, see the FAQ at https://www.contributor-covenant.org/faq. Translations are available at https://www.contributor-covenant.org/translations.\n","permalink":"https://rizin.re/code-of-conduct/","summary":"Rizin\u0026rsquo;s Code of Conduct","title":"Code Of Conduct"},{"content":"","permalink":"https://rizin.re/community/","summary":"community","title":"Community"},{"content":"","permalink":"https://rizin.re/teams/community/","summary":"","title":"Community Team"},{"content":"","permalink":"https://rizin.re/teams/core/","summary":"","title":"Core Team"},{"content":"","permalink":"https://rizin.re/teams/cutter-core/","summary":"","title":"Cutter Core Team"},{"content":"","permalink":"https://rizin.re/teams/distributions-and-packaging/","summary":"","title":"Distributions and Packaging Team"},{"content":"","permalink":"https://rizin.re/teams/documentation/","summary":"","title":"Documentation Team"},{"content":"","permalink":"https://rizin.re/teams/infrastructure/","summary":"","title":"Infrastructure Team"},{"content":"","permalink":"https://rizin.re/organization/","summary":"organization","title":"Organization"},{"content":"","permalink":"https://rizin.re/teams/package-manager-and-plugins/","summary":"","title":"Package Manager and Plugins Team"},{"content":"","permalink":"https://rizin.re/teams/security/","summary":"","title":"Security Team"}]
\ No newline at end of file
diff --git a/index.xml b/index.xml
index b0b5035..012aad0 100644
--- a/index.xml
+++ b/index.xml
@@ -10,7 +10,7 @@
https://rizin.re/images/rizin_preview.png
Hugo -- gohugo.io
- Thu, 02 Jan 2025 00:00:00 +0000
+ Sun, 26 Jan 2025 00:00:00 +00002024 Year Summary
https://rizin.re/posts/year-2024-summary/
@@ -185,6 +185,18 @@ At first, I fixed several bugs in the new tree-sitter based type parser.Who are you? Why did you fork radare2? What will happen to Cutter now? Our answers to your frequently asked questions.
+
+ GSoC 2025
+ https://rizin.re/gsoc/2025/
+ Sun, 26 Jan 2025 00:00:00 +0000
+
+ https://rizin.re/gsoc/2025/
+ TL;DR Jump to the Ideas list.
+Introduction This year, we participate again, effectively continuing the tradition since 2015.
+Mentors Members of the Rizin and Cutter core teams have volunteered to guide participants for GSoC’25. They have already been guiding the participants for the GSoC and RSoC in past years. Please feel free to reach out to any of them if you need any help in selecting a project.
+Anton Kochkov Mattermost: xvilka – @akochkov Florian Märkl Mattermost/Telegram: @thestr4ng3r – @thestr4ng3r Giovanni Dante Grazioli Mattermost/Telegram: @deroad @der0ad And many others Development methodology Currently, all repositories are hosted on GitHub main organization account and bugs are tracked on GitHub issues too.
+
+
GSoC 2024
https://rizin.re/gsoc/2024/
diff --git a/sitemap.xml b/sitemap.xml
index 418014a..54ca8ee 100644
--- a/sitemap.xml
+++ b/sitemap.xml
@@ -58,6 +58,15 @@
https://rizin.re/posts/faq/2020-12-05T00:00:00+00:00
+
+ https://rizin.re/gsoc/2025/
+ 2025-01-26T00:00:00+00:00
+
+ https://rizin.re/
+ 2025-01-26T00:00:00+00:00
+
+ https://rizin.re/gsoc/
+ 2025-01-26T00:00:00+00:00https://rizin.re/tags/capstone/2025-01-02T00:00:00+00:00
@@ -70,9 +79,6 @@
https://rizin.re/tags/rizin/2025-01-02T00:00:00+00:00
-
- https://rizin.re/
- 2025-01-02T00:00:00+00:00https://rizin.re/tags/rzil/2025-01-02T00:00:00+00:00
@@ -88,9 +94,6 @@
https://rizin.re/gsoc/2024/2024-01-23T00:00:00+00:00
-
- https://rizin.re/gsoc/
- 2024-01-23T00:00:00+00:00https://rizin.re/tags/dwarf/2023-09-06T00:00:00+00:00