-
Notifications
You must be signed in to change notification settings - Fork 610
add codemap script #2658
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
add codemap script #2658
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @williballenthin, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
Summary of Changes
This pull request introduces a new script, codemap.py
, designed to display the layout of a program. It visualizes metadata, sections, libraries, and a list of functions, including xrefs, API calls, strings, calls from the function, and optionally capa matches. The script leverages lancelot
and rich
libraries for analysis and presentation. It accepts a BinExport2 file as input, and optionally capa JSON results and Assemblage JSONL files to enhance the analysis.
Highlights
- Script Addition: Adds a new script
scripts/codemap.py
for visualizing program layout. - Dependency Management: Specifies dependencies like
protobuf
,python-lancelot
, andrich
within the script's header. - Data Extraction: Extracts and presents key program information such as metadata, sections, libraries, and functions.
- Capa Integration: Optionally integrates with capa results to display rule matches within functions.
- Assemblage Integration: Optionally integrates with Assemblage data to update function names.
- Thunk Resolution: Resolves thunks to provide more accurate call graph information.
- Output Formatting: Uses the
rich
library to format the output with colors and indentation for better readability.
Changelog
- scripts/codemap.py
- Adds a new script to display the layout of a program.
- Includes metadata, sections, libraries, and function details.
- Supports optional integration with capa and Assemblage data.
- Uses
lancelot
for BinExport2 analysis. - Uses
rich
for formatted output.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command>
or @gemini-code-assist <command>
. Below is a summary of the supported commands.
Feature | Command | Description |
---|---|---|
Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/
folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
A map of code,
Functions, calls, a winding road,
Insights we find.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add bug fixes, new features, breaking changes and anything else you think is worthwhile mentioning to the master (unreleased)
section of CHANGELOG.md. If no CHANGELOG update is needed add the following to the PR description: [x] No CHANGELOG update needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This PR introduces a script to display the layout of a program, including metadata, sections, libraries, functions, xrefs, API calls, strings, and capa matches. The script leverages lancelot and rich for analysis and presentation. Overall, the script provides a valuable tool for understanding program structure. However, there are a few areas that could be improved for clarity, efficiency, and robustness.
Summary of Findings
- Error Handling: The script uses a broad
except
clause to catchgoogle.protobuf.message.DecodeError
. It would be better to catch a more specific exception or add additional checks to ensure that the file is a valid BinExport2 file before attempting to parse it. This will prevent unexpected behavior if the script is run on an invalid file. - Thunk Resolution: The script contains logic to resolve thunks, but there are several places where thunks are handled differently or not at all. It would be beneficial to consolidate the thunk resolution logic into a single function or class to ensure consistency and reduce code duplication.
- Assemblage Location Handling: The script assumes that the base address is the lowest mapped page. This assumption may not always be correct, especially for more complex binaries. It would be better to either explicitly determine the base address or provide a way for the user to specify it.
- Missing Documentation: The script lacks documentation for some of the key functions and classes, such as
Renderer
andAssemblageLocation
. Adding docstrings to these functions and classes would improve the script's readability and maintainability.
Merge Readiness
The script provides a useful tool for analyzing program layouts. However, the identified issues related to error handling, thunk resolution, base address guessing, and documentation should be addressed before merging. I am unable to approve this pull request, and recommend that it not be merged until the critical and high severity issues are addressed, and that others review and approve this code before merging.
else: | ||
raise ValueError("unexpected section name") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# we don't know which function this is. | ||
# hopefully its a function recognized in our BinExport analysis. | ||
# *shrug* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for call_target_address in instruction.call_target: | ||
if call_target_address in idx.thunks: | ||
call_target_address = idx.thunks[call_target_address] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for call_target_address in instruction.call_target: | ||
call_target_index = idx.vertex_index_by_address[call_target_address] | ||
call_target_vertex = be2.call_graph.vertex[call_target_index] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
9fb8434
to
d0bafd6
Compare
This PR adds a script that displays the layout of a program:
for example:
This was originally developed to help with research segmenting a program into its object files, but it turns out to be an interesting overview of programs generally.
Under the hood, this program uses lancelot to process the program into a BinExport2 representation, and then works with that as the IR.
Checklist