Merge upstream PRs and update to Ghidra 11.4.2#1
Conversation
+added compiler plugin, set Java ver to 22
This commit adds "Decompile Timeout" option to the MCP plugin that allows for setting a custom timeout to the `decompileFunction`. In particular it is useful to set to 0 or a large value when dealing with large functions that take a while to decompile. The default value is set to 30. In addition as part of this commit, the request timeout from the MCP bridge to the plugin is made configurable with the `--ghidra-timeout` argument. The default value is 5 seconds.
Allow specifying exclusive upper bounds for similarity and confidence in BSim function queries. Filter results accordingly and update pagination and formatting logic.
Co-authored-by: Donovoi <[email protected]>
Co-authored-by: Donovoi <[email protected]>
There was a problem hiding this comment.
Pull Request Overview
This PR adds BSim (Binary Similarity) integration to the GhidraMCPPlugin, enabling function similarity matching against BSim databases. The update also upgrades Ghidra dependencies from version 11.3.2 to 11.4.2 and introduces a configurable decompilation timeout.
Key Changes:
- BSim database connectivity with support for H2 and PostgreSQL databases
- Function similarity querying (single function and batch processing)
- Configurable decompilation timeout to improve flexibility
- Upgrade to Ghidra 11.4.2 with new BSim-related dependencies
Reviewed Changes
Copilot reviewed 4 out of 5 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| src/main/java/com/lauriewired/GhidraMCPPlugin.java | Core BSim functionality including database connection, querying, and match retrieval; configurable decompile timeout; refactored comment type handling to use enum |
| pom.xml | Updated Ghidra dependencies to 11.4.2; added BSim and required libraries (H2, PostgreSQL, Apache Commons); updated Maven plugin versions |
| bridge_mcp_ghidra.py | Added BSim-related MCP tools and configurable request timeout parameter |
| README.md | Updated documentation with BSim feature description and required library dependencies |
| .gitignore | Added Python cache directory exclusion |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
|
||
| // Use built-in scanFunctions to scan all at once | ||
| try { | ||
| gensig.scanFunctions(funcManager.getFunctions(true), 30, new ConsoleTaskMonitor()); |
There was a problem hiding this comment.
Hardcoded timeout of 30 seconds in scanFunctions while the decompile timeout is configurable via this.decompileTimeout. For consistency, consider using a configurable timeout here as well, especially since scanning all functions could be time-consuming in large programs.
| gensig.scanFunctions(funcManager.getFunctions(true), 30, new ConsoleTaskMonitor()); | |
| gensig.scanFunctions(funcManager.getFunctions(true), this.decompileTimeout, new ConsoleTaskMonitor()); |
| // then limit to maxMatches. Setting query.max too early might exclude valid matches. | ||
| QueryNearest query = new QueryNearest(); | ||
| query.manage = descManager; | ||
| query.max = Integer.MAX_VALUE; // Get all potential matches |
There was a problem hiding this comment.
Setting query.max to Integer.MAX_VALUE could cause performance issues with large BSim databases. Consider using a more reasonable upper bound or making this configurable to prevent excessive memory usage and query time.
| // then limit to maxMatchesPerFunction. Setting query.max too early might exclude valid matches. | ||
| QueryNearest query = new QueryNearest(); | ||
| query.manage = descManager; | ||
| query.max = Integer.MAX_VALUE; // Get all potential matches |
There was a problem hiding this comment.
Setting query.max to Integer.MAX_VALUE when querying all functions could lead to severe performance degradation and memory issues, especially when scanning multiple functions. Consider implementing a per-function limit or a total result cap to prevent resource exhaustion.
| query.max = Integer.MAX_VALUE; // Get all potential matches | |
| // Limit the total number of results to avoid resource exhaustion | |
| int HARD_RESULT_CAP = 100_000; | |
| int totalMax = Math.min(queriedFunctions * maxMatchesPerFunction, HARD_RESULT_CAP); | |
| query.max = totalMax; |
| validMatchCount++; | ||
|
|
||
| // Early stopping: if we've reached maxMatches valid matches, remove all remaining | ||
| if (validMatchCount > maxMatches) { |
There was a problem hiding this comment.
The early stopping logic removes matches when validMatchCount exceeds maxMatches, but it should stop at exactly maxMatches. This condition should be validMatchCount >= maxMatches after incrementing, or the increment should happen before the check. Currently, this keeps one extra match beyond the limit.
| if (validMatchCount > maxMatches) { | |
| if (validMatchCount >= maxMatches) { |
|
|
||
| options.registerOption(DECOMPILE_TIMEOUT_OPTION_NAME, DEFAULT_DECOMPILE_TIMEOUT, | ||
| null, | ||
| "Decompilation timeout. " + |
There was a problem hiding this comment.
The description should specify the unit of the timeout value (seconds). Without this information, users may not know what value to enter.
| "Decompilation timeout. " + | |
| "Decompilation timeout (in seconds). " + |
| parser.add_argument("--transport", type=str, default="stdio", choices=["stdio", "sse"], | ||
| help="Transport protocol for MCP, default: stdio") | ||
| parser.add_argument("--ghidra-timeout", type=int, default=DEFAULT_REQUEST_TIMEOUT, | ||
| help=f"MCP requests timeout, default: {DEFAULT_REQUEST_TIMEOUT}") |
There was a problem hiding this comment.
The help text should specify the unit of the timeout value (seconds) to clarify what the numeric value represents.
| help=f"MCP requests timeout, default: {DEFAULT_REQUEST_TIMEOUT}") | |
| help=f"MCP requests timeout in seconds, default: {DEFAULT_REQUEST_TIMEOUT} seconds") |
| - `Ghidra/Features/BSim/lib/BSim.jar lib/BSim.jar` | ||
| - `Ghidra/Features/BSim/lib/commons-dbcp2-2.9.0.jar lib/commons-dbcp2-2.9.0.jar` | ||
| - `Ghidra/Features/BSim/lib/commons-logging-1.2.jar lib/commons-logging-1.2.jar` | ||
| - `Ghidra/Features/BSim/lib/commons-pool2-2.11.1.jar lib/commons-pool2-2.11.1.jar` | ||
| - `Ghidra/Features/BSim/lib/h2-2.2.220.jar lib/h2-2.2.220.jar` | ||
| - `Ghidra/Features/BSim/lib/postgresql-42.7.6.jar lib/postgresql-42.7.6.jar` |
There was a problem hiding this comment.
The copy instructions appear to have the destination path duplicated at the end instead of showing source -> destination. The format should match the pattern used in lines 107-115 where only the destination path is shown, or clarify the copy command syntax (e.g., 'cp source dest').
| - `Ghidra/Features/BSim/lib/BSim.jar lib/BSim.jar` | |
| - `Ghidra/Features/BSim/lib/commons-dbcp2-2.9.0.jar lib/commons-dbcp2-2.9.0.jar` | |
| - `Ghidra/Features/BSim/lib/commons-logging-1.2.jar lib/commons-logging-1.2.jar` | |
| - `Ghidra/Features/BSim/lib/commons-pool2-2.11.1.jar lib/commons-pool2-2.11.1.jar` | |
| - `Ghidra/Features/BSim/lib/h2-2.2.220.jar lib/h2-2.2.220.jar` | |
| - `Ghidra/Features/BSim/lib/postgresql-42.7.6.jar lib/postgresql-42.7.6.jar` | |
| - `Ghidra/Features/BSim/lib/BSim.jar` | |
| - `Ghidra/Features/BSim/lib/commons-dbcp2-2.9.0.jar` | |
| - `Ghidra/Features/BSim/lib/commons-logging-1.2.jar` | |
| - `Ghidra/Features/BSim/lib/commons-pool2-2.11.1.jar` | |
| - `Ghidra/Features/BSim/lib/h2-2.2.220.jar` | |
| - `Ghidra/Features/BSim/lib/postgresql-42.7.6.jar` |
Merges three upstream PRs from LaurieWired/GhidraMCP: Ghidra 11.4.2 compatibility (PR LaurieWired#95), configurable timeouts (PR LaurieWired#74), and BSim support (PR LaurieWired#92).
Changes
Ghidra 11.4.2 Compatibility
CodeUnitconstants withCommentTypeenum:searchByNameInAllCategories()to return first exact match instead of first case-insensitive matchConfigurable Timeouts
decompileTimeoutplugin option (default: 30s)--ghidra-timeoutCLI argument for HTTP requests (default: 5s)BSim Integration
bsim_select_database,bsim_query_function,bsim_query_all_functions,bsim_disconnect,bsim_statusBuild Configuration
Excluded
PR LaurieWired#90 (56 MCP tools, 165 files, 62k+ lines) was excluded to preserve project structure and maintainability.
Original prompt
💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.