Skip to content

Create directory to group native samples #2623

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

hcnpeiris
Copy link

@hcnpeiris hcnpeiris commented Mar 11, 2025

Create directory to group native samples

Fixes capa#1787
Depends on: mandiant/capa-testfiles#279

Since after changing the native samples path, pytest file paths have been updated accordingly

However, one expected failure test (xfailed) unexpectedly passed.
I ran pytest on the latest master branch, and it produced the same result.

Pytest Results After Updating File Paths

Image 2025-03-11 at 18 21

Pytest Results for latest master branch

Image 2025-03-11 at 16 51

Request for Feedback

Can I get feedback on resolving this issue?

Checklist

  • No CHANGELOG update needed
  • No new tests needed
  • No documentation update needed

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @hcnpeiris, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

Summary of Changes

This pull request addresses issue capa#1787 by creating a directory structure to group native samples within the test data. The primary change involves modifying file paths in tests/fixtures.py to reflect the new directory structure, which includes segregating samples into benign, malware, and pma_labs subdirectories. Additionally, file paths in tests/test_binexport_accessors.py, tests/test_elffile_features.py, and tests/test_scripts.py have been updated to align with the new directory structure. The author notes that one expected failure test unexpectedly passed after the file path updates and is seeking feedback on this issue.

Highlights

  • Directory Restructuring: Creates a new directory structure under tests/data to group native samples into benign, malware, and pma_labs subdirectories.
  • File Path Updates: Updates file paths in tests/fixtures.py, tests/test_binexport_accessors.py, tests/test_elffile_features.py, and tests/test_scripts.py to reflect the new directory structure.
  • Unexpected Test Result: An expected failure test unexpectedly passed after the file path updates, requiring further investigation.

Changelog

Click here to see the changelog
  • tests/fixtures.py
    • Modified search paths for binexport2 extractor to include data/malware.
    • Updated file paths for various test samples to reflect the new directory structure (e.g., moving mimikatz to data/benign, kernel32 to data/malware, and PMA labs to data/pma_labs).
    • Updated file paths for various malware samples.
  • tests/test_binexport_accessors.py
    • Updated the file path for the mimikatz binexport extractor to point to the new location in data/benign.
  • tests/test_elffile_features.py
    • Updated sample paths for ELF files to reflect the new location in data/malware.
  • tests/test_scripts.py
    • Updated the binary file path and binexport2 file path to reflect the new locations in data/malware and data/benign respectively.
    • Updated the CAPA_SAMPLES_DIR environment variable to point to the data/benign directory for binexport scripts.
    • Updated the source file path for bulk processing tests to reflect the new location in data/malware.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.


The tests did run, the paths aligned,
A structure new, carefully designed.
But a test that failed, now sees the light,
A puzzle to solve, both day and night.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The pull request aims to improve the organization of test files by grouping native samples into a dedicated directory. The changes primarily involve updating file paths in the test suite. The author has identified an unexpected test pass and is seeking feedback on resolving it. Overall, the changes seem reasonable, but the identified test issue needs further investigation.

Summary of Findings

  • Inconsistent file categorization: The categorization of files as 'benign' or 'malware' seems inconsistent. For example, mimikatz is categorized as benign, which might not be accurate in all contexts. Review the categorization to ensure it aligns with the intended use case.
  • Unexpected test pass: The pull request description mentions that one expected failure test unexpectedly passed. While the author notes that this also occurs on the latest master branch, it's important to investigate and address this issue to ensure tests are behaving as expected.

Merge Readiness

The pull request should not be merged until the unexpected test pass is investigated and resolved. While the file path updates seem correct, the test results indicate a potential underlying issue that needs to be addressed. I am unable to approve this pull request, and recommend that others review and approve this code before merging. At a minimum, the high severity issue should be addressed before merging.

Comment on lines 333 to 334
if name == "mimikatz":
return CD / "data" / "mimikatz.exe_"
return CD / "data" / "benign" / "mimikatz.exe_"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Is mimikatz truly benign? While it can be used for legitimate purposes, it's often associated with malicious activity. Consider whether it should be categorized as malware or have a separate 'potentially unwanted program' category.

@@ -593,7 +593,7 @@ def test_pattern_matching_not_stack():
assert match_address_with_be2(BE2_EXTRACTOR_687, queries, 0x107918) is None


BE2_EXTRACTOR_MIMI = fixtures.get_binexport_extractor(CD / "data" / "binexport2" / "mimikatz.exe_.ghidra.BinExport")
BE2_EXTRACTOR_MIMI = fixtures.get_binexport_extractor(CD / "data" / "benign" / "binexport2" / "mimikatz.exe_.ghidra.BinExport")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Similar to the previous comment, is mimikatz truly benign in this context? Ensure the categorization aligns with the test's purpose.

@@ -96,7 +96,7 @@ def test_scripts(script, args):
)
def test_binexport_scripts(script, args):
# define sample bytes location
os.environ["CAPA_SAMPLES_DIR"] = str(Path(CD / "data"))
os.environ["CAPA_SAMPLES_DIR"] = str(Path(CD / "data" / "benign" ))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Why is CAPA_SAMPLES_DIR set to the benign directory here? Does this mean that the binexport scripts are only intended to be run on benign samples?

@hcnpeiris
Copy link
Author

hcnpeiris commented Mar 17, 2025

@mr-tz
I changed the path of the test file in the test scripts to match the capa-testfiles PR: capa-testfiles PR #279

Here is the pytest result. However, tests/test_scripts.py::test_proto_conversion, which was expected to fail, was passed.
I also checked pytest against the latest master branch, and it shows the same result.
Screenshot 2025-03-16 at 20 51 06
Can I get feedback on this?

Copy link
Collaborator

@mr-tz mr-tz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thank you.

@mr-tz
Copy link
Collaborator

mr-tz commented Mar 20, 2025

I'll take another closer look and think about the naming. Maybe we add OS or arch but I'd like to keep it simple.

@hcnpeiris
Copy link
Author

Hi @mr-tz, just checking in — any updates on the PR or anything you’d like me to adjust?

@mr-tz
Copy link
Collaborator

mr-tz commented Mar 27, 2025

Hey, sorry it may take a bit on my end. Thanks for the ping.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Testfiles: create directory to group native samples
2 participants