Skip to content
This repository was archived by the owner on Nov 4, 2025. It is now read-only.

Conversation

@ryanhoangt
Copy link

Reference Issues/PRs

Fix SWE-bench#377

What does this implement/fix? Explain your changes.

When running patch eval on Modal, I see that for some instances, the content of test output files being captured are out of order, which causes the test summary to fall outside the >>>>> Start Test Output and >>>>> End Test Output markers. I attached a sample log file below.

test_output_astropy__astropy-12907.txt

This PR removes the content slicing line and uses the whole file content for parsing.

Any other comments?

🧡 Thanks for contributing!

sedrick-keh-tri and others added 30 commits March 24, 2025 23:28
Some of the repo_setup.sh scripts leave the working tree in a dirty state which can make it difficult to generate a patch that applies cleanly.  This change commits any outstanding changges such that any patch generated with `git diff` will cleanly apply to a newly launched container during the evaluation step.
* Simplify installation guidelines for inference submodule

* Fixes SWE-bench#368

* Update version
* add docs

* Add leaderboard

* Remove unused import

* Update docs

* Update version

* Update:
 docs
* Support multilingual evaluation

* CI: Fix documentation building vs deploying

* Minor fixes

* Remove some redundancy

* Update dataset ref

---------

Co-authored-by: Kilian Lieret <[email protected]>
Co-authored-by: John Yang <[email protected]>
…h#358)

* fix: preserve all issue references with same keyword in PRs

* Modified extract_resolved_issues to use a set instead of list to store references
…t.py SWE-bench#368 (SWE-bench#369)

* fix prompt_col from text_inputs to text

* update log

---------

Co-authored-by: changqingai <[email protected]>
Match the documentation for installing additional dependencies with the contents of `pyproject.toml`
carlosejimenez and others added 18 commits June 1, 2025 21:07
This action fails if more than 1 is running at the same time (which
happens if you merge multiple PRs in quick succession). Fix is by
disabling concurrency, so they just queue up.
SWE-bench#417)

* fix(build): fix python base images requirement types-setuptools incorrect version when replacing

* Update clean_requirements and clean_environment_yml patterns to remove version specs safely

---------

Co-authored-by: baixuran <[email protected]>
Co-authored-by: carlose <[email protected]>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Inconsistent evaluation in Modal vs without using Modal