Skip to content

benchmark: commit0 instances with num_passed 0 and num_tests 0 #112

@juanmichelini

Description

@juanmichelini

Observed on: #82 (update to commit hash once merged)

Steps to reproduce:

run
uv run commit0-infer .llm_config/sonnet-4.json --repo-split lite --n-limit 16 --num-workers 16 --max-iterations 100 --note "test-commit0-$(date +%Y%m%d%H%M%S)"

see the output.jsonl file, some intstances have "num_tests":0 and "num_passed":0

{"instance_id":"parsel","test_result":{"eval_result":{"name":"parsel","sum":0,"passed":0,"num_passed":0,"num_tests":0}}, ...}
This should be addressed on commit0/run_infer.py

Currently the instances that work are due to this fix


# Install pytest and required plugins for test reporting
        plugin_install_cmd = f"cd /workspace/{workspace_dir_name} && (uv pip install pytest pytest-json-report pytest-cov || pip install pytest pytest-json-report pytest-cov)"
        res = workspace.execute_command(plugin_install_cmd, timeout=600)
        if res.exit_code != 0:
            raise RuntimeError(f"Failed to install pytest and plugins: {res.stderr}")
        logger.info("Installed pytest and required plugins")

        # Verify pytest and plugin installation
        verify_pytest_cmd = f"cd /workspace/{workspace_dir_name} && pytest --version"
        verify_pytest_res = workspace.execute_command(verify_pytest_cmd, timeout=60)
        logger.info(f"Pytest verification exit code: {verify_pytest_res.exit_code}")
        if verify_pytest_res.exit_code == 0:
            logger.info(f"Pytest available: {verify_pytest_res.stdout.strip()}")
        else:
            logger.warning(f"Pytest verification failed: {verify_pytest_res.stderr}")

        verify_plugin_cmd = f"cd /workspace/{workspace_dir_name} && python -c 'import pytest_jsonreport; print(\"Plugin available\")'"
        verify_plugin_res = workspace.execute_command(verify_plugin_cmd, timeout=60)
        logger.info(f"Plugin verification exit code: {verify_plugin_res.exit_code}")
        if verify_plugin_res.exit_code == 0:
            logger.info("pytest-json-report plugin verified successfully")
        else:
            logger.warning(f"Plugin verification failed: {verify_plugin_res.stderr}")

Maybe they can be refactored in a function prepare_instance_test_framework and completed to work for all instances.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions