Skip to content

Add SubprocessEvaluator for process-isolated evaluation#77

Open
odelliab wants to merge 4 commits into
skydiscover-ai:mainfrom
odelliab:feature/subprocess-evaluator
Open

Add SubprocessEvaluator for process-isolated evaluation#77
odelliab wants to merge 4 commits into
skydiscover-ai:mainfrom
odelliab:feature/subprocess-evaluator

Conversation

@odelliab

@odelliab odelliab commented Jun 7, 2026

Copy link
Copy Markdown
Contributor

Summary

Adds SubprocessEvaluator — a new evaluator that runs each candidate in a separate
Python subprocess, providing process-level isolation without requiring Docker.

Problem

When evaluating candidate programs that can corrupt process state (e.g. CUDA kernels with illegal
memory access, C extensions that segfault, memory corruption), the in-process
Evaluator allows one bad candidate to poison all subsequent evaluations. For GPU
workloads, cudaErrorIllegalAddress is sticky — once triggered, the CUDA context is
permanently corrupted and all further operations fail.

Solution

SubprocessEvaluator provides a middle ground between Evaluator (fast, no isolation) and
ContainerizedEvaluator (full Docker):

Evaluator Isolation Overhead Setup
Evaluator None ~0ms None
SubprocessEvaluator Process ~100-200ms None
ContainerizedEvaluator Container High Dockerfile required
  • Each evaluate() call spawns a fresh Python subprocess
  • Child process gets its own CUDA context / address space
  • If a candidate crashes, only the subprocess dies
  • Same evaluate(program_path) -> dict interface as Evaluator

Usage

set evaluator.subprocess_isolation: true in config YAML.

The auto-detection in create_evaluator() checks this flag after Harbor/Container detection but before falling back to in-process.

##Test
Includes 6 tests covering: successful evaluation, noisy stdout parsing, crash isolation, exception handling, crash-then-success recovery, and timeout behavior.

Usage

evaluator:
  subprocess_isolation: true
  evaluation_file: my_evaluator.py
  timeout: 300

When evaluating candidate programs that can corrupt process state (e.g.
CUDA kernels with illegal memory access, C extensions that segfault),
the in-process Evaluator allows one bad candidate to poison all
subsequent evaluations within the same CUDA context.

SubprocessEvaluator provides a middle ground between the in-process
Evaluator (no isolation) and ContainerizedEvaluator (requires Docker):

- Each evaluate() call spawns a fresh Python subprocess
- Child process gets its own CUDA context / address space
- If a candidate crashes, only the subprocess dies
- ~100-200ms overhead per evaluation for process startup
- Same evaluate(program_path) -> dict interface as Evaluator

Usage: set `evaluator.subprocess_isolation: true` in config YAML.

The auto-detection in create_evaluator() checks this flag after
Harbor/Container detection but before falling back to in-process.

Includes 6 tests covering: successful evaluation, noisy stdout parsing,
crash isolation, exception handling, crash-then-success recovery, and
timeout behavior.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces SubprocessEvaluator to run candidate evaluations in isolated Python subprocesses, preventing crashes from affecting the parent process. Feedback on the implementation highlights several key areas for improvement: handling non-JSON-serializable return types (such as EvaluationResult or numpy arrays) in the wrapper script, fixing a potential resource leak and NameError during temporary file creation, replacing the blocking run_in_executor pattern with a non-blocking asyncio.create_subprocess_exec to avoid orphaned processes on timeout, and removing unnecessary sys.path modifications in the parent process.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread skydiscover/evaluation/subprocess_evaluator.py
Comment thread skydiscover/evaluation/subprocess_evaluator.py
Comment thread skydiscover/evaluation/subprocess_evaluator.py
Comment thread skydiscover/evaluation/subprocess_evaluator.py Outdated
odelliab and others added 3 commits June 7, 2026 12:30
- Wrapper template: handle EvaluationResult objects via to_dict(),
  use default=str for non-serializable types (numpy arrays etc.)
- Fix temp file leak: assign temp_path before write, guard cleanup
  with `if temp_path and os.path.exists(temp_path)`
- Replace run_in_executor + subprocess.run with asyncio.create_subprocess_exec
  for proper timeout handling (proc.kill() + await proc.wait() on timeout)
- Remove unnecessary sys.path modification in parent process
  (child gets eval_dir via PYTHONPATH env var)
- Remove unused subprocess import
Reuse the existing SafeJSONEncoder from checkpoint_manager (with an
inline fallback if the import fails in the subprocess) instead of the
generic default=str approach.
- Fix black violations in subprocess_evaluator.py (argument-per-line,
  slice spacing)
- Replace sys.modules stubbing in test_subprocess_evaluator.py with
  normal imports — the stub for skydiscover.config poisoned the module
  cache and caused ImportError in tests collected afterwards

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant