Add SubprocessEvaluator for process-isolated evaluation by odelliab · Pull Request #77 · skydiscover-ai/skydiscover

odelliab · 2026-06-07T09:13:08Z

Summary

Adds SubprocessEvaluator — a new evaluator that runs each candidate in a separate
Python subprocess, providing process-level isolation without requiring Docker.

Problem

When evaluating candidate programs that can corrupt process state (e.g. CUDA kernels with illegal
memory access, C extensions that segfault, memory corruption), the in-process
Evaluator allows one bad candidate to poison all subsequent evaluations. For GPU
workloads, cudaErrorIllegalAddress is sticky — once triggered, the CUDA context is
permanently corrupted and all further operations fail.

Solution

SubprocessEvaluator provides a middle ground between Evaluator (fast, no isolation) and
ContainerizedEvaluator (full Docker):

Evaluator	Isolation	Overhead	Setup
Evaluator	None	~0ms	None
SubprocessEvaluator	Process	~100-200ms	None
ContainerizedEvaluator	Container	High	Dockerfile required

Each evaluate() call spawns a fresh Python subprocess
Child process gets its own CUDA context / address space
If a candidate crashes, only the subprocess dies
Same evaluate(program_path) -> dict interface as Evaluator

Usage

set evaluator.subprocess_isolation: true in config YAML.

The auto-detection in create_evaluator() checks this flag after Harbor/Container detection but before falling back to in-process.

##Test
Includes 6 tests covering: successful evaluation, noisy stdout parsing, crash isolation, exception handling, crash-then-success recovery, and timeout behavior.

Usage

evaluator:
  subprocess_isolation: true
  evaluation_file: my_evaluator.py
  timeout: 300

When evaluating candidate programs that can corrupt process state (e.g. CUDA kernels with illegal memory access, C extensions that segfault), the in-process Evaluator allows one bad candidate to poison all subsequent evaluations within the same CUDA context. SubprocessEvaluator provides a middle ground between the in-process Evaluator (no isolation) and ContainerizedEvaluator (requires Docker): - Each evaluate() call spawns a fresh Python subprocess - Child process gets its own CUDA context / address space - If a candidate crashes, only the subprocess dies - ~100-200ms overhead per evaluation for process startup - Same evaluate(program_path) -> dict interface as Evaluator Usage: set `evaluator.subprocess_isolation: true` in config YAML. The auto-detection in create_evaluator() checks this flag after Harbor/Container detection but before falling back to in-process. Includes 6 tests covering: successful evaluation, noisy stdout parsing, crash isolation, exception handling, crash-then-success recovery, and timeout behavior.

gemini-code-assist

Code Review

This pull request introduces SubprocessEvaluator to run candidate evaluations in isolated Python subprocesses, preventing crashes from affecting the parent process. Feedback on the implementation highlights several key areas for improvement: handling non-JSON-serializable return types (such as EvaluationResult or numpy arrays) in the wrapper script, fixing a potential resource leak and NameError during temporary file creation, replacing the blocking run_in_executor pattern with a non-blocking asyncio.create_subprocess_exec to avoid orphaned processes on timeout, and removing unnecessary sys.path modifications in the parent process.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

- Wrapper template: handle EvaluationResult objects via to_dict(), use default=str for non-serializable types (numpy arrays etc.) - Fix temp file leak: assign temp_path before write, guard cleanup with `if temp_path and os.path.exists(temp_path)` - Replace run_in_executor + subprocess.run with asyncio.create_subprocess_exec for proper timeout handling (proc.kill() + await proc.wait() on timeout) - Remove unnecessary sys.path modification in parent process (child gets eval_dir via PYTHONPATH env var) - Remove unused subprocess import

Reuse the existing SafeJSONEncoder from checkpoint_manager (with an inline fallback if the import fails in the subprocess) instead of the generic default=str approach.

- Fix black violations in subprocess_evaluator.py (argument-per-line, slice spacing) - Replace sys.modules stubbing in test_subprocess_evaluator.py with normal imports — the stub for skydiscover.config poisoned the module cache and caused ImportError in tests collected afterwards Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

gemini-code-assist Bot reviewed Jun 7, 2026

View reviewed changes

Comment thread skydiscover/evaluation/subprocess_evaluator.py

Comment thread skydiscover/evaluation/subprocess_evaluator.py

Comment thread skydiscover/evaluation/subprocess_evaluator.py

Comment thread skydiscover/evaluation/subprocess_evaluator.py Outdated

odelliab and others added 3 commits June 7, 2026 12:30

Use SafeJSONEncoder in wrapper template for proper numpy/set handling

a324da8

Reuse the existing SafeJSONEncoder from checkpoint_manager (with an inline fallback if the import fails in the subprocess) instead of the generic default=str approach.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add SubprocessEvaluator for process-isolated evaluation#77

Add SubprocessEvaluator for process-isolated evaluation#77
odelliab wants to merge 4 commits into
skydiscover-ai:mainfrom
odelliab:feature/subprocess-evaluator

odelliab commented Jun 7, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

odelliab commented Jun 7, 2026

Summary

Problem

Solution

Usage

Usage

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant