Core Architecture
1. Jobs System Flow:
- Jobs creates multiple Interview instances based on combinations of agents, scenarios, and models
- JobsRunnerAsyncio orchestrates the async execution of these interviews
- AsyncInterviewRunner actually runs the interviews in parallel (with a concurrency limit)
- Results are accumulated into a Results object containing individual Result objects
2. Memory Accumulation Points:
- All results are stored in memory in the data list inside Results
- Each Result object contains comprehensive data about an interview including:
- Agent, scenario, model details
- All answers and prompts
- Raw model responses (potentially large)
- Token usage statistics
- Cache keys and metadata
3. Chunking Implementation:
- AsyncInterviewRunner does process interviews in chunks (lines 172-200 in async_interview_runner.py)
- BUT this is only for controlling execution concurrency, not memory usage
- All results are still collected in memory (line 218-219 in jobs_runner_asyncio.py)
Memory Issues
1. No Streaming Support:
- The entire result set is accumulated in memory before returning
- There's no mechanism to stream results to disk or process them incrementally
- The Results data structure isn't designed for partial loading/processing
2. Full In-Memory Collection:
- In run_async and run methods, all results are appended to a list (lines 114-116, 217-219)
- This list grows unbounded with job size
3. Inefficient Data Duplication:
- The Results creation creates multiple copies of data
- First results = Results(...) is created (line 118)
- Then another Results object with the same data is created (lines 124-128)
4. No Memory Bounds:
- There are no parameters for limiting memory usage
- No early cleanup of completed interview resources
Options for Improvement
1. Stream Results:
- Implement a streaming approach where results are processed incrementally
- Process/persist chunks of results instead of keeping everything in memory
2. Memory Threshold Parameter:
- Add a memory threshold parameter to limit in-memory results
- When threshold is reached, persist results to disk and clear memory
3. Result Iterator Pattern:
- Convert Results to use an iterator pattern that only loads data when needed
- Allow lazy evaluation of results rather than loading everything upfront
4. Disk-Based Results:
- Create a disk-backed variant of Results that pages data in/out as needed
- Could leverage SQLite or another lightweight database to manage result data
5. Result Processing Callbacks:
- Add support for result processing callbacks that handle results as they arrive
- Allow custom handling/aggregation logic without storing every raw result
The core issue is that the system is designed to accumulate all results in memory before returning them to the caller, with no mechanisms for streaming or limiting memory usage during
execution.
Claude analysis of the issue: