Commit 699736f
Persist Conversation Trajectory from the remote runtime (#108)
* Update SDK submodule to 49c42ee (main)
* Capture conversation archive from remote workspace during eval
* Capture remote conversations dir via base64 tar
* Fix import order to pass pre-commit checks
Co-authored-by: openhands <[email protected]>
* Move conversation archive capture to Evaluation base class
This change moves the conversation trajectory persistence logic from the
SWE-Bench-specific implementation to the Evaluation base class, making it
automatically available to all benchmarks.
Benefits:
- Automatic conversation capture for all benchmarks (SWE-Bench, GAIA, etc.)
- Consistent behavior across all evaluation workflows
- Better reproducibility and debugging capabilities
- Per-instance archives support parallel execution
- No need to manually add this to each new benchmark
The logic is now called in _process_one_mp after successful evaluation,
ensuring that conversation archives are captured for every instance across
all benchmarks. Archives are stored per-instance as conversations/{instance_id}.tar.gz
to support parallel workers without race conditions.
Co-authored-by: openhands <[email protected]>
* Revert sdk submodule to main
---------
Co-authored-by: openhands <[email protected]>1 parent 5793428 commit 699736f
1 file changed
+60
-0
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
2 | 2 | | |
3 | 3 | | |
4 | 4 | | |
| 5 | + | |
5 | 6 | | |
6 | 7 | | |
7 | 8 | | |
8 | 9 | | |
9 | 10 | | |
10 | 11 | | |
| 12 | + | |
11 | 13 | | |
12 | 14 | | |
13 | 15 | | |
| |||
99 | 101 | | |
100 | 102 | | |
101 | 103 | | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
102 | 158 | | |
103 | 159 | | |
104 | 160 | | |
| |||
350 | 406 | | |
351 | 407 | | |
352 | 408 | | |
| 409 | + | |
| 410 | + | |
| 411 | + | |
| 412 | + | |
353 | 413 | | |
354 | 414 | | |
355 | 415 | | |
| |||
0 commit comments