[reward] fix: restore timeout in math_verify via ProcessPoolExecutor#5839
[reward] fix: restore timeout in math_verify via ProcessPoolExecutor#5839MaxwellJryao wants to merge 2 commits intoverl-project:mainfrom
Conversation
…timeout math_verify uses signal.alarm() for timeout, which only works in the main thread. In verl's RewardLoopWorker, compute_score is called from a thread pool (asyncio run_in_executor), causing signal.alarm() to raise ValueError. The previous fix (verl-project#5635) disabled timeout entirely (parsing_timeout=None), but this can cause hangs on adversarial inputs. Run verification in a subprocess via ProcessPoolExecutor where signal.alarm() works normally, with an outer future.result(timeout=30) as a fallback. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
There was a problem hiding this comment.
Code Review
This pull request refactors the math_verify utility to execute scoring within a subprocess using ProcessPoolExecutor, which prevents issues with signal.alarm() in multi-threaded environments like Ray. The review feedback highlights several critical improvements: defining a placeholder for TimeoutException to avoid NameError when the library is missing, using the spawn multiprocessing context to prevent deadlocks, and replacing a silent exception handler with basic logging to aid debugging.
| from math_verify.errors import TimeoutException | ||
| from math_verify.grader import verify | ||
| from math_verify.parser import ExprExtractionConfig, LatexExtractionConfig, parse | ||
| except ImportError: | ||
| print("To use Math-Verify, please install it first by running `pip install math-verify`.") |
There was a problem hiding this comment.
If math_verify is not installed, TimeoutException will not be defined in the global scope. This will lead to a NameError when compute_score attempts to catch it in the except block at line 58. Since this NameError is then caught by the broad except Exception at line 60, the function will silently return 0.0 without any indication of the missing dependency or the underlying error. Defining a placeholder exception ensures the code remains robust even when the package is missing.
| from math_verify.errors import TimeoutException | |
| from math_verify.grader import verify | |
| from math_verify.parser import ExprExtractionConfig, LatexExtractionConfig, parse | |
| except ImportError: | |
| print("To use Math-Verify, please install it first by running `pip install math-verify`.") | |
| from math_verify.errors import TimeoutException | |
| except ImportError: | |
| class TimeoutException(Exception): | |
| pass | |
| print("To use Math-Verify, please install it first by running pip install math-verify.") |
| if _pool is None: | ||
| with _pool_lock: | ||
| if _pool is None: | ||
| _pool = ProcessPoolExecutor(max_workers=4) |
There was a problem hiding this comment.
Using ProcessPoolExecutor with the default start method (which is fork on Linux) can be dangerous in multi-threaded environments like Ray workers, as it may lead to deadlocks or inconsistent state in the child processes if the parent process has active threads during the fork. It is safer to use the spawn start method by providing an explicit multiprocessing context.
| _pool = ProcessPoolExecutor(max_workers=4) | |
| import multiprocessing | |
| _pool = ProcessPoolExecutor(max_workers=4, mp_context=multiprocessing.get_context('spawn')) |
| except Exception: | ||
| pass |
There was a problem hiding this comment.
The broad except Exception: pass block swallows all potential errors, including critical ones like PicklingError (if arguments cannot be sent to the subprocess) or the NameError mentioned previously. This makes debugging extremely difficult in a distributed training environment. It is recommended to at least log the exception or catch more specific error types.
| except Exception: | |
| pass | |
| except Exception as e: | |
| print(f"Error in math_verify compute_score: {e}") |
- Define TimeoutException placeholder when math_verify is not installed - Use spawn mp_context to avoid fork deadlocks in threaded environments - Log exceptions instead of silently swallowing them
What does this PR do?
Follow-up to #5635. That PR fixed the
signal.alarm()crash by disabling all timeouts (parsing_timeout=None,timeout_seconds=None). However, this causes the reward worker to hang indefinitely on adversarial or complex model outputs that trigger pathological parsing inmath_verify.This PR restores timeout protection by running
math_verifyin a subprocess viaProcessPoolExecutor, wheresignal.alarm()works normally (each subprocess has its own main thread). An outerfuture.result(timeout=30)provides a fallback guarantee.Checklist Before Starting
https://github.com/verl-project/verl/pulls?q=is%3Apr+math_verify+signal
[{modules}] {type}: {description}Test
Validated on Qwen2.5-3B-Instruct GRPO training with MATH500 val set on 4-node cluster:
parsing_timeout=None): Training hangs at step 2 — a singlemath_verifycall blocks theRewardLoopWorkerindefinitely. All GPUs idle.API and Usage Example
No API changes — same
compute_score(model_output, ground_truth)signature. New optionaltimeoutparameter (default 30s).Design & Code Changes
parse()+verify()in a subprocess viaProcessPoolExecutor—signal.alarm()works in each subprocess's main threadfuture.result(timeout=30)as fallbackmath_verifyinside subprocess function to avoid pickling issuesChecklist Before Submitting