-
Notifications
You must be signed in to change notification settings - Fork 458
added staleness metrics #1204
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
added staleness metrics #1204
Conversation
measure the training steps between - adding prompt to queue - starting generation of prompt - using prompt for training rename prompt_param_Q to just prompt_Q since it doesn't contain params anymore
Summary of ChangesHello @mnoukhov, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the training pipeline by introducing a comprehensive system for measuring and reporting prompt staleness. By meticulously tracking prompts from their initial queuing, through generation, and finally to their consumption in training, the changes provide valuable insights into the efficiency and potential bottlenecks of the prompt processing flow. This improved visibility aims to optimize the overall training process. Additionally, a variable name for a key queue was updated for better clarity and consistency. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces staleness metrics to measure the time prompts spend in different stages of the pipeline, and also refactors the naming of prompt_param_Q to prompt_Q. The refactoring is well-executed. The logic for adding staleness metrics is mostly sound, but I've identified a critical bug in open_instruct/grpo_fast.py within the staleness calculation logic that would lead to a runtime error. I've provided a detailed comment and a code suggestion to fix this. Additionally, I've included a medium-severity comment for a minor code simplification in the same file. The other changes in actor_manager.py, queue_types.py, vllm_utils.py, and the test files look good and correctly support the new feature.
| # calculate staleness metrics | ||
| queue_steps = batch_stats.pop("staleness_queue_steps", []) | ||
| generation_steps = batch_stats.pop("staleness_generation_start_steps", []) | ||
| consumed_steps = batch_stats.pop("staleness_consumed_step", []) | ||
|
|
||
| queue_to_generation = [] | ||
| generation_to_consume = [] | ||
| for queue_step, generation_step, consume_step in zip(queue_steps, generation_steps, consumed_steps): | ||
| queue_to_generation.append(generation_step - queue_step) | ||
| generation_to_consume.append(consume_step - generation_step) | ||
|
|
||
| staleness_metrics = { | ||
| "staleness/queue_to_generation": queue_to_generation, | ||
| "staleness/queue_to_generation_mean": np.mean(queue_to_generation), | ||
| "staleness/generation_to_consume": generation_to_consume, | ||
| "staleness/generation_to_consume_mean": np.mean(generation_to_consume), | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a few issues in this block that will cause runtime errors and incorrect behavior:
AttributeError: You are calling.pop()onbatch_stats, which is a dataclass instance and does not have apopmethod. You should be usingbatch_metrics, which is the dictionary created fromasdict(batch_stats).TypeError:batch_metrics.pop("staleness_consumed_step", [])will return an integer. The subsequentzipcall attempts to iterate over this integer (consumed_steps), which will raise aTypeError. The consumed step is the same for all items in the batch and should be treated as a scalar.Nonevalues: Thequeue_stepsandgeneration_stepslists can containNonevalues. The subtractiongeneration_step - queue_stepwill raise aTypeErrorif either value isNone. These cases should be handled.RuntimeWarningfromnp.mean: Ifqueue_to_generationorgeneration_to_consumeis an empty list (e.g., if all steps wereNone),np.meanwill returnNaNand raise aRuntimeWarning. It's better to handle this case and return0.0.
I've provided a suggestion that fixes these issues by using batch_metrics, handling the scalar consumed_step correctly, checking for None before subtraction, and safely calculating the mean.
| # calculate staleness metrics | |
| queue_steps = batch_stats.pop("staleness_queue_steps", []) | |
| generation_steps = batch_stats.pop("staleness_generation_start_steps", []) | |
| consumed_steps = batch_stats.pop("staleness_consumed_step", []) | |
| queue_to_generation = [] | |
| generation_to_consume = [] | |
| for queue_step, generation_step, consume_step in zip(queue_steps, generation_steps, consumed_steps): | |
| queue_to_generation.append(generation_step - queue_step) | |
| generation_to_consume.append(consume_step - generation_step) | |
| staleness_metrics = { | |
| "staleness/queue_to_generation": queue_to_generation, | |
| "staleness/queue_to_generation_mean": np.mean(queue_to_generation), | |
| "staleness/generation_to_consume": generation_to_consume, | |
| "staleness/generation_to_consume_mean": np.mean(generation_to_consume), | |
| } | |
| # calculate staleness metrics | |
| queue_steps = batch_metrics.pop("staleness_queue_steps", []) | |
| generation_steps = batch_metrics.pop("staleness_generation_start_steps", []) | |
| consumed_step = batch_metrics.pop("staleness_consumed_step", None) | |
| queue_to_generation = [] | |
| generation_to_consume = [] | |
| if consumed_step is not None: | |
| for q_step, g_step in zip(queue_steps, generation_steps): | |
| if q_step is not None and g_step is not None: | |
| queue_to_generation.append(g_step - q_step) | |
| generation_to_consume.append(consumed_step - g_step) | |
| staleness_metrics = { | |
| "staleness/queue_to_generation": queue_to_generation, | |
| "staleness/queue_to_generation_mean": np.mean(queue_to_generation) if queue_to_generation else 0.0, | |
| "staleness/generation_to_consume": generation_to_consume, | |
| "staleness/generation_to_consume_mean": np.mean(generation_to_consume) if generation_to_consume else 0.0, | |
| } |
| staleness_queue_steps=staleness_queue_steps if staleness_queue_steps else [], | ||
| staleness_generation_start_steps=staleness_generation_start_steps if staleness_generation_start_steps else [], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The conditional assignments here are redundant. The variables staleness_queue_steps and staleness_generation_start_steps are initialized as empty lists. If they remain empty, the expression ... if [] else [] evaluates to []. If they are populated, the expression ... if [1, 2] else [] evaluates to [1, 2]. In both cases, the result is the same as the original variable. You can simplify this by directly assigning the variables.
| staleness_queue_steps=staleness_queue_steps if staleness_queue_steps else [], | |
| staleness_generation_start_steps=staleness_generation_start_steps if staleness_generation_start_steps else [], | |
| staleness_queue_steps=staleness_queue_steps, | |
| staleness_generation_start_steps=staleness_generation_start_steps, |
| generation_to_consume = [] | ||
| for queue_step, generation_step, consume_step in zip(queue_steps, generation_steps, consumed_steps): | ||
| queue_to_generation.append(generation_step - queue_step) | ||
| generation_to_consume.append(consume_step - generation_step) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Zip Error: Single Value Not Iterable
The staleness_consumed_step field in BatchStatistics is a single int | None value, but the code tries to zip() it with two lists (queue_steps and generation_steps). This causes a runtime error because zip() expects an iterable, not a single integer. The consumed step should be repeated for each prompt or the code should be restructured to handle the single value properly.
| generation_to_consume = [] | ||
| for queue_step, generation_step, consume_step in zip(queue_steps, generation_steps, consumed_steps): | ||
| queue_to_generation.append(generation_step - queue_step) | ||
| generation_to_consume.append(consume_step - generation_step) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Staleness Calculation: Robustness Against Missing Data
The staleness calculation performs arithmetic on queue_step, generation_step, and consume_step values that can be None. When _get_current_training_step() fails or when metadata keys are missing, these values default to None, causing TypeError: unsupported operand type(s) for -: 'NoneType' and 'int' when attempting subtraction. The code needs to handle None values before performing arithmetic operations.
measure the training steps between
rename prompt_param_Q to just prompt_Q since it doesn't contain params anymore
Note
Adds staleness metrics (queue→generation→consume) wired through ActorManager, vLLM actor, batching, and logs; renames
param_prompt_Qtoprompt_Q.GenerationResult(queued_training_step,generation_started_training_step).ActorManager: store/expose current training step; main loop updates it each step.process_completed_requestsets fields.BatchStatistics: add staleness fields; computestaleness/queue_to_generationandstaleness/generation_to_consume(+ means) and log in training metrics.param_prompt_Q→prompt_Qacross code paths (add_prompt_to_generator,data_preparation_thread,run_training,create_model_and_optimizer,main).Written by Cursor Bugbot for commit ac22910. This will update automatically on new commits. Configure here.