Skip to content

Refactor mdlm, bd3lm, ppl tracking, and minor improvements#68

Merged
ZHZisZZ merged 8 commits intodevfrom
dev-refactor
Jan 1, 2026
Merged

Refactor mdlm, bd3lm, ppl tracking, and minor improvements#68
ZHZisZZ merged 8 commits intodevfrom
dev-refactor

Conversation

@ZHZisZZ
Copy link
Owner

@ZHZisZZ ZHZisZZ commented Jan 1, 2026

This pull request refactors and improves the BD3LM and MDLM trainer and sampler code, focusing on configuration management, loss computation, and sampling robustness. The changes introduce dataclass-based configuration, clarify and correct masking logic, and enhance per-sequence EOS handling in the sampler. Additionally, metric tracking is unified and modernized. The most important changes are grouped below:

Configuration and Trainer Refactoring:

  • Introduced MDLMConfig and BD3LMConfig dataclasses for structured configuration of trainers, replacing multiple positional arguments and improving clarity and maintainability (dllm/core/trainers/mdlm.py, dllm/core/trainers/bd3lm.py). [1] [2]
  • Updated trainer initialization to use the new config dataclasses and to unify metric tracking via a new OnEvaluateMetricsCallback instead of the previous EpochPPLMeter (dllm/core/trainers/mdlm.py, dllm/core/trainers/bd3lm.py). [1] [2]

Loss Computation and Masking Logic:

  • Standardized masking logic by renaming variables (e.g., masked_indices to masked_mask, and token_cnt_per_seq to maskable_mask), and updated all related logic to use these clearer names. This change ensures consistency and correctness in how loss is computed and normalized (dllm/core/trainers/mdlm.py, dllm/core/trainers/bd3lm.py). [1] [2] [3] [4] [5]
  • Improved degenerate case handling (when no tokens are masked) to ensure gradients remain valid, and updated metric tracking to use the new unified callback (dllm/core/trainers/bd3lm.py).
  • Updated loss normalization to use the new config field loss_norm_type and fixed normalization logic for batch, sequence, and token modes (dllm/core/trainers/bd3lm.py).

Sampler Improvements:

  • Enhanced block sampling in BD3LM sampler to pad prompt lengths to block size multiples, ensuring proper alignment and simplifying block scheduling (dllm/core/samplers/bd3lm.py). [1] [2]
  • Added robust per-sequence EOS handling to allow early stopping of generation for sequences that have reached EOS, improving efficiency and correctness (dllm/core/samplers/bd3lm.py). [1] [2]
  • Renamed internal functions for clarity (e.g., build_staircase_attention_mask to _prepare_for_sampling, diffusion_step_block to _diffusion_step_block) (dllm/core/samplers/bd3lm.py). [1] [2]

Documentation and API Consistency:

  • Updated example SFT preprocessing commands in README.md to use the correct mapping function (dllm.utils.default_sft_map_fn) for consistency with code changes (README.md). [1] [2]

Other Code Quality Improvements:

  • Removed unused imports and improved type annotations for better code clarity and maintainability (dllm/core/trainers/bd3lm.py). [1] [2]
  • Renamed and refactored attention mask construction function for BD3LM training for better clarity (dllm/core/trainers/bd3lm.py). [1] [2] [3]

These changes collectively modernize the codebase, improve maintainability, and ensure more robust and correct training and sampling behavior.

ZHZisZZ and others added 7 commits December 26, 2025 23:25
* refactor train args

* Initial plan

* Fix training arguments import and inheritance chain bugs

- Fix incorrect import of TrainingArguments in mdlm.py (was importing from dllm.utils.data, should be dllm.utils.configs)
- Fix DreamConfig.__post_init__ to call super().__post_init__() to ensure proper inheritance chain
- Fix typo in sl.py: 'arg=args' -> 'args=args' for MDLMTrainer.__init__ and BD3LMTrainer.__init__
- Fix BD3LMAnDSLConfig to inherit from BD3LMTrainer.BD3LMConfig instead of MDLMTrainer.MDLMConfig (to include block_size field)

Co-authored-by: ZHZisZZ <[email protected]>

* Move scheduler from MDLMConfig to __init__ kwarg

- Remove scheduler field from MDLMConfig dataclass
- Add scheduler as keyword argument in MDLMTrainer.__init__ with default LinearAlphaScheduler()
- Propagate scheduler parameter through BD3LMTrainer.__init__
- Propagate scheduler parameter through MDLMAnDSLTrainer.__init__ and BD3LMAnDSLTrainer.__init__
- DreamTrainer inherits the scheduler kwarg automatically from MDLMTrainer

Co-authored-by: ZHZisZZ <[email protected]>

* Simplify: let inherited trainers pass scheduler via **kwargs

Removed unnecessary explicit scheduler parameter from inherited trainers
(BD3LMTrainer, MDLMAnDSLTrainer, BD3LMAnDSLTrainer) since **kwargs
passes it through automatically to MDLMTrainer.__init__.

Co-authored-by: ZHZisZZ <[email protected]>

* Align EditFlowTrainer with other trainers' pattern

- Create EditFlowConfig nested class inside EditFlowTrainer with config fields
- Move time_epsilon, normalize_per_position, max_w from __init__ kwargs to config
- Update EditFlowTrainer.__init__ to accept args: EditFlowConfig
- Update examples/editflow/sft.py and pt.py to inherit from EditFlowTrainer.EditFlowConfig
- Simplify trainer instantiation to just pass args (config handles the rest)

Co-authored-by: ZHZisZZ <[email protected]>

* Add scheduler_cls back to EditFlowConfig for user customization

Restored scheduler_cls field to EditFlowConfig so users can specify
which kappa scheduler to use (e.g., "LinearKappaScheduler", "CubicKappaScheduler").
The scheduler is created from args.scheduler_cls if no scheduler is passed directly.

Co-authored-by: ZHZisZZ <[email protected]>

* Move scheduler_cls to example scripts' TrainingArguments

Keep EditFlowConfig clean with only trainer-specific params. The
scheduler_cls field is now in the example scripts' TrainingArguments
(which inherits from EditFlowConfig), and is used to create the
scheduler that is passed to the EditFlowTrainer.

Co-authored-by: ZHZisZZ <[email protected]>

* Initial plan

* Clean up unused imports and redundant boolean comparison (#5)

* Initial plan

* Address review comments: remove unused imports and fix redundant comparison

Co-authored-by: ZHZisZZ <[email protected]>

---------

Co-authored-by: copilot-swe-agent[bot] <[email protected]>
Co-authored-by: ZHZisZZ <[email protected]>

---------

Co-authored-by: copilot-swe-agent[bot] <[email protected]>
* fix bd3lm sampler

* andi temp save

* andi pt

* temp save

* Trainer refactor (#3)

* refactor train args

* Initial plan

* Fix training arguments import and inheritance chain bugs

- Fix incorrect import of TrainingArguments in mdlm.py (was importing from dllm.utils.data, should be dllm.utils.configs)
- Fix DreamConfig.__post_init__ to call super().__post_init__() to ensure proper inheritance chain
- Fix typo in sl.py: 'arg=args' -> 'args=args' for MDLMTrainer.__init__ and BD3LMTrainer.__init__
- Fix BD3LMAnDSLConfig to inherit from BD3LMTrainer.BD3LMConfig instead of MDLMTrainer.MDLMConfig (to include block_size field)

Co-authored-by: ZHZisZZ <[email protected]>

* Move scheduler from MDLMConfig to __init__ kwarg

- Remove scheduler field from MDLMConfig dataclass
- Add scheduler as keyword argument in MDLMTrainer.__init__ with default LinearAlphaScheduler()
- Propagate scheduler parameter through BD3LMTrainer.__init__
- Propagate scheduler parameter through MDLMAnDSLTrainer.__init__ and BD3LMAnDSLTrainer.__init__
- DreamTrainer inherits the scheduler kwarg automatically from MDLMTrainer

Co-authored-by: ZHZisZZ <[email protected]>

* Simplify: let inherited trainers pass scheduler via **kwargs

Removed unnecessary explicit scheduler parameter from inherited trainers
(BD3LMTrainer, MDLMAnDSLTrainer, BD3LMAnDSLTrainer) since **kwargs
passes it through automatically to MDLMTrainer.__init__.

Co-authored-by: ZHZisZZ <[email protected]>

* Align EditFlowTrainer with other trainers' pattern

- Create EditFlowConfig nested class inside EditFlowTrainer with config fields
- Move time_epsilon, normalize_per_position, max_w from __init__ kwargs to config
- Update EditFlowTrainer.__init__ to accept args: EditFlowConfig
- Update examples/editflow/sft.py and pt.py to inherit from EditFlowTrainer.EditFlowConfig
- Simplify trainer instantiation to just pass args (config handles the rest)

Co-authored-by: ZHZisZZ <[email protected]>

* Add scheduler_cls back to EditFlowConfig for user customization

Restored scheduler_cls field to EditFlowConfig so users can specify
which kappa scheduler to use (e.g., "LinearKappaScheduler", "CubicKappaScheduler").
The scheduler is created from args.scheduler_cls if no scheduler is passed directly.

Co-authored-by: ZHZisZZ <[email protected]>

* Move scheduler_cls to example scripts' TrainingArguments

Keep EditFlowConfig clean with only trainer-specific params. The
scheduler_cls field is now in the example scripts' TrainingArguments
(which inherits from EditFlowConfig), and is used to create the
scheduler that is passed to the EditFlowTrainer.

Co-authored-by: ZHZisZZ <[email protected]>

* Initial plan

* Clean up unused imports and redundant boolean comparison (#5)

* Initial plan

* Address review comments: remove unused imports and fix redundant comparison

Co-authored-by: ZHZisZZ <[email protected]>

---------

Co-authored-by: copilot-swe-agent[bot] <[email protected]>
Co-authored-by: ZHZisZZ <[email protected]>

---------

Co-authored-by: copilot-swe-agent[bot] <[email protected]>

* fix editflow scripts lacking main

* update

* Fix critical bugs in trainer metrics refactoring (#67)

* Initial plan

* Fix critical bugs in trainer refactoring: shape mismatch, division by zero, and data integrity

Co-authored-by: ZHZisZZ <[email protected]>

* Ensure dtype consistency in meter.update() calls

Co-authored-by: ZHZisZZ <[email protected]>

---------

Co-authored-by: copilot-swe-agent[bot] <[email protected]>
Co-authored-by: ZHZisZZ <[email protected]>

* update

---------

Co-authored-by: copilot-swe-agent[bot] <[email protected]>
@gemini-code-assist
Copy link

Summary of Changes

Hello @ZHZisZZ, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request primarily focuses on a significant refactoring of the training and sampling components, aiming to improve code structure, consistency, and maintainability. Key changes include standardizing trainer configurations using dataclasses, overhauling the metric logging system with torchmetrics for more robust and DDP-safe reporting, and enhancing the BD3LM sampler with better padding and sequence termination logic. These changes streamline development and ensure a more reliable and extensible framework.

Highlights

  • Trainer Configuration Refactor: Introduced dedicated Config dataclasses (e.g., MDLMConfig, BD3LMConfig, DreamConfig, EditFlowConfig) within each trainer, inheriting from a base TrainingArguments class. This centralizes and standardizes trainer-specific arguments, improving clarity and maintainability across different training pipelines.
  • Metric Logging System Overhaul: Replaced the custom EpochPPLMeter with a more robust and flexible BaseMetricsCallback and OnEvaluateMetricsCallback system, leveraging torchmetrics. This new system provides DDP-safe metric accumulation, per-split metric independence, and standardized logging for NLL and Perplexity.
  • BD3LM Sampler Enhancements: Improved the bd3lm sampler by renaming internal utility functions for clarity, adding per-sequence End-of-Sequence (EOS) stopping logic, and refining prompt padding to align with block boundaries, leading to more efficient and controlled generation.
  • Codebase Cleanup and Renaming: Numerous internal utility functions and variables were renamed (e.g., build_staircase_attention_mask to _prepare_for_sampling, default_mdlm_sft_map_fn to default_sft_map_fn, masked_indices to masked_mask) to enhance code readability and consistency. GPT2 A2D model support was also removed.
  • New Dataset Utility and Dependency: Added a new utility (dllm/data/s1k.py) for loading and mapping the 's1K' dataset. The pyproject.toml was updated to include torchmetrics as a core dependency and a new rl optional dependency group.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@ZHZisZZ
Copy link
Owner Author

ZHZisZZ commented Jan 1, 2026

@copilot Please check for potential bugs and verify whether the changes are functionally equivalent before and after.

Copy link
Contributor

Copilot AI commented Jan 1, 2026

@ZHZisZZ I've opened a new pull request, #69, to work on those changes. Once the pull request is ready, I'll request review from you.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a significant and high-quality refactoring across the codebase. Key improvements include:

  • A shift to structured dataclass-based configurations for trainers, enhancing maintainability.
  • The implementation of a new, robust metrics system using torchmetrics, replacing the older manual approach.
  • Cleaner and more efficient loss computation logic in the trainers.
  • Numerous code cleanups, such as function renaming, deduplication, and making example scripts self-contained.
  • Correctness improvements in the block diffusion sampler for batch generation.

The overall changes greatly improve the quality, clarity, and maintainability of the code. I have a few suggestions for further improvement, including a potential correctness issue regarding a removed chat template and a couple of minor style fixes.

Comment on lines +186 to 189
elif issubclass(model_cls, (A2DQwen2LMHeadModel, A2DQwen3LMHeadModel)):
tokenizer.add_special_tokens({"mask_token": "<|mask|>"})
tokenizer.eot_token = "<|im_end|>"
tokenizer.eot_token_id = tokenizer.convert_tokens_to_ids(tokenizer.eot_token)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This refactoring combines the logic for A2DQwen2LMHeadModel and A2DQwen3LMHeadModel, which is great for deduplication. However, the very long and specific chat_template that was previously set for A2DQwen3LMHeadModel has been removed. This template is crucial for the model's chat and tool-use functionality. Its removal will likely lead to incorrect behavior. If this template is not being set by some other mechanism (e.g., from the tokenizer's config on the Hub), it should be restored, perhaps conditionally for A2DQwen3LMHeadModel.

Comment on lines +325 to +330
# if done.any():
# new_block = torch.where(
# done.unsqueeze(1),
# torch.full_like(new_block, pad_id),
# new_block,
# )

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This commented-out block is a valuable optimization. By filling new blocks with pad_id for sequences that are already done, you can avoid wasteful computation during the diffusion steps. The current implementation processes these completed sequences unnecessarily. I recommend re-enabling this logic to improve efficiency in batch generation.

Suggested change
# if done.any():
# new_block = torch.where(
# done.unsqueeze(1),
# torch.full_like(new_block, pad_id),
# new_block,
# )
if done.any():
new_block = torch.where(
done.unsqueeze(1),
torch.full_like(new_block, pad_id),
new_block,
)


# # 你原来传 nll_sum / token_cnt 的位置,现在这样传:
# meter.update("train", value=(nll_sum / token_cnt.clamp_min(1)), weight=token_cnt)
# meter.update("eval", value=(nll_sum / token_cnt.clamp_min(1)), weight=token_cnt) No newline at end of file

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This file is missing a final newline character. It's a common Python convention (recommended by PEP 8) to end files with a single newline. This improves consistency and can prevent issues with some tools and file concatenations.

class PerplexityMetric(NLLMetric):
def compute(self) -> torch.Tensor:
mean_nll = super().compute()
return torch.exp(mean_nll) No newline at end of file

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This file is missing a final newline character. It's a common Python convention (recommended by PEP 8) to end files with a single newline. This improves consistency and can prevent issues with some tools and file concatenations.

@ZHZisZZ ZHZisZZ changed the title Dev refactor Refactor mdlm, bd3lm, ppl tracking, and minor improvements Jan 1, 2026
@ZHZisZZ ZHZisZZ merged commit b3d0d57 into dev Jan 1, 2026
ZHZisZZ added a commit that referenced this pull request Jan 5, 2026
…and BD3LM sampler improvements (#70)

* Refactor mdlm, bd3lm, ppl tracking, and minor improvements (#68)

* fix bd3lm sampler

* andi temp save

* andi pt

* temp save

* Trainer refactor (#3)

* refactor train args

* Initial plan

* Fix training arguments import and inheritance chain bugs

- Fix incorrect import of TrainingArguments in mdlm.py (was importing from dllm.utils.data, should be dllm.utils.configs)
- Fix DreamConfig.__post_init__ to call super().__post_init__() to ensure proper inheritance chain
- Fix typo in sl.py: 'arg=args' -> 'args=args' for MDLMTrainer.__init__ and BD3LMTrainer.__init__
- Fix BD3LMAnDSLConfig to inherit from BD3LMTrainer.BD3LMConfig instead of MDLMTrainer.MDLMConfig (to include block_size field)

Co-authored-by: ZHZisZZ <[email protected]>

* Move scheduler from MDLMConfig to __init__ kwarg

- Remove scheduler field from MDLMConfig dataclass
- Add scheduler as keyword argument in MDLMTrainer.__init__ with default LinearAlphaScheduler()
- Propagate scheduler parameter through BD3LMTrainer.__init__
- Propagate scheduler parameter through MDLMAnDSLTrainer.__init__ and BD3LMAnDSLTrainer.__init__
- DreamTrainer inherits the scheduler kwarg automatically from MDLMTrainer

Co-authored-by: ZHZisZZ <[email protected]>

* Simplify: let inherited trainers pass scheduler via **kwargs

Removed unnecessary explicit scheduler parameter from inherited trainers
(BD3LMTrainer, MDLMAnDSLTrainer, BD3LMAnDSLTrainer) since **kwargs
passes it through automatically to MDLMTrainer.__init__.

Co-authored-by: ZHZisZZ <[email protected]>

* Align EditFlowTrainer with other trainers' pattern

- Create EditFlowConfig nested class inside EditFlowTrainer with config fields
- Move time_epsilon, normalize_per_position, max_w from __init__ kwargs to config
- Update EditFlowTrainer.__init__ to accept args: EditFlowConfig
- Update examples/editflow/sft.py and pt.py to inherit from EditFlowTrainer.EditFlowConfig
- Simplify trainer instantiation to just pass args (config handles the rest)

Co-authored-by: ZHZisZZ <[email protected]>

* Add scheduler_cls back to EditFlowConfig for user customization

Restored scheduler_cls field to EditFlowConfig so users can specify
which kappa scheduler to use (e.g., "LinearKappaScheduler", "CubicKappaScheduler").
The scheduler is created from args.scheduler_cls if no scheduler is passed directly.

Co-authored-by: ZHZisZZ <[email protected]>

* Move scheduler_cls to example scripts' TrainingArguments

Keep EditFlowConfig clean with only trainer-specific params. The
scheduler_cls field is now in the example scripts' TrainingArguments
(which inherits from EditFlowConfig), and is used to create the
scheduler that is passed to the EditFlowTrainer.

Co-authored-by: ZHZisZZ <[email protected]>

* Initial plan

* Clean up unused imports and redundant boolean comparison (#5)

* Initial plan

* Address review comments: remove unused imports and fix redundant comparison

Co-authored-by: ZHZisZZ <[email protected]>

---------

Co-authored-by: copilot-swe-agent[bot] <[email protected]>
Co-authored-by: ZHZisZZ <[email protected]>

---------

Co-authored-by: copilot-swe-agent[bot] <[email protected]>

* fix editflow scripts lacking main

* update (#66)

* fix bd3lm sampler

* andi temp save

* andi pt

* temp save

* Trainer refactor (#3)

* refactor train args

* Initial plan

* Fix training arguments import and inheritance chain bugs

- Fix incorrect import of TrainingArguments in mdlm.py (was importing from dllm.utils.data, should be dllm.utils.configs)
- Fix DreamConfig.__post_init__ to call super().__post_init__() to ensure proper inheritance chain
- Fix typo in sl.py: 'arg=args' -> 'args=args' for MDLMTrainer.__init__ and BD3LMTrainer.__init__
- Fix BD3LMAnDSLConfig to inherit from BD3LMTrainer.BD3LMConfig instead of MDLMTrainer.MDLMConfig (to include block_size field)

Co-authored-by: ZHZisZZ <[email protected]>

* Move scheduler from MDLMConfig to __init__ kwarg

- Remove scheduler field from MDLMConfig dataclass
- Add scheduler as keyword argument in MDLMTrainer.__init__ with default LinearAlphaScheduler()
- Propagate scheduler parameter through BD3LMTrainer.__init__
- Propagate scheduler parameter through MDLMAnDSLTrainer.__init__ and BD3LMAnDSLTrainer.__init__
- DreamTrainer inherits the scheduler kwarg automatically from MDLMTrainer

Co-authored-by: ZHZisZZ <[email protected]>

* Simplify: let inherited trainers pass scheduler via **kwargs

Removed unnecessary explicit scheduler parameter from inherited trainers
(BD3LMTrainer, MDLMAnDSLTrainer, BD3LMAnDSLTrainer) since **kwargs
passes it through automatically to MDLMTrainer.__init__.

Co-authored-by: ZHZisZZ <[email protected]>

* Align EditFlowTrainer with other trainers' pattern

- Create EditFlowConfig nested class inside EditFlowTrainer with config fields
- Move time_epsilon, normalize_per_position, max_w from __init__ kwargs to config
- Update EditFlowTrainer.__init__ to accept args: EditFlowConfig
- Update examples/editflow/sft.py and pt.py to inherit from EditFlowTrainer.EditFlowConfig
- Simplify trainer instantiation to just pass args (config handles the rest)

Co-authored-by: ZHZisZZ <[email protected]>

* Add scheduler_cls back to EditFlowConfig for user customization

Restored scheduler_cls field to EditFlowConfig so users can specify
which kappa scheduler to use (e.g., "LinearKappaScheduler", "CubicKappaScheduler").
The scheduler is created from args.scheduler_cls if no scheduler is passed directly.

Co-authored-by: ZHZisZZ <[email protected]>

* Move scheduler_cls to example scripts' TrainingArguments

Keep EditFlowConfig clean with only trainer-specific params. The
scheduler_cls field is now in the example scripts' TrainingArguments
(which inherits from EditFlowConfig), and is used to create the
scheduler that is passed to the EditFlowTrainer.

Co-authored-by: ZHZisZZ <[email protected]>

* Initial plan

* Clean up unused imports and redundant boolean comparison (#5)

* Initial plan

* Address review comments: remove unused imports and fix redundant comparison

Co-authored-by: ZHZisZZ <[email protected]>

---------

Co-authored-by: copilot-swe-agent[bot] <[email protected]>
Co-authored-by: ZHZisZZ <[email protected]>

---------

Co-authored-by: copilot-swe-agent[bot] <[email protected]>

* fix editflow scripts lacking main

* update

* Fix critical bugs in trainer metrics refactoring (#67)

* Initial plan

* Fix critical bugs in trainer refactoring: shape mismatch, division by zero, and data integrity

Co-authored-by: ZHZisZZ <[email protected]>

* Ensure dtype consistency in meter.update() calls

Co-authored-by: ZHZisZZ <[email protected]>

---------

Co-authored-by: copilot-swe-agent[bot] <[email protected]>
Co-authored-by: ZHZisZZ <[email protected]>

* update

---------

Co-authored-by: copilot-swe-agent[bot] <[email protected]>

* update

---------

Co-authored-by: copilot-swe-agent[bot] <[email protected]>

* Remove unused utils

* minor fix

* fix meters and others

* minor fix

* minor fix

---------

Co-authored-by: copilot-swe-agent[bot] <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants