Skip to content

Latest commit

 

History

History
93 lines (68 loc) · 3.19 KB

File metadata and controls

93 lines (68 loc) · 3.19 KB

ModularML Roadmap

v1.0.0 — Core ModularML Pipeline

Progress

Target Release: Q1 2026


Data Structures & Serialization

  • Refactor FeatureSet, FeatureSubset, Batch, and related structures to use PyArrow tables
  • Implement zero-copy subset & sampler views over parent FeatureSet tables
  • Ensure data loads into memory only when needed for ModelGraph execution
  • Make all components fully serializable (FeatureSets, ModelGraphs, Stages, Samplers, Losses, Phases)
  • Support exporting Experiments as:
    • Full state (post-training, weights included)

Experiment Context & Tracking

  • Implement automatic Experiment context binding for all defined components
  • Add conflict detection for mismatched component/Experiment associations
  • Store all outputs (loss curves, metrics, results, figures) linked to their source phase

FeatureSet / Splitting / Sampling

FeatureSet

  • Fully structured feature–target–tag schema
  • Per-column scaling/normalization with tracked transform pipelines

Splitting

  • Ratio-based random splits
  • Rule-based conditional splits (user-defined criteria)

Sampling

  • Sample-wise batching
  • N-Sampler-based paired sampling
  • N-Sampler-based triplet sampling

ModelGraph

  • Support sequential, branching, and merging DAGs
  • Validate graph connectivity before training
  • Add graph visualization utility (Graphviz/Dot/Mermaid)

ModelNode

  • Unified wrappers for PyTorch, TensorFlow, and scikit-learn
  • Built-in PyTorch models (Sequential MLP, CNN encoder)
  • Backend-agnostic forward, training-step, and eval-step APIs

MergeNodes

  • Support merging of multi ModelGraph branches
  • ConcatNode for concatenating features of multiple inputs
    • Add non-concat aggregation strategies for targets and tags
    • Support padding of data with misaligned shapes
  • Make merging backend-aware to prevent PyTorch auto-grad breakage

Experiment / TrainingPhase / EvaluationPhase

  • Experiment holds static FeatureSets, splits, and ModelGraph
  • Support multiple independent Training and Evaluation phases
  • Each phase configurable with samplers, losses, optimizers, and trackers
  • Store and version phase results in the Experiment instance

Unit Testing

  • Add nox-based automated unit, integration, example, and doc test routines
  • Increase code coverage to $\geq$ 90%

v1.1.0 — Multi-Experiment Container & Comparison

Progress

Target Release: Q3 2026

  • Multi-input/output Samplers

    • Samplers can take in multiple FeatureSets
      • Must support sample alignment (separate from BatchSchedulingPolicy)
    • Samplers can produce multiple output streams
  • Add higher-level ExperimentCollection container

  • Support grouping Experiments for shared evaluation pipelines

  • Provide unified comparison utilities across Experiments (metrics, plots, tables)

  • Enable rapid testing of alternative ModelGraphs, architectures, or FeatureSets within the same task