feature: Add Python-native DataPipe interface for fluent data preprocessing #115

mathysgrapotte · 2025-02-19T14:50:23Z

Is your feature request related to a problem? Please describe.

Python users currently need to rely on YAML/config files for defining data preprocessing pipelines, which can be cumbersome for interactive experimentation and native Python workflows.

Describe the solution you'd like

A fluent Python interface (DataPipe) that enables chaining of data processing operations (split-transform-encode) while maintaining compatibility with existing config-based workflows. The interface should provide a clear pipeline construction pattern similar to Nextflow processes but in native Python.

The text was updated successfully, but these errors were encountered:

mathysgrapotte · 2025-02-19T14:51:00Z

very vague suggestion :

pipe = (DataPipe(raw_df, loader)
        .split(RandomSplitter, ratios=[0.8, 0.2])
        .transform(AddNoise, columns=['ecg'], std=0.1)
        .encode(LabelEncoder, column='diagnosis')
        .build())

dataset = HandlerTorch(pipe.get_tensors()).to_dataset()

but we have to see about this once refactoring is done

mathysgrapotte added this to the stimulus v1.0.0 release milestone Feb 19, 2025

mathysgrapotte added this to Stimulus v1.0 Feb 19, 2025

mathysgrapotte moved this to Todo - long issues in Stimulus v1.0 Feb 19, 2025

mathysgrapotte moved this from Todo - long issues to Todo - depend on other issues in Stimulus v1.0 Feb 19, 2025

mathysgrapotte moved this from Todo - depend on other issues to Todo in Stimulus v1.0 Feb 21, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feature: Add Python-native DataPipe interface for fluent data preprocessing #115

feature: Add Python-native DataPipe interface for fluent data preprocessing #115

mathysgrapotte commented Feb 19, 2025

mathysgrapotte commented Feb 19, 2025

feature: Add Python-native DataPipe interface for fluent data preprocessing #115

feature: Add Python-native DataPipe interface for fluent data preprocessing #115

Comments

mathysgrapotte commented Feb 19, 2025

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

mathysgrapotte commented Feb 19, 2025