Skip to content

Develop LeRobotDataset tools #2326

@michel-aractingi

Description

@michel-aractingi

🧰 Call for Contributions: Expanding Dataset Tools in LeRobotDataset

Currently, the dataset tools provided in src/lerobot/datasets/dataset_tools.py allow users to:

  • Delete episodes from a dataset
  • Split a dataset into multiple sub-datasets
  • Merge multiple datasets into one
  • Modify dataset features by removing existing features and/or adding new ones

You can try these tools by running lerobot-edit-dataset with the appropriate configuration.

While these tools fill an important gap by enabling manipulation of existing datasets, there’s still a lot of functionality missing.
This issue is a call to the community to help develop and expand the dataset tools for LeRobotDataset.


🚧 Some Missing Tools / Future Ideas

🗒️ 1. Modify language instructions at the episode level

Once a task description is written in the dataset, it’s currently not possible to edit or extend it (e.g., to fix typos or add a more detailed description).
We could add tools to update or rewrite the natural language instructions associated with specific episodes.


➕ 2. Add support for add_feature in lerobot_edit_dataset.py

Check the EditDatasetConfig class in the editing script. The add_feature option is not supported yet.
This would make it possible to easily add new computed or external features without manually modifying parquet files.


🔄 3. Merge datasets with different feature keys

Currently, the merge tool requires all datasets to share the exact same feature keys.
This limits merging datasets that are semantically similar but differ in naming conventions.

Possible improvements:

  • Allow merging datasets with semantically matching features using different keys
    (e.g., dataset1: "observation.image.camera1" vs dataset2: "observation.image.top")
  • Introduce a feature mapping dictionary to align features across datasets
  • Implement a rename_features tool to rename features within a dataset. This would be useful both for merging and as another editing tool
  • (Advanced) Support merging datasets with different feature sets by creating a union of all features and padding missing ones appropriately.
    This would be more complex and could be considered a lower priority.

💡 Open for Ideas

If you’d like to propose a new tool or enhancement, please comment below or open a PR. Contributions are highly encouraged!

Your help will make LeRobotDataset a more flexible and powerful framework for robotic data management 🚀

Original PR: #2100

Metadata

Metadata

Labels

datasetIssues regarding data inputs, processing, or datasetsenhancementSuggestions for new features or improvements

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions