-
Notifications
You must be signed in to change notification settings - Fork 55
Closed as not planned
Closed as not planned
Copy link
Labels
epicLarger tracking issue encompassing multiple smaller issuesLarger tracking issue encompassing multiple smaller issuesjira
Milestone
Description
The generate_data function is currently integrated with various functionalities like taxonomy data ingestion, preprocessing and mixing, leading to maintenance and testing challenges. We propose refactoring this into a clean, dedicated Python API that handles only data generation. This separation will increase modularity and ease further development.
Objectives
- Extract the generate logic from the existing implementation and encapsulate it within a new Python API.
- Ensure this API is compatible with both standalone use and integration into the CLI.
- Maintain the integrity of the existing codebase while simplifying the generation process.
Acceptance Criteria
- Define the New API
- Develop a Python API that focuses solely on the data generation process.
- Include additional parameters such as dataset path, output save path, pipeline path.
- Utilize the API within a CLI context to ensure seamless integration.
- Independent SDG CLI
- Use click for CLI development, providing options to configure the generation process directly from the command line.
- Ensure that the current existing ilab CLI uses this new API effectively, passing all necessary parameters through command line options.
- Testing and Debugging
- Write comprehensive unit tests for the new API to ensure it works as expected under various configurations.
- Documentation and Examples
- Since the new SDG CLI will require you to pass your own dataset and pipeline, it is essential to update the project documentation to include detailed instructions on how to use the new API and CLI.
- Provide example commands and configurations to help users get started with the new setup.
Metadata
Metadata
Assignees
Labels
epicLarger tracking issue encompassing multiple smaller issuesLarger tracking issue encompassing multiple smaller issuesjira