Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature][Customize] Add Support for Incremental CSV Upload in the Customize Plugin #8216

Closed
3 tasks done
narrowizard opened this issue Nov 25, 2024 · 2 comments · Fixed by #8218
Closed
3 tasks done
Assignees
Labels
improvement type/feature-request This issue is a proposal for something new

Comments

@narrowizard
Copy link
Collaborator

Search before asking

  • I had searched in the issues and found no similar feature requirement.

Use case

As a DevLake user leveraging the Customize plugin to upload issues and issue_repo_commits data for further analysis, I need the ability to perform incremental CSV uploads. This would allow me to append new data to existing records without overwriting or replacing the entire dataset.

Description

Currently, the Customize plugin in DevLake only supports full data uploads, which replace all existing data with the new data from the uploaded CSV file. While this functionality works for initial data loads, it poses significant challenges as the dataset grows over time:

  1. Data Integrity Risks: Full uploads may inadvertently overwrite or lose historical data, compromising the dataset's accuracy and completeness.
  2. File Maintenance Overhead: CSV files become increasingly large as time progresses, making them cumbersome to maintain and manage.
    To address these challenges, I propose adding incremental upload support to the Customize plugin. This feature would enable users to append new records from CSV files to the existing dataset without requiring a complete overwrite.

Benefits:

  • Enhanced Data Integrity: Ensures existing data remains untouched while appending new entries.
  • Improved Scalability: Reduces the need to maintain and manage increasingly large CSV files.
  • Better User Experience: Simplifies data upload workflows for users.
    I envision this feature functioning as follows:
  1. Users upload a new CSV file containing only new data entries.
  2. The Customize plugin compares the uploaded data with existing records.
  3. New records are appended to the domain layer, while existing records remain unchanged.
    This functionality would greatly improve the usability of the Customize plugin and make it more suitable for long-term data collection and analysis workflows.

Let me know if additional details or clarifications are needed!

Related issues

No

Are you willing to submit a PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@narrowizard narrowizard added the type/feature-request This issue is a proposal for something new label Nov 25, 2024
@dosubot dosubot bot added the improvement label Nov 25, 2024
@narrowizard
Copy link
Collaborator Author

narrowizard commented Nov 25, 2024

Plan

The implementation involves two APIs for uploading issues.csv and issue_repo_commits.csv data, respectively.

For issues.csv:

  • Parameters: boardId, boardName, file
  • Existing Logic:
    1. Update data in the boards table based on boardId.
    2. Clear data in the issues and issue_labels tables (clearing logic: remove records associated only with the given boardId).
    3. Clear data in the board_issues table.
    4. Create or update records in the issue_labels table based on the uploaded file (primary key: issue_id, label_name).
    5. Create or update records in the board_issues table based on the uploaded file (primary key: board_id, issue_id).
    6. Create new records in the issues table based on the uploaded file.
  • Incremental Update Logic:
    1. Add a new parameter: incremental = false.
    2. When incremental = true:
    • Skip the data-clearing steps from the existing logic.
    • Modify the issues table operation to CreateOrUpdate (primary key: id).
    • Retain all other existing logic.

For issue_repo_commits.csv:

  • Parameters: boardId, file
  • Existing Logic:
    1. Clear data in the issue_repo_commits and issue_commits tables (clearing logic: remove records associated only with the given boardId).
    2. Create new records in the issue_repo_commits and issue_commits tables based on the uploaded file.
  • Incremental Update Logic:
    1. Add a new parameter: incremental = false.
    2. When incremental = true:
    • Skip the data-clearing steps from the existing logic.
    • Modify the issue_repo_commits table operation to CreateOrUpdate (primary key: issue_id, repo_url, commit_sha).

This plan ensures support for incremental updates while retaining the full upload functionality when needed.

@dosubot Pls review the plan above.

@narrowizard narrowizard self-assigned this Nov 25, 2024
@klesh
Copy link
Contributor

klesh commented Nov 25, 2024

Excellent, It is very descriptive. Sounds great to me
Looking forward to your PR.

narrowizard added a commit that referenced this issue Nov 25, 2024
…es.csv

[Feature][Customize] Add Support for Incremental CSV Upload in the Customize Plugin #8216
narrowizard added a commit that referenced this issue Nov 25, 2024
…_repo_commits.csv

    [Feature][Customize] Add Support for Incremental CSV Upload in the Customize Plugin #8216
narrowizard added a commit that referenced this issue Nov 25, 2024
…_repo_commits.csv

    [Feature][Customize] Add Support for Incremental CSV Upload in the Customize Plugin #8216
narrowizard added a commit that referenced this issue Nov 26, 2024
…_repo_commits.csv

    [Feature][Customize] Add Support for Incremental CSV Upload in the Customize Plugin #8216
narrowizard added a commit that referenced this issue Nov 26, 2024
…ustomize plugin (#8218)

* feat: support incremental import for 

- /plugins/customize/csvfiles/issues.csv
- /plugins/customize/csvfiles/issue_repo_commits.csv

[Feature][Customize] Add Support for Incremental CSV Upload in the Customize Plugin #8216
github-actions bot pushed a commit that referenced this issue Nov 26, 2024
…ustomize plugin (#8218)

* feat: support incremental import for 

- /plugins/customize/csvfiles/issues.csv
- /plugins/customize/csvfiles/issue_repo_commits.csv

[Feature][Customize] Add Support for Incremental CSV Upload in the Customize Plugin #8216
narrowizard added a commit that referenced this issue Nov 26, 2024
…ustomize plugin (#8218) (#8220)

* feat: support incremental import for 

- /plugins/customize/csvfiles/issues.csv
- /plugins/customize/csvfiles/issue_repo_commits.csv

[Feature][Customize] Add Support for Incremental CSV Upload in the Customize Plugin #8216

Co-authored-by: NaRro <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement type/feature-request This issue is a proposal for something new
Projects
None yet
2 participants