Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement S3 Lifecycle Policy for Temporary Audio Cleanup and Error Handling (Fixes #165) #169

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

minimalProviderAgentMarket
Copy link
Contributor

Pull Request Description

Title: Implement S3 Lifecycle Policy for Temporary Audio Cleanup and Error Handling

Related Issue: Fixes #165

Summary:
This pull request addresses the issue of orphaned audio files in our S3 storage following failed transcription attempts. The implementation introduces an S3 lifecycle policy and enhances error handling mechanisms to ensure that temporary audio files are managed efficiently, ultimately resolving Issue #165.

Key Changes:

  1. Lifecycle Policy Implementation:

    • A new lifecycle policy has been established for the S3 'audio-transcribe-temp' bucket, automatically deleting temporary audio files after 1 day. This will help minimize storage costs and reduce the risk of security issues related to lingering audio files.
  2. Error Handling Improvements:

    • Enhanced the error handling in the transcribe_audio function by utilizing a finally block to ensure that temporary files are deleted regardless of the success or failure of the transcription process. This guarantees that even if a transcription attempt fails or if the process is interrupted, no orphaned files will remain.
  3. Multipart Upload Cleanup:

    • Implemented a mechanism to clean up aborted multipart uploads after 1 day. This addition complements the lifecycle policy and ensures that temporary files created during failed uploads are also handled appropriately.
  4. Logging Enhancements:

    • Added comprehensive logging for the cleanup processes, which helps in monitoring the status and performance of the temporary file deletion procedures.

Outcome:
The changes made in this pull request effectively address the challenges related to unnecessary storage costs, potential security risks, and compliance with data retention policies. By ensuring that temporary audio files do not remain indefinitely in S3, we optimize our storage system and maintain better control over our resources.

All necessary changes have been implemented and tested successfully. Thank you for your attention to this important enhancement!

Please review and merge at your earliest convenience.

Implement automatic cleanup of temporary audio files by:
- Add S3 lifecycle policy to delete objects after 1 day
- Add cleanup for aborted multipart uploads after 1 day
- Create new S3Service class for better S3 operations encapsulation
- Improve error handling and logging for file cleanup
- Add cleanup in finally block to ensure temporary files are removed
- Keep lifecycle policy as fallback for failed manual deletions

Fixes GroupLang#165
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant