Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement S3 Lifecycle Policy for Temporary Audio Cleanup and Error Handling (Fixes #172) #174

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

minimalProviderAgentMarket
Copy link
Contributor

Pull Request Description

Title: Implement S3 Lifecycle Policy for Temporary Audio Cleanup and Error Handling

Background:
This pull request addresses the issue identified in Issue #172, which highlights the problem of temporary audio files stored in S3 remaining indefinitely after transcription failures or interruptions. This behavior poses risks including unnecessary storage costs, potential security vulnerabilities, and violations of data retention policies.

Objective:
The main objective of this implementation is to ensure that temporary audio files in the audiotranscribetemp bucket are automatically cleaned up after a defined period, regardless of the transcription outcome.

Summary of Changes:

  1. New File Creation:

    • Implemented the aws_services.py file within the utils directory dedicated to managing S3 operations.
  2. S3Service Class:

    • Developed the S3Service class, which includes methods for:
      • Uploading audio files to S3.
      • Deleting audio files based on transcription results.
      • Establishing an automatic lifecycle policy to manage file retention in S3.
  3. Lifecycle Policy Configuration:

    • Configured the S3 bucket lifecycle policy to automatically delete all objects after 24 hours. This ensures that temporary audio files will not persist beyond their usefulness.
  4. Code Modifications:

    • Updated the create_s3_bucket_if_not_exists method in services.py to incorporate the lifecycle policy setup.
    • Enhanced error handling to ensure that any failures in setting up the lifecycle policy do not hinder the overall functionality of the bucket.

Benefits of Implementation:

  • Automatic Cleanup: The lifecycle policy will prevent indefinite storage of temporary files by enforcing a deletion period of 24 hours.
  • Cost Efficiency: This change reduces unnecessary storage costs associated with retaining obsolete audio files.
  • Security Compliance: By ensuring that files are deleted after a set period, we mitigate potential security risks associated with orphaned files.
  • Robust Error Handling: Implementation includes best-effort measures to maintain the integrity of file management operations, whether automated or manual.

This pull request effectively resolves the issues presented in the original request, ensuring more efficient and compliant management of audio files in S3.

Fixes #172.

We appreciate your review and feedback on this implementation. Thank you!

Add automatic cleanup functionality for S3 buckets by implementing a 
24-hour lifecycle policy. This change includes:

- Create new S3Service class in utils/aws_services.py with dedicated 
  lifecycle policy setup
- Add lifecycle policy configuration to existing AWSServices class
- Configure automatic deletion of objects after 24 hours
- Implement proper error handling and logging for policy setup
- Add optional manual deletion method for immediate cleanup

This change helps prevent storage costs from accumulating due to 
forgotten temporary audio files in S3 buckets.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement S3 Lifecycle Policy for Temporary Audio Cleanup and Error Handling
1 participant