Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement S3 Lifecycle Policy for Temporary Audio Cleanup and Error Handling #164

Closed
vadanrod14 opened this issue Feb 17, 2025 · 0 comments · May be fixed by #167
Closed

Implement S3 Lifecycle Policy for Temporary Audio Cleanup and Error Handling #164

vadanrod14 opened this issue Feb 17, 2025 · 0 comments · May be fixed by #167

Comments

@vadanrod14
Copy link
Contributor

Problem

The current implementation in aws_services.py creates temporary audio file
in S3 but only deletes them after successful transcription. If transcription
fails or the process is interrupted, these files remain in S3 indefinitely,
which can:

  • Lead to unnecessary storage costs
  • Create potential security risks with stored audio files
  • Violate data retention policies

Current Behavior

  • Audio files are uploaded to 'audio-transcribe-temp' bucket
  • Files are only deleted after successful transcription
  • Failed transcriptions leave orphaned files
  • No automated cleanup mechanism exists

Proposed Solution

  1. Implement S3 bucket lifecycle policy to automatically delete objects afte
    24 hours
  2. Add proper error handling to ensure cleanup in failure scenarios
  3. Consider implementing a monitoring system for bucket usage

Technical Details

Affected files:

  • aws_services.py
  • Specifically the transcribe_audio() function

Implementation Notes

  • Use AWS S3 lifecycle rules
  • Consider adding CloudWatch metrics for bucket usage
  • Add logging for cleanup operations
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
1 participant