Skip to content

yixuanzhou/End-to-End-Audio-Recognition

Repository files navigation

End-to-End-Audio-Recognition

The web has been deployed on 121.40.161.184, one can directly access on http://121.40.161.184:8484/music_voice.html(for audio to text translate) and http://121.40.161.184:8484/search.html(for search in database through keyword).

Dependencies

  • Python 2.x
  • HBase
  • For downloading m3u8 audio stream and convert to wav files:
  • For audio classification and segmentation tasks:
  • For segmenting an audio file (wav) into pieces:
  • For audio recognition and translate to text:

Workflow

  1. Prepare audio files for training model (train-model.py)
  2. Use pre-trained model to classify targeted audio segments (audio-classifier.py)
  3. Filter to get optimized the audio segments (audio_filter.py)
  4. Segment an audio file into pieces according to segment points (audio-segmenter.py)
  5. For each audio segment, do audio to text translation (audio_recognition.py)
  6. Save the result data in HBase (pythrift.py)

Demo

image image

About

Classify & Segment audio stream and convert into text

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published