Skip to content

🎙Speech recognition using the tensorflow deep learning framework, sequence-to-sequence neural networks

License

Notifications You must be signed in to change notification settings

jiaxp3144/tensorflow-speech-recognition

This branch is 3 commits behind pannous/tensorflow-speech-recognition:master.

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ee48345 Â· Jun 20, 2018
No commit message
Jun 7, 2018
No commit message
Dec 12, 2016
No commit message
Dec 9, 2016
No commit message
Jun 7, 2018
Feb 22, 2017
Mar 22, 2017
No commit message
Dec 22, 2016
No commit message
Dec 14, 2016
Jun 20, 2018
No commit message
Dec 14, 2016
No commit message
Feb 16, 2017
No commit message
Feb 16, 2017
No commit message
Jun 7, 2018
No commit message
Feb 16, 2017
No commit message
Feb 16, 2017
No commit message
Feb 16, 2017
No commit message
Feb 16, 2017
No commit message
Feb 16, 2017
No commit message
Feb 16, 2017
No commit message
Feb 16, 2017
No commit message
Feb 16, 2017
No commit message
Feb 16, 2017
No commit message
Feb 16, 2017
Feb 22, 2017
No commit message
Jun 7, 2018
No commit message
Jun 7, 2018
No commit message
Feb 16, 2017
Feb 21, 2017
No commit message
Feb 16, 2017
Sep 6, 2017
Jun 7, 2018
No commit message
Feb 16, 2017
Jun 7, 2018
No commit message
Jun 7, 2018
No commit message
Feb 23, 2017
Feb 28, 2017
No commit message
Feb 16, 2017
No commit message
Dec 9, 2016

Repository files navigation

Tensorflow Speech Recognition

Speech recognition using google's tensorflow deep learning framework, sequence-to-sequence neural networks.

Replaces caffe-speech-recognition, see there for some background.

Update Mozilla released DeepSpeech

They achieve good error rates. Free Speech is in good hands, go there if you are an end user. For now this project is only maintained for educational purposes.

Ultimate goal

Create a decent standalone speech recognition for Linux etc. Some people say we have the models but not enough training data. We disagree: There is plenty of training data (100GB here and 21GB here on openslr.org , synthetic Text to Speech snippets, Movies with transcripts, Gutenberg, YouTube with captions etc etc) we just need a simple yet powerful model. It's only a question of time...

Sample spectrogram, That's what she said, too laid?

Sample spectrogram, Karen uttering 'zero' with 160 words per minute.

Installation

clone code

git clone https://github.com/pannous/tensorflow-speech-recognition
cd tensorflow-speech-recognition
git clone https://github.com/pannous/layer.git
git clone https://github.com/pannous/tensorpeers.git

pyaudio

requirements portaudio from http://www.portaudio.com/

git clone  https://git.assembla.com/portaudio.git
./configure --prefix=/path/to/your/local
make
make install
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/path/to/your/local/lib
export LIDRARY_PATH=$LIBRARY_PATH:/path/to/your/local/lib
export CPATH=$CPATH:/path/to/your/local/include
source ~/.bashrc

install pyaudio

pip install pyaudio

Getting started

Toy examples: ./number_classifier_tflearn.py ./speaker_classifier_tflearn.py

Some less trivial architectures: ./densenet_layer.py

Later: ./train.sh ./record.py

Sample spectrogram or record.py

Update: Nervana demonstrated that it is possible for 'independents' to build speech recognizers that are state of the art.

Fun tasks for newcomers

Extensions

Extensions to current tensorflow which are probably needed:

Even though this project is far from finished we hope it gives you some starting points.

Looking for a tensorflow collaboration / consultant / deep learning contractor? Reach out to info@pannous.com

About

🎙Speech recognition using the tensorflow deep learning framework, sequence-to-sequence neural networks

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 98.0%
  • Swift 2.0%