This branch is 3 commits behind pannous/tensorflow-speech-recognition:master.

Name	Name	Last commit message	Last commit date
Latest commit pannous Merge pull request pannous#45 from camelshang/master Jun 20, 2018 ee48345 · Jun 20, 2018 History 330 Commits
.idea	.idea	No commit message	Jun 7, 2018
extra	extra	No commit message	Dec 12, 2016
images	images	No commit message	Dec 9, 2016
layer	layer	No commit message	Jun 7, 2018
tensorpeers @ f571827	tensorpeers @ f571827	note	Feb 22, 2017
.gitignore	.gitignore	fixed requirements.txt	Mar 22, 2017
.gitmodules	.gitmodules	No commit message	Dec 22, 2016
LICENSE	LICENSE	No commit message	Dec 14, 2016
README.md	README.md	Merge pull request pannous#45 from camelshang/master	Jun 20, 2018
WarpCTC.txt	WarpCTC.txt	No commit message	Dec 14, 2016
__init__.py	__init__.py	No commit message	Feb 16, 2017
bdlstm_utils.py	bdlstm_utils.py	No commit message	Feb 16, 2017
densenet_layer.py	densenet_layer.py	No commit message	Jun 7, 2018
generate_speech_data.py	generate_speech_data.py	No commit message	Feb 16, 2017
lstm-tflearn.py	lstm-tflearn.py	No commit message	Feb 16, 2017
lstm_ctc_to_chars.py	lstm_ctc_to_chars.py	No commit message	Feb 16, 2017
lstm_mfcc_ctc_to_words.py	lstm_mfcc_ctc_to_words.py	No commit message	Feb 16, 2017
lstm_mfcc_to_chars.py	lstm_mfcc_to_chars.py	No commit message	Feb 16, 2017
lstm_to_chars.py	lstm_to_chars.py	No commit message	Feb 16, 2017
mfcc_feature_classifier.py	mfcc_feature_classifier.py	No commit message	Feb 16, 2017
number_classifier_tflearn.py	number_classifier_tflearn.py	No commit message	Feb 16, 2017
number_gan_layer.py	number_gan_layer.py	No commit message	Feb 16, 2017
number_gan_tflearn.py	number_gan_tflearn.py	No commit message	Feb 16, 2017
record-autoencoder.py	record-autoencoder.py	qanda q_and_a sample data	Feb 22, 2017
record.py	record.py	No commit message	Jun 7, 2018
requirements.txt	requirements.txt	No commit message	Jun 7, 2018
speaker_classifier_tflearn.py	speaker_classifier_tflearn.py	No commit message	Feb 16, 2017
spectro_gan.py	spectro_gan.py	nix	Feb 21, 2017
speech2text-seq2seq.py	speech2text-seq2seq.py	No commit message	Feb 16, 2017
speech2text-tflearn.py	speech2text-tflearn.py	fix speech2text-tflearn.py	Sep 6, 2017
speech_data.py	speech_data.py	spoken_numbers_pcm.tar	Jun 7, 2018
speech_encoder.py	speech_encoder.py	No commit message	Feb 16, 2017
spoken_numbers_pcm.tar	spoken_numbers_pcm.tar	spoken_numbers_pcm.tar	Jun 7, 2018
spoken_numbers_spectros_64x64.tar	spoken_numbers_spectros_64x64.tar	No commit message	Jun 7, 2018
subtitle-downloader.py	subtitle-downloader.py	No commit message	Feb 23, 2017
subtitle_srt_parser.py	subtitle_srt_parser.py	x	Feb 28, 2017
wave_GANerate.py	wave_GANerate.py	No commit message	Feb 16, 2017
word_to_phonemes.swift	word_to_phonemes.swift	No commit message	Dec 9, 2016

Repository files navigation

Tensorflow Speech Recognition

Speech recognition using google's tensorflow deep learning framework, sequence-to-sequence neural networks.

Replaces caffe-speech-recognition, see there for some background.

Update Mozilla released DeepSpeech

They achieve good error rates. Free Speech is in good hands, go there if you are an end user. For now this project is only maintained for educational purposes.

Ultimate goal

Create a decent standalone speech recognition for Linux etc. Some people say we have the models but not enough training data. We disagree: There is plenty of training data (100GB here and 21GB here on openslr.org , synthetic Text to Speech snippets, Movies with transcripts, Gutenberg, YouTube with captions etc etc) we just need a simple yet powerful model. It's only a question of time...

Sample spectrogram, Karen uttering 'zero' with 160 words per minute.

Installation

clone code

git clone https://github.com/pannous/tensorflow-speech-recognition
cd tensorflow-speech-recognition
git clone https://github.com/pannous/layer.git
git clone https://github.com/pannous/tensorpeers.git

pyaudio

requirements portaudio from http://www.portaudio.com/

git clone  https://git.assembla.com/portaudio.git
./configure --prefix=/path/to/your/local
make
make install
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/path/to/your/local/lib
export LIDRARY_PATH=$LIBRARY_PATH:/path/to/your/local/lib
export CPATH=$CPATH:/path/to/your/local/include
source ~/.bashrc

install pyaudio

pip install pyaudio

Getting started

Toy examples: ./number_classifier_tflearn.py ./speaker_classifier_tflearn.py

Some less trivial architectures: ./densenet_layer.py

Later: ./train.sh ./record.py

Update: Nervana demonstrated that it is possible for 'independents' to build speech recognizers that are state of the art.

Fun tasks for newcomers

Watch video : https://www.youtube.com/watch?v=u9FPqkuoEJ8
Understand and correct the corresponding code: lstm-tflearn.py
Data Augmentation : create on-the-fly modulation of the data: increase the speech frequency, add background noise, alter the pitch etc,...

Extensions

Extensions to current tensorflow which are probably needed:

WarpCTC on the GPU see issue
Incremental collaborative snapshots ('P2P learning') !
Modular graphs/models + persistance

Even though this project is far from finished we hope it gives you some starting points.

Looking for a tensorflow collaboration / consultant / deep learning contractor? Reach out to info@pannous.com

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Tensorflow Speech Recognition

Update Mozilla released DeepSpeech

Ultimate goal

Installation

clone code

pyaudio

requirements portaudio from http://www.portaudio.com/

install pyaudio

Getting started

Fun tasks for newcomers

Extensions

About

Releases

Packages

Languages

License

jiaxp3144/tensorflow-speech-recognition

Folders and files

Latest commit

History

Repository files navigation

Tensorflow Speech Recognition

Update Mozilla released DeepSpeech

Ultimate goal

Installation

clone code

pyaudio

requirements portaudio from http://www.portaudio.com/

install pyaudio

Getting started

Fun tasks for newcomers

Extensions

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages