GitHub - cservan/tercpp-embeddings

#tercpp-embeddings

tercpp-embeddings: an open-source Translation Edit Rate (TER) scorer tool for Machine Translation and Word Error Rate (WER) tool for Automatic Speech Recognition using Word Embeddings (WER-E and WER-S).

Subdirectories:

	papers/			research papers 
	src/			source code of library needed
	test/			source code for main source.

Installation

See INSTALL for build instructions. You have to install firstly the multivec library : https://github.com/eske/multivec/

Example

WER-E:

with multivec binary models:

	tercpp-embeddings --WER  --noTxtIds -r reference.txt -h hypothesis.txt -emb binary_embeddings_model.fr

with multivec text models:

	tercpp-embeddings --WER  --noTxtIds -r reference.txt -h hypothesis.txt -embtxt text_embeddings_model.fr

with word2vec binary models:

	tercpp-embeddings --WER  --noTxtIds -r reference.txt -h hypothesis.txt -w2v binary_embeddings_model.fr

WER-S:

with multivec binary models:

	tercpp-embeddings --WER  --noTxtIds -r reference.txt -h hypothesis.txt -emb binary_embeddings_model.fr --deeper

with multivec text models:

	tercpp-embeddings --WER  --noTxtIds -r reference.txt -h hypothesis.txt -embtxt text_embeddings_model.fr --deeper

with word2vec binary models:

	tercpp-embeddings --WER  --noTxtIds -r reference.txt -h hypothesis.txt -w2v binary_embeddings_model.fr --deeper

Text models

Text models are like word2vec text models:

	83388 50		
	saluting -1.35397 0.586016 2.79672 -0.418376 -0.602821 1.21831 -1.93253 -1.04009 -0.370907 0.604416 0.336976 0.641255 0.334868 0.190173 0.0872742 0.883609 0.211205 0.80019 0.48248 0.824242 -0.346953 -1.07881 -0.521974 -0.111079 0.516307 1.86109 0.737816 0.225595 -0.282824 0.387372 0.523839 0.81536 1.03448 0.662487 0.315265 -2.33873 -0.0998709 -0.809018 -0.157693 -0.348118 0.733476 -0.694376 -0.411821 0.923461 -0.36941 0.576924 -0.00253894 -0.211869 0.738205 0.261968 
	liga 0.240736 0.374699 -0.456721 0.0247067 -0.500141 -0.0672075 -1.15694 -1.043 0.321601 -0.072098 0.402517 0.124218 0.706794 -0.565653 -0.0125166 0.0923419 0.738372 -1.34922 0.00461746 0.480944 -0.157564 -0.685169 -0.379456 0.114373 -0.497862 0.201393 0.804827 -0.173226 -0.403769 -0.529033 0.764493 0.0586693 0.373357 0.780176 0.0847483 0.608413 0.695131 -2.18405 -0.337933 0.0852144 -0.315775 0.484913 -0.0181489 0.439723 0.878166 -0.447149 -0.926107 0.124669 0.258716 1.17472 
	congressmen -2.69759 1.41468 1.14476 -2.2755 -0.324732 1.39088 -2.3428 3.7577 -1.63871 3.99818 1.87524 -2.47587 -0.0758982 -2.32292 -2.73987 1.70998 -1.05869 0.4186 -1.83306 -1.39962 -0.229668 -1.3621 0.61785 -0.712404 -0.570894 0.367588 -1.02264 1.85515 -0.0886673 -0.721675 -0.631928 -0.198751 -4.25625 0.131479 -0.33592 -0.565107 -0.926477 -3.46209 -4.15838 0.272748 0.925235 0.186114 1.2844 -1.43701 0.570687 -1.71726 0.88729 -1.34935 0.680968 2.594 
	outgrown 0.710253 -0.233596 -0.0494852 0.857347 0.25338 0.227812 -0.250028 -0.45428 0.290734 0.12975 0.251975 -0.0849167 0.530985 -0.340551 0.266533 0.584058 -0.0273177 0.77235 -0.285059 -0.306495 -0.592585 -0.212158 -0.211939 -0.723026 -0.010724 -0.165841 -0.140728 0.0779976 -0.860507 1.28739 0.560873 0.149085 0.0701273 0.00534706 -0.141197 0.703732 -0.58476 -0.352017 0.558202 0.0132576 0.067365 -0.00563633 0.0699992 -0.573648 -0.0673231 0.551693 0.294159 -0.0474721 -0.00882039 0.611732 
	causally 0.728588 0.113795 -0.654911 -0.125335 -0.307491 -0.518799 0.521385 -0.00640315 0.153376 -0.320903 0.0846396 0.289929 0.344029 -0.697442 -0.361158 -0.354603 0.536638 -0.318213 0.259182 0.123059 -1.07387 -0.193267 0.204612 -0.227474 -0.40505 -0.61406 -0.399437 -0.186055 -0.508952 0.119864 -0.44636 -0.853974 -0.122938 0.160827 -0.022631 0.291231 -0.608337 -0.157452 -0.196376 0.628901 0.0446108 -0.225483 -0.0513283 -0.00311536 0.598802 0.152529 0.0245117 0.483343 -0.0328817 -0.023467 
	flesch -1.52651 0.00303828 2.04088 -1.77761 0.911405 0.696672 -0.0195003 -1.37895 1.61164 -1.85726 0.466313 3.26666 1.01065 -2.95666 -0.957727 -0.649299 0.289477 -4.81147 -1.02689 1.77689 -2.67837 -3.17847 -0.932097 -0.839573 0.0256079 1.04358 2.61236 1.12533 -0.851274 -0.190656 0.838955 0.396768 1.58905 2.45082 2.24205 0.343338 0.152801 -1.40638 1.62371 -1.43038 -0.583923 -2.22495 0.455508 0.00134699 1.02266 -1.09808 1.04223 -3.45179 0.0491755 -2.18821 
	...

Data

All the data used for our experiments in our paper accepted to INTERSPEECH 2016 are available here: https://github.com/besacier/WCE-SLT-LIG/tree/master/IS2016

Acknowledgement

This toolkit is part of the project KEHATH (https://kehath.imag.fr/) funded by the French National Research Agency.
Copyright 2015-2016, Christophe Servan, GETALP-LIG, University of Grenoble, France

This toolkit was partially supported under the GALEprogram of the Defense Advanced Research Projects Agency, Contract No. HR0011-06-C-0022, and by the European Commission under the project EuroMatrixPlus
Copyright 2010-2013, Christophe Servan, LIUM, University of Le Mans, France

Contact: [email protected]

Citation

To reference tercpp in your publications, please cite these article:

@InProceedings{LeIS2016, author = {Ngoc-Tien Le and Christophe Servan and Benjamin Lecouteux and Laurent Besacier}, title = {Better Evaluation of ASR in Speech Translation Context Using Word Embeddings}, booktitle = {INTERSPEECH 2016}, year = 2016 }

@article{servanPBML2011, title={Optimising multiple metrics with mert}, author={Servan, Christophe and Schwenk, Holger}, journal={The Prague Bulletin of Mathematical Linguistics}, volume={96}, number={1}, pages={109--117}, year={2011}, publisher={Versita} }

The TER code is based on the Snover's algorithm provided at http://www.cs.umd.edu/~snover/tercom

References :

Matthew Snover, Bonnie Dorr, Richard Schwartz, Linnea Micciulla and John Makhoul, "A Study of Translation Edit Rate with Targeted Human Annotation," Proceedings of Association for Machine Translation in the Americas, 2006.
Matthew Snover, Bonnie J. Dorr, Richard Schwartz, John Makhoul, Linnea Micciulla and Ralph Weischedel, "A Study of Translation Error Rate with Targeted Human Annotation," LAMP-TR-126, CS-TR-4755, UMIACS-TR-2005-58, University of Maryland, College Park, MD July, 2005.

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
papers		papers
src		src
test		test
vendor		vendor
.gitmodules		.gitmodules
CHANGES		CHANGES
CMakeLists.txt		CMakeLists.txt
COPYING		COPYING
Copyright		Copyright
Dockerfile		Dockerfile
INSTALL		INSTALL
README.md		README.md
README.txt		README.txt
RELEASE		RELEASE
cbuild.sh		cbuild.sh
tercpp.kdev4		tercpp.kdev4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Subdirectories:

Installation

Example

WER-E:

WER-S:

Text models

Data

Acknowledgement

Citation

About

Uh oh!

Releases

Packages

Languages

License

cservan/tercpp-embeddings

Folders and files

Latest commit

History

Repository files navigation

Subdirectories:

Installation

Example

WER-E:

WER-S:

Text models

Data

Acknowledgement

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages