Rhyme model evaluation appears to extract unstressed final syllables

https://github.com/jhlau/deepspeare/blob/90489bc78271bd36629c836d3bca6c3dab49a99d/util.py#L412

I'm referencing the paper and code for this project in a class project. Because tensorflow 0.12 is hard to find anymore and can't be used on Colab, I'm rewriting my own implementation in PyTorch. In reading over the code responsible for evaluating the rhyme model to reproduce the original results, I notice that the line referenced above appears to allow the syllable_to_rhyme() function to consider CMU tokens ending in 1, 2, **or 0** to be extracted as the delimiting syllable for a word-ending rhyme. If I understand the [CMU dictionary](http://www.speech.cs.cmu.edu/cgi-bin/cmudict) right, this is incorrect, because '0' tokens correspond to _unstressed_ syllables, while [the paper](https://arxiv.org/pdf/1807.03491.pdf) says in section 5.1.3 that the rhyme evaluation uses the final _stressed_ syllable.

The CMU dictionary site does list non-numbered vowels in its [list of symbols](http://svn.code.sf.net/p/cmusphinx/code/trunk/cmudict/cmudict-0.7b.symbols). However, in the current version of the dictionary available through NLTK, none of those are used (except for one instance of 'UW' in one pronunciation of the word "juanita").

Have I understood this correctly? Was delimiting on unstressed syllables intentional, or should I correct this in my own implementation? Should I expect to be able to match the paper's results without doing it this way?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rhyme model evaluation appears to extract unstressed final syllables #3

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Rhyme model evaluation appears to extract unstressed final syllables #3

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions