Skip to content

Rhyme model evaluation appears to extract unstressed final syllables #3

@MasonLilly

Description

@MasonLilly

stresses = set(["0", "1", "2"])

I'm referencing the paper and code for this project in a class project. Because tensorflow 0.12 is hard to find anymore and can't be used on Colab, I'm rewriting my own implementation in PyTorch. In reading over the code responsible for evaluating the rhyme model to reproduce the original results, I notice that the line referenced above appears to allow the syllable_to_rhyme() function to consider CMU tokens ending in 1, 2, or 0 to be extracted as the delimiting syllable for a word-ending rhyme. If I understand the CMU dictionary right, this is incorrect, because '0' tokens correspond to unstressed syllables, while the paper says in section 5.1.3 that the rhyme evaluation uses the final stressed syllable.

The CMU dictionary site does list non-numbered vowels in its list of symbols. However, in the current version of the dictionary available through NLTK, none of those are used (except for one instance of 'UW' in one pronunciation of the word "juanita").

Have I understood this correctly? Was delimiting on unstressed syllables intentional, or should I correct this in my own implementation? Should I expect to be able to match the paper's results without doing it this way?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions