Skip to content

Generations from masked input sequences are garbled #6

@nytopop

Description

@nytopop

And it seems like the number of masked tokens smoothly increases how garbled it is. Something is being computed without regard for masking probably.

Happens on both CPU and CUDA.

Sample audio generated by main.py:

text = [
    "This is a test!",
    "Hey, what is going on in this play?",
    "Very smooth of you.",
    "I wonder why this is happening. Clearly, this sample is clear because it is unmasked.",
]

samples.zip

With the following input ids:

tensor([18, 37, 20, 87]) tensor([[  0,  81, 157, 102,  61,  16, 102,  68,  16,  70,  16,  62, 156,  86,
          61,  62,   5,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
           0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
           0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
           0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
           0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
           0,   0,   0],
        [  0,  50, 156,  24,   3,  16,  65, 157, 138,  62,  16, 102,  68,  16,
          92, 156,  31, 102, 112,  16, 157,  76,  56,  16, 102,  56,  16,  81,
         102,  61,  16,  58,  54, 156,  24,   6,   0,   0,   0,   0,   0,   0,
           0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
           0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
           0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
           0,   0,   0],
        [  0,  64, 156,  86, 123,  51,  16,  61,  55, 156,  63,  81,  16, 138,
          64,  16,  52,  63,   4,   0,   0,   0,   0,   0,   0,   0,   0,   0,
           0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
           0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
           0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
           0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
           0,   0,   0],
        [  0, 157,  25,  16,  65, 156, 138,  56,  46,  83, 123,  16,  65, 157,
          25,  16,  81, 102,  61,  16, 102,  68,  16,  50, 156,  72,  58,  42,
          56, 102, 112,   4,  16,  53,  54, 156, 102, 123,  54,  51,   3,  16,
          81, 102,  61,  16,  61, 156,  72,  55,  58,  42,  54,  16, 102,  68,
          16,  53,  54, 156, 102, 123,  16,  44,  83,  53, 156, 138,  68,  16,
         102,  62,  16, 102,  68,  16, 157, 138,  56,  55, 156,  72,  61,  53,
          62,   4,   0]])

The first 3 outputs are highly garbled while the last is totally clear.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions