We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
在获取数据的代码中 https://github.com/anliyuan/Ultralight-Digital-Human/blob/762e3b6de9e82b6927ce7cf414dcef67dd533ff3/syncnet.py#L84C5-L95C31 每次都把y设成了1, 没有用到ex的img, 不是相当于永远用到了同步的数据? 这样模型只需要无脑输出两个相同的向量, 后续计算loss就极小. 训练的时候BCELoss很快就下降到0.000xxx了 应该不太对吧
The text was updated successfully, but these errors were encountered:
他这个训练方法不对的 你可以参考wav2lip的口型判别器方法!
Sorry, something went wrong.
这个训练syncnet图像特征就输入一帧也不合理,16帧长的音频特征对应1帧图像
没有用到ex的img,会不会是随机到的音频特征未必是负样本,有可能嘴型和正样本也是相似的,这样反而效果更差,所以作者没用
不用 syncnet 和用了差别不大,即使改成 wav2lip 的方法也差别不大。
No branches or pull requests
在获取数据的代码中 https://github.com/anliyuan/Ultralight-Digital-Human/blob/762e3b6de9e82b6927ce7cf414dcef67dd533ff3/syncnet.py#L84C5-L95C31
每次都把y设成了1, 没有用到ex的img, 不是相当于永远用到了同步的数据? 这样模型只需要无脑输出两个相同的向量, 后续计算loss就极小.
训练的时候BCELoss很快就下降到0.000xxx了
应该不太对吧
The text was updated successfully, but these errors were encountered: