Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why is the logic for long video generation not entirely consistent with what is described in the paper? #76

Open
arceus-jia opened this issue Nov 20, 2024 · 1 comment

Comments

@arceus-jia
Copy link

Hello, thanks for such great algorithms and code!

I noticed in your paper that you mentioned two control methods:
"UniAnimate supports human video animation using only a reference image and a target pose sequence, as well as the input of a first frame."

For long videos, you mentioned:
"For subsequent segments, we use the reference image along with the first frame of the previous segment to initiate the next generation."

Does this mean that during inference for subsequent windows, I should use the last frame of the previous result as the reference image, or should I just pass it in as local_image?

Additionally, I noticed in your code that long video processing still uses the overlap approach. What is the reasoning behind this choice?

Thank you

@Falkonar
Copy link

Do you have any tips for this? I have same question about long videos.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants