Why is the logic for long video generation not entirely consistent with what is described in the paper? #76

arceus-jia · 2024-11-20T01:22:51Z

Hello, thanks for such great algorithms and code!

I noticed in your paper that you mentioned two control methods:
"UniAnimate supports human video animation using only a reference image and a target pose sequence, as well as the input of a first frame."

For long videos, you mentioned:
"For subsequent segments, we use the reference image along with the first frame of the previous segment to initiate the next generation."

Does this mean that during inference for subsequent windows, I should use the last frame of the previous result as the reference image, or should I just pass it in as local_image?

Additionally, I noticed in your code that long video processing still uses the overlap approach. What is the reasoning behind this choice?

Thank you

Falkonar · 2024-12-26T21:17:31Z

Do you have any tips for this? I have same question about long videos.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why is the logic for long video generation not entirely consistent with what is described in the paper? #76

Why is the logic for long video generation not entirely consistent with what is described in the paper? #76

arceus-jia commented Nov 20, 2024

Falkonar commented Dec 26, 2024

Why is the logic for long video generation not entirely consistent with what is described in the paper? #76

Why is the logic for long video generation not entirely consistent with what is described in the paper? #76

Comments

arceus-jia commented Nov 20, 2024

Falkonar commented Dec 26, 2024