You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi there, first of all nice work!
Secondly, I wanted to use this model in a slightly different way and from the paper it seemed to me that it is possible to use one or more reference images during inference which are then used by the diffusion model for conditioning.
However going through the code it seems to me that always only one image is used for conditioning as we have condition_index = [0] in run diffusion.
I understand this would always be the case for the task of generating a video from a single image but already for the nvs_sparse_view this means that only one image among the available ones is being used for conditioning.
Thanks for your help in advance!
The text was updated successfully, but these errors were encountered:
Sisso16
changed the title
Reference images to use for conditioning
Reference images used for conditioning
Oct 25, 2024
Thank you! Here, condition_index = [0] means input one of the reference images into the CLIP Image encoder (depicted in the pipeline figure). The CLIP Image encoder extracts high-level semantic information from the input image. We have tested both using all of the reference images and using only one reference image as input to the CLIP Image encoder, and we found no difference in model performance. Therefore, we only used one input in this case.
Hi there, first of all nice work!
Secondly, I wanted to use this model in a slightly different way and from the paper it seemed to me that it is possible to use one or more reference images during inference which are then used by the diffusion model for conditioning.
However going through the code it seems to me that always only one image is used for conditioning as we have
condition_index = [0]
inrun diffusion
.I understand this would always be the case for the task of generating a video from a single image but already for the nvs_sparse_view this means that only one image among the available ones is being used for conditioning.
Thanks for your help in advance!
The text was updated successfully, but these errors were encountered: