Spatial and temporal upscaling order; Continue Video less RAM #1234

Dencel-CleverAI · 2025-12-19T18:41:34Z

Processing the spatial before the temporal upscaler has about the same performance, but uses less RAM and has better video quality as the artifacts/ghosting is reduced.

Dencel-CleverAI · 2025-12-20T16:38:59Z

(Done) I added 20 FPS to choose from for the model default frames. For a default model of 16 FPS I saw that 20 FPS makes it 1.25x faster in movement so that it doesn't look like slow-mo anymore, but like realistic movement speed. The performance cost is also not that hard to hit the same video length of 5s (81 frames with 16 FPS and 101 frames with 20 FPS). It takes about 30% longer to generate.
(Failed) Moreover, I tried to add a 3x temporal upsampling so that it will hit 60 FPS, but I failed to do so, as RIFLE supports only binary inputs (2, 4, 8, 16, etc.). If someone could make it possible to hit a 3x, that would be the cherry on top.
(Failed) I also failed to crop the resolution, as a 1280x720 video gets upscaled with 1.5x factor to 1920x1088 instead of 1080p. It seems the save_video method will always force it to a resolution that is dividable by 16.
(Done) I also failed to make it possible that the last video stays untouched, while the generated video is upsampled first and after that will be combined with the last video, if "Continue Last Video" is set. Therefore, it doesn't need to upsample the whole long video, which requires more and more RAM and takes longer and longer, but rather doing it only part by part which is manageable.
(Failed) I haven't figured out a way to make custom checkpoints automatic useable, especially on how to use Q int quantized .gguf format ckpt's instead of .safetensors

Other problems I noticed:
A. The colors don't match with the end frame of the first video and the start frame of the second video. I think it has to do with the VAE encoding/decoding. However, adding an end image, it transitions to the correct color of that image, so why is the start wrong?

B. The motion does not match as the second video gets only the last frame from the first video as information. I tried multiple frames, but then it will only generate weird noisy artifacts at the beginning. So something like motion vectors would be nice.

C. The quality of the video degrades further and further with each new generation. The only "trick" I came across is to use an end image every 10s (every second generation) to avoid this degradation, as it orientates anew on this quality provided image.

deepbeepmeep · 2025-12-23T03:12:42Z

thx nice idea, do you have some sample videos that compare spatial/temporal and temporal/spatial ?

Dencel-CleverAI · 2025-12-23T15:54:28Z

@deepbeepmeep Alright, will do.

I have seen in the code there is a way to edit post-processing a video (wgp.py line 4576). So can I generate the video without upsampling first and then apply the upsamlping later? I don't see how I can do that in the UI.

Dencel-CleverAI · 2025-12-23T16:43:17Z

I can't upload them here, as they are a bit too big.
https://drive.google.com/file/d/1o6Q9bfS9HRCi5zYlF8vv5MEsNKvwgNmm

I have generated a few now always from scratch with the specific spatial vs temporal order. The morphing and noise from the model itself (before upsampling) makes it hard to see, but have a look at the dudes face in the I2V sample.

At the end, even if you don't see a difference here due to the noise, the RAM usage is a bit lower for some cases.

Dencel-CleverAI · 2025-12-25T19:12:09Z

Belongs to point 4:
I was able to implement that the upsampling is only applied to the new generation and then it will be merged with the last video. However, the RAM usage goes still through the roof the longer the video gets and the quality degrades with each further generation too. So still not usable to generate long upsampled videos. Have to find out what eats up the RAM.

Dencel-CleverAI · 2025-12-26T02:46:57Z

I exposed the Fit Canvas value, as it was set fix to 0, which means always Resolution Budget (Pixels will be reallocated to preserve Inputs W/H ratio). Beside that, now one can choose either "Outer Box Resolution (one dimension may be less to preserve video W/H ratio)" or "Output Resolution (Input Images wil be Cropped if the W/H ratio is different)".

Belongs to point 4:
I have found out that spatial upsampling RAM usage is low, temporal is higher. Processing a bigger source video is even higher, but saving all that with save_video takes the most RAM. I think we should save the generated video on disk first, then merge with last video via ffmpeg to avoid creating a huge tensor in RAM.

Dencel-CleverAI · 2025-12-27T17:20:09Z

Belongs to point 4:
I was finally able to fix the RAM problem with continuing videos. Instead of torch.cat, which creates a big tensor the longer the videos get and eats a lot of RAM, I used ffmpeg on the saved video sample (5s) and merged it with the other video (10s or longer). I kept the 5s generation as a preview alongside the combined one, but only the combined one is shown in the UI so that "Continue Last Video" works flawlessly.

No more RAM spikes and much longer upsampled videos are now possible!

Dencel-CleverAI · 2025-12-30T14:18:08Z

@deepbeepmeep You're welcome to take this into the main branch now. I haven't found any bugs anymore and the remaining problems are either to complex that I have to change too much deep down in the code or it will be addressed by future AI models anyway.

These are the changes:

I. Spatial upsampling will be done first before temporal upsampling -> less RAM usage and less ghosting artifacts

II. Added 20 FPS as model default to choose from dropdown -> Speeds up the movement by 1.25x and makes it feel realistic instead of slow-mo

III. Exposed Fit_Canvas in the UI and saved as model specific setting -> User can choose if the video should be scaled or cropped to match input image/video

IV. Each little process of the video generation is exposed in more detail in the progress bar -> User knows now better what takes so long or a lot of resources during the generation

V. If continue (last) video is active, first it will check if the result of the generation would have the correct resolution and FPS before starting the actual generation process; otherwise it stops and returns an user friendly error

VI. If continue (last) video is active, it will only upsample the new generated video and not the last video anymore -> Avoids big RAM spikes by upsampling only short and not long videos anymore

VII. If continue (last) video is active, it will save only the new generated video and then combine it with the last video via ffmpeg -> ffmpeg only uses ~2 GB of RAM and is very fast, which enables continue video generation of every length; whereas a bigger and bigger tensor has eaten up more and more RAM, has taken longer and longer to process and it was not possible for me to come over 15s of video length (3x 5s videos)

With all these changes, I am now able to create 1920x1088 videos with 40 FPS (60 requires 3x RIFEx temporal interpolation) and easily surpassed 2 min length without quality loss by feeding an end image every second generation (after every 10s). Unfortunately, it doesn't fix the color (VAE encoding/decoding) and motion (only mitigated by removing the last and first frame) discrepancy between each video generation though, but that is for future AI models a problem to solve.

deepbeepmeep · 2026-01-26T13:07:29Z

I think with the latest RAM optimizations this may no longer be needed. have you add the chance to compare ?

Dencel-CleverAI · 2026-01-27T20:11:19Z

I think with the latest RAM optimizations this may no longer be needed. have you add the chance to compare ?

Here it is, if you mean this comparison:

I can't upload them here, as they are a bit too big. https://drive.google.com/file/d/1o6Q9bfS9HRCi5zYlF8vv5MEsNKvwgNmm

I have generated a few now always from scratch with the specific spatial vs temporal order. The morphing and noise from the model itself (before upsampling) makes it hard to see, but have a look at the dudes face in the I2V sample.

At the end, even if you don't see a difference here due to the noise, the RAM usage is a bit lower for some cases.

As long as you can continue videos over and over again, without exceeding the RAM, with your latest update, then you're good to go.

Dencel-CleverAI added 2 commits December 19, 2025 19:38

Changed order of spatial and temporal upscaling

405145f

Added 20 FPS for Model Default

a295706

Merge video after upsampling

801587f

Exposed Fit Canvas, refined merging upsampled video

2272b41

Fit Canvas is now interactive and its start value is used correctly

aacaa8c

Dencel-CleverAI mentioned this pull request Dec 27, 2025

RAM eater: Continue video + upsampling #1230

Closed

Saved Fit_Canvas model specific, merged videos via ffmpeg

bd69339

Dencel-CleverAI changed the title ~~Changed order of spatial and temporal upscaling~~ Spatial and temporal upscaling order; Continue Video less RAM Dec 27, 2025

ffmpeg is now silent and preview gets deleted

83d28b4

Merge branch 'main' into main

42c40fe

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spatial and temporal upscaling order; Continue Video less RAM #1234

Spatial and temporal upscaling order; Continue Video less RAM #1234

Uh oh!

Dencel-CleverAI commented Dec 19, 2025

Uh oh!

Dencel-CleverAI commented Dec 20, 2025 •

edited

Loading

Uh oh!

deepbeepmeep commented Dec 23, 2025

Uh oh!

Dencel-CleverAI commented Dec 23, 2025 •

edited

Loading

Uh oh!

Dencel-CleverAI commented Dec 23, 2025

Uh oh!

Dencel-CleverAI commented Dec 25, 2025 •

edited

Loading

Uh oh!

Dencel-CleverAI commented Dec 26, 2025 •

edited

Loading

Uh oh!

Dencel-CleverAI commented Dec 27, 2025 •

edited

Loading

Uh oh!

Dencel-CleverAI commented Dec 30, 2025 •

edited

Loading

Uh oh!

deepbeepmeep commented Jan 26, 2026

Uh oh!

Dencel-CleverAI commented Jan 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Spatial and temporal upscaling order; Continue Video less RAM #1234

Are you sure you want to change the base?

Spatial and temporal upscaling order; Continue Video less RAM #1234

Uh oh!

Conversation

Dencel-CleverAI commented Dec 19, 2025

Uh oh!

Dencel-CleverAI commented Dec 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

deepbeepmeep commented Dec 23, 2025

Uh oh!

Dencel-CleverAI commented Dec 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Dencel-CleverAI commented Dec 23, 2025

Uh oh!

Dencel-CleverAI commented Dec 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Dencel-CleverAI commented Dec 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Dencel-CleverAI commented Dec 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Dencel-CleverAI commented Dec 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

deepbeepmeep commented Jan 26, 2026

Uh oh!

Dencel-CleverAI commented Jan 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Dencel-CleverAI commented Dec 20, 2025 •

edited

Loading

Dencel-CleverAI commented Dec 23, 2025 •

edited

Loading

Dencel-CleverAI commented Dec 25, 2025 •

edited

Loading

Dencel-CleverAI commented Dec 26, 2025 •

edited

Loading

Dencel-CleverAI commented Dec 27, 2025 •

edited

Loading

Dencel-CleverAI commented Dec 30, 2025 •

edited

Loading