feat: support most of features in ccvfi#4
Conversation
Summary of ChangesHello @Tohrusky, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces comprehensive support for Video Frame Interpolation (VFI) into the system. It integrates two prominent VFI models, RIFE and DRBA, providing their respective architectures, configurations, and inference mechanisms. A key aspect of this feature is the seamless integration with VapourSynth, allowing for efficient video processing and frame generation. The changes also include necessary refactorings of core model and configuration classes, along with new utility functions for VFI-specific operations like scene change detection. Additionally, a Docker Compose setup has been added to streamline the development workflow. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces significant new functionality for Video Frame Interpolation (VFI), including new model architectures, configurations, and VapourSynth integration. The changes are extensive and well-structured. I've identified several critical issues related to tensor manipulation and global PyTorch settings that require attention. Additionally, I've noted some medium to high severity issues concerning potential bugs, code clarity, and best practices. Addressing these points will improve the robustness and maintainability of the new features.
|
|
||
| disable_drm = False | ||
| if (_left_scene and not _right_scene) or (not _left_scene and _right_scene): | ||
| drm01r, drm21r = (ones_mask.clone() * 0.5 for _ in range(2)) |
There was a problem hiding this comment.
This line assigns a generator object to drm01r and drm21r instead of the tensor values. This will cause an error in subsequent operations. You should use a tuple or list to unpack the values correctly.
| drm01r, drm21r = (ones_mask.clone() * 0.5 for _ in range(2)) | |
| drm01r, drm21r = ones_mask.clone() * 0.5, ones_mask.clone() * 0.5 |
| grid_cache = {} | ||
| batch_cache = {} | ||
| torch.set_float32_matmul_precision("medium") | ||
| torch.set_grad_enabled(False) |
There was a problem hiding this comment.
Setting torch.set_grad_enabled(False) at the module level is dangerous as it disables gradient computation globally for your entire application. This can have unintended side effects. The functions in this file are already correctly decorated with @torch.inference_mode(), which handles this locally. This global setting is unnecessary and should be removed.
| I0 = resize(I0, scale).unsqueeze(0) | ||
| I1 = resize(I1, scale).unsqueeze(0) | ||
| I2 = resize(I2, scale).unsqueeze(0) | ||
|
|
||
| inp = torch.cat([I0, I1, I2], dim=1) |
There was a problem hiding this comment.
The input tensor inp is being constructed with an incorrect shape. The unsqueeze(0) and torch.cat combination results in a shape of (1, 3*B, C, H', W'), but the DRBA model expects (B, 3, C, H', W'). You should use torch.stack to create the correct input shape.
| I0 = resize(I0, scale).unsqueeze(0) | |
| I1 = resize(I1, scale).unsqueeze(0) | |
| I2 = resize(I2, scale).unsqueeze(0) | |
| inp = torch.cat([I0, I1, I2], dim=1) | |
| I0 = resize(I0, scale) | |
| I1 = resize(I1, scale) | |
| I2 = resize(I2, scale) | |
| inp = torch.stack([I0, I1, I2], dim=1) |
|
|
||
| results, reuse = self.model(inp, minus_t, zero_t, plus_t, left_scene_change, right_scene_change, scale, reuse) | ||
|
|
||
| results = torch.cat(tuple(de_resize(result, h, w).unsqueeze(0) for result in results), dim=1) |
There was a problem hiding this comment.
The output tensor results is being constructed with an incorrect shape. The current implementation with unsqueeze(0) and torch.cat results in a shape of (1, N*B, C, h, w). The expected output shape is (B, N, C, h, w). Using torch.stack will produce the correct shape.
| results = torch.cat(tuple(de_resize(result, h, w).unsqueeze(0) for result in results), dim=1) | |
| results = torch.stack([de_resize(r, h, w) for r in results], dim=1) |
|
|
||
| if _right_scene: | ||
| for _ in plus_t: | ||
| zero_t = np.append(zero_t, 0) |
There was a problem hiding this comment.
| torch.cat((img0[:, :3], img1[:, :3], f0, f1, timestep), 1), None, scale=scale_list[i] | ||
| ) | ||
| if ensemble: | ||
| print("warning: ensemble is not supported since RIFEv4.21") |
There was a problem hiding this comment.
| _drm01r[holes01r] = _drm01r[holes01r] | ||
| _drm21r[holes21r] = _drm21r[holes21r] |
| elif True: | ||
| print(strVariable, type(objValue)) |
| :return: 1 output frames (img0_1) | ||
| """ | ||
| if len(img_list) != 2: | ||
| raise ValueError("IFNet img_list must contain 2 images") |
There was a problem hiding this comment.
cccv/vs/vfi.py
Outdated
| return vfi_methods[num_frame](inference, clip, mapper, scale, scdet, scdet_threshold, device) | ||
|
|
||
|
|
||
| def inference_vsr_two_frame_in( |
There was a problem hiding this comment.
The function inference_vsr_two_frame_in (and inference_vsr_three_frame_in on line 160) is located in vfi.py and handles Video Frame Interpolation. The vsr in the name is misleading. Consider renaming them to inference_vfi_two_frame_in and inference_vfi_three_frame_in respectively to better reflect their purpose.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #4 +/- ##
==========================================
+ Coverage 92.81% 93.05% +0.24%
==========================================
Files 35 40 +5
Lines 918 1166 +248
==========================================
+ Hits 852 1085 +233
- Misses 66 81 +15 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Pull Request Overview
This PR adds comprehensive support for Video Frame Interpolation (VFI) functionality to the CCCV library, expanding beyond its existing Super Resolution (SR) and Video Super Resolution (VSR) capabilities.
- Implements VFI base models and specific implementations for RIFE and DRBA architectures
- Adds VapourSynth integration for video frame interpolation processing
- Introduces utility functions for scene detection, temporal mapping, and image processing operations
Reviewed Changes
Copilot reviewed 44 out of 57 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| vs-docker-compose.yml | Adds Docker Compose configuration for development environment |
| cccv/model/vfi_base_model.py | Implements base VFI model class with video inference capabilities |
| cccv/model/vfi/rife_model.py | RIFE model implementation for two-frame interpolation |
| cccv/model/vfi/drba_model.py | DRBA model implementation for three-frame interpolation |
| cccv/vs/vfi.py | VapourSynth integration for VFI processing |
| cccv/util/misc.py | Utility functions for interpolation, scene detection, and image operations |
| tests/vfi/ | Test files for VFI functionality |
| tests/util.py | Enhanced test utilities with VFI support |
Comments suppressed due to low confidence (1)
cccv/vs/vfi.py:1
- The error message refers to 'length of the input frames' but the parameter is named 'num_frame'. The message should be: '[CCCV] The number of input frames should be odd'
import math
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
| for k in [ | ||
| EVAL_IMG_PATH_DRBA_0, | ||
| EVAL_IMG_PATH_DRBA_1, | ||
| EVAL_IMG_PATH_DRBA_2, | ||
| EVAL_IMG_PATH_DRBA_3, | ||
| EVAL_IMG_PATH_DRBA_4, | ||
| ] |
There was a problem hiding this comment.
[nitpick] The hardcoded list of DRBA evaluation image paths creates maintenance overhead. Consider using a range-based approach or storing the paths in a list to make it easier to modify the number of evaluation images.
|
|
||
|
|
||
| def test_resize() -> None: | ||
| img = torch.randn(1, 3, 64, 64) # 创建一个随机的 4D 张量 |
There was a problem hiding this comment.
Chinese comment should be translated to English for consistency with the rest of the codebase: '# Create a random 4D tensor'
| if src_fps > tar_fps: | ||
| raise ValueError("[CCCV] The target fps should be greater than the clip fps") | ||
|
|
||
| if scale < 0 or not math.log2(scale).is_integer(): |
There was a problem hiding this comment.
The condition scale < 0 should be scale <= 0 to prevent division by zero and invalid scale values. A scale of 0 would cause issues in subsequent calculations.
| if scale < 0 or not math.log2(scale).is_integer(): | |
| if scale <= 0 or not math.log2(scale).is_integer(): |
cccv/util/misc.py
Outdated
| # Flow distance calculator | ||
|
|
||
|
|
There was a problem hiding this comment.
[nitpick] The comment '# Flow distance calculator' is disconnected from the function it describes due to extra blank lines. Move the comment directly above the function definition for better code readability.
| # Flow distance calculator | |
| # Flow distance calculator |
| for _ in plus_t: | ||
| zero_t = np.append(zero_t, 0) |
There was a problem hiding this comment.
Using np.append() in a loop is inefficient as it creates a new array each time. Consider extending the list with [0] * len(plus_t) or converting to list, extending, then back to numpy array if needed.
| for _ in plus_t: | |
| zero_t = np.append(zero_t, 0) | |
| zero_t = np.concatenate([zero_t, np.zeros(len(plus_t), dtype=zero_t.dtype)]) |
No description provided.