-
Notifications
You must be signed in to change notification settings - Fork 155
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add video support for Qwen2-VL #187
base: main
Are you sure you want to change the base?
Conversation
4bb020c
to
9e894fd
Compare
// Collect the frames | ||
var ciImages: [CIImage] = [] | ||
for sampledTime in sampledTimes { | ||
guard let generatedImage = try? await generator.image(at: sampledTime) else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the for: [CMTime]
might be more appropriate here -- I think the at:
may have to redo work each time.
guard let generatedImage = try? await generator.image(at: sampledTime) else { | ||
continue | ||
} | ||
guard let colorSpace = CGColorSpace(name: CGColorSpace.sRGB), let convertedImage = generatedImage.image.copy(colorSpace: colorSpace) else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if it would be more efficient to let CI do this? Doing the colorspace conversion in CG space will do it on the CPU (likely) while doing it in CI is already doing colorspace conversions, e.g. into the working space.
I think these lines could just be omitted and maybe make the CIImage directly from the provided image.
Another option (if the performance isn't acceptable) is to use an AVAssetReader/AVAssetReaderOutput and consume CMSampleBuffers. It is possible that the CGImageRef here refers to an IOSurface but the CMSampleBuffer will almost certainly be the IOSurface (direct output of the decoder). This API is much simpler though :-)
print("thw: \(gridThw)") | ||
print("pixel vaues, size and actual:") | ||
print(pixelValues.size) | ||
print(pixelValues) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove debug code or switch to logs
This looks great! Check out my comments and see what you think. The CI checks failed on swift-format -- please run the pre-commit hooks. Thank you! |
Overview
This PR adds video processing capabilities to Qwen2-VL by:
Performance
Results
The whole inference procedure, including frame extraction and processing: