Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fitting the RAFT model 16GB model on 2080Ti? #3

Open
dhruvmetha opened this issue Apr 5, 2022 · 15 comments
Open

Fitting the RAFT model 16GB model on 2080Ti? #3

dhruvmetha opened this issue Apr 5, 2022 · 15 comments

Comments

@dhruvmetha
Copy link

Hey, A question about the optical flow training model, was the RAFT model in a way shrunk down to fit it on the 11GB GPU that you mention in paper?

If so, will the code for training the RAFT optical flow model also be released?

@wenbin-lin
Copy link
Owner

Hi, thanks for your interest!

We only made a few changes to the RAFT open source implementation to adapt the RGB-D input, and no additional modifications were made for the GPU memory.

@dhruvmetha
Copy link
Author

dhruvmetha commented Apr 7, 2022

Thanks for the response!

Will the code for this adaptation for RGB-D input be released?
If not, could you give a high-level overview of how I could go about it, I'm trying to replicate the paper for RGB-D inputs using the raw RAFT code. Is it adding the inverse of the depth channel as the extra channel to the RGB image and the rest remains the same?

This information would be of great help!

@wenbin-lin
Copy link
Owner

We do not have plan to release the code for RGB-D based RAFT training for now, it's actually quite simple to implement.
As you mentioned, we just add the inverse of the depth as an extra channel and keep the rest remains the same.

@dhruvmetha
Copy link
Author

dhruvmetha commented Apr 11, 2022

Thank you! It is mentioned you retrain on 3 datasets, Sintel, FlyingThings3D, and Monkaa. Do you do them in order and successively train for 100k, 100k and 100k iterations? Sorry to be asking so many questions!

@wenbin-lin
Copy link
Owner

We train the model successively in the order of FlyingThings3D -> Monkaa -> Sintel for 100k iterations each.
If there is any confusion about it, please feel free to let me know.

@dhruvmetha
Copy link
Author

dhruvmetha commented Apr 12, 2022

Do y'all freeze the backbone post training on FlyingThings3D or just freeze the batchnorm inside the backbone model as done so in the original RAFT paper? Also do you use the smaller FlyingThings3D dataset (the subset used for dispnet/Flownet2.0) ? Thanks in advance, appreciate the help!

@wenbin-lin
Copy link
Owner

We follow the RAFT implementation and just freeze the batchnorm after training on FlyingThings3D.
And we use the full the FlyingThings3D dataset instead of the smaller subset.

@dhruvmetha
Copy link
Author

Thanks, this has been really helpful @wenbin-lin !

@dhruvmetha
Copy link
Author

Is the equation from disparity to depth depth = (focal_length * baseline) / (image_width * disparity), which is equivalent to (1050 * 1.0) / (960 * disparity) for FlyingThings3D? and depth inverse is just 1 - depth where values in depth range from 0 to 1?

@wenbin-lin
Copy link
Owner

Your equation is right.
The inverse depth is 1 / depth, as there can be large depth values in the background and using the inverse depth stabilizes the depth values.
In addition, we use a min max scaler for the inverse depth: x = (x - x_min) / (x_max - x_min). For convenience, you can just use the disparity as the inverse depth, because the value is the same after the min max scaler.

@dhruvmetha
Copy link
Author

Thank you @wenbin-lin

@dhruvmetha
Copy link
Author

Do y'all have any rough evaluation results for optical training through each phase of training? This would really help me know if I'm training the model correctly!

@wenbin-lin
Copy link
Owner

We are sorry that we lost the training log, but we are retraining the RGB-D based optical flow model. When the training is done, we will share the evaluation results with you.

A rough conclusion is that the evaluation errors of RGB-D based method can be significantly lower than the RGB based method. Perhaps you can compare your results with the results of the original RGB-based RAFT and the error should be much lower.

@phamtrongthang123
Copy link

phamtrongthang123 commented Jul 9, 2022

Is the equation from disparity to depth depth = (focal_length * baseline) / (image_width * disparity), which is equivalent to (1050 * 1.0) / (960 * disparity) for FlyingThings3D? and depth inverse is just 1 - depth where values in depth range from 0 to 1?

Your equation is right. The inverse depth is 1 / depth, as there can be large depth values in the background and using the inverse depth stabilizes the depth values. In addition, we use a min max scaler for the inverse depth: x = (x - x_min) / (x_max - x_min). For convenience, you can just use the disparity as the inverse depth, because the value is the same after the min max scaler.

Wait it should be focallength*baseline / disparity right? Why do we need to multiply image_width by disparity there?
Also, is the scaler applied to each depth? Or the x_min and x_max is the value from the whole dataset?

@Guptajakala
Copy link

@wenbin-lin Hi, is there any update on retraining the rgbd optical flow? I'm working on a research project and eager to try your method out!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants