-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mask calculation #3
Comments
Hi,
|
I got it. Thanks for your reply. By the way, If training on 1080Ti, 2080Ti or V100, why adopt inference comparison on one 1050Ti? |
I only have 1050 Ti in my working machine (The more powerful GPUs are in the servers). I also intended this method for low-computation devices indeed (mobile or laptops). I don't think it makes much sense to use it on very powerful GPUs, since overhead becomes a much more important factor there in order to fully utilize the GPUs. If I ever get my hands on a NVIDIA Jetson, I'd like to check the performance there. |
Thanks for your patient reply. "I don't think it makes much sense to use it on very powerful GPUs, since overhead becomes a much more important factor there in order to fully utilize the GPUs." which means if we infer on 2080Ti or V100, we will not get as high speedup ratio as that on 1050Ti (60% speedup) ? |
I tried it now on a 1080 Ti and with larger batchsize (128) it seems ok. But still, this work is experimental and limited to depthwise convolutions for now (e.g. as in MobileNetV2). In practice, the accuracy-speed ratio of MobileNetV2 on powerful GPUs is barely better than a standard ResNet. Also, this work is not compatible with TensorRT and that would probably give better/more consistent speedup. So this is more a proof of concept than a production-ready work, ideally it'd need to be better integrated into low-level CUDA libraries for better support. (command used: |
Great work!!! your patient reply helps me a lot to understand your paper and novel idea! with a more powerful 1080Ti, the results shows 60% speedup (batch 32) and 96% speedup (batch 128) respectively. so can we say it also makes sense on powerful gpu? one filed I think this novel idea can be used is what you said on "Conclusion and future work" part, high-resolution images might be much faster. |
Insightful work!!!
During the study of your paper, I have some questions (My English is not very good, and I am not aggressive, just some confusion):
However, the paper said, "Note that this formulation has no logarithms or exponentials in the forward pass, typically expensive computations on hardware platforms"
So in the code, why not just use soft >= 0, and no sigmoid operation.
Thanks for your kind help!
The text was updated successfully, but these errors were encountered: