You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Wonderful job!I studied your paper and code these days, which is very enlightening to me.
I have a question about the code to calculate the soft-mask by soft = self.maskconv(x). I'm not quite sure what the reason for choosing this (conv+fc) network to calculate the soft-mask. Thank you for your kind help.
The text was updated successfully, but these errors were encountered:
Hi, thanks for your interest in our work! I suppose you are referring to the classification code.
In classification, we use the same mask unit as the work we compare with (SACT). The mask unit is shown in Figure 6 of their paper ( https://arxiv.org/abs/1612.02297 ). It uses a convolution combined with a global average pooling to capture image context. We refer to this as "squeeze unit" as it resembles a squeeze operation of Squeezenet.
However, on pose estimation we noticed the squeeze unit had a significant performance hit (even though the amount of FLOPS is negligible). Table 2 in our paper ( https://arxiv.org/pdf/1912.03203.pdf ) compares the accuracy and inference speed of a simple 1x1 convolution and the squeeze unit. The 1x1 convolution results in slightly lower accuracy but faster inference.
Therefore:
Classification experiments -> Squeeze unit for accuracy reasons
Pose estimation -> 1x1 convolution for inference speed reasons
Wonderful job!I studied your paper and code these days, which is very enlightening to me.
I have a question about the code to calculate the soft-mask by soft = self.maskconv(x). I'm not quite sure what the reason for choosing this (conv+fc) network to calculate the soft-mask. Thank you for your kind help.
The text was updated successfully, but these errors were encountered: