how to get the video level "weak" label #3

xiaoyiming · 2018-11-13T04:33:28Z

Dear Mr. Gao
Thank you so much for the great work. However, I met some problems when I implemented this code.
As described in you article, "For the visual frames, we use an ImageNet pre-trained ResNet-152 network [34] to make object category predictions, and we max-pool over predictions of all frames to obtain a video-level prediction. The top labels (with class probability larger than a threshold = 0.3) are used as weak \labels" for the unlabeled video."
However, when I use the pre-trained-152 network, I can get the only one category prediction lager than the threshold. How can I get multi-labels through the pre-trained-152 network.
Should I train a object detection network or a multi-classes multi-labels network or some other solutions. Thank you for your assistance
Best regards!

rhgao · 2018-11-13T13:12:10Z

Hi,

We didn't use all 1000 imagenet classes, but ~20 selected audio-related classes. Then we normalize the class probabilities for these classes, so you could get multiple labels with class probability larger than the threshold. Also, 0.3 is just empirical.

Thanks for your interest!

xiaoyiming · 2018-11-13T13:28:06Z

@rhgao
Thanks for your reply! I will try it

xiaoyiming · 2018-12-01T12:14:36Z

Dear Mr. Gao
Thank you so much for the great work. However, I met some problems when I implemented this code.
As described in you paper, "we collect a maximum of 3,000 basis vectors for each object category." " In other words, we concatenate the basis vectors learnt for each detected object to construct the basis dictionary W(q). Next, in the NMF algorithm, we hold W(q) fixed, and only estimate activation H(q) with multiplicative update rules.
However, what's the shape of the selected W(q)(j) ? It is also MXK (K=25)? And how do you selected K basis vectors from the 3000 stored basis vectors

rhgao · 2018-12-03T15:51:52Z

Hi, We use all the collected basis vectors to initialize W, namely M x K with M = 3000, K=25. 3,000 is just a hyperparameter, and a larger number of basis vectors could potentially lead to better results.

xiaoyiming · 2018-12-08T09:17:46Z

Thanks, cloud you please give me your train loss/mAp ,and val loss/mAp. my train loss is about 0.0001, train Map is about 0.72. My val loss is about 0.1 and val mAp is 0.65 after 300 iter, batchSize and Valsize is the same of you. Is that normal?

rhgao closed this as completed Nov 13, 2018

rhgao reopened this Nov 13, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

how to get the video level "weak" label #3

how to get the video level "weak" label #3

xiaoyiming commented Nov 13, 2018

rhgao commented Nov 13, 2018 •

edited

Loading

xiaoyiming commented Nov 13, 2018

xiaoyiming commented Dec 1, 2018

rhgao commented Dec 3, 2018

xiaoyiming commented Dec 8, 2018

how to get the video level "weak" label #3

how to get the video level "weak" label #3

Comments

xiaoyiming commented Nov 13, 2018

rhgao commented Nov 13, 2018 • edited Loading

xiaoyiming commented Nov 13, 2018

xiaoyiming commented Dec 1, 2018

rhgao commented Dec 3, 2018

xiaoyiming commented Dec 8, 2018

rhgao commented Nov 13, 2018 •

edited

Loading