-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathresults.txt
255 lines (157 loc) · 7.07 KB
/
results.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
cd /dev/shm/
mkdir imagenet
cd imagenet
wget -b https://image-net.org/data/ILSVRC/2012/ILSVRC2012_img_train.tar
wget -b https://image-net.org/data/ILSVRC/2012/ILSVRC2012_img_val.tar
fct14_s12_64_7478_FFF
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 ./distributed_train.sh 8 /dev/shm/imagenet/ --model fct14_s12_64_7478_FFF -b 128 --lr 1e-3 --drop-path 0.1 --apex-amp
shuffle operation, 1 token stride 73.87799993408203
Shift speed: 6000imgs/s
conv3x3 6000imgs/s
conv5x5 5500imgs/s
[B,C,W,H] [B,WH,C] conv1d(WH, WH,3)
[B,C,W,H] [B,W,C,H] conv2d (3,3) [B,H,C,W]
Overlapped Patch embedding
model10_static_shiftformer_s12-224 77.88199997802734 (epoch 288)
model10_s12_8844 78.59800013427734 (epoch 287)
model10_s24_8844 81.2499999731445
fct_s12_64_7478_TFT 79.778 3993.19/s
fct_s12_64_7118_TTT 80.168 3420.69/s
fct_s12_64_7478_TTF 79.730 3718.43/s
fct_s12_64_7478_TTT 79.724 3344.70/s
fct_s12_64_7478_FFF 79.594 4483.86/s ==fct_s12_64_7118_FFF
fct_s24_64_7118_TTT_8844 82.429
fct_s12_64_7118_TTT 80.168 3420.69/s
fct_s12_64_7118_TTT_normpatch 79.9760
fct_s12_64_7118_TTT_2DWshift 80.01799994
fct_s12_64_7118_TTT_CDWshift 79.97400008056641
fct_s12_64_7118_TTT_SDWshift 80.09399997802734 3255.53/s
fct_s12_64_7118_TTT_ECA 80.1620000756 3505.41/s
fct_s12_64_7118_TTT_lk_token 80.2600
fct_s12_64_7118_TTT_rm_cdwconv 80.069
fct_s12_64_7118_TTT_ChannelConv 80.23999
80.21000015625
33,63,133,193
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 ./distributed_train.sh 8 /dev/shm/imagenet/ --model fcvt_v6_64_B12 -b 128 --lr 1e-3 --drop-path 0.1 --apex-amp
speed baseline
fct_s12_64_7478_FFF 79.594 4483.86/s ==fct_s12_64_7118_FFF
remove DW in MLP 5193.52/s
CUDA_VISIBLE_DEVICES=1 python throughput.py /dev/shm/imagenet --model fct_s12_64_7478_FFF -b 128 --checkpoint /path/to/checkpoint
baseline: fct_s12_64_7478_FFF 4453.00/s
remove 5x5dilated in token-mixer 4998.38/s
change 5x5dilated in T-M to 5x5 2269.65/s
change 3x3DW in C-M to identy 2616.04/s
Confirmation code: AUTFSZ
demo1-6 Q: 23 83 98 143 Head: 0 3 6 9
python3 single.py --image images/demo1.JPEG --model deit_base_patch16_224
List detailed information for a job (useful for troubleshooting):
scontrol show jobid -dd 25435137
salloc -N 1 -p ai-jumpstart --gres=gpu:8 --cpus-per-task=64 --mem=512Gb
squeue --format="%.18i %.9P %.50j %.8u %.2t %.10M %.6D %R" -p ai-jumpstart
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 ./distributed_train.sh 8 /dev/shm/imagenet/ --model fcvt_v4_s12_64_TTTT_gc_weighted -b 128 --lr 1e-3 --drop-path 0.1 --apex-amp
----------fcvt------------
useSecondTokenMix useDWconv useSpatialAtt useChannelAtt
FFFF 5106.87/s 14.0M 2.3G 80.21000015625 (epoch 296) No GC 79.819
TFFF 4216.39/s 16.4M 2.5G 81.06 (epoch 287)
FTFF 4554.38/s 14.2M 2.3G 80.41200005615234 (epoch 297)
FFTF 4478.38/s 15.2M 2.5G
FFFT 4837.12/s 14.0M 2.3G
TFTT 3407.22/s 17.6M 2.7G 81.2959999243164 (epoch 285)
------------
FFFF change the mixer-1 to dilated, almost no speed influence.
FFFF change the mixer-1 from 5 to 9, speed down, from 5106.87/s to 4509.56/s
FFFF mixer-1 size 9, change to optimized DWConv, from 4509.56/s to 5083.88/s
18 014 398 442 373 116
fcvt_v3_s12_64_debug。 mlp_ratios = [6, 6, 4, 4] layers = [2, 2, 6, 2] 14.7M 2.4G
fcvt_v3_s12_64_debug。
salloc --account=rrg-bengioy-ad --gres=gpu:v100l:4 --cpus-per-task=16 --mem=128Gb
fcvt-V4:
useSecondTokenMix False True
use_globalcontext False False
weighted_gc False False
useDWconv False False
useSpatialAtt False False
useChannelAtt False False
12.7/1.9 14.0/2.1
params= {
"global_context":{
"weighted_gc": True,
"gc_reduction": 8,
"head": 8,
},
"spatial_mixer":{
"use_globalcontext":True,
"useSecondTokenMix": True,
"mix_size_1": 5,
"mix_size_2": 7,
"fc_factor": 8,
"fc_min_value": 16,
"useSpatialAtt": True
},
"channel_mixer":{
"useChannelAtt": True,
"useDWconv":True,
"DWconv_size":3
},
"spatial_att":{
"kernel_size": 3,
"dim_reduction":8
},
"channel_att":{
"size_1": 3,
"size_2": 5,
}
}
vqvae 离散化约束/泛化提高, GC引用别的领域的 information bottleneck
DISCRETE REPRESENTATIONS STRENGTHEN VISION TRANSFORMER ROBUSTNESS
Discrete-Valued Neural Communication
Neural message passing for quantum chemistry
MASTERING ATARI WITH DISCRETE WORLD MODELS
Concern:
GC 的related work 是否不足
三个大表 : Image Cls, MSCoCo Det+InstSeg Mask RCNN, (ADE20K Semantic Seg)
Ablation:
1. use globalcontext vs. no gc
2. focus on the spatial_mixer,
2. component ablation
可视化weighted gc, loss landscape, global context vs spatial attention (做的是不是同一件事),
fcvt_v5_32_TTFF_W_11_11 73.5340000
fcvt_v5_32_TTFF_W_9_9 73.05400 73.2400
fcvt_v5_32_TTFF_W_7_7 73.42000
fcvt_v5_32_TTFF_W_5_5 73.225999
fcvt_v5_32_TTFF_W_3_3 73.24999998
fcvt_v5_32_TTFF_W_7_9 73.3780000122
Kernel Size in tokenmixer:
13 11 9 [79] 7 5 3
73.210 73.534 73.054 73.378 73.420 73.2259 73.2499
Effectiveness of GC:
Without GC: 25686391_fcvt_v5_32_TFFF_11_11 72.776
With Direct GC: 25663076_fcvt_v5_32_TTFF_NotW_11_11 73.0199
With dynamic GC:fcvt_v5_32_TTFF_W_11_11 73.5340000
With compte GC: 25827667fcvt_v5_32_TTFF_W_11_11_H8_compete_ema.sh 73.724000
Similarity groups (Wrong mapping, doesnt matter):
H4 25724743 fcvt_v5_32_TTFF_W_11_11_H16_ema.sh
H1 25767404 fcvt_v5_32_TTFF_W_11_11_H1updated_ema.sh 73.2240
H4 25724742 fcvt_v5_32_TTFF_W_11_11_H4_ema.sh 73.414
H8 73.534
H16 25724740 fcvt_v5_32_TTFF_W_11_11_H1_ema.sh 73.33
H32 25724745_fcvt_v5_32_TTFF_W_11_11_H32.log 73.29400
component analysis:
GC dw-conv. spatialatt channatt
FFFF 72.776 25686391_fcvt_v5_32_TFFF_11_11
TFFF 73.724 25827667fcvt_v5_32_TTFF_W_11_11_H8_compete_ema
TTFF 74.878 25830146_fcvt_v5_32_TTFF_W_11_11_H8_compete_TFF.log
TTTF
TTFT 75.030
TFTT
TTTT
H32
25830157 ai-jumpst fcvt_v5_32_TTFF_W_11_11_H8_compete_TTF_ema.sh ma.xu1 PD 0:00 1 (Resources)
25830158 ai-jumpst fcvt_v5_32_TTFF_W_11_11_H8_compete_FTT_ema.sh ma.xu1 PD 0:00 1 (
25830159 ai-jumpst fcvt_v5_32_TTFF_W_11_11_H8_compete_TTT_ema.sh ma.xu1 PD 0:00 1
25827667 ai-jumpst fcvt_v5_32_TTFF_W_11_11_H8_compete_ema.sh ma.xu1 R 23:34:22 1 d3148
25830146 ai-jumpst fcvt_v5_32_TTFF_W_11_11_H8_compete_TFF_ema.sh ma.xu1 R 21:55:23 1 d3149
25830135 ai-jumpst fcvt_v5_32_TTFF_W_11_11_H8_compete_TFT_ema.sh ma.xu1 R 21:55:49 1 d3146
25827655 ai-jumpst bash ma.xu1 R 23:58:44 1 d3150
python3 validate.py /dev/shm/imagenet --model deit_base_patch16_224 -b 128
salloc -N 1 -p ai-jumpstart --gres=gpu:1 --cpus-per-task=64 --mem=256Gb --nodelist=d3150