Attempt to get non standard models to work with sinq#11
Conversation
Skipping quantization for layer with incompatible shape (block must divide W). Keeping in high precision.
|
okay atm i think it doesnt actually quantize other layers, so ill try to add support for that too (; |
|
Okay ive done some tests, it seems the code works fine and also quantizes layers that are not just from the llm, here is a debug print for qwen3vl 2b, as you can see every linear layer is saved, if some have to be kept in high precision, it can be arranged without any issue with the skip_tensors=['tensorname1', 'tensorname2"] in the BaseQuantizeConfig === [SINQ QUANTIZATION DEBUG - SAVED] ===
Total layers in model: 171
Layers processed for saving: 471
Completely skipped layers: 0
Quantized layers (with W_q): 300
Non-quantized layers (including high-precision): 171
✓ QUANTIZED LAYERS (300):
1. model.visual.blocks.0.attn.qkv
2. model.visual.blocks.0.attn.proj
3. model.visual.blocks.0.mlp.linear_fc1
4. model.visual.blocks.0.mlp.linear_fc2
5. model.visual.blocks.1.attn.qkv
6. model.visual.blocks.1.attn.proj
7. model.visual.blocks.1.mlp.linear_fc1
8. model.visual.blocks.1.mlp.linear_fc2
9. model.visual.blocks.2.attn.qkv
10. model.visual.blocks.2.attn.proj
11. model.visual.blocks.2.mlp.linear_fc1
12. model.visual.blocks.2.mlp.linear_fc2
13. model.visual.blocks.3.attn.qkv
14. model.visual.blocks.3.attn.proj
15. model.visual.blocks.3.mlp.linear_fc1
16. model.visual.blocks.3.mlp.linear_fc2
17. model.visual.blocks.4.attn.qkv
18. model.visual.blocks.4.attn.proj
19. model.visual.blocks.4.mlp.linear_fc1
20. model.visual.blocks.4.mlp.linear_fc2
21. model.visual.blocks.5.attn.qkv
22. model.visual.blocks.5.attn.proj
23. model.visual.blocks.5.mlp.linear_fc1
24. model.visual.blocks.5.mlp.linear_fc2
25. model.visual.blocks.6.attn.qkv
26. model.visual.blocks.6.attn.proj
27. model.visual.blocks.6.mlp.linear_fc1
28. model.visual.blocks.6.mlp.linear_fc2
29. model.visual.blocks.7.attn.qkv
30. model.visual.blocks.7.attn.proj
31. model.visual.blocks.7.mlp.linear_fc1
32. model.visual.blocks.7.mlp.linear_fc2
33. model.visual.blocks.8.attn.qkv
34. model.visual.blocks.8.attn.proj
35. model.visual.blocks.8.mlp.linear_fc1
36. model.visual.blocks.8.mlp.linear_fc2
37. model.visual.blocks.9.attn.qkv
38. model.visual.blocks.9.attn.proj
39. model.visual.blocks.9.mlp.linear_fc1
40. model.visual.blocks.9.mlp.linear_fc2
41. model.visual.blocks.10.attn.qkv
42. model.visual.blocks.10.attn.proj
43. model.visual.blocks.10.mlp.linear_fc1
44. model.visual.blocks.10.mlp.linear_fc2
45. model.visual.blocks.11.attn.qkv
46. model.visual.blocks.11.attn.proj
47. model.visual.blocks.11.mlp.linear_fc1
48. model.visual.blocks.11.mlp.linear_fc2
49. model.visual.blocks.12.attn.qkv
50. model.visual.blocks.12.attn.proj
51. model.visual.blocks.12.mlp.linear_fc1
52. model.visual.blocks.12.mlp.linear_fc2
53. model.visual.blocks.13.attn.qkv
54. model.visual.blocks.13.attn.proj
55. model.visual.blocks.13.mlp.linear_fc1
56. model.visual.blocks.13.mlp.linear_fc2
57. model.visual.blocks.14.attn.qkv
58. model.visual.blocks.14.attn.proj
59. model.visual.blocks.14.mlp.linear_fc1
60. model.visual.blocks.14.mlp.linear_fc2
61. model.visual.blocks.15.attn.qkv
62. model.visual.blocks.15.attn.proj
63. model.visual.blocks.15.mlp.linear_fc1
64. model.visual.blocks.15.mlp.linear_fc2
65. model.visual.blocks.16.attn.qkv
66. model.visual.blocks.16.attn.proj
67. model.visual.blocks.16.mlp.linear_fc1
68. model.visual.blocks.16.mlp.linear_fc2
69. model.visual.blocks.17.attn.qkv
70. model.visual.blocks.17.attn.proj
71. model.visual.blocks.17.mlp.linear_fc1
72. model.visual.blocks.17.mlp.linear_fc2
73. model.visual.blocks.18.attn.qkv
74. model.visual.blocks.18.attn.proj
75. model.visual.blocks.18.mlp.linear_fc1
76. model.visual.blocks.18.mlp.linear_fc2
77. model.visual.blocks.19.attn.qkv
78. model.visual.blocks.19.attn.proj
79. model.visual.blocks.19.mlp.linear_fc1
80. model.visual.blocks.19.mlp.linear_fc2
81. model.visual.blocks.20.attn.qkv
82. model.visual.blocks.20.attn.proj
83. model.visual.blocks.20.mlp.linear_fc1
84. model.visual.blocks.20.mlp.linear_fc2
85. model.visual.blocks.21.attn.qkv
86. model.visual.blocks.21.attn.proj
87. model.visual.blocks.21.mlp.linear_fc1
88. model.visual.blocks.21.mlp.linear_fc2
89. model.visual.blocks.22.attn.qkv
90. model.visual.blocks.22.attn.proj
91. model.visual.blocks.22.mlp.linear_fc1
92. model.visual.blocks.22.mlp.linear_fc2
93. model.visual.blocks.23.attn.qkv
94. model.visual.blocks.23.attn.proj
95. model.visual.blocks.23.mlp.linear_fc1
96. model.visual.blocks.23.mlp.linear_fc2
97. model.visual.merger.linear_fc1
98. model.visual.merger.linear_fc2
99. model.visual.deepstack_merger_list.0.linear_fc1
100. model.visual.deepstack_merger_list.0.linear_fc2
101. model.visual.deepstack_merger_list.1.linear_fc1
102. model.visual.deepstack_merger_list.1.linear_fc2
103. model.visual.deepstack_merger_list.2.linear_fc1
104. model.visual.deepstack_merger_list.2.linear_fc2
105. model.language_model.layers.0.self_attn.q_proj
106. model.language_model.layers.0.self_attn.k_proj
107. model.language_model.layers.0.self_attn.v_proj
108. model.language_model.layers.0.self_attn.o_proj
109. model.language_model.layers.0.mlp.gate_proj
110. model.language_model.layers.0.mlp.up_proj
111. model.language_model.layers.0.mlp.down_proj
112. model.language_model.layers.1.self_attn.q_proj
113. model.language_model.layers.1.self_attn.k_proj
114. model.language_model.layers.1.self_attn.v_proj
115. model.language_model.layers.1.self_attn.o_proj
116. model.language_model.layers.1.mlp.gate_proj
117. model.language_model.layers.1.mlp.up_proj
118. model.language_model.layers.1.mlp.down_proj
119. model.language_model.layers.2.self_attn.q_proj
120. model.language_model.layers.2.self_attn.k_proj
121. model.language_model.layers.2.self_attn.v_proj
122. model.language_model.layers.2.self_attn.o_proj
123. model.language_model.layers.2.mlp.gate_proj
124. model.language_model.layers.2.mlp.up_proj
125. model.language_model.layers.2.mlp.down_proj
126. model.language_model.layers.3.self_attn.q_proj
127. model.language_model.layers.3.self_attn.k_proj
128. model.language_model.layers.3.self_attn.v_proj
129. model.language_model.layers.3.self_attn.o_proj
130. model.language_model.layers.3.mlp.gate_proj
131. model.language_model.layers.3.mlp.up_proj
132. model.language_model.layers.3.mlp.down_proj
133. model.language_model.layers.4.self_attn.q_proj
134. model.language_model.layers.4.self_attn.k_proj
135. model.language_model.layers.4.self_attn.v_proj
136. model.language_model.layers.4.self_attn.o_proj
137. model.language_model.layers.4.mlp.gate_proj
138. model.language_model.layers.4.mlp.up_proj
139. model.language_model.layers.4.mlp.down_proj
140. model.language_model.layers.5.self_attn.q_proj
141. model.language_model.layers.5.self_attn.k_proj
142. model.language_model.layers.5.self_attn.v_proj
143. model.language_model.layers.5.self_attn.o_proj
144. model.language_model.layers.5.mlp.gate_proj
145. model.language_model.layers.5.mlp.up_proj
146. model.language_model.layers.5.mlp.down_proj
147. model.language_model.layers.6.self_attn.q_proj
148. model.language_model.layers.6.self_attn.k_proj
149. model.language_model.layers.6.self_attn.v_proj
150. model.language_model.layers.6.self_attn.o_proj
151. model.language_model.layers.6.mlp.gate_proj
152. model.language_model.layers.6.mlp.up_proj
153. model.language_model.layers.6.mlp.down_proj
154. model.language_model.layers.7.self_attn.q_proj
155. model.language_model.layers.7.self_attn.k_proj
156. model.language_model.layers.7.self_attn.v_proj
157. model.language_model.layers.7.self_attn.o_proj
158. model.language_model.layers.7.mlp.gate_proj
159. model.language_model.layers.7.mlp.up_proj
160. model.language_model.layers.7.mlp.down_proj
161. model.language_model.layers.8.self_attn.q_proj
162. model.language_model.layers.8.self_attn.k_proj
163. model.language_model.layers.8.self_attn.v_proj
164. model.language_model.layers.8.self_attn.o_proj
165. model.language_model.layers.8.mlp.gate_proj
166. model.language_model.layers.8.mlp.up_proj
167. model.language_model.layers.8.mlp.down_proj
168. model.language_model.layers.9.self_attn.q_proj
169. model.language_model.layers.9.self_attn.k_proj
170. model.language_model.layers.9.self_attn.v_proj
171. model.language_model.layers.9.self_attn.o_proj
172. model.language_model.layers.9.mlp.gate_proj
173. model.language_model.layers.9.mlp.up_proj
174. model.language_model.layers.9.mlp.down_proj
175. model.language_model.layers.10.self_attn.q_proj
176. model.language_model.layers.10.self_attn.k_proj
177. model.language_model.layers.10.self_attn.v_proj
178. model.language_model.layers.10.self_attn.o_proj
179. model.language_model.layers.10.mlp.gate_proj
180. model.language_model.layers.10.mlp.up_proj
181. model.language_model.layers.10.mlp.down_proj
182. model.language_model.layers.11.self_attn.q_proj
183. model.language_model.layers.11.self_attn.k_proj
184. model.language_model.layers.11.self_attn.v_proj
185. model.language_model.layers.11.self_attn.o_proj
186. model.language_model.layers.11.mlp.gate_proj
187. model.language_model.layers.11.mlp.up_proj
188. model.language_model.layers.11.mlp.down_proj
189. model.language_model.layers.12.self_attn.q_proj
190. model.language_model.layers.12.self_attn.k_proj
191. model.language_model.layers.12.self_attn.v_proj
192. model.language_model.layers.12.self_attn.o_proj
193. model.language_model.layers.12.mlp.gate_proj
194. model.language_model.layers.12.mlp.up_proj
195. model.language_model.layers.12.mlp.down_proj
196. model.language_model.layers.13.self_attn.q_proj
197. model.language_model.layers.13.self_attn.k_proj
198. model.language_model.layers.13.self_attn.v_proj
199. model.language_model.layers.13.self_attn.o_proj
200. model.language_model.layers.13.mlp.gate_proj
201. model.language_model.layers.13.mlp.up_proj
202. model.language_model.layers.13.mlp.down_proj
203. model.language_model.layers.14.self_attn.q_proj
204. model.language_model.layers.14.self_attn.k_proj
205. model.language_model.layers.14.self_attn.v_proj
206. model.language_model.layers.14.self_attn.o_proj
207. model.language_model.layers.14.mlp.gate_proj
208. model.language_model.layers.14.mlp.up_proj
209. model.language_model.layers.14.mlp.down_proj
210. model.language_model.layers.15.self_attn.q_proj
211. model.language_model.layers.15.self_attn.k_proj
212. model.language_model.layers.15.self_attn.v_proj
213. model.language_model.layers.15.self_attn.o_proj
214. model.language_model.layers.15.mlp.gate_proj
215. model.language_model.layers.15.mlp.up_proj
216. model.language_model.layers.15.mlp.down_proj
217. model.language_model.layers.16.self_attn.q_proj
218. model.language_model.layers.16.self_attn.k_proj
219. model.language_model.layers.16.self_attn.v_proj
220. model.language_model.layers.16.self_attn.o_proj
221. model.language_model.layers.16.mlp.gate_proj
222. model.language_model.layers.16.mlp.up_proj
223. model.language_model.layers.16.mlp.down_proj
224. model.language_model.layers.17.self_attn.q_proj
225. model.language_model.layers.17.self_attn.k_proj
226. model.language_model.layers.17.self_attn.v_proj
227. model.language_model.layers.17.self_attn.o_proj
228. model.language_model.layers.17.mlp.gate_proj
229. model.language_model.layers.17.mlp.up_proj
230. model.language_model.layers.17.mlp.down_proj
231. model.language_model.layers.18.self_attn.q_proj
232. model.language_model.layers.18.self_attn.k_proj
233. model.language_model.layers.18.self_attn.v_proj
234. model.language_model.layers.18.self_attn.o_proj
235. model.language_model.layers.18.mlp.gate_proj
236. model.language_model.layers.18.mlp.up_proj
237. model.language_model.layers.18.mlp.down_proj
238. model.language_model.layers.19.self_attn.q_proj
239. model.language_model.layers.19.self_attn.k_proj
240. model.language_model.layers.19.self_attn.v_proj
241. model.language_model.layers.19.self_attn.o_proj
242. model.language_model.layers.19.mlp.gate_proj
243. model.language_model.layers.19.mlp.up_proj
244. model.language_model.layers.19.mlp.down_proj
245. model.language_model.layers.20.self_attn.q_proj
246. model.language_model.layers.20.self_attn.k_proj
247. model.language_model.layers.20.self_attn.v_proj
248. model.language_model.layers.20.self_attn.o_proj
249. model.language_model.layers.20.mlp.gate_proj
250. model.language_model.layers.20.mlp.up_proj
251. model.language_model.layers.20.mlp.down_proj
252. model.language_model.layers.21.self_attn.q_proj
253. model.language_model.layers.21.self_attn.k_proj
254. model.language_model.layers.21.self_attn.v_proj
255. model.language_model.layers.21.self_attn.o_proj
256. model.language_model.layers.21.mlp.gate_proj
257. model.language_model.layers.21.mlp.up_proj
258. model.language_model.layers.21.mlp.down_proj
259. model.language_model.layers.22.self_attn.q_proj
260. model.language_model.layers.22.self_attn.k_proj
261. model.language_model.layers.22.self_attn.v_proj
262. model.language_model.layers.22.self_attn.o_proj
263. model.language_model.layers.22.mlp.gate_proj
264. model.language_model.layers.22.mlp.up_proj
265. model.language_model.layers.22.mlp.down_proj
266. model.language_model.layers.23.self_attn.q_proj
267. model.language_model.layers.23.self_attn.k_proj
268. model.language_model.layers.23.self_attn.v_proj
269. model.language_model.layers.23.self_attn.o_proj
270. model.language_model.layers.23.mlp.gate_proj
271. model.language_model.layers.23.mlp.up_proj
272. model.language_model.layers.23.mlp.down_proj
273. model.language_model.layers.24.self_attn.q_proj
274. model.language_model.layers.24.self_attn.k_proj
275. model.language_model.layers.24.self_attn.v_proj
276. model.language_model.layers.24.self_attn.o_proj
277. model.language_model.layers.24.mlp.gate_proj
278. model.language_model.layers.24.mlp.up_proj
279. model.language_model.layers.24.mlp.down_proj
280. model.language_model.layers.25.self_attn.q_proj
281. model.language_model.layers.25.self_attn.k_proj
282. model.language_model.layers.25.self_attn.v_proj
283. model.language_model.layers.25.self_attn.o_proj
284. model.language_model.layers.25.mlp.gate_proj
285. model.language_model.layers.25.mlp.up_proj
286. model.language_model.layers.25.mlp.down_proj
287. model.language_model.layers.26.self_attn.q_proj
288. model.language_model.layers.26.self_attn.k_proj
289. model.language_model.layers.26.self_attn.v_proj
290. model.language_model.layers.26.self_attn.o_proj
291. model.language_model.layers.26.mlp.gate_proj
292. model.language_model.layers.26.mlp.up_proj
293. model.language_model.layers.26.mlp.down_proj
294. model.language_model.layers.27.self_attn.q_proj
295. model.language_model.layers.27.self_attn.k_proj
296. model.language_model.layers.27.self_attn.v_proj
297. model.language_model.layers.27.self_attn.o_proj
298. model.language_model.layers.27.mlp.gate_proj
299. model.language_model.layers.27.mlp.up_proj
300. model.language_model.layers.27.mlp.down_proj
✗ NON-QUANTIZED LAYERS (171):
1. model.visual.patch_embed.proj
2. model.visual.pos_embed
3. model.visual.rotary_pos_emb
4. model.visual.blocks.0.norm1
5. model.visual.blocks.0.norm2
6. model.visual.blocks.1.norm1
7. model.visual.blocks.1.norm2
8. model.visual.blocks.2.norm1
9. model.visual.blocks.2.norm2
10. model.visual.blocks.3.norm1
11. model.visual.blocks.3.norm2
12. model.visual.blocks.4.norm1
13. model.visual.blocks.4.norm2
14. model.visual.blocks.5.norm1
15. model.visual.blocks.5.norm2
16. model.visual.blocks.6.norm1
17. model.visual.blocks.6.norm2
18. model.visual.blocks.7.norm1
19. model.visual.blocks.7.norm2
20. model.visual.blocks.8.norm1
21. model.visual.blocks.8.norm2
22. model.visual.blocks.9.norm1
23. model.visual.blocks.9.norm2
24. model.visual.blocks.10.norm1
25. model.visual.blocks.10.norm2
26. model.visual.blocks.11.norm1
27. model.visual.blocks.11.norm2
28. model.visual.blocks.12.norm1
29. model.visual.blocks.12.norm2
30. model.visual.blocks.13.norm1
31. model.visual.blocks.13.norm2
32. model.visual.blocks.14.norm1
33. model.visual.blocks.14.norm2
34. model.visual.blocks.15.norm1
35. model.visual.blocks.15.norm2
36. model.visual.blocks.16.norm1
37. model.visual.blocks.16.norm2
38. model.visual.blocks.17.norm1
39. model.visual.blocks.17.norm2
40. model.visual.blocks.18.norm1
41. model.visual.blocks.18.norm2
42. model.visual.blocks.19.norm1
43. model.visual.blocks.19.norm2
44. model.visual.blocks.20.norm1
45. model.visual.blocks.20.norm2
46. model.visual.blocks.21.norm1
47. model.visual.blocks.21.norm2
48. model.visual.blocks.22.norm1
49. model.visual.blocks.22.norm2
50. model.visual.blocks.23.norm1
51. model.visual.blocks.23.norm2
52. model.visual.merger.norm
53. model.visual.deepstack_merger_list.0.norm
54. model.visual.deepstack_merger_list.1.norm
55. model.visual.deepstack_merger_list.2.norm
56. model.language_model.embed_tokens
57. model.language_model.layers.0.self_attn.q_norm
58. model.language_model.layers.0.self_attn.k_norm
59. model.language_model.layers.0.input_layernorm
60. model.language_model.layers.0.post_attention_layernorm
61. model.language_model.layers.1.self_attn.q_norm
62. model.language_model.layers.1.self_attn.k_norm
63. model.language_model.layers.1.input_layernorm
64. model.language_model.layers.1.post_attention_layernorm
65. model.language_model.layers.2.self_attn.q_norm
66. model.language_model.layers.2.self_attn.k_norm
67. model.language_model.layers.2.input_layernorm
68. model.language_model.layers.2.post_attention_layernorm
69. model.language_model.layers.3.self_attn.q_norm
70. model.language_model.layers.3.self_attn.k_norm
71. model.language_model.layers.3.input_layernorm
72. model.language_model.layers.3.post_attention_layernorm
73. model.language_model.layers.4.self_attn.q_norm
74. model.language_model.layers.4.self_attn.k_norm
75. model.language_model.layers.4.input_layernorm
76. model.language_model.layers.4.post_attention_layernorm
77. model.language_model.layers.5.self_attn.q_norm
78. model.language_model.layers.5.self_attn.k_norm
79. model.language_model.layers.5.input_layernorm
80. model.language_model.layers.5.post_attention_layernorm
81. model.language_model.layers.6.self_attn.q_norm
82. model.language_model.layers.6.self_attn.k_norm
83. model.language_model.layers.6.input_layernorm
84. model.language_model.layers.6.post_attention_layernorm
85. model.language_model.layers.7.self_attn.q_norm
86. model.language_model.layers.7.self_attn.k_norm
87. model.language_model.layers.7.input_layernorm
88. model.language_model.layers.7.post_attention_layernorm
89. model.language_model.layers.8.self_attn.q_norm
90. model.language_model.layers.8.self_attn.k_norm
91. model.language_model.layers.8.input_layernorm
92. model.language_model.layers.8.post_attention_layernorm
93. model.language_model.layers.9.self_attn.q_norm
94. model.language_model.layers.9.self_attn.k_norm
95. model.language_model.layers.9.input_layernorm
96. model.language_model.layers.9.post_attention_layernorm
97. model.language_model.layers.10.self_attn.q_norm
98. model.language_model.layers.10.self_attn.k_norm
99. model.language_model.layers.10.input_layernorm
100. model.language_model.layers.10.post_attention_layernorm
101. model.language_model.layers.11.self_attn.q_norm
102. model.language_model.layers.11.self_attn.k_norm
103. model.language_model.layers.11.input_layernorm
104. model.language_model.layers.11.post_attention_layernorm
105. model.language_model.layers.12.self_attn.q_norm
106. model.language_model.layers.12.self_attn.k_norm
107. model.language_model.layers.12.input_layernorm
108. model.language_model.layers.12.post_attention_layernorm
109. model.language_model.layers.13.self_attn.q_norm
110. model.language_model.layers.13.self_attn.k_norm
111. model.language_model.layers.13.input_layernorm
112. model.language_model.layers.13.post_attention_layernorm
113. model.language_model.layers.14.self_attn.q_norm
114. model.language_model.layers.14.self_attn.k_norm
115. model.language_model.layers.14.input_layernorm
116. model.language_model.layers.14.post_attention_layernorm
117. model.language_model.layers.15.self_attn.q_norm
118. model.language_model.layers.15.self_attn.k_norm
119. model.language_model.layers.15.input_layernorm
120. model.language_model.layers.15.post_attention_layernorm
121. model.language_model.layers.16.self_attn.q_norm
122. model.language_model.layers.16.self_attn.k_norm
123. model.language_model.layers.16.input_layernorm
124. model.language_model.layers.16.post_attention_layernorm
125. model.language_model.layers.17.self_attn.q_norm
126. model.language_model.layers.17.self_attn.k_norm
127. model.language_model.layers.17.input_layernorm
128. model.language_model.layers.17.post_attention_layernorm
129. model.language_model.layers.18.self_attn.q_norm
130. model.language_model.layers.18.self_attn.k_norm
131. model.language_model.layers.18.input_layernorm
132. model.language_model.layers.18.post_attention_layernorm
133. model.language_model.layers.19.self_attn.q_norm
134. model.language_model.layers.19.self_attn.k_norm
135. model.language_model.layers.19.input_layernorm
136. model.language_model.layers.19.post_attention_layernorm
137. model.language_model.layers.20.self_attn.q_norm
138. model.language_model.layers.20.self_attn.k_norm
139. model.language_model.layers.20.input_layernorm
140. model.language_model.layers.20.post_attention_layernorm
141. model.language_model.layers.21.self_attn.q_norm
142. model.language_model.layers.21.self_attn.k_norm
143. model.language_model.layers.21.input_layernorm
144. model.language_model.layers.21.post_attention_layernorm
145. model.language_model.layers.22.self_attn.q_norm
146. model.language_model.layers.22.self_attn.k_norm
147. model.language_model.layers.22.input_layernorm
148. model.language_model.layers.22.post_attention_layernorm
149. model.language_model.layers.23.self_attn.q_norm
150. model.language_model.layers.23.self_attn.k_norm
151. model.language_model.layers.23.input_layernorm
152. model.language_model.layers.23.post_attention_layernorm
153. model.language_model.layers.24.self_attn.q_norm
154. model.language_model.layers.24.self_attn.k_norm
155. model.language_model.layers.24.input_layernorm
156. model.language_model.layers.24.post_attention_layernorm
157. model.language_model.layers.25.self_attn.q_norm
158. model.language_model.layers.25.self_attn.k_norm
159. model.language_model.layers.25.input_layernorm
160. model.language_model.layers.25.post_attention_layernorm
161. model.language_model.layers.26.self_attn.q_norm
162. model.language_model.layers.26.self_attn.k_norm
163. model.language_model.layers.26.input_layernorm
164. model.language_model.layers.26.post_attention_layernorm
165. model.language_model.layers.27.self_attn.q_norm
166. model.language_model.layers.27.self_attn.k_norm
167. model.language_model.layers.27.input_layernorm
168. model.language_model.layers.27.post_attention_layernorm
169. model.language_model.norm
170. model.language_model.rotary_emb
171. lm_head
==========================================================
|
|
okay tried it with some more complex arch like ovis2.5, which needs nested registrations, that still doesnt work, though i will add support for that tomorrow |
|
Okay i found an issue with the current implementation in this pr and fixed the nested registration stuff, ill update the code tomorrow, because there is a LOT of debug stuff in it 😅 |
|
Okay found a weird error that i have no idea how to fix rn, when i load non gemlite quantized models it works with gemlite and without it just fine. Though the moment i try to load one that was quantized with gemlite it gives me some weird Traceback (most recent call last):
File "F:\SINQ\inference_ovis.py", line 350, in
run_inference(args.model_dir, args.image, args.prompt, args.device, args.stream)
File "F:\SINQ\inference_ovis.py", line 131, in run_inference
model = AutoSINQHFModel.from_quantized_safetensors(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "f:\sinq\sinq\patch_model.py", line 1474, in from_quantized_safetensors
cls.patch_model(model, _load_module, _load_module, {k: None for k in model.linear_tags})
File "f:\sinq\sinq\patch_model.py", line 548, in patch_model
cls.patch_linearlayers(model, patch_linear_fct, patch_params, verbose=verbose)
File "f:\sinq\sinq\patch_model.py", line 486, in patch_linearlayers
patch_fct(tmp_mapping[name], patch_param),
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "G:\python\Lib\site-packages\torch\utils\_contextlib.py", line 120, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "f:\sinq\sinq\patch_model.py", line 1468, in _load_module
t = tensor.to(device=device, non_blocking=True)
^^^^^^^^^
AttributeError: 'dict' object has no attribute 'to'
|
|
ill create a custom branch with the prototype code in my fork |
Well i have no idea why that happens, ive tried everything, the only thing to prevent this is to disable gemlite for quantization, when its done with the normal method everything works fine and the model can also be loaded with gemlite enabled without issues. Its not only custom models like ovis that are effected in my code, but normal llms too? |
|
The relevant code is in here "https://github.com/wsbagnsv1/SINQ/tree/prototype" but its as ive said not cleaned up and might have redundant code and debug prints |
|
At the moment we have an issue with save/reload with gemlite as you pointed out. We are trying to solve. The new commit automatically avoid gemlite. Sorry for this. We will try to fix asap. |
Good to hear its not me 😅 |
Refactor weight optimization to handle None return.
I tried to code support for non standard models, and so far at least qwen3 vl worked fine (;
There might be redundant code or some bugs somewhere, so id advice to check the code before merge but I tried my best to mitigate that (;