Skip to content

Attempt to get non standard models to work with sinq#11

Draft
wsbagnsv1 wants to merge 7 commits into
huawei-csl:mainfrom
wsbagnsv1:main
Draft

Attempt to get non standard models to work with sinq#11
wsbagnsv1 wants to merge 7 commits into
huawei-csl:mainfrom
wsbagnsv1:main

Conversation

@wsbagnsv1
Copy link
Copy Markdown

I tried to code support for non standard models, and so far at least qwen3 vl worked fine (;

There might be redundant code or some bugs somewhere, so id advice to check the code before merge but I tried my best to mitigate that (;

Skipping quantization for layer with incompatible shape (block must divide W). Keeping in high precision.
@wsbagnsv1
Copy link
Copy Markdown
Author

okay atm i think it doesnt actually quantize other layers, so ill try to add support for that too (;
Though vllms for example should work fine now, the visual encoder is just kept in high precision

@wsbagnsv1
Copy link
Copy Markdown
Author

wsbagnsv1 commented Oct 24, 2025

Okay ive done some tests, it seems the code works fine and also quantizes layers that are not just from the llm, here is a debug print for qwen3vl 2b, as you can see every linear layer is saved, if some have to be kept in high precision, it can be arranged without any issue with the

skip_tensors=['tensorname1', 'tensorname2"] 

in the BaseQuantizeConfig

=== [SINQ QUANTIZATION DEBUG - SAVED] ===
Total layers in model: 171
Layers processed for saving: 471
Completely skipped layers: 0
Quantized layers (with W_q): 300
Non-quantized layers (including high-precision): 171

✓ QUANTIZED LAYERS (300):
    1. model.visual.blocks.0.attn.qkv
    2. model.visual.blocks.0.attn.proj
    3. model.visual.blocks.0.mlp.linear_fc1
    4. model.visual.blocks.0.mlp.linear_fc2
    5. model.visual.blocks.1.attn.qkv
    6. model.visual.blocks.1.attn.proj
    7. model.visual.blocks.1.mlp.linear_fc1
    8. model.visual.blocks.1.mlp.linear_fc2
    9. model.visual.blocks.2.attn.qkv
   10. model.visual.blocks.2.attn.proj
   11. model.visual.blocks.2.mlp.linear_fc1
   12. model.visual.blocks.2.mlp.linear_fc2
   13. model.visual.blocks.3.attn.qkv
   14. model.visual.blocks.3.attn.proj
   15. model.visual.blocks.3.mlp.linear_fc1
   16. model.visual.blocks.3.mlp.linear_fc2
   17. model.visual.blocks.4.attn.qkv
   18. model.visual.blocks.4.attn.proj
   19. model.visual.blocks.4.mlp.linear_fc1
   20. model.visual.blocks.4.mlp.linear_fc2
   21. model.visual.blocks.5.attn.qkv
   22. model.visual.blocks.5.attn.proj
   23. model.visual.blocks.5.mlp.linear_fc1
   24. model.visual.blocks.5.mlp.linear_fc2
   25. model.visual.blocks.6.attn.qkv
   26. model.visual.blocks.6.attn.proj
   27. model.visual.blocks.6.mlp.linear_fc1
   28. model.visual.blocks.6.mlp.linear_fc2
   29. model.visual.blocks.7.attn.qkv
   30. model.visual.blocks.7.attn.proj
   31. model.visual.blocks.7.mlp.linear_fc1
   32. model.visual.blocks.7.mlp.linear_fc2
   33. model.visual.blocks.8.attn.qkv
   34. model.visual.blocks.8.attn.proj
   35. model.visual.blocks.8.mlp.linear_fc1
   36. model.visual.blocks.8.mlp.linear_fc2
   37. model.visual.blocks.9.attn.qkv
   38. model.visual.blocks.9.attn.proj
   39. model.visual.blocks.9.mlp.linear_fc1
   40. model.visual.blocks.9.mlp.linear_fc2
   41. model.visual.blocks.10.attn.qkv
   42. model.visual.blocks.10.attn.proj
   43. model.visual.blocks.10.mlp.linear_fc1
   44. model.visual.blocks.10.mlp.linear_fc2
   45. model.visual.blocks.11.attn.qkv
   46. model.visual.blocks.11.attn.proj
   47. model.visual.blocks.11.mlp.linear_fc1
   48. model.visual.blocks.11.mlp.linear_fc2
   49. model.visual.blocks.12.attn.qkv
   50. model.visual.blocks.12.attn.proj
   51. model.visual.blocks.12.mlp.linear_fc1
   52. model.visual.blocks.12.mlp.linear_fc2
   53. model.visual.blocks.13.attn.qkv
   54. model.visual.blocks.13.attn.proj
   55. model.visual.blocks.13.mlp.linear_fc1
   56. model.visual.blocks.13.mlp.linear_fc2
   57. model.visual.blocks.14.attn.qkv
   58. model.visual.blocks.14.attn.proj
   59. model.visual.blocks.14.mlp.linear_fc1
   60. model.visual.blocks.14.mlp.linear_fc2
   61. model.visual.blocks.15.attn.qkv
   62. model.visual.blocks.15.attn.proj
   63. model.visual.blocks.15.mlp.linear_fc1
   64. model.visual.blocks.15.mlp.linear_fc2
   65. model.visual.blocks.16.attn.qkv
   66. model.visual.blocks.16.attn.proj
   67. model.visual.blocks.16.mlp.linear_fc1
   68. model.visual.blocks.16.mlp.linear_fc2
   69. model.visual.blocks.17.attn.qkv
   70. model.visual.blocks.17.attn.proj
   71. model.visual.blocks.17.mlp.linear_fc1
   72. model.visual.blocks.17.mlp.linear_fc2
   73. model.visual.blocks.18.attn.qkv
   74. model.visual.blocks.18.attn.proj
   75. model.visual.blocks.18.mlp.linear_fc1
   76. model.visual.blocks.18.mlp.linear_fc2
   77. model.visual.blocks.19.attn.qkv
   78. model.visual.blocks.19.attn.proj
   79. model.visual.blocks.19.mlp.linear_fc1
   80. model.visual.blocks.19.mlp.linear_fc2
   81. model.visual.blocks.20.attn.qkv
   82. model.visual.blocks.20.attn.proj
   83. model.visual.blocks.20.mlp.linear_fc1
   84. model.visual.blocks.20.mlp.linear_fc2
   85. model.visual.blocks.21.attn.qkv
   86. model.visual.blocks.21.attn.proj
   87. model.visual.blocks.21.mlp.linear_fc1
   88. model.visual.blocks.21.mlp.linear_fc2
   89. model.visual.blocks.22.attn.qkv
   90. model.visual.blocks.22.attn.proj
   91. model.visual.blocks.22.mlp.linear_fc1
   92. model.visual.blocks.22.mlp.linear_fc2
   93. model.visual.blocks.23.attn.qkv
   94. model.visual.blocks.23.attn.proj
   95. model.visual.blocks.23.mlp.linear_fc1
   96. model.visual.blocks.23.mlp.linear_fc2
   97. model.visual.merger.linear_fc1
   98. model.visual.merger.linear_fc2
   99. model.visual.deepstack_merger_list.0.linear_fc1
  100. model.visual.deepstack_merger_list.0.linear_fc2
  101. model.visual.deepstack_merger_list.1.linear_fc1
  102. model.visual.deepstack_merger_list.1.linear_fc2
  103. model.visual.deepstack_merger_list.2.linear_fc1
  104. model.visual.deepstack_merger_list.2.linear_fc2
  105. model.language_model.layers.0.self_attn.q_proj
  106. model.language_model.layers.0.self_attn.k_proj
  107. model.language_model.layers.0.self_attn.v_proj
  108. model.language_model.layers.0.self_attn.o_proj
  109. model.language_model.layers.0.mlp.gate_proj
  110. model.language_model.layers.0.mlp.up_proj
  111. model.language_model.layers.0.mlp.down_proj
  112. model.language_model.layers.1.self_attn.q_proj
  113. model.language_model.layers.1.self_attn.k_proj
  114. model.language_model.layers.1.self_attn.v_proj
  115. model.language_model.layers.1.self_attn.o_proj
  116. model.language_model.layers.1.mlp.gate_proj
  117. model.language_model.layers.1.mlp.up_proj
  118. model.language_model.layers.1.mlp.down_proj
  119. model.language_model.layers.2.self_attn.q_proj
  120. model.language_model.layers.2.self_attn.k_proj
  121. model.language_model.layers.2.self_attn.v_proj
  122. model.language_model.layers.2.self_attn.o_proj
  123. model.language_model.layers.2.mlp.gate_proj
  124. model.language_model.layers.2.mlp.up_proj
  125. model.language_model.layers.2.mlp.down_proj
  126. model.language_model.layers.3.self_attn.q_proj
  127. model.language_model.layers.3.self_attn.k_proj
  128. model.language_model.layers.3.self_attn.v_proj
  129. model.language_model.layers.3.self_attn.o_proj
  130. model.language_model.layers.3.mlp.gate_proj
  131. model.language_model.layers.3.mlp.up_proj
  132. model.language_model.layers.3.mlp.down_proj
  133. model.language_model.layers.4.self_attn.q_proj
  134. model.language_model.layers.4.self_attn.k_proj
  135. model.language_model.layers.4.self_attn.v_proj
  136. model.language_model.layers.4.self_attn.o_proj
  137. model.language_model.layers.4.mlp.gate_proj
  138. model.language_model.layers.4.mlp.up_proj
  139. model.language_model.layers.4.mlp.down_proj
  140. model.language_model.layers.5.self_attn.q_proj
  141. model.language_model.layers.5.self_attn.k_proj
  142. model.language_model.layers.5.self_attn.v_proj
  143. model.language_model.layers.5.self_attn.o_proj
  144. model.language_model.layers.5.mlp.gate_proj
  145. model.language_model.layers.5.mlp.up_proj
  146. model.language_model.layers.5.mlp.down_proj
  147. model.language_model.layers.6.self_attn.q_proj
  148. model.language_model.layers.6.self_attn.k_proj
  149. model.language_model.layers.6.self_attn.v_proj
  150. model.language_model.layers.6.self_attn.o_proj
  151. model.language_model.layers.6.mlp.gate_proj
  152. model.language_model.layers.6.mlp.up_proj
  153. model.language_model.layers.6.mlp.down_proj
  154. model.language_model.layers.7.self_attn.q_proj
  155. model.language_model.layers.7.self_attn.k_proj
  156. model.language_model.layers.7.self_attn.v_proj
  157. model.language_model.layers.7.self_attn.o_proj
  158. model.language_model.layers.7.mlp.gate_proj
  159. model.language_model.layers.7.mlp.up_proj
  160. model.language_model.layers.7.mlp.down_proj
  161. model.language_model.layers.8.self_attn.q_proj
  162. model.language_model.layers.8.self_attn.k_proj
  163. model.language_model.layers.8.self_attn.v_proj
  164. model.language_model.layers.8.self_attn.o_proj
  165. model.language_model.layers.8.mlp.gate_proj
  166. model.language_model.layers.8.mlp.up_proj
  167. model.language_model.layers.8.mlp.down_proj
  168. model.language_model.layers.9.self_attn.q_proj
  169. model.language_model.layers.9.self_attn.k_proj
  170. model.language_model.layers.9.self_attn.v_proj
  171. model.language_model.layers.9.self_attn.o_proj
  172. model.language_model.layers.9.mlp.gate_proj
  173. model.language_model.layers.9.mlp.up_proj
  174. model.language_model.layers.9.mlp.down_proj
  175. model.language_model.layers.10.self_attn.q_proj
  176. model.language_model.layers.10.self_attn.k_proj
  177. model.language_model.layers.10.self_attn.v_proj
  178. model.language_model.layers.10.self_attn.o_proj
  179. model.language_model.layers.10.mlp.gate_proj
  180. model.language_model.layers.10.mlp.up_proj
  181. model.language_model.layers.10.mlp.down_proj
  182. model.language_model.layers.11.self_attn.q_proj
  183. model.language_model.layers.11.self_attn.k_proj
  184. model.language_model.layers.11.self_attn.v_proj
  185. model.language_model.layers.11.self_attn.o_proj
  186. model.language_model.layers.11.mlp.gate_proj
  187. model.language_model.layers.11.mlp.up_proj
  188. model.language_model.layers.11.mlp.down_proj
  189. model.language_model.layers.12.self_attn.q_proj
  190. model.language_model.layers.12.self_attn.k_proj
  191. model.language_model.layers.12.self_attn.v_proj
  192. model.language_model.layers.12.self_attn.o_proj
  193. model.language_model.layers.12.mlp.gate_proj
  194. model.language_model.layers.12.mlp.up_proj
  195. model.language_model.layers.12.mlp.down_proj
  196. model.language_model.layers.13.self_attn.q_proj
  197. model.language_model.layers.13.self_attn.k_proj
  198. model.language_model.layers.13.self_attn.v_proj
  199. model.language_model.layers.13.self_attn.o_proj
  200. model.language_model.layers.13.mlp.gate_proj
  201. model.language_model.layers.13.mlp.up_proj
  202. model.language_model.layers.13.mlp.down_proj
  203. model.language_model.layers.14.self_attn.q_proj
  204. model.language_model.layers.14.self_attn.k_proj
  205. model.language_model.layers.14.self_attn.v_proj
  206. model.language_model.layers.14.self_attn.o_proj
  207. model.language_model.layers.14.mlp.gate_proj
  208. model.language_model.layers.14.mlp.up_proj
  209. model.language_model.layers.14.mlp.down_proj
  210. model.language_model.layers.15.self_attn.q_proj
  211. model.language_model.layers.15.self_attn.k_proj
  212. model.language_model.layers.15.self_attn.v_proj
  213. model.language_model.layers.15.self_attn.o_proj
  214. model.language_model.layers.15.mlp.gate_proj
  215. model.language_model.layers.15.mlp.up_proj
  216. model.language_model.layers.15.mlp.down_proj
  217. model.language_model.layers.16.self_attn.q_proj
  218. model.language_model.layers.16.self_attn.k_proj
  219. model.language_model.layers.16.self_attn.v_proj
  220. model.language_model.layers.16.self_attn.o_proj
  221. model.language_model.layers.16.mlp.gate_proj
  222. model.language_model.layers.16.mlp.up_proj
  223. model.language_model.layers.16.mlp.down_proj
  224. model.language_model.layers.17.self_attn.q_proj
  225. model.language_model.layers.17.self_attn.k_proj
  226. model.language_model.layers.17.self_attn.v_proj
  227. model.language_model.layers.17.self_attn.o_proj
  228. model.language_model.layers.17.mlp.gate_proj
  229. model.language_model.layers.17.mlp.up_proj
  230. model.language_model.layers.17.mlp.down_proj
  231. model.language_model.layers.18.self_attn.q_proj
  232. model.language_model.layers.18.self_attn.k_proj
  233. model.language_model.layers.18.self_attn.v_proj
  234. model.language_model.layers.18.self_attn.o_proj
  235. model.language_model.layers.18.mlp.gate_proj
  236. model.language_model.layers.18.mlp.up_proj
  237. model.language_model.layers.18.mlp.down_proj
  238. model.language_model.layers.19.self_attn.q_proj
  239. model.language_model.layers.19.self_attn.k_proj
  240. model.language_model.layers.19.self_attn.v_proj
  241. model.language_model.layers.19.self_attn.o_proj
  242. model.language_model.layers.19.mlp.gate_proj
  243. model.language_model.layers.19.mlp.up_proj
  244. model.language_model.layers.19.mlp.down_proj
  245. model.language_model.layers.20.self_attn.q_proj
  246. model.language_model.layers.20.self_attn.k_proj
  247. model.language_model.layers.20.self_attn.v_proj
  248. model.language_model.layers.20.self_attn.o_proj
  249. model.language_model.layers.20.mlp.gate_proj
  250. model.language_model.layers.20.mlp.up_proj
  251. model.language_model.layers.20.mlp.down_proj
  252. model.language_model.layers.21.self_attn.q_proj
  253. model.language_model.layers.21.self_attn.k_proj
  254. model.language_model.layers.21.self_attn.v_proj
  255. model.language_model.layers.21.self_attn.o_proj
  256. model.language_model.layers.21.mlp.gate_proj
  257. model.language_model.layers.21.mlp.up_proj
  258. model.language_model.layers.21.mlp.down_proj
  259. model.language_model.layers.22.self_attn.q_proj
  260. model.language_model.layers.22.self_attn.k_proj
  261. model.language_model.layers.22.self_attn.v_proj
  262. model.language_model.layers.22.self_attn.o_proj
  263. model.language_model.layers.22.mlp.gate_proj
  264. model.language_model.layers.22.mlp.up_proj
  265. model.language_model.layers.22.mlp.down_proj
  266. model.language_model.layers.23.self_attn.q_proj
  267. model.language_model.layers.23.self_attn.k_proj
  268. model.language_model.layers.23.self_attn.v_proj
  269. model.language_model.layers.23.self_attn.o_proj
  270. model.language_model.layers.23.mlp.gate_proj
  271. model.language_model.layers.23.mlp.up_proj
  272. model.language_model.layers.23.mlp.down_proj
  273. model.language_model.layers.24.self_attn.q_proj
  274. model.language_model.layers.24.self_attn.k_proj
  275. model.language_model.layers.24.self_attn.v_proj
  276. model.language_model.layers.24.self_attn.o_proj
  277. model.language_model.layers.24.mlp.gate_proj
  278. model.language_model.layers.24.mlp.up_proj
  279. model.language_model.layers.24.mlp.down_proj
  280. model.language_model.layers.25.self_attn.q_proj
  281. model.language_model.layers.25.self_attn.k_proj
  282. model.language_model.layers.25.self_attn.v_proj
  283. model.language_model.layers.25.self_attn.o_proj
  284. model.language_model.layers.25.mlp.gate_proj
  285. model.language_model.layers.25.mlp.up_proj
  286. model.language_model.layers.25.mlp.down_proj
  287. model.language_model.layers.26.self_attn.q_proj
  288. model.language_model.layers.26.self_attn.k_proj
  289. model.language_model.layers.26.self_attn.v_proj
  290. model.language_model.layers.26.self_attn.o_proj
  291. model.language_model.layers.26.mlp.gate_proj
  292. model.language_model.layers.26.mlp.up_proj
  293. model.language_model.layers.26.mlp.down_proj
  294. model.language_model.layers.27.self_attn.q_proj
  295. model.language_model.layers.27.self_attn.k_proj
  296. model.language_model.layers.27.self_attn.v_proj
  297. model.language_model.layers.27.self_attn.o_proj
  298. model.language_model.layers.27.mlp.gate_proj
  299. model.language_model.layers.27.mlp.up_proj
  300. model.language_model.layers.27.mlp.down_proj

✗ NON-QUANTIZED LAYERS (171):
    1. model.visual.patch_embed.proj
    2. model.visual.pos_embed
    3. model.visual.rotary_pos_emb
    4. model.visual.blocks.0.norm1
    5. model.visual.blocks.0.norm2
    6. model.visual.blocks.1.norm1
    7. model.visual.blocks.1.norm2
    8. model.visual.blocks.2.norm1
    9. model.visual.blocks.2.norm2
   10. model.visual.blocks.3.norm1
   11. model.visual.blocks.3.norm2
   12. model.visual.blocks.4.norm1
   13. model.visual.blocks.4.norm2
   14. model.visual.blocks.5.norm1
   15. model.visual.blocks.5.norm2
   16. model.visual.blocks.6.norm1
   17. model.visual.blocks.6.norm2
   18. model.visual.blocks.7.norm1
   19. model.visual.blocks.7.norm2
   20. model.visual.blocks.8.norm1
   21. model.visual.blocks.8.norm2
   22. model.visual.blocks.9.norm1
   23. model.visual.blocks.9.norm2
   24. model.visual.blocks.10.norm1
   25. model.visual.blocks.10.norm2
   26. model.visual.blocks.11.norm1
   27. model.visual.blocks.11.norm2
   28. model.visual.blocks.12.norm1
   29. model.visual.blocks.12.norm2
   30. model.visual.blocks.13.norm1
   31. model.visual.blocks.13.norm2
   32. model.visual.blocks.14.norm1
   33. model.visual.blocks.14.norm2
   34. model.visual.blocks.15.norm1
   35. model.visual.blocks.15.norm2
   36. model.visual.blocks.16.norm1
   37. model.visual.blocks.16.norm2
   38. model.visual.blocks.17.norm1
   39. model.visual.blocks.17.norm2
   40. model.visual.blocks.18.norm1
   41. model.visual.blocks.18.norm2
   42. model.visual.blocks.19.norm1
   43. model.visual.blocks.19.norm2
   44. model.visual.blocks.20.norm1
   45. model.visual.blocks.20.norm2
   46. model.visual.blocks.21.norm1
   47. model.visual.blocks.21.norm2
   48. model.visual.blocks.22.norm1
   49. model.visual.blocks.22.norm2
   50. model.visual.blocks.23.norm1
   51. model.visual.blocks.23.norm2
   52. model.visual.merger.norm
   53. model.visual.deepstack_merger_list.0.norm
   54. model.visual.deepstack_merger_list.1.norm
   55. model.visual.deepstack_merger_list.2.norm
   56. model.language_model.embed_tokens
   57. model.language_model.layers.0.self_attn.q_norm
   58. model.language_model.layers.0.self_attn.k_norm
   59. model.language_model.layers.0.input_layernorm
   60. model.language_model.layers.0.post_attention_layernorm
   61. model.language_model.layers.1.self_attn.q_norm
   62. model.language_model.layers.1.self_attn.k_norm
   63. model.language_model.layers.1.input_layernorm
   64. model.language_model.layers.1.post_attention_layernorm
   65. model.language_model.layers.2.self_attn.q_norm
   66. model.language_model.layers.2.self_attn.k_norm
   67. model.language_model.layers.2.input_layernorm
   68. model.language_model.layers.2.post_attention_layernorm
   69. model.language_model.layers.3.self_attn.q_norm
   70. model.language_model.layers.3.self_attn.k_norm
   71. model.language_model.layers.3.input_layernorm
   72. model.language_model.layers.3.post_attention_layernorm
   73. model.language_model.layers.4.self_attn.q_norm
   74. model.language_model.layers.4.self_attn.k_norm
   75. model.language_model.layers.4.input_layernorm
   76. model.language_model.layers.4.post_attention_layernorm
   77. model.language_model.layers.5.self_attn.q_norm
   78. model.language_model.layers.5.self_attn.k_norm
   79. model.language_model.layers.5.input_layernorm
   80. model.language_model.layers.5.post_attention_layernorm
   81. model.language_model.layers.6.self_attn.q_norm
   82. model.language_model.layers.6.self_attn.k_norm
   83. model.language_model.layers.6.input_layernorm
   84. model.language_model.layers.6.post_attention_layernorm
   85. model.language_model.layers.7.self_attn.q_norm
   86. model.language_model.layers.7.self_attn.k_norm
   87. model.language_model.layers.7.input_layernorm
   88. model.language_model.layers.7.post_attention_layernorm
   89. model.language_model.layers.8.self_attn.q_norm
   90. model.language_model.layers.8.self_attn.k_norm
   91. model.language_model.layers.8.input_layernorm
   92. model.language_model.layers.8.post_attention_layernorm
   93. model.language_model.layers.9.self_attn.q_norm
   94. model.language_model.layers.9.self_attn.k_norm
   95. model.language_model.layers.9.input_layernorm
   96. model.language_model.layers.9.post_attention_layernorm
   97. model.language_model.layers.10.self_attn.q_norm
   98. model.language_model.layers.10.self_attn.k_norm
   99. model.language_model.layers.10.input_layernorm
  100. model.language_model.layers.10.post_attention_layernorm
  101. model.language_model.layers.11.self_attn.q_norm
  102. model.language_model.layers.11.self_attn.k_norm
  103. model.language_model.layers.11.input_layernorm
  104. model.language_model.layers.11.post_attention_layernorm
  105. model.language_model.layers.12.self_attn.q_norm
  106. model.language_model.layers.12.self_attn.k_norm
  107. model.language_model.layers.12.input_layernorm
  108. model.language_model.layers.12.post_attention_layernorm
  109. model.language_model.layers.13.self_attn.q_norm
  110. model.language_model.layers.13.self_attn.k_norm
  111. model.language_model.layers.13.input_layernorm
  112. model.language_model.layers.13.post_attention_layernorm
  113. model.language_model.layers.14.self_attn.q_norm
  114. model.language_model.layers.14.self_attn.k_norm
  115. model.language_model.layers.14.input_layernorm
  116. model.language_model.layers.14.post_attention_layernorm
  117. model.language_model.layers.15.self_attn.q_norm
  118. model.language_model.layers.15.self_attn.k_norm
  119. model.language_model.layers.15.input_layernorm
  120. model.language_model.layers.15.post_attention_layernorm
  121. model.language_model.layers.16.self_attn.q_norm
  122. model.language_model.layers.16.self_attn.k_norm
  123. model.language_model.layers.16.input_layernorm
  124. model.language_model.layers.16.post_attention_layernorm
  125. model.language_model.layers.17.self_attn.q_norm
  126. model.language_model.layers.17.self_attn.k_norm
  127. model.language_model.layers.17.input_layernorm
  128. model.language_model.layers.17.post_attention_layernorm
  129. model.language_model.layers.18.self_attn.q_norm
  130. model.language_model.layers.18.self_attn.k_norm
  131. model.language_model.layers.18.input_layernorm
  132. model.language_model.layers.18.post_attention_layernorm
  133. model.language_model.layers.19.self_attn.q_norm
  134. model.language_model.layers.19.self_attn.k_norm
  135. model.language_model.layers.19.input_layernorm
  136. model.language_model.layers.19.post_attention_layernorm
  137. model.language_model.layers.20.self_attn.q_norm
  138. model.language_model.layers.20.self_attn.k_norm
  139. model.language_model.layers.20.input_layernorm
  140. model.language_model.layers.20.post_attention_layernorm
  141. model.language_model.layers.21.self_attn.q_norm
  142. model.language_model.layers.21.self_attn.k_norm
  143. model.language_model.layers.21.input_layernorm
  144. model.language_model.layers.21.post_attention_layernorm
  145. model.language_model.layers.22.self_attn.q_norm
  146. model.language_model.layers.22.self_attn.k_norm
  147. model.language_model.layers.22.input_layernorm
  148. model.language_model.layers.22.post_attention_layernorm
  149. model.language_model.layers.23.self_attn.q_norm
  150. model.language_model.layers.23.self_attn.k_norm
  151. model.language_model.layers.23.input_layernorm
  152. model.language_model.layers.23.post_attention_layernorm
  153. model.language_model.layers.24.self_attn.q_norm
  154. model.language_model.layers.24.self_attn.k_norm
  155. model.language_model.layers.24.input_layernorm
  156. model.language_model.layers.24.post_attention_layernorm
  157. model.language_model.layers.25.self_attn.q_norm
  158. model.language_model.layers.25.self_attn.k_norm
  159. model.language_model.layers.25.input_layernorm
  160. model.language_model.layers.25.post_attention_layernorm
  161. model.language_model.layers.26.self_attn.q_norm
  162. model.language_model.layers.26.self_attn.k_norm
  163. model.language_model.layers.26.input_layernorm
  164. model.language_model.layers.26.post_attention_layernorm
  165. model.language_model.layers.27.self_attn.q_norm
  166. model.language_model.layers.27.self_attn.k_norm
  167. model.language_model.layers.27.input_layernorm
  168. model.language_model.layers.27.post_attention_layernorm
  169. model.language_model.norm
  170. model.language_model.rotary_emb
  171. lm_head
==========================================================

@wsbagnsv1
Copy link
Copy Markdown
Author

okay tried it with some more complex arch like ovis2.5, which needs nested registrations, that still doesnt work, though i will add support for that tomorrow

@wsbagnsv1
Copy link
Copy Markdown
Author

Okay i found an issue with the current implementation in this pr and fixed the nested registration stuff, ill update the code tomorrow, because there is a LOT of debug stuff in it 😅
But ovis2.5 with weird tensors now works without any issues and quantizes everything (;
So i think this should cover basically all transformers models that have a working configuration even those that dont have support in the transformers lib

@wsbagnsv1
Copy link
Copy Markdown
Author

Okay found a weird error that i have no idea how to fix rn, when i load non gemlite quantized models it works with gemlite and without it just fine. Though the moment i try to load one that was quantized with gemlite it gives me some weird

Traceback (most recent call last):
  File "F:\SINQ\inference_ovis.py", line 350, in 
    run_inference(args.model_dir, args.image, args.prompt, args.device, args.stream)
  File "F:\SINQ\inference_ovis.py", line 131, in run_inference
    model = AutoSINQHFModel.from_quantized_safetensors(
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "f:\sinq\sinq\patch_model.py", line 1474, in from_quantized_safetensors
    cls.patch_model(model, _load_module, _load_module, {k: None for k in model.linear_tags})
  File "f:\sinq\sinq\patch_model.py", line 548, in patch_model
    cls.patch_linearlayers(model, patch_linear_fct, patch_params, verbose=verbose)
  File "f:\sinq\sinq\patch_model.py", line 486, in patch_linearlayers
    patch_fct(tmp_mapping[name], patch_param),
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "G:\python\Lib\site-packages\torch\utils\_contextlib.py", line 120, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "f:\sinq\sinq\patch_model.py", line 1468, in _load_module
    t = tensor.to(device=device, non_blocking=True)
        ^^^^^^^^^
AttributeError: 'dict' object has no attribute 'to'

@wsbagnsv1
Copy link
Copy Markdown
Author

ill create a custom branch with the prototype code in my fork

@wsbagnsv1
Copy link
Copy Markdown
Author

n gemlite quantized models it works with gemlite and without it just fine. Though the moment i try to load one that was quantized with gemlite it gives me some weird

Traceback (most recent call last):

Well i have no idea why that happens, ive tried everything, the only thing to prevent this is to disable gemlite for quantization, when its done with the normal method everything works fine and the model can also be loaded with gemlite enabled without issues. Its not only custom models like ovis that are effected in my code, but normal llms too?

@wsbagnsv1
Copy link
Copy Markdown
Author

The relevant code is in here "https://github.com/wsbagnsv1/SINQ/tree/prototype" but its as ive said not cleaned up and might have redundant code and debug prints

@philippebich
Copy link
Copy Markdown
Collaborator

At the moment we have an issue with save/reload with gemlite as you pointed out. We are trying to solve. The new commit automatically avoid gemlite. Sorry for this. We will try to fix asap.

@wsbagnsv1
Copy link
Copy Markdown
Author

At the moment we have an issue with save/reload with gemlite as you pointed out. We are trying to solve. The new commit automatically avoid gemlite. Sorry for this. We will try to fix asap.

Good to hear its not me 😅
Gonna clean up my prototype code an add it to this pr then (;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants