Hey,
I am very exited about the possibilities SINQ seems to offer.
When trying it out, I found that the quantization itself works easily using the provided example code. But for my use case, I need to be able to use the quantized models with ollama. Running ollama create with the produced safetensors files is no good, this seems to remove the quantization and uses too much memory.
Converting to gguf using llama.cpp tooling doesn't seem to be supported, either. Is this what is meant by 'We’re actively working to add support for popular frameworks such as vLLM, SGLang, and llama.cpp' in the Readme? If so, that would be super great!
Anyway, thanks for providing this repo. I am curious where it will go and I hope it will be successful and be able to integrate with existing tools.
Hey,
I am very exited about the possibilities SINQ seems to offer.
When trying it out, I found that the quantization itself works easily using the provided example code. But for my use case, I need to be able to use the quantized models with ollama. Running
ollama createwith the produced safetensors files is no good, this seems to remove the quantization and uses too much memory.Converting to gguf using llama.cpp tooling doesn't seem to be supported, either. Is this what is meant by 'We’re actively working to add support for popular frameworks such as vLLM, SGLang, and llama.cpp' in the Readme? If so, that would be super great!
Anyway, thanks for providing this repo. I am curious where it will go and I hope it will be successful and be able to integrate with existing tools.