Skip to content

Using with Ollama and LLama.cpp #9

@VanGorg

Description

@VanGorg

Hey,
I am very exited about the possibilities SINQ seems to offer.
When trying it out, I found that the quantization itself works easily using the provided example code. But for my use case, I need to be able to use the quantized models with ollama. Running ollama create with the produced safetensors files is no good, this seems to remove the quantization and uses too much memory.
Converting to gguf using llama.cpp tooling doesn't seem to be supported, either. Is this what is meant by 'We’re actively working to add support for popular frameworks such as vLLM, SGLang, and llama.cpp' in the Readme? If so, that would be super great!

Anyway, thanks for providing this repo. I am curious where it will go and I hope it will be successful and be able to integrate with existing tools.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request
No fields configured for Feature.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions