Using with Ollama and LLama.cpp

Hey,
I am very exited about the possibilities SINQ seems to offer.
When trying it out, I found that the quantization itself works easily using the provided example code. But for my use case, I need to be able to use the quantized models with ollama. Running `ollama create` with the produced safetensors files is no good, this seems to remove the quantization and uses too much memory.
Converting to gguf using llama.cpp tooling doesn't seem to be supported, either. Is this what is meant by 'We’re actively working to add support for popular frameworks such as vLLM, SGLang, and llama.cpp' in the Readme? If so, that would be super great!

Anyway, thanks for providing this repo. I am curious where it will go and I hope it will be successful and be able to integrate with existing tools.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using with Ollama and LLama.cpp #9

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Using with Ollama and LLama.cpp #9

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions