Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ollama UX Needs Improvement #54

Closed
LumiWasTaken opened this issue Dec 3, 2024 · 13 comments · Fixed by #57
Closed

Ollama UX Needs Improvement #54

LumiWasTaken opened this issue Dec 3, 2024 · 13 comments · Fixed by #57
Labels
enhancement New feature or request

Comments

@LumiWasTaken
Copy link

Using:
https://github.com/Kiln-AI/Kiln/actions/runs/12130985759

Backend:
Both normal Ollama and KoboldCPP Ollama Compatible API

Issue:
image
No model found.

@LumiWasTaken
Copy link
Author

I dug a bit deeper and it seems to be a Fundamental issue with:

Ollama running, but no models available

it does not properly fetch the /api/tags model list. Sadly I could with my effort not properly debug the "Why"

@scosman
Copy link
Collaborator

scosman commented Dec 3, 2024

interesting. What does a curl http://localhost:11434/api/tags return?

Do you have one of the supported Ollama models installed. llama3.1:8b, llama3.1:70b, llama3.1:405b, mistral-large, llama3.2:1b, llama3.2, lama3.2-vision, llama3.2-vision:90b, phi3.5, gemma2:2b, gemma2:9b, gemma2:27b

Also: note the (i) info link in the screenshot - fewer models are supported for structured output. Smaller models can't produce it reliably, so they are filtered out. It might be working for task with unstructured output, but filtered for tasks with structured output.

UI improvements I should make here:

  • UI could be better at giving you a list of supported Ollama models.
  • Dropdown should be better at showing installed but unsupported models for structured tasks.

And potentially the root cause for why /api/tags is failing to parse

@LumiWasTaken
Copy link
Author

Oh- i did not know there is a specific list of "supported" models as i use custom loaded models via gguf since mine have a slightly altered finedtune.

may i ask why you added a list of "supported" models... given that technically people who'd want to create synthetic data know what kind of model they should choose?

@LumiWasTaken
Copy link
Author

the /api/tags endpoint returns the normal
{"models":[{"name":"dagbs/qwen2.5-coder-14b-instruct-abliterated:q4_k_m","model":"dagbs/qwen2.5-coder-14b-instruct-abliterated:q4_k_m","modified_at":"2024-12-03T14:08:10.388678503+01:00","size":8988113261,"digest":"c729c6635cd6ed6d14ee6d5d272d7c3930cb0cb62303f2bd1c4cba65f8f47fa1","details":{"parent_model":"","format":"gguf","family":"qwen2","families":["qwen2"],"parameter_size":"14.8B","quantization_level":"Q4_K_M"}}]}

@scosman
Copy link
Collaborator

scosman commented Dec 3, 2024

yeah - that's by design (if the design is good is another question). I want the out of box experience to be great so I test every feature with every model in a big matrix. But you should be able to override it.

Try one of the supported models for now.

I'll keep this issue for improving the UX around Ollama. Issues exist already for the other two:

@scosman scosman changed the title Bug: Ollama API - Model Selector Box Empty Ollama UX Needs Improvement Dec 3, 2024
@scosman scosman added the enhancement New feature or request label Dec 4, 2024
@scosman
Copy link
Collaborator

scosman commented Dec 4, 2024

@LumiWasTaken fixed this. Grab a updated version from a main build here: https://github.com/Kiln-AI/Kiln/actions/workflows/build_desktop.yml

If you have time to confirm it works for you that would be great!

@LumiWasTaken
Copy link
Author

@scosman

I've tested the new version.
Ollama STILL does not Connect UNLESS you have at least 1 Supported Model Installed.

While this is a valid approach, consider making the error message more clear by actively stating that unsupported models have been found, and prompt the user to install at least one Supported model or allow to just connect directly but keep the "Supported" Tab instead of empty, with one Place-Holder that says "No Supported Models installed" to highlight that all of them are untested

While i was writing this message, i installed llama3.2 to debug and saw that it's unsupported / structured output fails...
Maybe don't use that one as an example then in your Ollama prompt >.<

Additional Bug i found:
If the model name contains a / in the model name (from the ollama user repo it has the name of username/modelname in a HF style), it will only use anything pre-slash. So Lumi/llama-something resulting in a fail "Lumi is not installed"

Why do the Models require "tools" support? Could you highlight that one for me? If it's about the JSON output, their API supports "format": "json"

Additional Note:
I saw that the "joke" example has
{
"punchline": "",
"setup": " "
}

The "Raw Output" Represents that too.
Why would the Punchline come before the setup? Or am i missing a critical thing here...
The order of the JSON output determines the path it takes e.g.
If you ask a model to respind in a {thoughts: string, response: string} order, it will first think, then speak.
Swap that around then it's thinking will rely on what response it gave before.

Wouldn't it make more sense to swap it around as the generation is Left to right / top to bottom?

Sorry for this long winded response! I hope this gave some insight into my thoughts as i'm eating my dinner

@LumiWasTaken
Copy link
Author

Another extension to my previous post.

I have also noticed, if you use a quanted pull e.g. using:
llama3.2:1b-instruct-q3_K_M
or even
llama3.2:3b-instruct-q3_K_M

Kiln will not recognize it UNLESS you specifically pull only the llama3.2 with no other extension.

That's pretty bad if you do not support for quants.

The API responds with:
"models":[{"name":"llama3.2:3b-instruct-q3_K_M","model":"llama3.2:3b-instruct-q3_K_M","modified_at":"2024-12-04T20:23:00.519908513+01:00","size":1687174821,"digest":"fea4b467793058703bce1eaa2ffa09cb236f530e01476c9b0c2a486f6a2e2920","details":{"parent_model":"","format":"gguf","family":"llama","families":["llama"],"parameter_size":"3.2B","quantization_level":"Q3_K_M"}}

So one way to check it with is somewhere here
(I am not too familliar with it)

provider.provider_options["model"]

is to Split it by : and use the [0] value in that array so.
llama3.2:3b-instruct-q3_K_M
is deconstructed into
llama3.2 and 3b-instruct-q3_K_M

ADDITIONALLY you could compare the model name (pre : ) with a "startwith" or even better an "contains" rather than full text comparison.
This would allow models like:
dagbs-qwen2.5-coder-14b-instruct-abliterated:q4_k_m (i replaced the / with a - here given the previously mentioned / issue
to be counted as qwen2.5-coder-14b based while ignoring it's being an abliterated modification.

@scosman
Copy link
Collaborator

scosman commented Dec 4, 2024

fixed the issue requiring at least 1 supported model just now. Good call. The messaging/UI should reflect what you're asking for now. It was setup like that, just missed the zero supported model case in tests.

Also fixed the issue with slashes. Literally had a TODO to fix that in code, just missed it 🤦

Great bug reports, keep them coming!

re:tools. The prompt should be pointing you to llama 3.1 now (which supports tools). Although the docs say it should work on 3.2 as well so not sure why it's erroring. I'll look at json mode + Ollama soon which might let me expand the structured output support. Trying to make a bunch of providers play nice is no fun. For now we test, and flag them as unlikely to work when there are known compatibility issues. You should see a big warning in the UI under the model selector.

re:joke example - JSON dictionaries don't have order so they might render in any order once parsed. The rendering you're seeing is after parsing (including "raw"). In general, I've seen most models are smart enough to write the setup first and punchline second, but if they aren't that's just a model problem. 1B param models might not get it right. Let me know if I missed something.

re:String matching - I really hate Ollama IDs. I can't just match the prefix, because "llama3.2" != "llama3.2:1b" (the first is 3b). However sometimes 2 IDs map to same model "llama3.2" == "llama3.2:latest". However, I think the current is working as I want it to. I only want it to match if that exact version (including quants) has passed the tests for structured output and data generation. I don't want to treat q4 like fp16 or even fp16 like fpb16. Your specific quants should appear in the untested section and work fine if selected (let me know if they don't, that would be a bug). They shouldn't appear in the "tested" list since those quants really aren't tested.

@LumiWasTaken
Copy link
Author

The Quants actually did not appear in the Tested section. Your idea was working as suggested.
BUT i downloaded a llama 3.2 version that was a specific quant ergo: i had the issue that it kept triggering the "you need at least one supported model"

Regarding the string matching, maybe you are right... maybe there is also a combo of things to check, like general matching if ids match OR if the name contains "llama3.2" AND "3b" or some weird chains.

of course this might be an issue for a later "polishing" state where it could slowly work into the "convenience" features. I don't know if you plan on re-factoring code some day. As i was exploring these issues i slowly managed to get myself a overview of the structuring and found that some files e.g. with the list for the Ai models might not be very "long-term-scaleable"

Regarding the JSON / Tools i am not too sure why the models still need tool support but maybe that's for the JSON compatibility which would be logical.
The JSON "order" itsself is somewhat important as using the "thinking" example above it severely impacts how one or the other object is perceived. Another example would be a "criteria evaluation" where the JSON has an
criteria 1, 2,3,4,5 etc. and each one explains if it matches or not with the "summary" and "verdict" being at the end.
If the verdict would be done at the start the criterias would be created under the bias that the verdict has already been made or even do contradicting verdicts.

Other than that, thank you for the feedback. if i see more issues or things that bug me i'll make sure to open a seperate Issue to keep things on-topic

@LumiWasTaken
Copy link
Author

If you'd like maybe a bit more in-depth theorizing or experimenting, feel free to let me know, we could chat on discord or something!

@scosman
Copy link
Collaborator

scosman commented Dec 5, 2024

Just to wrap this, I think everything is fixed? Please confirm I'm not missing anything.

  • Quants now work fine. They appear in untested section, but they are untested so that's correct.
  • Fixed several bugs, including the zero supported model bug and forward slash bug (if you can confirm that would be great)
  • ml_model_list file cleaned up. I pulled out the code into two files. This file is basically a long config file now, but that's needed for a "just works" UX that isn't a thousand options.

JSON vs tools is going to be a forever problem. Some models are better at one, and others at another. We'll bake in guidance for the built in models. When I add the ability to add any model, we'll have an option for this. But there isn't a universal solution, and there will probably be more modes in time. For Kiln default install I want to focus on ease of use, with lots of options in config files under the covers for power users (not documented yet, but I'll get there).

@LumiWasTaken
Copy link
Author

Yup works!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants