Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error with gguf conversion. #1416

Open
StoryHack opened this issue Dec 12, 2024 · 8 comments
Open

Error with gguf conversion. #1416

StoryHack opened this issue Dec 12, 2024 · 8 comments
Labels
currently fixing Am fixing now!

Comments

@StoryHack
Copy link

StoryHack commented Dec 12, 2024

Here's what I get while trying to quantize my latest attempt at finetuning.

'---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
Cell In[12], line 12
9 if False: model.push_to_hub_gguf("hf/model", tokenizer, quantization_method = "f16", token = "")
11 # Save to q4_k_m GGUF
---> 12 if True: model.save_pretrained_gguf("fictions", tokenizer, quantization_method = "q5_k")
13 if False: model.push_to_hub_gguf("hf/model", tokenizer, quantization_method = "q4_k_m", token = "")
15 # Save to multiple GGUF options - much faster if you want multiple!

File /usr/local/lib/python3.11/dist-packages/unsloth/save.py:1734, in unsloth_save_pretrained_gguf(self, save_directory, tokenizer, quantization_method, first_conversion, push_to_hub, token, private, is_main_process, state_dict, save_function, max_shard_size, safe_serialization, variant, save_peft_format, tags, temporary_location, maximum_memory_usage)
1731 is_sentencepiece_model = check_if_sentencepiece_model(self)
1733 # Save to GGUF
-> 1734 all_file_locations, want_full_precision = save_to_gguf(
1735 model_type, model_dtype, is_sentencepiece_model,
1736 new_save_directory, quantization_method, first_conversion, makefile,
1737 )
1739 # Save Ollama modelfile
1740 modelfile = create_ollama_modelfile(tokenizer, all_file_locations[0])

File /usr/local/lib/python3.11/dist-packages/unsloth/save.py:1069, in save_to_gguf(model_type, model_dtype, is_sentencepiece, model_directory, quantization_method, first_conversion, _run_installer)
1067 quantize_location = "llama.cpp/llama-quantize"
1068 else:
-> 1069 raise RuntimeError(
1070 "Unsloth: The file 'llama.cpp/llama-quantize' or 'llama.cpp/quantize' does not exist.\n"
1071 "But we expect this file to exist! Maybe the llama.cpp developers changed the name?"
1072 )
1073 pass
1075 # See #730
1076 # Filenames changed again!

RuntimeError: Unsloth: The file 'llama.cpp/llama-quantize' or 'llama.cpp/quantize' does not exist.
But we expect this file to exist! Maybe the llama.cpp developers changed the name?'

@danielhanchen danielhanchen added the currently fixing Am fixing now! label Dec 12, 2024
@danielhanchen
Copy link
Contributor

I'm trying to add a new method which should make GGUF conversions easier - was planning to add it in today, but it's more complicated than I expected - it'll come out by EOW hopefully!

In the meantime, use model.save_pretrained_merged and don't do GGUF. Then convert to GGUF via https://huggingface.co/spaces/ggml-org/gguf-my-repo or manually in the meantime through https://github.com/ggerganov/llama.cpp/blob/master/docs/build.md

@jainpradeep
Copy link

Build tutorial says that we need to build llama.cpp first. Then these files shall get generated. I used cmake but files were not getting generated. Nothing I tried worked. Then I used the python files in llama.cpp folder to convert the model to gguf manually..

python llama.cpp/convert_hf_to_gguf.py "C:\Users\\Desktop\New folder\lora_model" --outfile "C:\Users\\Desktop\New folder\op" --outtype f16

Atleast model files are getting generated. But I am unable to convert these model files to use in ollama..

ollama create unsloth_m -f "C:\Users\wrpladmin\Desktop\New folder\op"

@danielhanchen
Copy link
Contributor

@jainpradeep Did you create a Modelfile?

@jainpradeep
Copy link

@danielhanchen yes sir.
I could convert the model into GGUF format manually.
But could not create ollama model from the GGUF model
ollama create unsloth_m -f "C:\Users\wrpladmin\Desktop\New folder\op"
was throwing error.

I fixed the issue following the link
But after running ollama model now I get following error
Error: llama runner process has terminated: error loading model: error loading model vocabulary: cannot find tokenizer merges in model file

@jhangmez
Copy link

@danielhanchen any update? or is it like #1376 ?

@shimmyshimmer
Copy link
Collaborator

@danielhanchen any update? or is it like #1376 ?

Still working on it!

@nctu6
Copy link

nctu6 commented Dec 23, 2024

On Windows, the executable will be *.exe,
e.g. 'llama.cpp/llama-quantize.exe' or 'llama.cpp/quantize.exe'
The file name strings in save.py are hardcoded, so error happened on Windows platform.

@nctu6
Copy link

nctu6 commented Dec 23, 2024

Besides, the file name "convert-hf-to-gguf.py" is wrong in save.py.
The correct string should be convert_hf_to_gguf.py.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
currently fixing Am fixing now!
Projects
None yet
Development

No branches or pull requests

6 participants