Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Newbie trying cake on Llama3.2 issue with missing lm_head.weight #38

Open
stevef1uk opened this issue Dec 7, 2024 · 2 comments
Open

Comments

@stevef1uk
Copy link

Hi,

I am hoping this project may enable me to try inference across a PC (Windows) with NVIDIA 4070 Super GPU and a Mac mini (M2 Pro).

I used:

 huggingface-cli download meta-llama/Llama-3.2-1B --local-dir  ./Llama-3.2-1B 

then:

bash-3.2$ target/release/cake-cli --model  ~/Llama-3.2-1B/ --api 0.0.0.0:8080  
[2024-12-07T21:23:11Z INFO ] [Master] dtype=F16 device=Metal(MetalDevice(DeviceId(1))) mem=10.2 MiB
[2024-12-07T21:23:11Z WARN ] no topology file specified, the entire model will be loaded
[2024-12-07T21:23:11Z INFO ] loading configuration from /Users/stevef/Llama-3.2-1B/config.json
[2024-12-07T21:23:11Z INFO ] loading tensors from model.safetensors ...
[2024-12-07T21:23:11Z INFO ] loading embeddings ...
[2024-12-07T21:23:12Z INFO ] loading lm_head ...
Error: cannot find tensor lm_head.weight

There is no lm_head.weight file in the hugging face model files repo so what do I need to do?

Regards

@stevef1uk stevef1uk changed the title Newbie trying cake on Llama3.2 issue with Newbie trying cake on Llama3.2 issue with missing lm_head.weight Dec 7, 2024
@doraemoncandy
Copy link

I also try this. I found when I try Meta-Llama-3-8B with file model.safetensors.index.json and not do split model. then I can start the node successfully. but Llama-3.2-1B do not have this file..

@parthdev99
Copy link

parthdev99 commented Dec 12, 2024

The workaround i sound is replacing then lm head with model.embed_tokens tensor as 1b and 3b models doesnt have lm head tensor in there,
to work with this replace this peice of code in your llama.rs files

let lm_head = linear(
          config.hidden_size,
          config.vocab_size,
          var_builder.pp("lm_head"),
 )?;

with this

let lm_head = match linear(
           config.hidden_size,
           config.vocab_size,
           var_builder.pp("lm_head"),
       ) {
           Ok(head) => Some(head),
           Err(e) => {
               log::warn!("Could not load lm_head: {}. Attempting alternative approaches.", e);
               
               let alternative_paths = vec![
                   "model.embed_tokens",
                   "model.embed_tokens.weight",
                   "embed_tokens",
               ];
               
               alternative_paths.into_iter()
                   .find_map(|path| 
                       linear(
                           config.hidden_size,
                           config.vocab_size,
                           var_builder.pp(path),
                       ).ok()
                   )
           }
       };

       // Use .expect() or handle the None case more explicitly
       let lm_head = lm_head.expect("Could not load lm_head after trying alternatives");

It will work

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants