Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Errors installing: AttributeError: 'str' object has no attribute '_name_or_path' #4

Open
mattdeeperinsights opened this issue Aug 25, 2023 · 4 comments

Comments

@mattdeeperinsights
Copy link

mattdeeperinsights commented Aug 25, 2023

Thanks for this, looking forward to getting stuck in, there's just some teething problems to get it all installed.

Issues:

  1. Missing sentencepiece requirement
  2. Error loading model from config

...

  1. Update requirements

I needed to run pip install sentencepiece so I think you need to update your requirements.txt to include that.

  1. Loading model

I had to update this line to:

base_model = AutoModelForCausalLM.from_pretrained(base_model_path, torch_dtype=torch.float16)

i.e. from AutoModelForCausalLM.from_config to AutoModelForCausalLM.from_pretrained because otherwise it tries to run base_model_path._name_or_path i.e. 'abacusai/Giraffe-v2-13b-32k'._name_or_path

Also, I think that this line is an error because delta_model is not defined as delta_model_path=None by default.

Maybe the function should be updated to reflect that:

def load_model(base_model_path: str, delta_model_path: str = None, **patch_args):
    '''Helper to load a model and patch it to support a longer context.
    
    For example to load Giraffe V2 with its trained scale:
    ```python
    model = load_model('abacusai/Giraffe-v2-13b-32k', scale=8)
    ```

    To load a delta model you need the original llama v1 weights available:
    ```python
    model = load_model('abacusai/Giraffe-v1-delta-13b-scaled-4', 'path/to/llama-13b', scale=4)

    See `ScaledLlamaRotaryEmbedding.patch` for information on additional arguments.
    '''
    from .interpolate import ScaledLlamaRotaryEmbedding
    import torch
    from transformers import AutoModelForCausalLM

    base_model = AutoModelForCausalLM.from_pretrained(
        base_model_path,
        torch_dtype=torch.float16
    )
    
    if delta_model_path is None:
        ScaledLlamaRotaryEmbedding.patch(base_model, **patch_args)
        return base_model
    
    else:
        delta_model = AutoModelForCausalLM.from_config(delta_model_path, torch_dtype=torch.float16)
        for name, param in base_model.named_parameters():
            delta_param = delta_model.get_parameter(name)
            assert delta_param.shape == param.shape
            delta_param += param

        ScaledLlamaRotaryEmbedding.patch(delta_model, **patch_args)
        return delta_model
@mattdeeperinsights
Copy link
Author

FYI of how I am trying to import the code:

Run in terminal to get your code from git and local import it:

git clone https://github.com/abacusai/Long-Context.git
pip install -r ./Long-Context/python/requirements.txt
pip install sentencepiece
cp -rp ./Long-Context/python/* ./
rm -r ./Long-Context

Edit the function in models/__init__.py I mentioned below.

Then in a local script I can run:

from models import load_model, load_tokenizer
tokenizer = load_tokenizer()
model = load_model('abacusai/Giraffe-v2-13b-32k', scale=8)

but I'm getting memory error right now after succesfully downloading

@sshh12
Copy link

sshh12 commented Aug 28, 2023

@mattdeeperinsights just confirming that what you posted works for me.

from models import load_model, load_tokenizer
tokenizer = load_tokenizer()
model = load_model('abacusai/Giraffe-v2-13b-32k', scale=8)
model.to('cuda')

prompt = "Question: What is 2 + 2? Answer: "
inputs = tokenizer(prompt, return_tensors="pt").to('cuda')

generate_ids = model.generate(inputs.input_ids, max_new_tokens=50)[0]
output_str = tokenizer.batch_decode([generate_ids], skip_special_tokens=True)[0]

Seeing 38GB VRAM usage.

@mattdeeperinsights
Copy link
Author

@sshh12 good to hear! Can you confirm that it works for long contexts up to 32,000 tokens?

@sshh12
Copy link

sshh12 commented Aug 30, 2023

Good question -- looks like it doesn't on the A100, tested w/23k tokens:

CUDA out of memory. Tried to allocate 41.85 GiB (GPU 0; 79.10 GiB total capacity; 28.88 GiB already allocated; 6.43 GiB free; 71.20 GiB reserved in total by PyTorch)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants