Throwing: AttributeError: PreTrainedTokenizerFast has no attribute _pad_token. Did you mean: '_add_tokens' at runtime #1917

l0r3zz · 2025-01-05T19:21:20Z

Hello all,
Running on an

Core™ i7-11800H @ 2.30GHz × 16
NVIDIA GeForce RTX 3070 Laptop GPU/PCIe/SSE2 / NVIDIA Corporation GA104M
Memory :64GiB
OS: Pop!_OS 22.04 LTS

Using startup...
python generate.py
--base_model=h2oai/h2ogpt-gm-oasst1-en-2048-falcon-7b-v3
--score_model=None
--prompt_type=human_bot
--cli=True
--gradio_offline_level=1
--load4bit=True

current repo version:
base) l0r3zz@tarnover:[2025-01-05 11:18:40]-$git log -1
commit a0fcc33 (HEAD -> main, origin/main, origin/HEAD)
Author: Jonathan C. McKinney [email protected]
Date: Tue Dec 3 23:58:28 2024 -0800

(Got here after watching): https://youtu.be/Coj72EzmX20?si=ofBAsNACnB7JAKe7

got through all the build issues, but after startup, and the printing of:
Enter an instruction:

It blows up no matter what I enter...

(base) l0r3zz@tarnover:[2025-01-04 07:52:56]-$./model.sh
Must install langchain for transcription, disabling
Using Model h2oai/h2ogpt-gm-oasst1-en-2048-falcon-7b-v3
Must install langchain for preloading embedding model, disabling
Must install DocTR and LangChain installed if enabled DocTR, disabling
Starting get_model: h2oai/h2ogpt-gm-oasst1-en-2048-falcon-7b-v3
Could not determine --max_seq_len, setting to 4096. Pass if not correct
/home/l0r3zz/.local/lib/python3.12/site-packages/huggingface_hub/file_download.py:795: FutureWarning: resume_download is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use force_download=True.
warnings.warn(
Could not determine --max_seq_len, setting to 4096. Pass if not correct
Could not determine --max_seq_len, setting to 4096. Pass if not correct
device_map: {'': 0}
Starting get_model: h2oai/h2ogpt-gm-oasst1-en-2048-falcon-7b-v3
Could not determine --max_seq_len, setting to 4096. Pass if not correct
Could not determine --max_seq_len, setting to 4096. Pass if not correct
Could not determine --max_seq_len, setting to 4096. Pass if not correct
device_map: {'': 0}
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:03<00:00, 1.62s/it]

Enter an instruction: Hello World
Traceback (most recent call last):
File "/home/l0r3zz/github/h2ogpt/generate.py", line 20, in
entrypoint_main()
File "/home/l0r3zz/github/h2ogpt/generate.py", line 16, in entrypoint_main
H2O_Fire(main)
File "/home/l0r3zz/github/h2ogpt/src/utils.py", line 79, in H2O_Fire
fire.Fire(component=component, command=args)
File "/home/l0r3zz/.local/lib/python3.12/site-packages/fire/core.py", line 135, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/l0r3zz/.local/lib/python3.12/site-packages/fire/core.py", line 468, in _Fire
component, remaining_args = _CallAndUpdateTrace(
^^^^^^^^^^^^^^^^^^^^
File "/home/l0r3zz/.local/lib/python3.12/site-packages/fire/core.py", line 684, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^
File "/home/l0r3zz/github/h2ogpt/src/gen.py", line 2430, in main
return run_cli(**get_kwargs(run_cli, **local_kwargs))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/l0r3zz/github/h2ogpt/src/cli.py", line 226, in run_cli
for gen_output in gener:
^^^^^
File "/home/l0r3zz/github/h2ogpt/src/gen.py", line 4165, in evaluate
stopping_criteria = get_stopping(prompt_type, prompt_dict, tokenizer, device, base_model,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/l0r3zz/github/h2ogpt/src/stopping.py", line 183, in get_stopping
if tokenizer._pad_token: # use hidden variable to avoid annoying properly logger bug
^^^^^^^^^^^^^^^^^^^^
File "/home/l0r3zz/.local/lib/python3.12/site-packages/transformers/tokenization_utils_base.py", line 1104, in getattr
raise AttributeError(f"{self.class.name} has no attribute {key}")
AttributeError: PreTrainedTokenizerFast has no attribute _pad_token. Did you mean: '_add_tokens'?

I tried some troubleshooting but can't get anywhere...

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Throwing: AttributeError: PreTrainedTokenizerFast has no attribute _pad_token. Did you mean: '_add_tokens' at runtime #1917

Throwing: AttributeError: PreTrainedTokenizerFast has no attribute _pad_token. Did you mean: '_add_tokens' at runtime #1917

l0r3zz commented Jan 5, 2025

Throwing: AttributeError: PreTrainedTokenizerFast has no attribute _pad_token. Did you mean: '_add_tokens' at runtime #1917

Throwing: AttributeError: PreTrainedTokenizerFast has no attribute _pad_token. Did you mean: '_add_tokens' at runtime #1917

Comments

l0r3zz commented Jan 5, 2025