Skip to content

No instruct dp #20

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: main
Choose a base branch
from
Open

No instruct dp #20

wants to merge 8 commits into from

Conversation

AndrewRWilliams
Copy link
Collaborator

Adds support for using Direct Prompt for models that are do not support instructions in HF.

  • Checks if the tokenizer has a chat template.
  • If not, the context is simply concatenated into one long string and feed to the model.

Comment on lines +108 to +123

# Build one pattern line per required timestamp:
# (YYYY-MM-DD HH:MM:SS,[-+]?\d+(?:\.\d+)?)
# No spaces allowed anywhere, so everything is literally "fixed"
# except for the numeric portion.
lines = [
rf"\({re.escape(ts)},[-+]?\d{1,20}(?:\.\d{0,20})?\)"
for ts in required_timestamps
]

# Join lines with exactly one "\n".
body = r"\n".join(lines)

# Return the full pattern, ensuring a single newline
# after <forecast> and before </forecast>.
return rf"<forecast>\n{body}\n</forecast>"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a need to replace the original function? This might break reproducibility.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original function allowed for returns such as \n \n \n \n \n \n \n ... so I fixed it to this when working with Hymba. I think it would only break reproducibility if this code is somehow encoded into controlling the RNG for the llm or for the tasks, right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marcotet what are your thoughts?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, all results of our paper were with the previous code so I guess it was fine - we didn't get any errors.

If you think it's better maybe good to reproduce a model's results on CiK to confirm it doesn't break reproducibility. What do you think?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does mamba need this change?

@@ -19,5 +19,6 @@ termcolor
tenacity
h5py
transformers>4.4.1
tokenizers
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this necessary for mamba or something?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I can remove it though since I didn't add the mamba models

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that'd be great cause it might break the already-fragile requirements otherwise :D Unless you tested that it didn't :)

@ashok-arjun
Copy link
Collaborator

ashok-arjun commented Feb 14, 2025

By the way, I'm testing if the code works for other base models. Will get back on that.
Update: It works!! Yay!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants