Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support vision models and function calling #8

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

DePasqualeOrg
Copy link
Collaborator

@DePasqualeOrg DePasqualeOrg commented Dec 11, 2024

I've added functionality from the huggingface.js implementation. This is a work in progress.

@johnmai-dev
Copy link
Owner

Thank you for your PR, it's great! Could you please provide some test cases?

I may not have time until next month. I've been a bit busy lately.

Hi @pcuenca ! Do you have time to help review this PR?

@DePasqualeOrg
Copy link
Collaborator Author

DePasqualeOrg commented Dec 12, 2024

I'll add some tests for images and function calling and try to polish this up a bit.

I should have also formatted the code before editing it, to make the changes more legible. After this gets merged, maybe we can add some auto-formatting.

@DePasqualeOrg DePasqualeOrg marked this pull request as draft December 12, 2024 08:35
@johnmai-dev johnmai-dev linked an issue Dec 12, 2024 that may be closed by this pull request
@pcuenca
Copy link
Collaborator

pcuenca commented Dec 12, 2024

Hi @DePasqualeOrg, thanks a lot for the effort! It's a long diff, I can try to take a look in a couple of days. Do we need everything at once, including namespaces, built-in functions and tool calling, or could this potentially be approached in a few phases?

@DePasqualeOrg DePasqualeOrg marked this pull request as ready for review December 12, 2024 21:46
@DePasqualeOrg DePasqualeOrg mentioned this pull request Dec 26, 2024
@DePasqualeOrg DePasqualeOrg force-pushed the add-functionality branch 3 times, most recently from 9a43ea0 to 1c85539 Compare December 29, 2024 14:34
@DePasqualeOrg DePasqualeOrg force-pushed the add-functionality branch 2 times, most recently from 8365aec to 0fdf32f Compare December 29, 2024 14:57
@DePasqualeOrg
Copy link
Collaborator Author

DePasqualeOrg commented Dec 29, 2024

I've rebased this after formatting the repo with swift-format, to make it easier for @pcuenca to review it.

@DePasqualeOrg DePasqualeOrg marked this pull request as draft December 29, 2024 19:09
@DePasqualeOrg DePasqualeOrg force-pushed the add-functionality branch 2 times, most recently from 5b4472f to a60b6b1 Compare December 29, 2024 23:12
@DePasqualeOrg
Copy link
Collaborator Author

DePasqualeOrg commented Dec 29, 2024

The existing tests as well as some additional ones I added pass.

@DePasqualeOrg DePasqualeOrg marked this pull request as ready for review December 29, 2024 23:13
@DePasqualeOrg DePasqualeOrg marked this pull request as draft December 30, 2024 08:06
@DePasqualeOrg DePasqualeOrg force-pushed the add-functionality branch 6 times, most recently from a982715 to 0e65018 Compare January 1, 2025 13:00
@DePasqualeOrg
Copy link
Collaborator Author

DePasqualeOrg commented Jan 1, 2025

Four of the six tool use tests from the TypeScript implementation are now passing. It's quite difficult to get these tests to pass, because dictionaries in Swift don't preserve the order of the keys. I've used OrderedDictionary in the tests, which presents its own challenges, since sometimes these values need to be converted to JSON. I've disabled the two problematic tests and will focus on testing with more recent models. It already appears to work well with Llama 3.2.

I'll be testing this on vision models here: ml-explore/mlx-swift-examples#173

Tool use examples in LLMEval: ml-explore/mlx-swift-examples#174

@DePasqualeOrg DePasqualeOrg force-pushed the add-functionality branch 2 times, most recently from 7c0b020 to 9eb074b Compare January 1, 2025 16:18
@DePasqualeOrg DePasqualeOrg force-pushed the add-functionality branch 8 times, most recently from a04ae8d to 890b975 Compare January 7, 2025 19:48
@DePasqualeOrg
Copy link
Collaborator Author

This adds quite a lot of functionality from Jinja2 in Python. I've added tests from the TypeScript Jinja library and from Jinja2. Some of the new tests I've added from Jinja2 aren't passing and are disabled, but they mostly seem to cover edge cases. I also created tests for vision language models and tool use.

I could keep going with this, but I think this is a good place to stop for now, since it seems to work with chat templates for vision language models and tool use. Maybe I'll polish it up a bit later on, but I've already spent several whole days on this and need to move on to other things.

@pcuenca and @johnmai-dev, I don't expect you to review every line of this, but maybe you can scan through it and let me know if anything catches your eye.

@DePasqualeOrg DePasqualeOrg marked this pull request as ready for review January 7, 2025 20:48
@DePasqualeOrg
Copy link
Collaborator Author

@pcuenca and @johnmai-dev, if you don't have any feedback, I think we should merge this so that I can move forward with huggingface/swift-transformers#151 and ml-explore/mlx-swift-examples#173.

@johnmai-dev
Copy link
Owner

@pcuenca and @johnmai-dev, if you don't have any feedback, I think we should merge this so that I can move forward with huggingface/swift-transformers#151 and ml-explore/mlx-swift-examples#173.

I'm really sorry, I've been quite busy with work lately. I will take some time over the weekend to review it. Thank you for your work.🍻

@johnmai-dev
Copy link
Owner

This adds quite a lot of functionality from Jinja2 in Python. I've added tests from the TypeScript Jinja library and from Jinja2. Some of the new tests I've added from Jinja2 aren't passing and are disabled, but they mostly seem to cover edge cases. I also created tests for vision language models and tool use.

I suggest creating a separate PR to split the Jinja2 functionality. This PR should only replicate the TypeScript version of Jinja.

Currently, I also have a branch with an incomplete replica of the TypeScript version of Jinja locally, and I'll merge your separately split PR then.

Do you think this is okay?

@DePasqualeOrg
Copy link
Collaborator Author

DePasqualeOrg commented Jan 13, 2025

Honestly, no, I don't think that's a good idea. I spent several whole days trying to get as close as possible to the Python Jinja2. I think that should be our reference point, not the TypeScript version. That is what is used for chat templates in Python.

If you have tests from your implementation of the TypeScript version, you can share them and we can see if they pass on my branch. You can take my tests and see if they pass on your branch.

It was an enormous amount of work to get this working, and I would be very disappointed if it gets thrown away.

Also, to prevent duplication of work in the future, I would suggest opening a draft pull request here when you're working on something, so that your work isn't duplicated by someone else, which would be a waste of their time.

@johnmai-dev
Copy link
Owner

I'm very sorry, but it won't be thrown away. I think the ported Jinja2 still needs further inspection and testing, so I want to split some of Jinja2's code into smaller modules for iterative updates.

Secondly, the original intention of Swift Jinja is to replicate TypeScript Jinja; I hope to keep it synchronized with the TypeScript Jinja version and then port Jinja2 on this basis.

Replicating first is the fastest method, and there are significant design differences between Swift Jinja and Jinja2. If we truly want to replicate Jinja2, a refactor of Swift Jinja might be necessary.

@DePasqualeOrg
Copy link
Collaborator Author

DePasqualeOrg commented Jan 13, 2025

I really don't know what you have in mind. What exactly do you want to separate out from my contribution? Is there any functionality in your branch that isn't reproduced in mine?

Please also point out specific differences between the TypeScript version and the Python Jinja2 that you think are important.

My pull request has been open for more than a month, and only now am I learning about your parallel effort, which you didn't share here. Since no progress has been made for several months, I decided to take the initiative and make chat templates for vision models and function calling work. Now they're working.

If you're worried about correctness, I can remove the filters for which I disabled tests because fully implementing them would be too complex.

However, considering the huge effort required to make this all work, I don't think we're going to be able to merge your version and mine. Since I was the first to share this here, I think we should use my branch as the basis for further work. I'm happy to remove anything that you think is not ready for production (please provide tests that demonstrate it's not correct), and to add any functionality from your branch that is missing in mine.

The reason I feel confident about my work is that I've already covered a large part of the tests from the Python implementation, and they're passing.

@DePasqualeOrg
Copy link
Collaborator Author

I've reviewed my PR, and there are no major architectural changes here. Therefore, I don't understand what you mean about refactoring. Furthermore, I'd like to emphasize that I first ported functionality from the TypeScript implementation and then (after almost entirely covering the TypeScript implementation) added missing functionality (mainly the filters and tests) from Jinja2 in Python. This PR is still comparable to the TypeScript implementation.

I'm hoping we can move forward with this efficiently, because I want to start building actual features in apps using function calling and vision models instead of getting bogged down with this library.

@johnmai-dev
Copy link
Owner

Ok.

Before merging this PR, you still need to resolve the previous review feedback. @DePasqualeOrg

Additionally, I hope @pcuenca can also participate in the review since swift-transformers is mainly being used at present.

@johnmai-dev johnmai-dev requested a review from pcuenca January 14, 2025 01:46
Tests/Templates/ToolSpecs.swift Show resolved Hide resolved
Python/test-chat-template.ipynb Outdated Show resolved Hide resolved
Sources/Parser.swift Show resolved Hide resolved
Sources/Ast.swift Outdated Show resolved Hide resolved
Copy link
Owner

@johnmai-dev johnmai-dev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much @DePasqualeOrg!
I have reviewed it and think it can be merged at any time.

But we still need @pcuenca to help review it again.

@johnmai-dev johnmai-dev linked an issue Jan 14, 2025 that may be closed by this pull request
@johnmai-dev johnmai-dev added the enhancement New feature or request label Jan 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Handle vision language model chat templates Parse Llama tool calls?
3 participants