-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GSoC 2024: Summary of LLM Hyperparameter Optimization API Project #154
base: master
Are you sure you want to change the base?
Conversation
Signed-off-by: helenxie-bit <[email protected]>
Signed-off-by: helenxie-bit <[email protected]>
Ref: kubeflow/katib#2339 |
Please review when you have time and any suggestions are welcome! Thanks! @andreyvelich @johnugeorge @terrytangyuan |
Signed-off-by: helenxie-bit <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for working on this @helenxie-bit, and sorry for the late reply!
/assign @varodrig @hbelmiro @franciscojavierarceo @kubeflow/wg-training-leads @Electronic-Waste
Please can you help us with the review, so we can merge this great blog post ?
This is awesome! We'll make sure to review these sooner going forward :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm /approve
_posts/2024-09-19-gsoc-2024-summary-llm-hyperparameter-optimization-api.md
Outdated
Show resolved
Hide resolved
|
||
Hyperparameter optimization is a crucial but time-consuming task in fine-tuning machine learning models, especially for LLMs that involve billions of parameters. This API aims to streamline this process by abstracting the complexity of Kubernetes infrastructure, enabling data scientists to focus on model performance instead of system configuration. | ||
|
||
 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to share a little bit about the feature and why it is useful for Kubeflow Katib end-users.
Maybe we can take something from your proposal or documentation PR: kubeflow/website#3952
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@andreyvelich I've added another paragraph explaining the features of this API—hope it's clear! Please take a look when you have time.
I also included a link to the user guide, but since it hasn't been merged yet, I'm unsure how to link it properly. The link I'm using now seems to be temporary. Could you provide instructions on how to link it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@helenxie-bit @mahdikhashan Should we just merge this website PR and you can address my remaining comments in the followup PR: kubeflow/website#3952 (comment) ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, that sounds great.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@helenxie-bit Please can you create an issue in kubeflow/katib
to track followup updates in the website PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@andreyvelich Sure! I've created an issue. Please have a look.
Hyperparameter optimization is a crucial but time-consuming task in fine-tuning machine learning models, especially for LLMs that involve billions of parameters. This API aims to streamline this process by abstracting the complexity of Kubernetes infrastructure, enabling data scientists to focus on model performance instead of system configuration. | ||
|
||
 | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's also cross-reference the docs for this feature, since we will merge this PR soon: kubeflow/website#3952
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's get this one merged soon
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: franciscojavierarceo, terrytangyuan The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
+1. I'd like us to get a GenAI page as soon as possible. :) I'm happy to cut the draft PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
helenxie-bit Fantastic blog! I enjoyed reading it.
Thank you for working on this and I'm so glad you had such a great experience.
I added a few suggestions and recommendations.
_posts/2024-09-19-gsoc-2024-summary-llm-hyperparameter-optimization-api.md
Outdated
Show resolved
Hide resolved
|
||
## Motivation | ||
|
||
The rapid advancements and rising popularity of LLMs, such as GPT and BERT, have created a growing demand for efficient LLMOps in Kubernetes. To address this, we have developed a [train API](https://www.kubeflow.org/docs/components/training/user-guides/fine-tuning/) within the Training Python SDK, simplifying the process of fine-tuning LLMs using distributed PyTorchJob workers. However, hyperparameter optimization remains a crucial yet labor-intensive task for enhancing model performance. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a suggestion when linking the documentation about fine-tuning, once I access the page it says it's "Old Version
This page is about Kubeflow Training Operator V1, for the latest information check the Kubeflow Trainer V2 documentation."
. I'd add a disclaimer in your blog that currently, the documentation is being updated to the new Trainer, but the functionality it's still valid/current.
cc @andreyvelich for comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This functionality is not part of Kubeflow Trainer V2, since we use other methods for Fine-Tuning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, as Andrey mentioned, Kubeflow Trainer V2 is still a work in progress, and this API is not part of it. Do you think I should remove the link to avoid any confusion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's keep it for legacy docs for now, it's ok
_posts/2024-09-19-gsoc-2024-summary-llm-hyperparameter-optimization-api.md
Outdated
Show resolved
Hide resolved
- **Stage 1**: Writing the project proposal and converting it into a Kubeflow Enhancement Proposal (KEP). | ||
- **Stage 2**: Developing and implementing the high-level API. | ||
- **Stage 3**: Implementing unit tests and end-to-end tests for the API. | ||
- **Stage 4**: Creating documentation and presenting the work to the Kubeflow community. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you contribute to the API design as well? if you did I'd include it as as separate stage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I contributed to the API design as well, and I see it as part of the work involved in writing the proposal. I've added it and refined the wording—please take a look when you have time!
_posts/2024-09-19-gsoc-2024-summary-llm-hyperparameter-optimization-api.md
Outdated
Show resolved
Hide resolved
_posts/2024-09-19-gsoc-2024-summary-llm-hyperparameter-optimization-api.md
Outdated
Show resolved
Hide resolved
_posts/2024-09-19-gsoc-2024-summary-llm-hyperparameter-optimization-api.md
Outdated
Show resolved
Hide resolved
_posts/2024-09-19-gsoc-2024-summary-llm-hyperparameter-optimization-api.md
Outdated
Show resolved
Hide resolved
_posts/2024-09-19-gsoc-2024-summary-llm-hyperparameter-optimization-api.md
Show resolved
Hide resolved
_posts/2024-09-19-gsoc-2024-summary-llm-hyperparameter-optimization-api.md
Outdated
Show resolved
Hide resolved
Signed-off-by: helenxie-bit <[email protected]>
Signed-off-by: helenxie-bit <[email protected]>
Signed-off-by: helenxie-bit <[email protected]>
Signed-off-by: helenxie-bit <[email protected]>
Signed-off-by: helenxie-bit <[email protected]>
Signed-off-by: helenxie-bit <[email protected]>
Signed-off-by: helenxie-bit <[email protected]>
Signed-off-by: helenxie-bit <[email protected]>
Signed-off-by: helenxie-bit <[email protected]>
Signed-off-by: helenxie-bit <[email protected]>
This PR adds a detailed summary of my GSoC 2024 Project 4: Developing the LLM Hyperparameter Optimization API in Kubeflow's Katib. It highlights the motivation, goals, my contributions, and key lessons learned from the project.