-
Notifications
You must be signed in to change notification settings - Fork 417
Qwen3 sft collab #2355
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
mydatascience
wants to merge
15
commits into
main
Choose a base branch
from
qwen3-sft-collab
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+485
−15
Open
Qwen3 sft collab #2355
Changes from all commits
Commits
Show all changes
15 commits
Select commit
Hold shift + click to select a range
3d5efb7
Sft fro qwen
mydatascience 2cea782
Sft fro qwen
mydatascience e1cb7e4
Refined qwen colab
mydatascience 59e8ebf
Del old one
mydatascience 2742f82
Fix var name
mydatascience 7e68b9a
colab label
mydatascience dce7b0d
Nicer
mydatascience b51d693
Making nicer
mydatascience 5d05559
Added README.md
mydatascience b6264b1
Restructured
mydatascience 81f20a5
Fix
mydatascience 59f9f6f
Fix
mydatascience 053b980
Fix
mydatascience 5d7aea5
Fix
mydatascience bfa7148
Fix install
mydatascience File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,158 @@ | ||
# MaxText Examples - Setting the Jupyter Lab or Collab to run them on TPU | ||
|
||
This guide provides comprehensive instructions for setting up Jupyter Lab on TPU and connecting it to Google Colab for running MaxText examples. | ||
|
||
## 📑 Table of Contents | ||
|
||
- [Prerequisites](#prerequisites) | ||
- [Method 1: Google Colab with TPU (Recommended)](#method-1-google-colab-with-tpu-recommended) | ||
- [Method 2: Local Jupyter Lab with TPU](#method-2-local-jupyter-lab-with-tpu) | ||
- [Method 3: Colab + Local Jupyter Lab Hybrid](#method-3-colab--local-jupyter-lab-hybrid) | ||
- [Available Examples](#available-examples) | ||
- [Common Pitfalls & Debugging](#common-pitfalls--debugging) | ||
- [Support & Resources](#support--resources) | ||
- [Contributing](#contributing) | ||
|
||
## Prerequisites | ||
|
||
Before starting, make sure you have: | ||
|
||
- ✅ A Google Cloud Platform (GCP) account with billing enabled | ||
- ✅ TPU quota available in your region (check under IAM & Admin → Quotas) | ||
- ✅ Basic familiarity with Jupyter, Python, and Git | ||
- ✅ gcloud CLI installed locally if you plan to use Method 2 or 3 | ||
- ✅ Firewall rules open for port 8888 (Jupyter) if accessing directly | ||
|
||
## Method 1: Google Colab with TPU (Recommended) | ||
|
||
This is the fastest way to run MaxText without managing infrastructure. | ||
|
||
### Step 1: Open Google Colab | ||
|
||
1. Go to [Google Colab](https://colab.research.google.com/) | ||
2. Sign in → New Notebook | ||
|
||
### Step 2: Enable TPU Runtime | ||
|
||
1. **Runtime** → **Change runtime type** | ||
2. Set **Hardware accelerator** → **TPU** | ||
3. Select TPU version: | ||
- **v5e-8** → recommended for most MaxText examples, but it's a paid option | ||
- **v5e-1** → free tier option (slower, but works for Qwen-0.6B demos) | ||
4. Click **Save** | ||
|
||
### Step 3: Upload & Prepare MaxText | ||
|
||
Upload notebooks or mount your GitHub repo | ||
|
||
> **Note:** In Colab, the repo root will usually be `/content/maxtext` | ||
|
||
**Example:** | ||
```python | ||
!git clone https://github.com/AI-Hypercomputer/maxtext.git | ||
%cd maxtext | ||
``` | ||
|
||
### Step 4: Run Examples | ||
|
||
1. Open `src/MaxText/examples/` | ||
2. Try: | ||
- `sft_qwen3_demo.ipynb` | ||
- `sft_llama3_demo.ipynb` | ||
- `grpo_llama3_demo.ipynb` | ||
|
||
|
||
> ⚡ **Tip:** If Colab disconnects, re-enable TPU and re-run setup cells. Save checkpoints to GCS or Drive. | ||
> ⚡ **Tip:** If Colab asks to restart session - do it and continue to run cells | ||
|
||
## Method 2: Local Jupyter Lab with TPU | ||
|
||
This method gives you more control and is better for long training runs. | ||
|
||
### Step 1: Set Up TPU VM | ||
|
||
In Google Cloud Console: | ||
|
||
1. **Compute Engine** → **TPU** → **Create TPU Node** | ||
2. Example config: | ||
- **Name:** `maxtext-tpu-node` | ||
- **TPU type:** `v5e-8` (or `v6p-8` for newer hardware) | ||
- **Runtime Version:** `tpu-ubuntu-alpha-*` (matches your VM image) | ||
|
||
### Step 2: Connect to TPU VM | ||
|
||
```bash | ||
gcloud compute tpus tpu-vm ssh maxtext-tpu-node --zone=YOUR_ZONE | ||
``` | ||
|
||
### Step 3: Install Dependencies | ||
|
||
```bash | ||
sudo apt update && sudo apt upgrade -y | ||
sudo apt install python3-pip python3-dev git -y | ||
pip3 install jupyterlab | ||
``` | ||
|
||
### Step 4: Start Jupyter Lab | ||
|
||
```bash | ||
jupyter lab --ip=0.0.0.0 --port=8888 --no-browser --allow-root | ||
``` | ||
|
||
Copy the URL with token from terminal | ||
|
||
### Step 5: Secure Access | ||
|
||
#### Option A: SSH Tunnel (Recommended) | ||
|
||
```bash | ||
gcloud compute tpus tpu-vm ssh maxtext-tpu-node --zone=YOUR_ZONE -- -L 8888:localhost:8888 | ||
``` | ||
|
||
Then open → `http://localhost:8888` | ||
|
||
|
||
## Method 3: Colab + Local Jupyter Lab Hybrid | ||
|
||
Set up Jupyter Lab as in step 2. | ||
Use the link for Jupyter Lab as a link for "Connect to a local runtime" in Collab - at the dropdown where you select the runtime. | ||
|
||
## Available Examples | ||
|
||
### Supervised Fine-Tuning (SFT) | ||
|
||
- **`sft_qwen3_demo.ipynb`** → Qwen3-0.6B with Hugging Face ultrachat_200k dataset | ||
- **`sft_llama3_demo.ipynb`** → Llama3.1-8B with Hugging Face ultrachat_200k dataset | ||
|
||
### GRPO Training | ||
|
||
- **`grpo_llama3_demo.ipynb`** → GRPO training on math dataset | ||
|
||
## Common Pitfalls & Debugging | ||
|
||
| Issue | Solution | | ||
|-------|----------| | ||
| ❌ TPU runtime mismatch | Check TPU runtime version matches VM image (`tpu-ubuntu-alpha-*`) | | ||
| ❌ Colab disconnects | Save checkpoints to GCS or Drive regularly | | ||
| ❌ "RESOURCE_EXHAUSTED" errors | Use smaller batch size or v5e-8 instead of v5e-1 | | ||
| ❌ Firewall blocked | Ensure port 8888 open, or always use SSH tunneling | | ||
| ❌ Path confusion | In Colab use `/content/maxtext`; in TPU VM use `~/maxtext` | | ||
|
||
## Support and Resources | ||
|
||
- 📘 [MaxText Documentation](https://github.com/AI-Hypercomputer/maxtext) | ||
- 💻 [Google Colab](https://colab.research.google.com) | ||
- ⚡ [Cloud TPU Docs](https://cloud.google.com/tpu/docs) | ||
- 🧩 [Jupyter Lab](https://jupyterlab.readthedocs.io) | ||
|
||
## Contributing | ||
|
||
If you encounter issues or have improvements for this guide, please: | ||
|
||
1. Open an issue on the MaxText repository | ||
2. Submit a pull request with your improvements | ||
3. Share your experience in the discussions | ||
|
||
--- | ||
|
||
**Happy Training! 🚀** |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, add a note that after installing dependencies you would have to restart the session. After restarting, just run the steps following the setup (no need to install dependencies again).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done