Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
135 commits
Select commit Hold shift + click to select a range
b837338
feat: Added initial Pytorch example to monitor batching and per layer…
LeoRoccoBreedt Feb 26, 2025
9994528
refactor: Update introduction section for more clarity on notebook use
LeoRoccoBreedt Feb 26, 2025
223fde1
chore: change how the custom run id gets automatically generated
LeoRoccoBreedt Feb 26, 2025
ceb3a24
chore: update instructions on how users can get and set their API tok…
LeoRoccoBreedt Feb 26, 2025
39f5bb9
chore: update the introduction to be more foundation model training o…
LeoRoccoBreedt Feb 26, 2025
69fdfba
chore: update dataset section with a better description
LeoRoccoBreedt Feb 26, 2025
1cde287
refactor: update training loop where grads, norms and activations are…
LeoRoccoBreedt Feb 26, 2025
2da28f4
refactor: update batch size and edit gradient norm logging code
LeoRoccoBreedt Feb 27, 2025
123fea2
chore: add data file to ignore for pytorch example
LeoRoccoBreedt Feb 27, 2025
86b6d5d
refactor: update model architecture layers and update training loop
LeoRoccoBreedt Feb 27, 2025
a39077a
refactor: update accuracy calculation to not output the percentage
LeoRoccoBreedt Feb 27, 2025
3e2d453
refactor: update model architecture layers, accuracy calculation and …
LeoRoccoBreedt Feb 28, 2025
8a3da57
feat: Added a pytorch text-based example that is used to demonstrate …
LeoRoccoBreedt Mar 3, 2025
d03fd43
refactor: add validation and test loss calcualtion for each epoch
LeoRoccoBreedt Mar 3, 2025
b7f8e08
refactor: update configs and parameters
LeoRoccoBreedt Mar 3, 2025
b113a36
refactor: update logged configs
LeoRoccoBreedt Mar 3, 2025
7258993
refactor: calculate activations per layer
LeoRoccoBreedt Mar 3, 2025
ae330e6
refactor: add tracking for grad norms
LeoRoccoBreedt Mar 3, 2025
d4d160c
refactor: add gradient tracking per epoch
LeoRoccoBreedt Mar 3, 2025
890bf26
chore: remove uneeded section
LeoRoccoBreedt Mar 3, 2025
ebd34c7
refactor: add fully connected layer to model for more complexity
LeoRoccoBreedt Mar 4, 2025
ae9abc6
chore: fix activation saving for layers
LeoRoccoBreedt Mar 4, 2025
6f23761
refactor: update packages for example
LeoRoccoBreedt Mar 5, 2025
2c82a0f
refactor: update dataset to be used in example
LeoRoccoBreedt Mar 5, 2025
48451d6
refactor: update evalution function that calculates the validation lo…
LeoRoccoBreedt Mar 5, 2025
5b5fbdd
refactor: update training loop to work with new data
LeoRoccoBreedt Mar 5, 2025
8ff6f30
chore: remove unused sections
LeoRoccoBreedt Mar 5, 2025
b32b27b
chore: re-organize notebook layout
LeoRoccoBreedt Mar 5, 2025
4d659a2
chore: cleanup and add parameters in right place
LeoRoccoBreedt Mar 5, 2025
5e7ec6e
refactor: add all debugging metrics to the same dictionary variable
LeoRoccoBreedt Mar 5, 2025
d27c247
chore: change LSTM layers to see response in logging
LeoRoccoBreedt Mar 5, 2025
dc4b63d
refactor: update location where model.train() is called in training loop
LeoRoccoBreedt Mar 6, 2025
1a8d021
refactor: update how HF dataset is downloaded to only download a subset
LeoRoccoBreedt Mar 6, 2025
fd7cbc5
refactor: update loading and processing of HF dataset to make code fa…
LeoRoccoBreedt Mar 6, 2025
3be6c3e
chore: add TODO's to address
LeoRoccoBreedt Mar 6, 2025
6e1df27
chore: update sections
LeoRoccoBreedt Mar 10, 2025
fbfb0da
refactor: move data downloading section
LeoRoccoBreedt Mar 10, 2025
84149d1
refactor: added evaluate function to model initialization cell
LeoRoccoBreedt Mar 10, 2025
2d69e01
chore: update introduction for the model architecture and helper func…
LeoRoccoBreedt Mar 10, 2025
90e0539
fix: update input for vocab_size
LeoRoccoBreedt Mar 10, 2025
46e1f37
refactor: add the vocab_size calculation to the data formatting section
LeoRoccoBreedt Mar 10, 2025
ca1b52d
fix: refactor data loading process from HF
LeoRoccoBreedt Mar 10, 2025
836c452
fix: update validation data to use test subset from HF and comment ou…
LeoRoccoBreedt Mar 10, 2025
814d56f
refactor: create a class to manage hooks for tracking gradients and a…
LeoRoccoBreedt Mar 11, 2025
31968e9
fix: gradients logging to Neptune
LeoRoccoBreedt Mar 11, 2025
a9bad61
style: remove old model architecture section
LeoRoccoBreedt Mar 11, 2025
32d46f1
refactor: change attribute names for better readability
LeoRoccoBreedt Mar 11, 2025
32a43b1
style: update sections and model architecture
LeoRoccoBreedt Mar 12, 2025
0c8497b
chore: cleanup commented code
LeoRoccoBreedt Mar 12, 2025
f61f734
chore: cleanup model outputs
LeoRoccoBreedt Mar 12, 2025
bcc344e
refactor: update quotes for dictionary keys since Colab returns an error
LeoRoccoBreedt Mar 12, 2025
df89878
style: update intro and information about debugging metrics
LeoRoccoBreedt Mar 14, 2025
23bd58a
refactor: update to be able to run model on GPUs
LeoRoccoBreedt Mar 17, 2025
0dbd341
style: update ending of notebook with follow along
LeoRoccoBreedt Mar 17, 2025
8f98b33
style: add colab link to intro
LeoRoccoBreedt Mar 18, 2025
ce66396
style: update colab link and ending section
LeoRoccoBreedt Mar 18, 2025
5eda856
style: update intro to notebook
LeoRoccoBreedt Mar 18, 2025
ffd0b0a
style: update markdown display to work with colab
LeoRoccoBreedt Mar 18, 2025
275f024
style: update HookManager section
LeoRoccoBreedt Mar 18, 2025
295c9b2
style: update sections headings
LeoRoccoBreedt Mar 19, 2025
5851842
style: condense intro
LeoRoccoBreedt Mar 19, 2025
ea30d40
style: add links for active run and update ending steps
LeoRoccoBreedt Mar 19, 2025
471beb7
refactor: log global and debugging metrics during training at each st…
LeoRoccoBreedt Mar 19, 2025
b2732cf
style: update urls to use experiment_name
LeoRoccoBreedt Mar 20, 2025
c34dca3
chore: removed unused notebook
LeoRoccoBreedt Mar 20, 2025
5fd0a0e
chore: clean up comments and code cell outputs
LeoRoccoBreedt Mar 20, 2025
23ea663
chore: move notebook into its own notebook folder
LeoRoccoBreedt Mar 20, 2025
21114f6
chore: fix comment
LeoRoccoBreedt Mar 20, 2025
c8cd676
refactor: update notebook to use envionment variables
LeoRoccoBreedt Mar 20, 2025
fe4c1a3
chore: update colab link
LeoRoccoBreedt Mar 20, 2025
6e96c5c
chore: accpet pre-commit recommendations
LeoRoccoBreedt Mar 20, 2025
b16869b
chore: add notebook to test workflow
LeoRoccoBreedt Mar 20, 2025
15357bb
TW review
szaganek Mar 27, 2025
6b04aaa
Merge commit 'e842efd5add2a4ef8a45164c8a91dda9de175925' into lr/pytor…
LeoRoccoBreedt Mar 27, 2025
34c47f3
chore: update pip installation line
LeoRoccoBreedt Mar 27, 2025
a2a2bf1
chore: update url's to use the new run methods
LeoRoccoBreedt Mar 27, 2025
3da164e
refactor: update example to start using an external class for hooking…
LeoRoccoBreedt Mar 28, 2025
f44323d
feat: create a TorchWatcher package
LeoRoccoBreedt Mar 31, 2025
2b001af
style: update intro
LeoRoccoBreedt Mar 31, 2025
fdfca80
update TocrhWatcher package to default to tracking all available laye…
LeoRoccoBreedt Mar 31, 2025
84a7ca3
chore: update readme
LeoRoccoBreedt Mar 31, 2025
2baa709
Merge commit 'ee5a98f111eb90400a373c292b591e3684762fcc' into lb/pytor…
LeoRoccoBreedt Mar 31, 2025
75265ec
chore: cleanup comments and add TODO for future improvements
LeoRoccoBreedt Apr 1, 2025
ebde389
refactor: update TorchWatcher to use a common _track_metrics() method
LeoRoccoBreedt Apr 1, 2025
9f83fbf
refactor: update the watch() method to accept boolen inputs for the m…
LeoRoccoBreedt Apr 1, 2025
603af72
feat: add ability to specify base_namespace and namespace during trai…
LeoRoccoBreedt Apr 1, 2025
6bb4879
chore: updare readme and example for namespace feature
LeoRoccoBreedt Apr 1, 2025
50b0a6d
chore: pre-commit hooks changes
LeoRoccoBreedt Apr 1, 2025
e67e1f6
refactor: update example to use the improved TorchWatcher package
LeoRoccoBreedt Apr 1, 2025
a2d0d50
chore: pre-commit cleanup
LeoRoccoBreedt Apr 3, 2025
fcf3657
feat: notebook tutorial on debugging training runs with Neptune
LeoRoccoBreedt Apr 8, 2025
82f4d8b
chore: pre-commit changes
LeoRoccoBreedt Apr 8, 2025
8c56296
chore: update colab link for branch
LeoRoccoBreedt Apr 8, 2025
247679a
refactor: keep notebook more self-contained for use in Colab
LeoRoccoBreedt Apr 8, 2025
408ba02
fix: f string compatibility in Colab
LeoRoccoBreedt Apr 8, 2025
314e8a2
fix: ensure model object is on correct device
LeoRoccoBreedt Apr 8, 2025
62227bc
chore: Add header links to GH, Neptune and docs
LeoRoccoBreedt Apr 9, 2025
f64d54d
Merge commit 'fc4bc5ee2d6e3297c8611991e60f372ea785d213' into lb/debug…
LeoRoccoBreedt Apr 9, 2025
505f41c
style: minor updates to markdown
LeoRoccoBreedt Apr 14, 2025
bed4899
chore: remove unused code
LeoRoccoBreedt Apr 14, 2025
75885bb
style: remove image reference
LeoRoccoBreedt Apr 15, 2025
33d3305
style: update ending steps and add button links to examples
LeoRoccoBreedt Apr 15, 2025
494379f
style: update text for better readability
LeoRoccoBreedt Apr 16, 2025
0b8847b
style: Update markdown section of explanations for better readability…
LeoRoccoBreedt Apr 16, 2025
69bdb16
chore: pre-commit cleanup
LeoRoccoBreedt Apr 16, 2025
0fed412
chore: removed unused code for this example
LeoRoccoBreedt Apr 16, 2025
ba14f8f
chore: update test yaml
LeoRoccoBreedt Apr 16, 2025
1ab2ed0
refactor: use synthetic simulation data in example to illustrate the …
LeoRoccoBreedt May 2, 2025
aaf96ac
Merge branch 'main' into lb/debugging_model_training
LeoRoccoBreedt May 2, 2025
3e423f8
fix: add numpy to be installed as dependency
LeoRoccoBreedt May 2, 2025
90fa8d9
chore: remove unneeded comments
LeoRoccoBreedt May 2, 2025
7e4f3ff
refactor: update model to PyTorch based
LeoRoccoBreedt May 22, 2025
12a95f8
Merge remote-tracking branch 'origin/main' into lb/debugging_model_tr…
LeoRoccoBreedt May 22, 2025
b0b3f56
chore: update readme and pre-commit accept
LeoRoccoBreedt May 22, 2025
b6eb2bd
Merge branch 'main' into lb/debugging_model_training
SiddhantSadangi May 27, 2025
c35c43d
fix: add packages dependencies
LeoRoccoBreedt Jun 2, 2025
b7fe196
refactor: update notebooks
LeoRoccoBreedt Jun 2, 2025
ebc9aee
feat: add script version of tutorial
LeoRoccoBreedt Jun 2, 2025
6b9df37
Merge remote-tracking branch 'origin/main' into lb/debugging_model_tr…
LeoRoccoBreedt Jun 2, 2025
75e88ad
chore: Add tests to GH workflows
LeoRoccoBreedt Jun 2, 2025
f0db1c2
fix: constrain numpy versions <2
LeoRoccoBreedt Jun 2, 2025
d3c8200
chore: update links for colab and GH
LeoRoccoBreedt Jun 2, 2025
4925f69
chore: minor tweaks and update links to project
LeoRoccoBreedt Jun 2, 2025
39aaa81
chore: readme links
LeoRoccoBreedt Jun 2, 2025
d9d6e77
small fixes
LeoRoccoBreedt Jun 2, 2025
0af184c
Apply suggestions from TW's review
LeoRoccoBreedt Jun 6, 2025
d31a78f
Merge commit 'c134c15e129fa782b6341895af7f1a6b002629c1' into lb/debug…
LeoRoccoBreedt Jun 6, 2025
a1ef5e8
Apply suggestions from TW's review
LeoRoccoBreedt Jun 6, 2025
f3bae1a
refactor: updates from TW and sourcery review
LeoRoccoBreedt Jun 6, 2025
d848e0c
Merge remote-tracking branch 'origin/main' into lb/debugging_model_tr…
LeoRoccoBreedt Jun 24, 2025
6c828f2
refactor: update notebook workflow and links from TW review
LeoRoccoBreedt Jun 24, 2025
3bb6f5a
Merge remote-tracking branch 'origin/main' into lb/debugging_model_tr…
LeoRoccoBreedt Jun 25, 2025
6b97205
Apply suggestions from code review
LeoRoccoBreedt Jun 26, 2025
f19dfb7
refactor: updates from TW's review
LeoRoccoBreedt Jun 26, 2025
4abd1db
Apply suggestions from code review
LeoRoccoBreedt Jun 27, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .github/workflows/test-notebooks.yml
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ jobs:
matrix:
python-version: ["3.9", "3.10", "3.11", "3.12", "3.13"]
notebooks: # Add in alphabetical order
- how-to-guides/debug-model-training-runs/notebooks/debug_training_runs.ipynb
- how-to-guides/hpo/notebooks/Neptune_HPO.ipynb
- how-to-guides/quickstart/notebooks/neptune_quickstart.ipynb
os: ["${{ inputs.os }}"]
Expand Down
1 change: 1 addition & 0 deletions .github/workflows/test-scripts.yml
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ jobs:
matrix:
python-version: ["3.9", "3.10", "3.11", "3.12", "3.13"]
scripts: # Add in alphabetical order
- how-to-guides/debug-model-training-runs/scripts
- how-to-guides/hpo/scripts
- how-to-guides/quickstart/scripts
os: ["${{ inputs.os }}"]
Expand Down
19 changes: 12 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,13 +14,14 @@ With Neptune, you can monitor thousands of per-layer metrics—losses, gradients

This repo contains tutorials and examples of how to use Neptune.

| | Docs | Neptune | GitHub | Colab |
| -- | :--: | :--: | :--: | :--: |
| Quickstart | [![docs-icon]][quickstart] | [![neptune-icon]][quickstart-example] | [![github-icon]][qs-notebook] | [![colab-icon]][qs-colab] |
| Log different types of metadata | [![docs-icon]][log-metadata] | [![neptune-icon]][log-metadata-example] | | |
| Organize and filter runs | [![docs-icon]][runs-table] | [![neptune-icon]][runs-table-example] | | |
| Resume run or other object | [![docs-icon]][resume-run] | | | |
| Use Neptune in HPO jobs | [![docs-icon]][hpo] | [![neptune-icon]][hpo-example] | [![github-icon]][hpo-notebook] | [![colab-icon]][hpo-colab] |
| | Docs | Neptune | GitHub | Colab |
| ------------------------------- | ---------------------------- | --------------------------------------- | ------------------------------ | -------------------------- |
| Quickstart | [![docs-icon]][quickstart] | [![neptune-icon]][quickstart-example] | [![github-icon]][qs-notebook] | [![colab-icon]][qs-colab] |
| Log different types of metadata | [![docs-icon]][log-metadata] | [![neptune-icon]][log-metadata-example] | | |
| Organize and filter runs | [![docs-icon]][runs-table] | [![neptune-icon]][runs-table-example] | | |
| Resume run or other object | [![docs-icon]][resume-run] | | | |
| Use Neptune in HPO jobs | [![docs-icon]][hpo] | [![neptune-icon]][hpo-example] | [![github-icon]][hpo-notebook] | [![colab-icon]][hpo-colab] |
| Debug training runs | [![docs-icon]][debug] | [![neptune-icon]][debug-example] | [![github-icon]][debug-notebook] | [![colab-icon]][debug-colab] |

### Migration tools

Expand Down Expand Up @@ -48,6 +49,8 @@ This repo contains tutorials and examples of how to use Neptune.
[hpo-colab]: https://colab.research.google.com/github/neptune-ai/scale-examples/blob/master/how-to-guides/hpo/notebooks/Neptune_HPO.ipynb
[qs-notebook]: how-to-guides/quickstart/notebooks/neptune_quickstart.ipynb
[qs-colab]: https://colab.research.google.com/github/neptune-ai/scale-examples/blob/master/how-to-guides/quickstart/notebooks/neptune_quickstart.ipynb
[debug-notebook]: how-to-guides/debug-model-training-runs/debug_training_runs.ipynb
[debug-colab]: https://colab.research.google.com/github/neptune-ai/scale-examples/blob/master/how-to-guides/debug-model-training-runs/notebooks/debug_training_runs.ipynb

<!-- External -->
[blog]: https://neptune.ai/blog
Expand All @@ -66,6 +69,8 @@ This repo contains tutorials and examples of how to use Neptune.
[resume-run]: https://docs.neptune.ai/resume_run
[runs-table]: https://docs.neptune.ai/runs_table
[runs-table-example]: https://scale.neptune.ai/o/examples/org/LLM-Pretraining/runs/table?viewId=9e746462-f045-4ff2-9ac4-e41fa349b04d&detailsTab=dashboard&dash=table&type=run&compare=auto-5
[debug]: https://docs.neptune.ai/debug_runs_tutorial
[debug-example]: https://scale.neptune.ai/o/examples/org/debug-training-metrics/runs/table?viewId=standard-view&dash=table&compareChartsFilter-compound=udzSoRe3VmlvolZ8TbuB_zvfcAcgJmla8UuNku1rGWdg

<!-- Clickable icons -->
[docs-icon]: https://neptune.ai/wp-content/uploads/2023/06/file_icon.svg "Read the documentation"
Expand Down
Loading
Loading