Skip to content

Commit

Permalink
Merge branch 'master' into feature/drop-torch-1-12
Browse files Browse the repository at this point in the history
  • Loading branch information
Borda authored Jan 26, 2024
2 parents 3b46cde + 3bd133b commit 7033aad
Show file tree
Hide file tree
Showing 68 changed files with 2,263 additions and 293 deletions.
9 changes: 8 additions & 1 deletion .azure/gpu-tests-fabric.yml
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,9 @@ jobs:
"Fabric | latest":
image: "pytorchlightning/pytorch_lightning:base-cuda-py3.10-torch2.1-cuda12.1.0"
PACKAGE_NAME: "fabric"
"Fabric | future":
image: "pytorchlightning/pytorch_lightning:base-cuda-py3.11-torch2.2-cuda12.1.0"
PACKAGE_NAME: "fabric"
"Lightning | latest":
image: "pytorchlightning/pytorch_lightning:base-cuda-py3.10-torch2.1-cuda12.1.0"
PACKAGE_NAME: "lightning"
Expand All @@ -73,6 +76,10 @@ jobs:
scope=$(python -c 'n = "$(PACKAGE_NAME)" ; print(dict(fabric="lightning_fabric").get(n, n))')
echo "##vso[task.setvariable variable=COVERAGE_SOURCE]$scope"
displayName: "set env. vars"
- bash: |
echo "##vso[task.setvariable variable=TORCH_URL]https://download.pytorch.org/whl/test/cu${CUDA_VERSION_MM}/torch_test.html"
condition: endsWith(variables['Agent.JobName'], 'future')
displayName: "set env. vars 4 future"
- bash: |
echo $(DEVICES)
Expand All @@ -99,7 +106,7 @@ jobs:
- bash: |
extra=$(python -c "print({'lightning': 'fabric-'}.get('$(PACKAGE_NAME)', ''))")
pip install -e ".[${extra}dev]" pytest-timeout -U --find-links ${TORCH_URL}
pip install -e ".[${extra}dev]" pytest-timeout -U --find-links="${TORCH_URL}"
displayName: "Install package & dependencies"
- bash: |
Expand Down
10 changes: 9 additions & 1 deletion .azure/gpu-tests-pytorch.yml
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,10 @@ jobs:
"PyTorch | latest":
image: "pytorchlightning/pytorch_lightning:base-cuda-py3.10-torch2.1-cuda12.1.0"
PACKAGE_NAME: "pytorch"
"PyTorch | future":
# todo: failed to install `pygame` with py3.11
image: "pytorchlightning/pytorch_lightning:base-cuda-py3.10-torch2.2-cuda12.1.0"
PACKAGE_NAME: "pytorch"
"Lightning | latest":
image: "pytorchlightning/pytorch_lightning:base-cuda-py3.10-torch2.1-cuda12.1.0"
PACKAGE_NAME: "lightning"
Expand All @@ -76,6 +80,10 @@ jobs:
scope=$(python -c 'n = "$(PACKAGE_NAME)" ; print(dict(pytorch="pytorch_lightning").get(n, n))')
echo "##vso[task.setvariable variable=COVERAGE_SOURCE]$scope"
displayName: "set env. vars"
- bash: |
echo "##vso[task.setvariable variable=TORCH_URL]https://download.pytorch.org/whl/test/cu${CUDA_VERSION_MM}/torch_test.html"
condition: endsWith(variables['Agent.JobName'], 'future')
displayName: "set env. vars 4 future"
- bash: |
echo $(DEVICES)
Expand Down Expand Up @@ -109,7 +117,7 @@ jobs:
- bash: |
extra=$(python -c "print({'lightning': 'pytorch-'}.get('$(PACKAGE_NAME)', ''))")
pip install -e ".[${extra}dev]" -r requirements/_integrations/strategies.txt pytest-timeout -U --find-links ${TORCH_URL}
pip install -e ".[${extra}dev]" -r requirements/_integrations/strategies.txt pytest-timeout -U --find-links="${TORCH_URL}"
displayName: "Install package & dependencies"
- bash: pip uninstall -y lightning
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/ci-tests-data.yml
Original file line number Diff line number Diff line change
Expand Up @@ -87,7 +87,7 @@ jobs:
# ls -lh $PYPI_CACHE_DIR

- name: Install package & dependencies
timeout-minutes: 20
timeout-minutes: 30
run: |
pip install -e ".[data-dev]" -U --prefer-binary -f ${TORCH_URL}
pip list
Expand Down
10 changes: 7 additions & 3 deletions .github/workflows/ci-tests-fabric.yml
Original file line number Diff line number Diff line change
Expand Up @@ -42,13 +42,17 @@ jobs:
- { os: "macOS-11", pkg-name: "lightning", python-version: "3.10", pytorch-version: "1.13" }
- { os: "ubuntu-20.04", pkg-name: "lightning", python-version: "3.10", pytorch-version: "1.13" }
- { os: "windows-2022", pkg-name: "lightning", python-version: "3.10", pytorch-version: "1.13" }
# only run PyTorch latest with Python recent
# only run PyTorch latest
- { os: "macOS-11", pkg-name: "lightning", python-version: "3.10", pytorch-version: "2.0" }
- { os: "ubuntu-20.04", pkg-name: "lightning", python-version: "3.10", pytorch-version: "2.0" }
- { os: "windows-2022", pkg-name: "lightning", python-version: "3.10", pytorch-version: "2.0" }
- { os: "macOS-11", pkg-name: "lightning", python-version: "3.11", pytorch-version: "2.1" }
- { os: "ubuntu-20.04", pkg-name: "lightning", python-version: "3.11", pytorch-version: "2.1" }
- { os: "windows-2022", pkg-name: "lightning", python-version: "3.11", pytorch-version: "2.1" }
# only run PyTorch future
- { os: "macOS-12", pkg-name: "lightning", python-version: "3.11", pytorch-version: "2.2" }
- { os: "ubuntu-22.04", pkg-name: "lightning", python-version: "3.11", pytorch-version: "2.2" }
- { os: "windows-2022", pkg-name: "lightning", python-version: "3.11", pytorch-version: "2.2" }
# only run PyTorch latest with Python latest, use Fabric scope to limit dependency issues
- { os: "macOS-12", pkg-name: "fabric", python-version: "3.11", pytorch-version: "2.0" }
- { os: "ubuntu-22.04", pkg-name: "fabric", python-version: "3.11", pytorch-version: "2.0" }
Expand Down Expand Up @@ -125,7 +129,7 @@ jobs:
- name: Env. variables
run: |
# Switch PyTorch URL
#python -c "print('TORCH_URL=' + str('${{env.TORCH_URL_TEST}}' if '${{ matrix.release }}' == 'pre' else '${{env.TORCH_URL_STABLE}}'))" >> $GITHUB_ENV
python -c "print('TORCH_URL=' + str('${{env.TORCH_URL_TEST}}' if '${{ matrix.pytorch-version }}' == '2.2' else '${{env.TORCH_URL_STABLE}}'))" >> $GITHUB_ENV
# Switch coverage scope
python -c "print('COVERAGE_SCOPE=' + str('lightning' if '${{matrix.pkg-name}}' == 'lightning' else 'lightning_fabric'))" >> $GITHUB_ENV
# if you install mono-package set dependency only for this subpackage
Expand Down Expand Up @@ -154,7 +158,7 @@ jobs:
- name: Testing Warnings
working-directory: tests/tests_fabric
# needs to run outside of `pytest`
# needs to run outside `pytest`
run: python utilities/test_warnings.py

- name: Testing Fabric
Expand Down
10 changes: 7 additions & 3 deletions .github/workflows/ci-tests-pytorch.yml
Original file line number Diff line number Diff line change
Expand Up @@ -46,13 +46,17 @@ jobs:
- { os: "macOS-11", pkg-name: "lightning", python-version: "3.10", pytorch-version: "1.13" }
- { os: "ubuntu-20.04", pkg-name: "lightning", python-version: "3.10", pytorch-version: "1.13" }
- { os: "windows-2022", pkg-name: "lightning", python-version: "3.10", pytorch-version: "1.13" }
# only run PyTorch latest with Python recent
# only run PyTorch latest
- { os: "macOS-11", pkg-name: "lightning", python-version: "3.10", pytorch-version: "2.0" }
- { os: "ubuntu-20.04", pkg-name: "lightning", python-version: "3.10", pytorch-version: "2.0" }
- { os: "windows-2022", pkg-name: "lightning", python-version: "3.10", pytorch-version: "2.0" }
- { os: "macOS-11", pkg-name: "lightning", python-version: "3.10", pytorch-version: "2.1" }
- { os: "ubuntu-20.04", pkg-name: "lightning", python-version: "3.10", pytorch-version: "2.1" }
- { os: "windows-2022", pkg-name: "lightning", python-version: "3.10", pytorch-version: "2.1" }
# only run PyTorch future
- { os: "macOS-12", pkg-name: "lightning", python-version: "3.10", pytorch-version: "2.2" }
- { os: "ubuntu-22.04", pkg-name: "lightning", python-version: "3.10", pytorch-version: "2.2" }
- { os: "windows-2022", pkg-name: "lightning", python-version: "3.10", pytorch-version: "2.2" }
# only run PyTorch latest with Python latest, use PyTorch scope to limit dependency issues
- { os: "macOS-12", pkg-name: "pytorch", python-version: "3.11", pytorch-version: "2.0" }
- { os: "ubuntu-22.04", pkg-name: "pytorch", python-version: "3.11", pytorch-version: "2.0" }
Expand Down Expand Up @@ -131,7 +135,7 @@ jobs:
- name: Env. variables
run: |
# Switch PyTorch URL
#python -c "print('TORCH_URL=' + str('${{env.TORCH_URL_TEST}}' if '${{ matrix.release }}' == 'pre' else '${{env.TORCH_URL_STABLE}}'))" >> $GITHUB_ENV
python -c "print('TORCH_URL=' + str('${{env.TORCH_URL_TEST}}' if '${{ matrix.pytorch-version }}' == '2.2' else '${{env.TORCH_URL_STABLE}}'))" >> $GITHUB_ENV
# Switch coverage scope
python -c "print('COVERAGE_SCOPE=' + str('lightning' if '${{matrix.pkg-name}}' == 'lightning' else 'pytorch_lightning'))" >> $GITHUB_ENV
# if you install mono-package set dependency only for this subpackage
Expand Down Expand Up @@ -191,7 +195,7 @@ jobs:
- name: Testing Warnings
working-directory: tests/tests_pytorch
# needs to run outside of `pytest`
# needs to run outside `pytest`
run: python utilities/test_warnings.py

- name: Testing PyTorch
Expand Down
8 changes: 6 additions & 2 deletions .github/workflows/docker-build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -101,12 +101,16 @@ jobs:
fail-fast: false
matrix:
include:
# These are the base images for PL release docker images.
# Make sure the matrix here matches the one above.
# These are the base images for PL release docker images,
# so include at least all the combinations in release-dockers.yml.
- { python_version: "3.9", pytorch_version: "1.13", cuda_version: "11.8.0" }
- { python_version: "3.9", pytorch_version: "1.13", cuda_version: "12.0.1" }
- { python_version: "3.10", pytorch_version: "2.0", cuda_version: "11.8.0" }
- { python_version: "3.10", pytorch_version: "2.1", cuda_version: "12.1.0" }
- { python_version: "3.10", pytorch_version: "2.2", cuda_version: "12.1.0" }
- { python_version: "3.11", pytorch_version: "2.1", cuda_version: "12.1.0" }
- { python_version: "3.11", pytorch_version: "2.2", cuda_version: "12.1.0" }
# - { python_version: "3.12", pytorch_version: "2.2", cuda_version: "12.1.0" } # todo: pending on `onnxruntime`
steps:
- uses: actions/checkout@v4
- uses: docker/setup-buildx-action@v3
Expand Down
11 changes: 5 additions & 6 deletions dockers/base-cuda/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@
# See the License for the specific language governing permissions and
# limitations under the License.

ARG UBUNTU_VERSION=20.04
ARG UBUNTU_VERSION=22.04
ARG CUDA_VERSION=11.7.1


Expand All @@ -38,7 +38,7 @@ RUN \
# https://github.com/NVIDIA/nvidia-docker/issues/1631
# https://github.com/NVIDIA/nvidia-docker/issues/1631#issuecomment-1264715214
apt-get update && apt-get install -y wget && \
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/3bf863cc.pub && \
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/3bf863cc.pub && \
mkdir -p /etc/apt/keyrings/ && mv 3bf863cc.pub /etc/apt/keyrings/ && \
echo "deb [signed-by=/etc/apt/keyrings/3bf863cc.pub] https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/ /" /etc/apt/sources.list.d/cuda.list && \
apt-get update -qq --fix-missing && \
Expand Down Expand Up @@ -82,9 +82,7 @@ COPY requirements/_integrations/ requirements/_integrations/
ENV PYTHONPATH="/usr/lib/python${PYTHON_VERSION}/site-packages"

RUN \
wget https://bootstrap.pypa.io/get-pip.py --progress=bar:force:noscroll --no-check-certificate && \
python${PYTHON_VERSION} get-pip.py && \
rm get-pip.py && \
curl https://bootstrap.pypa.io/get-pip.py | python${PYTHON_VERSION} && \
# Disable cache \
pip config set global.cache-dir false && \
# set particular PyTorch version \
Expand All @@ -99,7 +97,8 @@ RUN \
-r requirements/pytorch/extra.txt \
-r requirements/pytorch/test.txt \
-r requirements/pytorch/strategies.txt \
--find-links "https://download.pytorch.org/whl/cu${CUDA_VERSION_MM//'.'/''}/torch_stable.html"
--find-links="https://download.pytorch.org/whl/cu${CUDA_VERSION_MM//'.'/''}/torch_stable.html" \
--find-links="https://download.pytorch.org/whl/test/cu${CUDA_VERSION_MM//'.'/''}/torch_test.html"

RUN \
# Show what we have
Expand Down
1 change: 0 additions & 1 deletion docs/source-fabric/api/fabric_methods.rst
Original file line number Diff line number Diff line change
Expand Up @@ -108,7 +108,6 @@ This is useful if your model experiences *exploding gradients* during training.
fabric.clip_gradients(model, optimizer, max_norm=2.0, norm_type="inf")
The :meth:`~lightning.fabric.fabric.Fabric.clip_gradients` method is agnostic to the precision and strategy being used.
Note: Gradient clipping with FSDP is not yet fully supported.


to_device
Expand Down
2 changes: 1 addition & 1 deletion docs/source-fabric/fundamentals/launch.rst
Original file line number Diff line number Diff line change
Expand Up @@ -172,7 +172,7 @@ Choose from the following options based on your expertise level and available in
<div class="row">

.. displayitem::
:header: Lightning Cloud
:header: Run single or multi-node on Lightning Studios
:description: The easiest way to scale models in the cloud. No infrastructure setup required.
:col_css: col-md-4
:button_link: ../guide/multi_node/cloud.html
Expand Down
34 changes: 33 additions & 1 deletion docs/source-fabric/guide/checkpoint/distributed_checkpoint.rst
Original file line number Diff line number Diff line change
Expand Up @@ -183,4 +183,36 @@ Note that you can load the distributed checkpoint even if the world size has cha
Convert a distributed checkpoint
********************************

Coming soon.
It is possible to convert a distributed checkpoint to a regular, single-file checkpoint with this utility:

.. code-block:: bash
python -m lightning.fabric.utilities.consolidate_checkpoint path/to/my/checkpoint
You will need to do this for example if you want to load the checkpoint into a script that doesn't use FSDP, or need to export the checkpoint to a different format for deployment, evaluation, etc.

.. note::

All tensors in the checkpoint will be converted to CPU tensors, and no GPUs are required to run the conversion command.
This function assumes you have enough free CPU memory to hold the entire checkpoint in memory.

.. collapse:: Full example

Assuming you have saved a checkpoint ``my-checkpoint.ckpt`` using the examples above, run the following command to convert it:

.. code-block:: bash
python -m lightning.fabric.utilities.consolidate_checkpoint my-checkpoint.ckpt
This saves a new file ``my-checkpoint.ckpt.consolidated`` next to the sharded checkpoint which you can load normally in PyTorch:

.. code-block:: python
import torch
checkpoint = torch.load("my-checkpoint.ckpt.consolidated")
print(list(checkpoint.keys()))
print(checkpoint["model"]["transformer.decoder.layers.31.norm1.weight"])
|
Loading

0 comments on commit 7033aad

Please sign in to comment.