Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"ValueError: You selected an invalid strategy name" When DDPStrategy(process_group_backend="gloo") is passed #20526

Open
11philip22 opened this issue Jan 5, 2025 · 2 comments
Labels
bug Something isn't working repro needed The issue is missing a reproducible example ver: 2.4.x

Comments

@11philip22
Copy link

Bug description

When I run this code on Python 3.12.8 with pytorch-lightning 2.4.0 I get a ValueError

What version are you seeing the problem on?

v2.4

How to reproduce the bug

ddp_gloo = DDPStrategy(process_group_backend="gloo")

trainer = Trainer(
    devices=2,
    # devices=1,
    accelerator='gpu',
    strategy=ddp_gloo,
    benchmark=True,
    logger=logger,
    callbacks=[checkpoint_callback, lr_monitor],
    check_val_every_n_epoch=1,
    max_epochs=30,
    # max_epochs=3,
)
trainer.fit(model, data_module)

Error messages and logs

Traceback (most recent call last):
  File "C:\Users\Philip\source\repos\insightface_alignment_lightning\src\train.py", line 59, in <module>
    main()
  File "C:\Users\Philip\source\repos\insightface_alignment_lightning\src\train.py", line 43, in main
    trainer = Trainer(
              ^^^^^^^^
  File "C:\Users\Philip\.conda\envs\lightning\Lib\site-packages\pytorch_lightning\utilities\argparse.py", line 70, in insert_env_defaults
    return fn(self, **kwargs)
           ^^^^^^^^^^^^^^^^^^
  File "C:\Users\Philip\.conda\envs\lightning\Lib\site-packages\pytorch_lightning\trainer\trainer.py", line 395, in __init__
    self._accelerator_connector = _AcceleratorConnector(
                                  ^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Philip\.conda\envs\lightning\Lib\site-packages\pytorch_lightning\trainer\connectors\accelerator_connector.py", line 130, in __init__
    self._check_config_and_set_final_flags(
  File "C:\Users\Philip\.conda\envs\lightning\Lib\site-packages\pytorch_lightning\trainer\connectors\accelerator_connector.py", line 193, in _check_config_and_set_final_flags
    raise ValueError(
ValueError: You selected an invalid strategy name: `strategy=<lightning.pytorch.strategies.ddp.DDPStrategy object at 0x0000023622FA2240>`. It must be either a string or an instance of `pytorch_lightning.strategies.Strategy`. Example choices: auto, ddp, ddp_spawn, deepspeed, ... Find a complete list of options in our documentation at https://lightning.ai

Environment

Current environment
  • CUDA:
    - GPU:
    - Quadro P6000
    - Quadro P6000
    - available: True
    - version: 12.4
  • Lightning:
    - efficientnet-pytorch: 0.7.1
    - lightning: 2.4.0
    - lightning-utilities: 0.11.9
    - pytorch-lightning: 2.4.0
    - segmentation-models-pytorch: 0.3.5.dev0
    - torch: 2.5.1
    - torchmetrics: 1.6.0
    - torchvision: 0.20.1
  • Packages:
    - absl-py: 2.1.0
    - aiohappyeyeballs: 2.4.4
    - aiohttp: 3.11.11
    - aiosignal: 1.3.2
    - albucore: 0.0.21
    - albumentations: 1.4.23
    - annotated-types: 0.7.0
    - attrs: 24.3.0
    - autocommand: 2.2.2
    - backports.tarfile: 1.2.0
    - brotli: 1.1.0
    - certifi: 2024.12.14
    - cffi: 1.17.1
    - charset-normalizer: 3.4.0
    - colorama: 0.4.6
    - contourpy: 1.3.1
    - cycler: 0.12.1
    - efficientnet-pytorch: 0.7.1
    - eval-type-backport: 0.2.0
    - filelock: 3.16.1
    - fonttools: 4.55.3
    - frozenlist: 1.5.0
    - fsspec: 2024.10.0
    - grpcio: 1.68.1
    - h2: 4.1.0
    - hpack: 4.0.0
    - huggingface-hub: 0.27.0
    - hyperframe: 6.0.1
    - idna: 3.10
    - importlib-metadata: 8.0.0
    - inflect: 7.3.1
    - jaraco.collections: 5.1.0
    - jaraco.context: 5.3.0
    - jaraco.functools: 4.0.1
    - jaraco.text: 3.12.1
    - jinja2: 3.1.4
    - kiwisolver: 1.4.7
    - lightning: 2.4.0
    - lightning-utilities: 0.11.9
    - markdown: 3.7
    - markupsafe: 3.0.2
    - matplotlib: 3.10.0
    - more-itertools: 10.3.0
    - mpmath: 1.3.0
    - multidict: 6.1.0
    - munch: 4.0.0
    - networkx: 3.4.2
    - numpy: 2.2.0
    - opencv-python: 4.10.0.84
    - opencv-python-headless: 4.10.0.84
    - packaging: 24.2
    - pillow: 10.4.0
    - pip: 24.3.1
    - platformdirs: 4.2.2
    - pretrainedmodels: 0.7.4
    - propcache: 0.2.1
    - protobuf: 5.29.2
    - pycocotools: 2.0.8
    - pycparser: 2.22
    - pydantic: 2.10.4
    - pydantic-core: 2.27.2
    - pyparsing: 3.2.0
    - pysocks: 1.7.1
    - python-dateutil: 2.9.0.post0
    - pytorch-lightning: 2.4.0
    - pyyaml: 6.0.2
    - requests: 2.32.3
    - safetensors: 0.5.0
    - scipy: 1.14.1
    - segmentation-models-pytorch: 0.3.5.dev0
    - setuptools: 75.6.0
    - simsimd: 6.2.1
    - six: 1.17.0
    - stringzilla: 3.11.2
    - sympy: 1.13.1
    - tensorboard: 2.18.0
    - tensorboard-data-server: 0.7.2
    - timm: 1.0.12
    - tomli: 2.0.1
    - torch: 2.5.1
    - torchmetrics: 1.6.0
    - torchvision: 0.20.1
    - tqdm: 4.67.1
    - typeguard: 4.3.0
    - typing-extensions: 4.12.2
    - urllib3: 2.2.3
    - werkzeug: 3.1.3
    - wheel: 0.45.1
    - win-inet-pton: 1.1.0
    - yarl: 1.18.3
    - zipp: 3.19.2
    - zstandard: 0.23.0
  • System:
    - OS: Windows
    - architecture:
    - 64bit
    - WindowsPE
    - processor: Intel64 Family 6 Model 94 Stepping 3, GenuineIntel
    - python: 3.12.8
    - release: 10
    - version: 10.0.19045

More info

No response

@11philip22 11philip22 added bug Something isn't working needs triage Waiting to be triaged by maintainers labels Jan 5, 2025
@lantiga
Copy link
Collaborator

lantiga commented Jan 6, 2025

Hey @11philip22 can you show the full imports? I'd like to make sure you're not importing the Trainer and the strategy from different packages, like pytorch_lightning and lightning.

@lantiga lantiga added waiting on author Waiting on user action, correction, or update and removed needs triage Waiting to be triaged by maintainers labels Jan 6, 2025
@lantiga lantiga added repro needed The issue is missing a reproducible example and removed waiting on author Waiting on user action, correction, or update labels Jan 13, 2025
@thomas-keller
Copy link

Hi, I also encountered the same error recently trying to run the mnist tune example. For me, it was an issue of pytorch_lightning vs lightning. I replaced pytorch_lightning as pl with lightning.pytorch as pl, and it worked as expected.

Now, why I had pytorch_lightning AND lightning installed is a question only past me can answer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working repro needed The issue is missing a reproducible example ver: 2.4.x
Projects
None yet
Development

No branches or pull requests

3 participants