"ValueError: You selected an invalid strategy name" When DDPStrategy(process_group_backend="gloo") is passed #20526

11philip22 · 2025-01-05T00:23:00Z

Bug description

When I run this code on Python 3.12.8 with pytorch-lightning 2.4.0 I get a ValueError

What version are you seeing the problem on?

v2.4

How to reproduce the bug

ddp_gloo = DDPStrategy(process_group_backend="gloo")

trainer = Trainer(
    devices=2,
    # devices=1,
    accelerator='gpu',
    strategy=ddp_gloo,
    benchmark=True,
    logger=logger,
    callbacks=[checkpoint_callback, lr_monitor],
    check_val_every_n_epoch=1,
    max_epochs=30,
    # max_epochs=3,
)
trainer.fit(model, data_module)

Error messages and logs

Traceback (most recent call last):
  File "C:\Users\Philip\source\repos\insightface_alignment_lightning\src\train.py", line 59, in <module>
    main()
  File "C:\Users\Philip\source\repos\insightface_alignment_lightning\src\train.py", line 43, in main
    trainer = Trainer(
              ^^^^^^^^
  File "C:\Users\Philip\.conda\envs\lightning\Lib\site-packages\pytorch_lightning\utilities\argparse.py", line 70, in insert_env_defaults
    return fn(self, **kwargs)
           ^^^^^^^^^^^^^^^^^^
  File "C:\Users\Philip\.conda\envs\lightning\Lib\site-packages\pytorch_lightning\trainer\trainer.py", line 395, in __init__
    self._accelerator_connector = _AcceleratorConnector(
                                  ^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Philip\.conda\envs\lightning\Lib\site-packages\pytorch_lightning\trainer\connectors\accelerator_connector.py", line 130, in __init__
    self._check_config_and_set_final_flags(
  File "C:\Users\Philip\.conda\envs\lightning\Lib\site-packages\pytorch_lightning\trainer\connectors\accelerator_connector.py", line 193, in _check_config_and_set_final_flags
    raise ValueError(
ValueError: You selected an invalid strategy name: `strategy=<lightning.pytorch.strategies.ddp.DDPStrategy object at 0x0000023622FA2240>`. It must be either a string or an instance of `pytorch_lightning.strategies.Strategy`. Example choices: auto, ddp, ddp_spawn, deepspeed, ... Find a complete list of options in our documentation at https://lightning.ai

Environment

Current environment

CUDA:
- GPU:
- Quadro P6000
- Quadro P6000
- available: True
- version: 12.4
Lightning:
- efficientnet-pytorch: 0.7.1
- lightning: 2.4.0
- lightning-utilities: 0.11.9
- pytorch-lightning: 2.4.0
- segmentation-models-pytorch: 0.3.5.dev0
- torch: 2.5.1
- torchmetrics: 1.6.0
- torchvision: 0.20.1
Packages:
- absl-py: 2.1.0
- aiohappyeyeballs: 2.4.4
- aiohttp: 3.11.11
- aiosignal: 1.3.2
- albucore: 0.0.21
- albumentations: 1.4.23
- annotated-types: 0.7.0
- attrs: 24.3.0
- autocommand: 2.2.2
- backports.tarfile: 1.2.0
- brotli: 1.1.0
- certifi: 2024.12.14
- cffi: 1.17.1
- charset-normalizer: 3.4.0
- colorama: 0.4.6
- contourpy: 1.3.1
- cycler: 0.12.1
- efficientnet-pytorch: 0.7.1
- eval-type-backport: 0.2.0
- filelock: 3.16.1
- fonttools: 4.55.3
- frozenlist: 1.5.0
- fsspec: 2024.10.0
- grpcio: 1.68.1
- h2: 4.1.0
- hpack: 4.0.0
- huggingface-hub: 0.27.0
- hyperframe: 6.0.1
- idna: 3.10
- importlib-metadata: 8.0.0
- inflect: 7.3.1
- jaraco.collections: 5.1.0
- jaraco.context: 5.3.0
- jaraco.functools: 4.0.1
- jaraco.text: 3.12.1
- jinja2: 3.1.4
- kiwisolver: 1.4.7
- lightning: 2.4.0
- lightning-utilities: 0.11.9
- markdown: 3.7
- markupsafe: 3.0.2
- matplotlib: 3.10.0
- more-itertools: 10.3.0
- mpmath: 1.3.0
- multidict: 6.1.0
- munch: 4.0.0
- networkx: 3.4.2
- numpy: 2.2.0
- opencv-python: 4.10.0.84
- opencv-python-headless: 4.10.0.84
- packaging: 24.2
- pillow: 10.4.0
- pip: 24.3.1
- platformdirs: 4.2.2
- pretrainedmodels: 0.7.4
- propcache: 0.2.1
- protobuf: 5.29.2
- pycocotools: 2.0.8
- pycparser: 2.22
- pydantic: 2.10.4
- pydantic-core: 2.27.2
- pyparsing: 3.2.0
- pysocks: 1.7.1
- python-dateutil: 2.9.0.post0
- pytorch-lightning: 2.4.0
- pyyaml: 6.0.2
- requests: 2.32.3
- safetensors: 0.5.0
- scipy: 1.14.1
- segmentation-models-pytorch: 0.3.5.dev0
- setuptools: 75.6.0
- simsimd: 6.2.1
- six: 1.17.0
- stringzilla: 3.11.2
- sympy: 1.13.1
- tensorboard: 2.18.0
- tensorboard-data-server: 0.7.2
- timm: 1.0.12
- tomli: 2.0.1
- torch: 2.5.1
- torchmetrics: 1.6.0
- torchvision: 0.20.1
- tqdm: 4.67.1
- typeguard: 4.3.0
- typing-extensions: 4.12.2
- urllib3: 2.2.3
- werkzeug: 3.1.3
- wheel: 0.45.1
- win-inet-pton: 1.1.0
- yarl: 1.18.3
- zipp: 3.19.2
- zstandard: 0.23.0
System:
- OS: Windows
- architecture:
- 64bit
- WindowsPE
- processor: Intel64 Family 6 Model 94 Stepping 3, GenuineIntel
- python: 3.12.8
- release: 10
- version: 10.0.19045

More info

No response

The text was updated successfully, but these errors were encountered:

lantiga · 2025-01-06T08:51:24Z

Hey @11philip22 can you show the full imports? I'd like to make sure you're not importing the Trainer and the strategy from different packages, like pytorch_lightning and lightning.

thomas-keller · 2025-02-17T14:55:32Z

Hi, I also encountered the same error recently trying to run the mnist tune example. For me, it was an issue of pytorch_lightning vs lightning. I replaced pytorch_lightning as pl with lightning.pytorch as pl, and it worked as expected.

Now, why I had pytorch_lightning AND lightning installed is a question only past me can answer.

11philip22 added bug Something isn't working needs triage Waiting to be triaged by maintainers labels Jan 5, 2025

github-actions bot added the ver: 2.4.x label Jan 5, 2025

lantiga added waiting on author Waiting on user action, correction, or update and removed needs triage Waiting to be triaged by maintainers labels Jan 6, 2025

lantiga added repro needed The issue is missing a reproducible example and removed waiting on author Waiting on user action, correction, or update labels Jan 13, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"ValueError: You selected an invalid strategy name" When DDPStrategy(process_group_backend="gloo") is passed #20526

"ValueError: You selected an invalid strategy name" When DDPStrategy(process_group_backend="gloo") is passed #20526

11philip22 commented Jan 5, 2025

lantiga commented Jan 6, 2025

thomas-keller commented Feb 17, 2025

"ValueError: You selected an invalid strategy name" When DDPStrategy(process_group_backend="gloo") is passed #20526

"ValueError: You selected an invalid strategy name" When DDPStrategy(process_group_backend="gloo") is passed #20526

Comments

11philip22 commented Jan 5, 2025

Bug description

What version are you seeing the problem on?

How to reproduce the bug

Error messages and logs

Environment

More info

lantiga commented Jan 6, 2025

thomas-keller commented Feb 17, 2025