Skip to content

Commit 9004379

Browse files
awaelchlicarmocca
andauthored
Clarify self.log(..., rank_zero_only=True|False) (#19056)
Co-authored-by: Carlos Mocholí <[email protected]>
1 parent a6da1e3 commit 9004379

File tree

4 files changed

+29
-10
lines changed

4 files changed

+29
-10
lines changed

docs/source-pytorch/accelerators/accelerator_prepare.rst

+4-2
Original file line numberDiff line numberDiff line change
@@ -121,14 +121,16 @@ It is possible to perform some computation manually and log the reduced result o
121121
mean = torch.mean(self.all_gather(self.outputs))
122122
self.outputs.clear() # free memory
123123
124-
# When logging only on rank 0, don't forget to add
124+
# When you call `self.log` only on rank 0, don't forget to add
125125
# `rank_zero_only=True` to avoid deadlocks on synchronization.
126-
# caveat: monitoring this is unimplemented. see https://github.com/Lightning-AI/lightning/issues/15852
126+
# Caveat: monitoring this is unimplemented, see https://github.com/Lightning-AI/lightning/issues/15852
127127
if self.trainer.is_global_zero:
128128
self.log("my_reduced_metric", mean, rank_zero_only=True)
129129
130+
130131
----
131132

133+
132134
**********************
133135
Make models pickleable
134136
**********************

docs/source-pytorch/extensions/logging.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -141,7 +141,7 @@ The :meth:`~lightning.pytorch.core.LightningModule.log` method has a few options
141141
* ``sync_dist_group``: The DDP group to sync across.
142142
* ``add_dataloader_idx``: If True, appends the index of the current dataloader to the name (when using multiple dataloaders). If False, user needs to give unique names for each dataloader to not mix the values.
143143
* ``batch_size``: Current batch size used for accumulating logs logged with ``on_epoch=True``. This will be directly inferred from the loaded batch, but for some data structures you might need to explicitly provide it.
144-
* ``rank_zero_only``: Whether the value will be logged only on rank 0. This will prevent synchronization which would produce a deadlock as not all processes would perform this log call.
144+
* ``rank_zero_only``: Set this to ``True`` only if you call ``self.log`` explicitly only from rank 0. If ``True`` you won't be able to access or specify this metric in callbacks (e.g. early stopping).
145145

146146
.. list-table:: Default behavior of logging in Callback or LightningModule
147147
:widths: 50 25 25

docs/source-pytorch/visualize/logging_advanced.rst

+16-3
Original file line numberDiff line numberDiff line change
@@ -196,13 +196,26 @@ If set to True, logs will be sent to the progress bar.
196196

197197
rank_zero_only
198198
==============
199-
**Default:** True
199+
**Default:** False
200+
201+
Tells Lightning if you are calling ``self.log`` from every process (default) or only from rank 0.
202+
This is for advanced users who want to reduce their metric manually across processes, but still want to benefit from automatic logging via ``self.log``.
200203

201-
Whether the value will be logged only on rank 0. This will prevent synchronization which would produce a deadlock as not all processes would perform this log call.
204+
- Set ``False`` (default) if you are calling ``self.log`` from every process.
205+
- Set ``True`` if you are calling ``self.log`` from rank 0 only. Caveat: you won't be able to use this metric as a monitor in callbacks (e.g., early stopping).
202206

203207
.. code-block:: python
204208
205-
self.log(rank_zero_only=True)
209+
# Default
210+
self.log(..., rank_zero_only=False)
211+
212+
# If you call `self.log` on rank 0 only, you need to set `rank_zero_only=True`
213+
if self.trainer.global_rank == 0:
214+
self.log(..., rank_zero_only=True)
215+
216+
# DON'T do this, it will cause deadlocks!
217+
self.log(..., rank_zero_only=True)
218+
206219
207220
----
208221

src/lightning/pytorch/core/module.py

+8-4
Original file line numberDiff line numberDiff line change
@@ -400,8 +400,10 @@ def log(
400400
but for some data structures you might need to explicitly provide it.
401401
metric_attribute: To restore the metric state, Lightning requires the reference of the
402402
:class:`torchmetrics.Metric` in your model. This is found automatically if it is a model attribute.
403-
rank_zero_only: Whether the value will be logged only on rank 0. This will prevent synchronization which
404-
would produce a deadlock as not all processes would perform this log call.
403+
rank_zero_only: Tells Lightning if you are calling ``self.log`` from every process (default) or only from
404+
rank 0. If ``True``, you won't be able to use this metric as a monitor in callbacks
405+
(e.g., early stopping). Warning: Improper use can lead to deadlocks! See
406+
:ref:`Advanced Logging <visualize/logging_advanced:rank_zero_only>` for more details.
405407
406408
"""
407409
if self._fabric is not None:
@@ -563,8 +565,10 @@ def log_dict(
563565
each dataloader to not mix values.
564566
batch_size: Current batch size. This will be directly inferred from the loaded batch,
565567
but some data structures might need to explicitly provide it.
566-
rank_zero_only: Whether the value will be logged only on rank 0. This will prevent synchronization which
567-
would produce a deadlock as not all processes would perform this log call.
568+
rank_zero_only: Tells Lightning if you are calling ``self.log`` from every process (default) or only from
569+
rank 0. If ``True``, you won't be able to use this metric as a monitor in callbacks
570+
(e.g., early stopping). Warning: Improper use can lead to deadlocks! See
571+
:ref:`Advanced Logging <visualize/logging_advanced:rank_zero_only>` for more details.
568572
569573
"""
570574
if self._fabric is not None:

0 commit comments

Comments
 (0)