Skip to content

Commit 9cea282

Browse files
committed
Fix multi-GPU training.
A previous fix to let validation run across more than one batch caused an issue with multi-GPU training. The issue seems to be in how Keras averages loss and metric values, where it expects them to be scalars rather than arrays. This fix causes scalar outputs from a model to remain scalar in multi-GPU training.
1 parent 2a7bcfc commit 9cea282

File tree

1 file changed

+12
-10
lines changed

1 file changed

+12
-10
lines changed

mrcnn/parallel_model.py

+12-10
Original file line numberDiff line numberDiff line change
@@ -89,16 +89,18 @@ def make_parallel(self):
8989
with tf.device('/cpu:0'):
9090
merged = []
9191
for outputs, name in zip(outputs_all, output_names):
92-
# If outputs are numbers without dimensions, add a batch dim.
93-
def add_dim(tensor):
94-
"""Add a dimension to tensors that don't have any."""
95-
if K.int_shape(tensor) == ():
96-
return KL.Lambda(lambda t: K.reshape(t, [1, 1]))(tensor)
97-
return tensor
98-
outputs = list(map(add_dim, outputs))
99-
100-
# Concatenate
101-
merged.append(KL.Concatenate(axis=0, name=name)(outputs))
92+
# Concatenate or average outputs?
93+
# Outputs usually have a batch dimension and we concatenate
94+
# across it. If they don't, then the output is likely a loss
95+
# or a metric value that gets averaged across the batch.
96+
# Keras expects losses and metrics to be scalars.
97+
if K.int_shape(outputs[0]) == ():
98+
# Average
99+
m = KL.Lambda(lambda o: tf.add_n(o) / len(outputs), name=name)(outputs)
100+
else:
101+
# Concatenate
102+
m = KL.Concatenate(axis=0, name=name)(outputs)
103+
merged.append(m)
102104
return merged
103105

104106

0 commit comments

Comments
 (0)