We can explore it: - Internally: Interleave the device data synchronizations with the different MPI exchanges. - Externally: Return and do some useful computation while halo_exchange is being executed.