The Process class should be used in the 4D case to speed-up computation:
- distribute the 'ie=1:tuu' loop into asynchronous computations into separate directories
- wait for all processes to end (waitfor)
- collect and merge data sets
this will then 'emulate' an MPI execution on local-host
NOTE: a simple implementation would use as many threads as the 'tuu' number, independently of the real CPU capability.