The project implements a 3D thermal porous convection solver in Julia using ImplicitGlobalGrid.jl and ParallelStencil.jl, enabling multi-xPU parallelized execution across distributed GPUs or multi-threaded CPUs. The solver models buoyancy-driven convection in a fully saturated porous medium, governed by mass conservation, Darcy's law, and the heat transport equation. The simulation was tested on Piz Daint multi-xPU nodes at the Swiss National Supercomputing Centre (CSCS) as part of the course "Solving Partial Differential Equations in Parallel on Graphic Processing Units" at ETH Zürich.
![]() |
![]() |
Fig. 1: Temperature field evolution in a 3D porous convection simulation with grid resolution 255 × 127 × 127. |
Fig. 2: Cross-sectional 2D snapshot of the evolution of the temperature field at |
The thermal porous convection system is governed by a coupled system of PDEs:
where
- Setup & Instructions
- Thermal Porous Convection: The Physical Model
- Discretization and Numerical Scheme
- Parallelization and Communication Hiding
The 3D porous convection simulation employs a computational grid of size
- We call our multi-xpu solver using a SLURM script
run_PC_3D_multixpu.sh
, which runs the Julia-based 3D porous convection simulation on 8 GPU nodes using MPI on Piz Daint:
$ sbatch run_PC_3D_multixpu.sh
This outputs the simulation frames at the specified timesteps, along with .bin
files.
- We used the script
create_animation_2D.sh
to assemble an animation from the created frames and produce a 2D animation of the porous convection simulation:
$ chmod +x create_animation_2D.sh
$ ./create_animation_2D.sh
- We used the scripts
visualize_final_3D.jl
andvisualize_final_3D.jl
, which rely on the.bin
files created earlier, to create 3D visualizations showing the initial and final stages of temperature distribution in 3D. Upon adapting our local environment to accommodate using theGLMakie
package (check next section), we ran both scripts in julia to create visualizations for the initial and final 3D simulation stages.
![]() |
![]() |
Fig. 3: Initial 3D Temperature Distribution in the Thermal Porous Convection Simulation. | Fig. 4: Final 3D Temperature Distribution in the Thermal Porous Convection Simulation. |
- To create a 3D animation, we first generated 3D frames from the already created
.bin
files by running the scriptgenerate_3D_frames.jl
in julia via terminal. The frames were saved into 3D_frames_out/.
![]() |
![]() |
Fig. 5: Sample 3D firgure generated during the simulation. |
Fig. 6: Corresponding 2D slice at |
- We used the script
create_animation_3D.sh
to assemble an animation from the created frames. The script was run as follows:
$ chmod +x create_animation_3D.sh
$ ./create_animation_3D.sh
The final 3D animation of the thermal porous convection simulation is presented here:
- We also created a 2D porous convection simulation with velocity quiver. Firstly, a non-interactive job was created on Piz Daint by running:
$ sbatch l7_runme2D.sh
We then used the script create_animation_2D.sh
to assemble an animation from the created frames:
$ chmod +x create_animation_2D.sh
$ ./create_animation_2D.sh
Animation created: ./PorousConvection_2D_animation_quiver.gif
The visualization scripts rely on GLMakie. GLMakie requires OpenGL, which is unavailable in headless environments such as Piz Daint. As a walk-through, we set up a virtual display for headless rendering on Piz Daint, and then added the packages and ran the scripts. Here's a quick summary of the scheme used to create the visualization:
- Set up a virtual display for headless rendering on Piz Daint:
$ Xvfb :1 -screen 0 1024x768x24 &
$ export DISPLAY=:1
- We confirmed that Xvfb was running using:
$ ps aux | grep Xvfb
class203 12856 0.0 0.0 2389004 43516 pts/0 Sl 10:00 0:00 Xvfb :1 -screen 0 1024x768x24
- We then entered julia REPL and activated current local project:
julia> using Pkg
julia> Pkg.activate(".")
- GLMakie package was installed within the environment:
Pkg.add("GLMakie")
- We then ran a sample visualization script
visualize_3D
for validation:
julia> include("visualize_3D.jl")
Loading data...
Data loaded. Max value: 100.0
Saving figure as PorousConvection_3D.png
Figure saved successfully!
The file PorousConvection_3D.png
is created in the working directory.
Thermal porous convection involves the coupling of pressure and temperature dynamics in a porous medium fully saturated with a fluid. The physical process is governed by mass conservation, Darcy's law for fluid flow through porous media, and the heat transport equation incorporating both conduction and advection. In the given model, we assume a 3D domain of dimensions
The starting point is the mass conservation equation for an incompressible, constant-porosity (
where
Darcy’s law relates the Darcy flux to the pressure gradient and buoyancy forces, given by:
To incorporate thermal effects, we apply the Boussinesq approximation, where the fluid density depends linearly on temperature as
with
The pressure field
This elliptic PDE forms the basis of the fluid flow solver, with pressure gradients driving flow through the porous medium.
The conservation of energy in the fluid is expressed by
where
This PDE governs the transport of temperature in the medium, with advection driven by fluid flow and diffusion governed by thermal conductivity.
Our implementation efficiently handles pressure-driven flow, temperature advection-diffusion, and fluid-thermal interactions in a staggered finite-difference discretization.
The system of equations is solved using a pseudo-transient approach, where pseudo-time derivatives are introduced to improve numerical stability and iterative convergence. The primary equations in the pseudo-transient formulation are:
- Pseudo-transient Darcy flux equation:
where
In our implementation,
@parallel function compute_Dflux!(qDx, qDy, qDz, Pf, T, k_ηf, _dx, _dy, _dz, αρg, _1_θ_dτ_D)
@inn_x(qDx) = @inn_x(qDx) - (@inn_x(qDx) + k_ηf * (@d_xa(Pf) * _dx)) * _1_θ_dτ_D
@inn_y(qDy) = @inn_y(qDy) - (@inn_y(qDy) + k_ηf * (@d_ya(Pf) * _dy)) * _1_θ_dτ_D
@inn_z(qDz) = @inn_z(qDz) - (@inn_z(qDz) + k_ηf * (@d_za(Pf) * _dz - αρg * @av_za(T))) * _1_θ_dτ_D
end
-
$k/\eta$ is represented byk_ηf
. -
$\nabla p$ is approximated by finite differences using@d_xa(Pf)
,@d_ya(Pf)
, and@d_za(Pf)
. -
$\rho_0 \alpha (T - T_0) \boldsymbol{g}$ is represented by the termαρg * @av_za(T)
in the$z$ -direction, incorporating buoyancy effects.
- Pseudo-transient temperature flux equation:
where
@parallel_indices (ix, iy, iz) function compute_Tflux!(qTx, qTy, qTz, dTdt, T, T_old, qDx, qDy, qDz, _dt, λ_ρCp_dx, λ_ρCp_dy, λ_ρCp_dz, _1_θ_dτ_T, _dx, _dy, _dz, _ϕ)
if (ix <= size(qTx, 1) && iy <= size(qTx, 2) && iz <= size(qTx, 3))
qTx[ix, iy, iz] = qTx[ix, iy, iz] - (qTx[ix, iy, iz] + λ_ρCp_dx * (T[ix+1, iy+1, iz+1] - T[ix, iy+1, iz+1])) * _1_θ_dτ_T
qTy[ix, iy, iz] = qTy[ix, iy, iz] - (qTy[ix, iy, iz] + λ_ρCp_dy * (T[ix+1, iy+1, iz+1] - T[ix+1, iy, iz+1])) * _1_θ_dτ_T
qTz[ix, iy, iz] = qTz[ix, iy, iz] - (qTz[ix, iy, iz] + λ_ρCp_dz * (T[ix+1, iy+1, iz+1] - T[ix+1, iy+1, iz])) * _1_θ_dτ_T
end
end
-
$-\frac{\lambda}{\rho_0 c_p} \nabla T$ is approximated by finite differences usingλ_ρCp_dx
,λ_ρCp_dy
, andλ_ρCp_dz
to scale the temperature gradients in each direction.
- Mass conservation with pseudo-compressibility:
where
@parallel function update_Pf!(Pf, qDx, qDy, qDz, _dx, _dy, _dz, _β_dτ_D)
@all(Pf) = @all(Pf) - (@d_xa(qDx) * _dx + @d_ya(qDy) * _dy + @d_za(qDz) * _dz) * _β_dτ_D
end
-
$\nabla \cdot \boldsymbol{q_D}$ is approximated using finite differences via@d_xa(qDx)
,@d_ya(qDy)
, and@d_za(qDz)
.
- Advection-diffusion equation for temperature:
The update of the temperature field
@parallel function update_T!(T, qTx, qTy, qTz, dTdt, _dx, _dy, _dz, _1_dt_β_dτ_T)
@inn(T) = @inn(T) - (@all(dTdt) + @d_xa(qTx) * _dx + @d_ya(qTy) * _dy + @d_za(qTz) * _dz) * _1_dt_β_dτ_T
end
-
$\frac{1}{\phi} \boldsymbol{q_D} \cdot \nabla T$ is computed implicitly in thedTdt
term. -
$\nabla \cdot \boldsymbol{q_T}$ is approximated using finite differences on$q_{Tx}, q_{Ty}, q_{Tz}$ .
The system is discretized using a staggered finite-difference scheme on a 3D grid, with variables located at different positions within each control volume:
- Pressure
$p$ and temperature$T$ are defined at cell centers. - Velocity components
$q_{Dx}, q_{Dy}, q_{Dz}$ are located on cell faces to ensure flux continuity. - Flux terms and derivatives are approximated using central differences for diffusion terms and upwind schemes for advective terms, ensuring numerical stability.
The grid spacing
with
The model applies Neumann boundary conditions, which specify zero normal derivatives of the dependent variables at the domain boundaries:
These conditions imply no flux of mass or heat across the domain boundaries, which is typical for confined porous systems. This is implemented in the code using:
@parallel_indices (iy, iz) function bc_x!(A)
A[1, iy, iz] = A[2, iy, iz]
A[end, iy, iz] = A[end-1, iy, iz]
end
Similar updates are applied to the
The pseudo-transient method involves iterative updates of the primary variables using an implicit-explicit scheme:
- The pressure equation is solved using an implicit elliptic solver for robustness.
- The temperature equation is integrated using an explicit method for the advection term and an implicit scheme for diffusion. The time step
$\Delta t$ is selected based on a CFL condition to maintain numerical stability.
Within each time step, the system iterates until convergence is achieved, with the error defined as:
The convergence tolerance is set to
@parallel function compute_r!(r_Pf, r_T, qDx, qDy, qDz, qTx, qTy, qTz, dTdt, _dx, _dy, _dz)
@all(r_Pf) = @d_xa(qDx) * _dx + @d_ya(qDy) * _dy + @d_za(qDz) * _dz
@all(r_T) = @all(dTdt) + @d_xa(qTx) * _dx + @d_ya(qTy) * _dy + @d_za(qTz) * _dz
end
This iterative approach ensures convergence of the coupled pressure and temperature fields to the solution.
This project employs multi-xPU parallelism by integrating distributed computing with ImplicitGlobalGrid.jl
and GPU acceleration via ParallelStencil.jl
. The solver dynamically selects the computational backend, initializing either GPU-based parallelism with CUDA or multi-threaded CPU execution to enable efficient vectorized 3D computations. In the script, this selection is controlled by declaring the USE_GPU
variable.
The global computational domain is partitioned into subdomains using init_global_grid(nx, ny, nz)
. This partitioning enables distributed memory parallelization, where each process updates its local grid portion and exchanges data with neighboring subdomains through halo regions. The numerical scheme is implemented using finite difference methods, with parallelized kernel execution leveraging @parallel
macros. For instance, the solver updates the Darcy flux components, pressure field, and temperature field in parallel:
@parallel function compute_Dflux!(...)
@parallel function update_Pf!(...)
@parallel function update_T!(...)
@parallel function compute_r!(...)
For computations that require explicit indexing, such as temperature flux updates, the solver utilizes @parallel_indices
, allowing explicit control over individual grid cell operations. This ensures that all spatial components of temperature flux updates are executed in parallel:
@parallel_indices (ix, iy, iz) function compute_Tflux!(...)
The halo exchange mechanism updates boundary values between subdomains, ensuring continuity between neighboring partitions:
update_halo!(Pf, T, qDx, qDy, qDz)
To enhance scalability and performance, the implementation employs halo exchange optimization, where communication is overlapped with computation using @hide_communication
, reducing synchronization overhead:
@hide_communication (8, 8, 4) begin
@parallel compute_Dflux!(qDx, qDy, qDz, Pf, T, k_ηf, _dx, _dy, _dz, αρg, _1_θ_dτ_D)
end
Global reductions are performed using MPI.Allreduce
, facilitating efficient distributed computations without explicit data gathering. The workload is dynamically balanced via domain decomposition, ensuring even distribution across processes. Finally, simulation results are collected and visualized using gather!
, where each process contributes its local computation results to reconstruct the full data.
For further details on the implementation, refer to the complete discussion: Parallelization.md.