Multi-xPU 3D Thermal Porous Convection Solver

The project implements a 3D thermal porous convection solver in Julia using ImplicitGlobalGrid.jl and ParallelStencil.jl, enabling multi-xPU parallelized execution across distributed GPUs or multi-threaded CPUs. The solver models buoyancy-driven convection in a fully saturated porous medium, governed by mass conservation, Darcy's law, and the heat transport equation. The simulation was tested on Piz Daint multi-xPU nodes at the Swiss National Supercomputing Centre (CSCS) as part of the course "Solving Partial Differential Equations in Parallel on Graphic Processing Units" at ETH Zürich.


Fig. 1: Temperature field evolution in a 3D porous convection simulation with grid resolution 255 × 127 × 127.	Fig. 2: Cross-sectional 2D snapshot of the evolution of the temperature field at $y/2$.

The thermal porous convection system is governed by a coupled system of PDEs:

$$ \nabla \cdot \boldsymbol{q_D} = 0, \quad (\text{Mass Conservation}) $$

$$ \boldsymbol{q_D} = -\frac{k}{\eta} \left[ \nabla p - \rho_0 \alpha (T - T_0) g \right], \quad (\text{Momentum Conservation}) $$

$$ \frac{\partial T}{\partial t} + \frac{1}{\phi} \boldsymbol{q_D} \cdot \nabla T - \frac{\lambda}{\rho_0 c_p} \nabla^2 T = 0, \quad (\text{Energy Conservation}) $$

where $T(x,y,z,t)$ represents the temperature field, $\boldsymbol{q_D}$ is the Darcy flux, describing the volumetric flow rate per unit cross-sectional area, $p(x,y,z,t)$ is the pressure field driving the flow, $\phi$ is the porosity, representing the void fraction of the porous medium, $g$ is the gravitational acceleration vector. The material and fluid properties include $k$ (permeability), $\eta$ (dynamic viscosity), $\rho_0$ (reference density), $\alpha$ (thermal expansion coefficient), $c_p$ (specific heat capacity), $\lambda$ (thermal conductivity), and $\frac{\lambda}{\rho_0 c_p}$ (thermal diffusivity).

Setup & Instructions

The 3D porous convection simulation employs a computational grid of size $255 \times 127 \times 127$. It runs for $2000$ time steps, with a dynamically adjusted time step $dt$ based on stability conditions. The initial time step is computed as $7.85 \times 10^{-5}$, resulting in a total simulated physical time of $0.157$ units. Visualization outputs are generated every $100$ steps, corresponding to a physical time interval of $7.85 \times 10^{-3}$, and producing a total of $20$ visualization frames.

Running 2D & 3D Simulations

We call our multi-xpu solver using a SLURM script run_PC_3D_multixpu.sh, which runs the Julia-based 3D porous convection simulation on 8 GPU nodes using MPI on Piz Daint:

$ sbatch run_PC_3D_multixpu.sh

This outputs the simulation frames at the specified timesteps, along with .bin files.

We used the script create_animation_2D.sh to assemble an animation from the created frames and produce a 2D animation of the porous convection simulation:

$ chmod +x create_animation_2D.sh
$ ./create_animation_2D.sh

We used the scripts visualize_final_3D.jl and visualize_final_3D.jl, which rely on the .bin files created earlier, to create 3D visualizations showing the initial and final stages of temperature distribution in 3D. Upon adapting our local environment to accommodate using the GLMakie package (check next section), we ran both scripts in julia to create visualizations for the initial and final 3D simulation stages.


Fig. 3: Initial 3D Temperature Distribution in the Thermal Porous Convection Simulation.	Fig. 4: Final 3D Temperature Distribution in the Thermal Porous Convection Simulation.

To create a 3D animation, we first generated 3D frames from the already created .bin files by running the script generate_3D_frames.jl in julia via terminal. The frames were saved into 3D_frames_out/.


Fig. 5: Sample 3D firgure generated during the simulation.	Fig. 6: Corresponding 2D slice at $y / 2$

We used the script create_animation_3D.sh to assemble an animation from the created frames. The script was run as follows:

$ chmod +x create_animation_3D.sh
$ ./create_animation_3D.sh

The final 3D animation of the thermal porous convection simulation is presented here:

We also created a 2D porous convection simulation with velocity quiver. Firstly, a non-interactive job was created on Piz Daint by running:

$ sbatch l7_runme2D.sh

We then used the script create_animation_2D.sh to assemble an animation from the created frames:

$ chmod +x create_animation_2D.sh
$ ./create_animation_2D.sh 
Animation created: ./PorousConvection_2D_animation_quiver.gif

Visualization on a Supercomputing Cluster

The visualization scripts rely on GLMakie. GLMakie requires OpenGL, which is unavailable in headless environments such as Piz Daint. As a walk-through, we set up a virtual display for headless rendering on Piz Daint, and then added the packages and ran the scripts. Here's a quick summary of the scheme used to create the visualization:

Set up a virtual display for headless rendering on Piz Daint:

$ Xvfb :1 -screen 0 1024x768x24 &
$ export DISPLAY=:1

We confirmed that Xvfb was running using:

$ ps aux | grep Xvfb
class203 12856  0.0  0.0 2389004 43516 pts/0   Sl   10:00   0:00 Xvfb :1 -screen 0 1024x768x24

We then entered julia REPL and activated current local project:

julia> using Pkg
julia> Pkg.activate(".")

GLMakie package was installed within the environment:

Pkg.add("GLMakie")

We then ran a sample visualization script visualize_3D for validation:

julia> include("visualize_3D.jl")
Loading data...
Data loaded. Max value: 100.0
Saving figure as PorousConvection_3D.png
Figure saved successfully!

The file PorousConvection_3D.png is created in the working directory.

Thermal Porous Convection: The Physical Model

Thermal porous convection involves the coupling of pressure and temperature dynamics in a porous medium fully saturated with a fluid. The physical process is governed by mass conservation, Darcy's law for fluid flow through porous media, and the heat transport equation incorporating both conduction and advection. In the given model, we assume a 3D domain of dimensions $l_x \times l_y \times l_z$ and adopt the Boussinesq approximation to capture buoyancy-driven convection due to temperature variations.

Mass Conservation and Darcy’s Law

The starting point is the mass conservation equation for an incompressible, constant-porosity ($\phi = \text{const}$) fluid:

$$ \nabla \cdot \boldsymbol{q_D} = 0, $$

where $\boldsymbol{q_D} = \phi \boldsymbol{v}$ is the Darcy flux. In 3D, this ensures that the net volumetric flow rate within any control volume is zero, a necessary condition for incompressibility.

Darcy’s law relates the Darcy flux to the pressure gradient and buoyancy forces, given by:

$$ \boldsymbol{q_D} = -\frac{k}{\eta} \big( \nabla p - \rho \boldsymbol{g} \big). $$

To incorporate thermal effects, we apply the Boussinesq approximation, where the fluid density depends linearly on temperature as

$$ \rho = \rho_0 \big[ 1 - \alpha (T - T_0) \big], $$

with $\rho_0$ as the reference density and $\alpha$ the thermal expansion coefficient. Substituting this into Darcy’s law yields:

$$ \boldsymbol{q_D} = -\frac{k}{\eta} \big[ \nabla p - \rho_0 \alpha (T - T_0) \boldsymbol{g} \big]. $$

The pressure field $p$ is determined by the steady-state pressure diffusion equation:

$$ \nabla \cdot \left[ \frac{k}{\eta} \big( \nabla p - \rho_0 \alpha (T - T_0) \boldsymbol{g} \big) \right] = 0. $$

This elliptic PDE forms the basis of the fluid flow solver, with pressure gradients driving flow through the porous medium.

Energy Conservation and Heat Transport

The conservation of energy in the fluid is expressed by

$$ \rho c_p \frac{\partial T}{\partial t} + \rho c_p \boldsymbol{v} \cdot \nabla T + \nabla \cdot \boldsymbol{q_F} = 0, $$

where $c_p$ is the specific heat capacity, and $\boldsymbol{q_F} = -\lambda \nabla T$ represents the Fourier heat flux, with $\lambda$ the thermal conductivity. Dividing through by $\rho_0 c_p$ and substituting the Darcy flux, we obtain the advection-diffusion equation for temperature:

$$ \frac{\partial T}{\partial t} + \frac{1}{\phi} \boldsymbol{q_D} \cdot \nabla T - \frac{\lambda}{\rho_0 c_p} \nabla^2 T = 0. $$

This PDE governs the transport of temperature in the medium, with advection driven by fluid flow and diffusion governed by thermal conductivity.

Discretization and Numerical Scheme

Numerical Methods

Our implementation efficiently handles pressure-driven flow, temperature advection-diffusion, and fluid-thermal interactions in a staggered finite-difference discretization.

The system of equations is solved using a pseudo-transient approach, where pseudo-time derivatives are introduced to improve numerical stability and iterative convergence. The primary equations in the pseudo-transient formulation are:

Pseudo-transient Darcy flux equation:

$$ \theta_D \frac{\partial \boldsymbol{q_D}}{\partial \tau} + \boldsymbol{q_D} = -\frac{k}{\eta} \big[ \nabla p - \rho_0 \alpha (T - T_0) \boldsymbol{g} \big], $$

where $\theta_D$ is the characteristic relaxation time for pressure diffusion and $\tau$ denotes the pseudo-time.

In our implementation, $\boldsymbol{q_D}$ is updated iteratively by discretizing the equation and evaluating the gradient terms $\nabla p$ and $\rho_0 \alpha (T - T_0)$. The corresponding update for the Darcy flux components $q_{Dx}, q_{Dy}, q_{Dz}$ is given by:

@parallel function compute_Dflux!(qDx, qDy, qDz, Pf, T, k_ηf, _dx, _dy, _dz, αρg, _1_θ_dτ_D)
    @inn_x(qDx) = @inn_x(qDx) - (@inn_x(qDx) + k_ηf * (@d_xa(Pf) * _dx)) * _1_θ_dτ_D
    @inn_y(qDy) = @inn_y(qDy) - (@inn_y(qDy) + k_ηf * (@d_ya(Pf) * _dy)) * _1_θ_dτ_D
    @inn_z(qDz) = @inn_z(qDz) - (@inn_z(qDz) + k_ηf * (@d_za(Pf) * _dz - αρg * @av_za(T))) * _1_θ_dτ_D
end

$k/\eta$ is represented by k_ηf.
$\nabla p$ is approximated by finite differences using @d_xa(Pf), @d_ya(Pf), and @d_za(Pf).
$\rho_0 \alpha (T - T_0) \boldsymbol{g}$ is represented by the term αρg * @av_za(T) in the $z$-direction, incorporating buoyancy effects.

Pseudo-transient temperature flux equation:

$$ \theta_T \frac{\partial \boldsymbol{q_T}}{\partial \tau} + \boldsymbol{q_T} = -\frac{\lambda}{\rho_0 c_p} \nabla T, $$

where $\theta_T$ is the relaxation time for heat diffusion. In our implementation, the temperature flux components $q_{Tx}, q_{Ty}, q_{Tz}$ are updated based on the gradient of temperature $\nabla T$ using finite differences:

@parallel_indices (ix, iy, iz) function compute_Tflux!(qTx, qTy, qTz, dTdt, T, T_old, qDx, qDy, qDz, _dt, λ_ρCp_dx, λ_ρCp_dy, λ_ρCp_dz, _1_θ_dτ_T, _dx, _dy, _dz, _ϕ)
    if (ix <= size(qTx, 1) && iy <= size(qTx, 2) && iz <= size(qTx, 3))
        qTx[ix, iy, iz] = qTx[ix, iy, iz] - (qTx[ix, iy, iz] + λ_ρCp_dx * (T[ix+1, iy+1, iz+1] - T[ix, iy+1, iz+1])) * _1_θ_dτ_T
        qTy[ix, iy, iz] = qTy[ix, iy, iz] - (qTy[ix, iy, iz] + λ_ρCp_dy * (T[ix+1, iy+1, iz+1] - T[ix+1, iy, iz+1])) * _1_θ_dτ_T
        qTz[ix, iy, iz] = qTz[ix, iy, iz] - (qTz[ix, iy, iz] + λ_ρCp_dz * (T[ix+1, iy+1, iz+1] - T[ix+1, iy+1, iz])) * _1_θ_dτ_T
    end
end

$-\frac{\lambda}{\rho_0 c_p} \nabla T$ is approximated by finite differences using λ_ρCp_dx, λ_ρCp_dy, and λ_ρCp_dz to scale the temperature gradients in each direction.

Mass conservation with pseudo-compressibility:

$$ \beta \frac{\partial p}{\partial \tau} + \nabla \cdot \boldsymbol{q_D} = 0, $$

where $\beta$ is the pseudo-compressibility parameter, introduced to regularize the solution of the elliptic pressure equation. The pressure $p$ is updated by evaluating the divergence of the Darcy flux:

@parallel function update_Pf!(Pf, qDx, qDy, qDz, _dx, _dy, _dz, _β_dτ_D)
    @all(Pf) = @all(Pf) - (@d_xa(qDx) * _dx + @d_ya(qDy) * _dy + @d_za(qDz) * _dz) * _β_dτ_D
end

$\nabla \cdot \boldsymbol{q_D}$ is approximated using finite differences via @d_xa(qDx), @d_ya(qDy), and @d_za(qDz).

Advection-diffusion equation for temperature:

$$ \frac{\partial T}{\partial \tau} + \frac{T - T_{\text{old}}}{\Delta t} + \frac{1}{\phi} \boldsymbol{q_D} \cdot \nabla T + \nabla \cdot \boldsymbol{q_T} = 0. $$

The update of the temperature field $T$ incorporates the contributions from advection by $\boldsymbol{q_D} \cdot \nabla T$ and diffusion by $\nabla \cdot \boldsymbol{q_T}$:

@parallel function update_T!(T, qTx, qTy, qTz, dTdt, _dx, _dy, _dz, _1_dt_β_dτ_T)
    @inn(T) = @inn(T) - (@all(dTdt) + @d_xa(qTx) * _dx + @d_ya(qTy) * _dy + @d_za(qTz) * _dz) * _1_dt_β_dτ_T
end

$\frac{1}{\phi} \boldsymbol{q_D} \cdot \nabla T$ is computed implicitly in the dTdt term.
$\nabla \cdot \boldsymbol{q_T}$ is approximated using finite differences on $q_{Tx}, q_{Ty}, q_{Tz}$.

Spatial Discretization

The system is discretized using a staggered finite-difference scheme on a 3D grid, with variables located at different positions within each control volume:

Pressure $p$ and temperature $T$ are defined at cell centers.
Velocity components $q_{Dx}, q_{Dy}, q_{Dz}$ are located on cell faces to ensure flux continuity.
Flux terms and derivatives are approximated using central differences for diffusion terms and upwind schemes for advective terms, ensuring numerical stability.

The grid spacing $\Delta x$, $\Delta y$, and $\Delta z$ are chosen such that:

$$ \Delta x = \frac{l_x}{N_x}, \quad \Delta y = \frac{l_y}{N_y}, \quad \Delta z = \frac{l_z}{N_z}, $$

with $N_x$, $N_y$, and $N_z$ representing the number of grid points in each direction.

Boundary Conditions

The model applies Neumann boundary conditions, which specify zero normal derivatives of the dependent variables at the domain boundaries:

$$ \frac{\partial p}{\partial n} = 0, \quad \frac{\partial T}{\partial n} = 0. $$

These conditions imply no flux of mass or heat across the domain boundaries, which is typical for confined porous systems. This is implemented in the code using:

@parallel_indices (iy, iz) function bc_x!(A)
    A[1, iy, iz] = A[2, iy, iz]
    A[end, iy, iz] = A[end-1, iy, iz]
end

Similar updates are applied to the $y$ and $z$ directions to maintain zero flux at the boundaries.

Time Integration and Iterative Solvers

The pseudo-transient method involves iterative updates of the primary variables using an implicit-explicit scheme:

The pressure equation is solved using an implicit elliptic solver for robustness.
The temperature equation is integrated using an explicit method for the advection term and an implicit scheme for diffusion. The time step $\Delta t$ is selected based on a CFL condition to maintain numerical stability.

Within each time step, the system iterates until convergence is achieved, with the error defined as:

$$ \epsilon = \max \big( | \nabla \cdot \boldsymbol{q_D} |, | \text{residual of } T \text{ equation} | \big). $$

The convergence tolerance is set to $\epsilon_{\text{tol}} = 10^{-6}$ to ensure accuracy. Convergence is monitored using:

@parallel function compute_r!(r_Pf, r_T, qDx, qDy, qDz, qTx, qTy, qTz, dTdt, _dx, _dy, _dz)
    @all(r_Pf) = @d_xa(qDx) * _dx + @d_ya(qDy) * _dy + @d_za(qDz) * _dz
    @all(r_T)  = @all(dTdt) + @d_xa(qTx) * _dx + @d_ya(qTy) * _dy + @d_za(qTz) * _dz
end

This iterative approach ensures convergence of the coupled pressure and temperature fields to the solution.

Parallelization and Communication Hiding

This project employs multi-xPU parallelism by integrating distributed computing with ImplicitGlobalGrid.jl and GPU acceleration via ParallelStencil.jl. The solver dynamically selects the computational backend, initializing either GPU-based parallelism with CUDA or multi-threaded CPU execution to enable efficient vectorized 3D computations. In the script, this selection is controlled by declaring the USE_GPU variable.

The global computational domain is partitioned into subdomains using init_global_grid(nx, ny, nz). This partitioning enables distributed memory parallelization, where each process updates its local grid portion and exchanges data with neighboring subdomains through halo regions. The numerical scheme is implemented using finite difference methods, with parallelized kernel execution leveraging @parallel macros. For instance, the solver updates the Darcy flux components, pressure field, and temperature field in parallel:

@parallel function compute_Dflux!(...)

@parallel function update_Pf!(...)

@parallel function update_T!(...)

@parallel function compute_r!(...)

For computations that require explicit indexing, such as temperature flux updates, the solver utilizes @parallel_indices, allowing explicit control over individual grid cell operations. This ensures that all spatial components of temperature flux updates are executed in parallel:

@parallel_indices (ix, iy, iz) function compute_Tflux!(...)

The halo exchange mechanism updates boundary values between subdomains, ensuring continuity between neighboring partitions:

update_halo!(Pf, T, qDx, qDy, qDz)

To enhance scalability and performance, the implementation employs halo exchange optimization, where communication is overlapped with computation using @hide_communication, reducing synchronization overhead:

@hide_communication (8, 8, 4) begin
    @parallel compute_Dflux!(qDx, qDy, qDz, Pf, T, k_ηf, _dx, _dy, _dz, αρg, _1_θ_dτ_D)
end

Global reductions are performed using MPI.Allreduce, facilitating efficient distributed computations without explicit data gathering. The workload is dynamically balanced via domain decomposition, ensuring even distribution across processes. Finally, simulation results are collected and visualized using gather!, where each process contributes its local computation results to reconstruct the full data.

For further details on the implementation, refer to the complete discussion: Parallelization.md.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
bins		bins
deps		deps
docs		docs
scripts		scripts
src		src
test		test
Project.toml		Project.toml
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multi-xPU 3D Thermal Porous Convection Solver

Table of Contents

Setup & Instructions

Running 2D & 3D Simulations

Visualization on a Supercomputing Cluster

Thermal Porous Convection: The Physical Model

Mass Conservation and Darcy’s Law

Energy Conservation and Heat Transport

Discretization and Numerical Scheme

Numerical Methods

Spatial Discretization

Boundary Conditions

Time Integration and Iterative Solvers

Parallelization and Communication Hiding

About

Releases

Packages

Languages

BadeaTayea/Multi-xPU-3D-Thermal-Porous-Convection-Solver

Folders and files

Latest commit

History

Repository files navigation

Multi-xPU 3D Thermal Porous Convection Solver

Table of Contents

Setup & Instructions

Running 2D & 3D Simulations

Visualization on a Supercomputing Cluster

Thermal Porous Convection: The Physical Model

Mass Conservation and Darcy’s Law

Energy Conservation and Heat Transport

Discretization and Numerical Scheme

Numerical Methods

Spatial Discretization

Boundary Conditions

Time Integration and Iterative Solvers

Parallelization and Communication Hiding

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages