Parallelization Strategy¶

The parallelization of the code relies on a 2D pencil decomposition using the open-source library 2DECOMP&FFT ¹, which has been designed to perform optimized three-dimensional distributed FFTs. This library is called on top of standard MPI and MPI-I/O libraries and includes user-friendly programming interfaces. 2DECOMP&FFT supports large-scale parallel applications on distributed memory systems and shows excellent performance on a variety of existing supercomputers ²³. The library comprises two different possible pencil orientations, x-pencils and z-pencils. DINO has been coded with the x-pencil orientation in order to minimize the computational time needed for transposing the data as required for non-periodic boundary conditions.

Domain Decomposition using a pencil orientation in x direction.

Using 2DECOMP&FFT, an excellent parallel scaling has been obtained with DINO.

Scaling¶

Code scalability and parallel efficiency are important quantities defining the efficiency of numerical codes.. DINO is tested on a variety of machines in order to check portability. Results obtained with SuperMUC at Leibniz Supercomputing Center in Munich are presented here. The strong scalability performance obtained by DINO for two different benchmarks with different grid size are shown below. These tests correspond to the DNS of a turbulent hydrogen/air flame using a detailed reaction scheme. For these tests, the number of processors was varied from \(1\,024\) to \(65\,536\).

DINO thus shows a very good parallel efficiency, ensuring efficient computations for up to \(\mathcal{O}(10^5)\) cores, which is a very satisfactory result for a low-Mach solver, since global operations associated to the Poisson equation severely constrain the parallel performance.

Solving the Poisson Equation in Parallel¶

One of the most difficult issues associated with any low-Mach or incompressible flow solver is to find an efficient way for solving the Poisson equation. Usually, it is solved by explicit iterative methods (Conjugate Gradient CG, Multi-grid, etc.) or more rarely by implicit methods (matrix inversion, spectral methods, or combinations of both). In DINO, the Poisson equation is solved by FFT, even when the boundary conditions of the domain are not periodic.

The developed approach is an extension of the one described in ²³⁴⁵, where pre- and post-processing was applied both in physical and in wave space. The current algorithm needs only pre- and post-processing in the physical space and for a different purpose. Additional differences result from the fact that the pressure is solved in DINO in a collocated manner and not on a staggered grid.

Suitable pre- and and post-processing steps are applied to the corresponding array before and after calling the parallel FFT subroutine included in 2DECOMP&FFT. All tests have demonstrated that this method delivers a very high order (equivalent spectral accuracy) and is also computationally very efficient ⁶ compared to existing alternatives. In order to explain the current algorithm it is better to start with classical discrete Fourier transform (\(\hat{F}_k\)) for a real sequence \(F_j\), \(j =0, 1,2,....,N-1\), which is defined by

\[ \hat{F}_k=\frac{1}{N}\, \sum_{j=0}^{N-1} F_j e^{-2 \pi jk \sqrt{-1}/N},\quad k =0, 1,...., N-1. \tag{1} \]

Thanks to Hermitian symmetry, \(\hat{F}_k=\hat{F}_{n-k}^*\), with \(\hat{F}_N=\hat{F}_0\). The inverse of this transform reads:

\[ F_j=\sum_{k=0}^{N-1} F_k e^{2 \pi jk \sqrt{-1}/N}. \]

This operation is directly applied only for periodic sequences (periodic boundary conditions). These two transforms can then be obtained immediately with 2DECOMP&FFT and FFTW libraries, using FFT and IFFT algorithms, respectively.

Concerning now the implementation in DINO, \(F_j\) is first transformed in case of Dirichlet-Dirichlet (DD) boundary conditions using a discrete sine transform (DST),

\[ \hat{F}_k=\frac{2}{N}\, \sum_{j=1}^{N-1} F_j\, \mathrm{sin} (\pi jk/N). \tag{2} \]

In order to obtain the Fourier transform with the standard FFT parallel routines included in 2DECOMP&FFT, the \(F_j\) array is extended in a pre-processing step to a temporary, odd symmetry sequence with length of \((2N)\), in the form (0, \(F_1\), \(F_2\),\(\dots\), \(F_{N-1}\), 0, \(-F_{N-1}\),\(\dots\), \(-F_2\), \(-F_1\)), where \(F_j = -F_{2N-j}\) for \(j =1, N-1\).

In the same manner, in case of Neumann-Neumann (NN) boundary conditions, a discrete cosine transform (DCT) is used instead:

\[ \hat{F}_k=\frac{2}{N} \left[ \frac{F_0}{2}+ \sum_{j=1}^{N-1} F_j\, \mathrm{cos} (\pi jk/N)+ \frac{(-1)^{k}F_N}{2}\right], \tag{3} \]

The standard FFT routine is applied after extending the array in a pre-processing step into a temporary, even symmetry sequence of length \((2N)\) with the form of (\(F_0\), \(F_1\), \(F_2\),\(\dots\), \(F_{N-1}\), \(F_N\), \(F_{N-1}\),\(\dots\), \(F_2\), \(F_1\)), where \(F_j = F_{2N-j}\) for \(j =1, N-1\).

A combination between both boundary conditions is also possible. In the case of a Dirichlet-Neumann (DN) combination, a quarter-wave discrete sine transform (QW-DST) is suitable:

\[ \hat{F}_k=\frac{1}{N} \left[ \frac{F_0}{2}+ \sum_{j=1}^{N-1} F_j\, \mathrm{sin} (\pi j(2k-1)/(2N))+ \frac{(-1)^{k-1}F_N}{2}\right], \tag{4} \]

Then, a classical FFT routine is again possible, after extending the original sequence to a temporary, odd symmetry sequence with length of \((4N)\) where (0, \(F_1\),\(\dots\), \(F_N\), \(F_{N-1}\),\(\dots\), \(F_1\), \(0\), \(-F_1\),\(\dots\), \(-F_N\), \(-F_{N-1}\), \(\dots\), \(-F_1\)). Similarly, a case with Neumann-Dirichlet (ND) boundary conditions is now transformed with a quarter-wave discrete cosine transform (QW-DCT),

\[ \hat{F}_k=\frac{1}{N} \left[ \frac{F_0}{2}+ \sum_{j=1}^{N-1} F_j\, \mathrm{cos} (\pi j(2k-1)/(2N))\right], \tag{5} \]

with a standard FFT routine after extending the original sequence to a temporary, even symmetry sequence with length of \((4N)\), in the form (\(F_0\), \(F_1\),\(\dots\), \(F_{N-1}\),0, \(-F_{N-1}\),\(\dots\), \(-F_0\), \(-F_1\),\(\dots\), \(-F_{N-1}\), \(0\),\(F_{N-1}\), \(\dots\), \(F_1\)).

Finally, the algorithm implemented in {\it DINO} for solving the Poisson equation \(\nabla^2{p}=F\) can be summarized as follows:

Pre-processing for sequence \(F\) (input to the algorithm), which is a real array of length \(N\), extending its length depending on the boundary conditions;
- DD: \(M=2N\), (0, \(F_1\), \(F_2\), \(\dots\), \(F_{N-1}\), 0, \(-F_{N-1}\),\(\dots\), \(-F_2\), \(-F_1\));
- NN: \(M=2N\), (\(F_0\), \(F_1\), \(F_2\),\(\dots\), \(F_{N-1}\), \(F_N\), \(F_{N-1}\),\(\dots\), \(F_2\), \(F_1\));
- DN: \(M=4N\), (0, \(F_1\),\(\dots\), \(F_N\), \(F_{N-1}\),\(\dots\), \(F_1\), \(0\), \(-F_1\),\(\dots\), \(-F_N\), \(-F_{N-1}\), \(\dots\), \(-F_1\));
- ND: \(M=4N\), (\(F_0\), \(F_1\),\(\dots\), \(F_{N-1}\),0, \(-F_{N-1}\),\(\dots\), \(-F_0\), \(-F_1\),\(\dots\), \(-F_{N-1}\), \(0\),\(F_{N-1}\), \(\dots\), \(F_1\)).
Apply standard FFT routine (Eq.~\ref{eq:fft}) over \(M\) discrete points to obtain \(\hat{F_k}\);
Solve the Poisson equation in wave space, \(\hat{p}= -\hat{F}/\boldsymbol\kappa^2\);
Apply standard IFFT routine (Eq.~\ref{eq:ifft}) to Fourier transform of the pressure (\(\hat{p}\)), obtaining the pressure in the physical space saved in temporary array \(F\) (overwritten to save memory);
Post-processing for array \(F\) by saving the correct part into an array \(P\) of length \(N\).

This algorithm has been coded for parallel simulations (parallel FFT) to speed-up the process ⁶. Considering for instance a small 3D DNS involving \corrected{\(65 \times 65 \times32\)} grid points parallelized using 16 cores with Dirichlet-Neumann, Neumann-Neumann, and periodic-periodic boundary conditions in \(x\), \(y\), and \(z\) direction, respectively, the implemented algorithm is already \(6.75\) faster than the CG solver implemented in the well-known HYPRE library ⁷. More details with verifications and validations for this implementation can be found in Abdelsamie et al. ⁶.

Li N., Laizet S. 2DECOMP&FFT - a highly scalable 2D decomposition library and FFT interface. In Cray User's Group 2010 conference. 2010. ↩
Laizet S., Lamballais E., Vassilicos J.C. A numerical strategy to combine high-order schemes, complex geometry and parallel computing for high resolution DNS of fractal generated turbulence. Comput. Fluids, 39:471–484, 2010. ↩↩
Laizet S., Li N. Incompact3d: A powerful tool to tackle turbulence problems with up to \(O(10^5)\) computational cores. Int. J. Numer. Meth. Fluids, 67:1735–1757, 2011. ↩↩
Cooley J. W., Lewis P.A.W., Welsh P.D. The fast Fourier transform algorithm: Programming considerations in the calculation of sine, cosine and Laplace transforms. J. Sound Vib., 12:315–337, 1970. ↩
Swarztrauber P.N. Symmetric FFTs. Math. Comput., 47:323–346, 1986. ↩
A. Abdelsamie, G. Fru, T. Oster, F. Dietzsch, G. Janiga, and D. Thévenin. Towards direct numerical simulations of low-mach number turbulent reacting and two-phase flows using immersed boundaries. Computers & Fluids, 131:123 – 141, 2016. ↩↩↩
\url http://computation.llnl.gov/project/linear_solvers/software.php. ↩