We explore the diagonalization methods used in the PWscf (Plane-Wave Self Consistent Field), a key component of the Quantum ESPRESSO open-source suite of codes for materials modelling. For the high performance of the iterative diagonalization solvers, two solutions are proposed. Projected Preconditioned Conjugate Gradient (PPCG) method as an alternative diagonalization solver and Porting of existing solvers to GPU systems using CUDA Fortran. Kernel loop directives (CUF kernels) have been extensively used for the implementation of Conjugate Gradient (CG) solver for general k-point calculations to have a single source code for both CPU and GPU implementations. The results of the PPCG solver for Γ-point calculation and the GPU version of CG have been carefully validated, and the performance of the code on several GPU systems have been compared with Intel multi-core (CPU only) systems. Both of these choices reduce the time to solution by a considerable factor for different input cases which are used for standard benchmarks using QE package
A performance study of Quantum ESPRESSO’s diagonalization methods on cutting edge computer technology for high-performance computing(2017 Dec 18).
A performance study of Quantum ESPRESSO’s diagonalization methods on cutting edge computer technology for high-performance computing
-
2017-12-18
Abstract
We explore the diagonalization methods used in the PWscf (Plane-Wave Self Consistent Field), a key component of the Quantum ESPRESSO open-source suite of codes for materials modelling. For the high performance of the iterative diagonalization solvers, two solutions are proposed. Projected Preconditioned Conjugate Gradient (PPCG) method as an alternative diagonalization solver and Porting of existing solvers to GPU systems using CUDA Fortran. Kernel loop directives (CUF kernels) have been extensively used for the implementation of Conjugate Gradient (CG) solver for general k-point calculations to have a single source code for both CPU and GPU implementations. The results of the PPCG solver for Γ-point calculation and the GPU version of CG have been carefully validated, and the performance of the code on several GPU systems have been compared with Intel multi-core (CPU only) systems. Both of these choices reduce the time to solution by a considerable factor for different input cases which are used for standard benchmarks using QE packageFile | Dimensione | Formato | |
---|---|---|---|
Anoop Kaithalikunnel Chandran.pdf
accesso aperto
Tipologia:
Tesi
Licenza:
Non specificato
Dimensione
1.68 MB
Formato
Adobe PDF
|
1.68 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.