Performance

Boosting NumPy with MKL

March 3, 2016January 18, 2023 Heiko BaukeLeave a comment

The Intel Math Kernel Library (MKL) contains a collection of highly optimized numerical functions. Among others, it provides implementations of Blas functions and Lapack functions for various linear algebra problems. A program, which is dynamically linked against the standard Blas and Lapack libraries, can easily benefit from alternative optimized implementations by replacing libblas.so and liblapack.so… Continue reading Boosting NumPy with MKL

CUDA-aware MPI

April 30, 2014January 18, 2023 Heiko BaukeLeave a comment

CUDA and MPI provide two different APIs for parallel programming that target very different parallel architectures. While CUDA allows to utilize parallel graphics hardware for general purpose computing, MPI is usually employed to write parallel programs that run on large SMP systems or on cluster computers. In order to improve a cluster’s overall computational capabilities… Continue reading CUDA-aware MPI

GPU computing · Performance

GPU FFT performance

February 23, 2011August 27, 2015 Heiko Bauke1 Comment

In a recent paper (see arXiv:1012.3911) we showed how to solve the time-dependent Schrödinger equation and the time-dependent Dirac equation by a Fourier split operator method on GPU hardware. For the Fourier split operator method one has to compute a fast Fourier transform (FFT) in each time step and the FFT dominates the overall computing… Continue reading GPU FFT performance

Performance · Software

Parallel FFT performance

March 6, 2010January 18, 2023 Heiko Bauke12 Comments

In a recent project (the numerical solution of the Dirac equation) I am working on, the computation of the fast Fourier transform (FFT) of four interwoven two-dimensional grids is the main computational task. Interwoven grids means that the memory layout of the matrices is such that data starts with the 1st element of the 1st… Continue reading Parallel FFT performance

Number Crunch

A computational science blog.

Category: Performance

Boosting NumPy with MKL

CUDA-aware MPI

GPU FFT performance

Parallel FFT performance