Performance

Boosting NumPy with MKL

The Intel Math Kernel Library (MKL) contains a collection of highly optimized numerical functions.  Among others, it provides implementations of Blas functions and Lapack functions for various linear algebra problems. A program, which is dynamically linked against the standard Blas and Lapack libraries, can easily benefit from alternative optimized implementations by replacing libblas.so and liblapack.so… Continue reading Boosting NumPy with MKL

GPU computing · MPI · parallel computing · Performance

CUDA-aware MPI

CUDA and MPI provide two different APIs for parallel programming that target very different parallel architectures. While CUDA allows to utilize parallel graphics hardware for general purpose computing, MPI is usually employed to write parallel programs that run on large SMP systems or on cluster computers. In order to improve a cluster’s overall computational capabilities… Continue reading CUDA-aware MPI