The 3D Fourier Reconstruction is used during the reconstruction of volumes obtained by cryo-electron microscopy. It is a significant computational bottleneck in the pipeline. We have introduced a novel GPU-friendly algorithm, improving cache locality and removing race conditions in parallel writing into the 3-D volume over state-of-the-art implementations. Our algorithm has been auto-tuned by Kernel Tuning Toolkit (https://github.com/Fillo7/KTT).
We have integrated the algorithm into widely used software Xmipp, version 3.19, reaching 11.4× speedup compared to the original parallel CPU implementation using GPU with comparable power consumption. Moreover, we have reached 31.7× speedup using four GPUs and 2.14×–5.96× speedup compared to optimized GPU implementation based on state-of-the-art algorithms. The paper is available at https://journals.sagepub.com/doi/abs/10.1177/1094342019832958?journalCode=hpcc
Fast Fourier transform (FFT) it is often one of the most computationally demanding kernels in various scientific applications. Although a lot of attention has been invested into tuning its performance on various hardware devices, FFT libraries usually have many possible settings, and it is not always easy to deduce which settings should be used for optimal performance. In practice, we can often slightly modify the FFT settings; for example, we can pad or crop input data.
We have developed a new tool, cuFFTAdvisor (https://github.com/DStrelak/cuFFTAdvisor), which proposes and using auto-tuning finds the best configuration of the cuFFT library (a popular library computing FFT on GPUs) for given constraints of input size and plan settings. We experimentally show that our tool can propose different settings of the transformation, resulting in an average 6x speedup using fast heuristics and 6.9x speedup using auto-tuning. The paper is available here: https://dl.acm.org/citation.cfm?id=3295817
You are running an old browser version which is not fully supported information system anymore. Some applications might not display correctly, some functions might not work as expected or might not work at all.