Next Generation FFT Algorithms: Parallel Implementations and Applications at SIAM PP26
The 2026 SIAM Conference on Parallel Processing for Scientific Computing (PP26) took place from March 3–6, 2026 in Berlin, with registration and sessions held at the Zuse Institute Berlin and the Free University of Berlin. The 21st edition of the conference brought together researchers working on high‑performance computing (HPC), scalable algorithms, and parallel scientific software. The program emphasized the intersection between applied mathematics and computational science, making it the natural venue for a minisymposium on the next generation of Fast Fourier Transform (FFT) algorithms.
Daisuke Takahashi (University of Tsukuba) and I (King Abdullah University of Science and Technology) organized a minisymposium titled “Next Generation FFT Algorithms in Theory and Practice: Parallel Implementations and Applications.” The minisymposium explored how current parallel software implementations do not always exploit modern FFT hardware efficiently, despite underpinning countless applications. Comprised of four invited presentations, the session bridged the gap between algorithm development and practical implementations by fostering dialogue between researchers working on alternative FFT algorithms and those focused on implementations for parallel hardware. Number‑theoretic transforms (NTTs), discrete transforms closely related to FFTs, were also included in the scope of the session. The talks illustrated how algorithmic innovations, automatic tuning, and hybrid classical-quantum approaches could leverage modern HPC platforms.
Readers may recall a SIAM News Online article following the 2018 SIAM Conference on Parallel Processing for Scientific Computing that provided an earlier overview of the implementations and applications of state-of-the-art FFT algorithms [2]. This article expands on those reflections by focusing on the next generation of FFT and NTT algorithms and how they were presented at PP26. The following summarizes each presentation.
In the opening talk, Takahashi discussed Graphics Processing Units (GPU)‑accelerated implementation of parallel NTTs, which incorporates an automatic tuning facility to select optimal parameters when running NTTs on GPU clusters (e.g., the amount of overlap between computation and communication or the choice of radices). Because parallel NTTs require intensive all‑to‑all communication, the tuning facility is crucial for balancing computation and communication. Takahashi’s talk presented performance results that demonstrated the benefits of automatic tuning and discussed how these techniques could be generalized to other FFT‑like algorithms.
I continued the symposia with a second talk that introduced a hybrid quantum-classical method for solving nonlinear differential equations. The approach discretized the differential equation using Chebyshev spectral collocation, converted the resulting algebraic system into a quadratic unconstrained binary optimization (QUBO) problem, and solved it on a quantum annealer. By coupling quantum speedups for the core optimization with classical control over nonlinearities, the method exhibits promising scalability and accuracy. When I tested this approach experimentally, the hybrid method more closely approximated the true solution than a classical solver (see Figure 1) [1] — suggesting that annealer‑based QUBO formulations provide more stable convergence for certain classes of nonlinear ordinary differential equations. The remainder of the talk consisted of detailed test problems and discussions around the advantages of using the hybrid strategy over purely classical methods. Such insights help explain why hybrid quantum-classical algorithms are garnering interest within the FFT community.
The third talk, presented by Michael Sorochan Armstrong and José Camacho (Universidad de Granada), focused on parallelizing the direct, interpolative non‑uniform FFTs (NFFTs) to impute missing values in periodic signals. Interpolative inverse FFTs can reconstruct gaps in time‑series data, but when sampling is irregular and damping coefficients vary, the transformation matrix lacks the Vandermonde properties that facilitate efficient inversion. The authors formulated the problem as a least‑squares optimization with weighted norms and derived a direct solution via an lower-upper decomposition, exploiting GPU‑based tensor operations in PyTorch to accelerate the computationally intensive matrix multiplications. Their results demonstrated substantial reductions in execution time compared to Central Processing Unit‑based implementations, indicating that GPUs could dramatically speed up NFFTs used for imputation.
In the final presentation, Flavia Gehrig and Matti Schneider (University of Duisburg) proposed a novel computational homogenization approach that combined FFT‑based solvers with extended finite element methods (X‑FEM). Traditional FFT‑based homogenization methods operate on regular grids and avoid mesh generation, typically only converging linearly with respect to mesh spacing, thus sacrificing accuracy. To overcome this limitation, Gehrig and Schneider integrated an X‑FEM discretization with a mesh‑independent FFT‑based solver, achieving quadratic convergence of effective material properties. The approach retained the computational efficiency of FFT methods while improving accuracy, offering a promising direction for simulations of heterogeneous materials in engineering.
The Next Generation FFT Algorithms minisymposium at PP26 showcased how innovations in algorithms, hardware‑aware tuning, and hybrid quantum techniques are shaping the future of FFTs. By addressing challenges ranging from automatic tuning of NTTs on GPU clusters to quantum annealing solutions for nonlinear differential equations, the session exemplified the conference’s emphasis on high‑performance and scalable computing. The inclusion of non‑uniform transforms for data imputation and improved homogenization methods highlighted the breadth of FFT applications in signal processing and materials science. Attendees engaged in lively discussions on algorithm design, hardware optimization, and emerging quantum technologies, reflecting the dynamic and interdisciplinary spirit of the conference as a whole. The session also built on earlier efforts to promote reproducibility in benchmarking parallel FFT‑based applications [3], underlining the importance of repeatable experiments and transparent reporting in computational science.
References
[1] Aseeri, S.A. (2025). A Hybrid Quantum–Classical Spectral Solver for Nonlinear Differential Equations. Algorithms, 18(11), 678.
[2] Aseeri, S.A. (2018). State‑of‑the‑art FFT: Algorithms, Implementations, and Applications. SIAM News Online. Retried from https://www.siam.org/publications/siam-news/articles/state-of-the-art-fft-algorithms-implementations-and-applications/.
[3] Aseeri, S.A., Muite, B.K., & Takahashi, D. (2019). Reproducibility in Benchmarking Parallel Fast Fourier Transform Based Applications. In Companion of the Tenth ACM/SPEC International Conference on Performance Engineering (ICPE ’19). Mumbai, India.
About the Author
Samar Aseeri
Computational scientist, King Abdullah University
Samar Aseeri is a computational scientist at King Abdullah University of Science and Technology (KAUST) in Saudi Arabia. She earned her undergraduate, graduate, and doctoral degrees in applied mathematics from Umm Al-Qura University in Saudi Arabia, and completed supercomputing training from IBM in New York. Aseeri is currently leading two initiatives to establish high-performance computing communities: Benchmarking in the Data Center and FFT in the Exascale Era.

Stay Up-to-Date with Email Alerts
Sign up for our monthly newsletter and emails about other topics of your choosing.





