Hostname: page-component-586b7cd67f-t8hqh Total loading time: 0 Render date: 2024-11-24T07:40:49.307Z Has data issue: false hasContentIssue false

PVFMM: A Parallel Kernel Independent FMM for Particle and Volume Potentials

Published online by Cambridge University Press:  14 September 2015

Dhairya Malhotra*
Affiliation:
The University of Texas at Austin, Austin, TX 78712
George Biros
Affiliation:
The University of Texas at Austin, Austin, TX 78712
*
*Corresponding author. Email addresses: [email protected] (D. Malhotra), [email protected] (G. Biros)
Get access

Abstract

We describe our implementation of a parallel fast multipole method for evaluating potentials for discrete and continuous source distributions. The first requires summation over the source points and the second requiring integration over a continuous source density. Both problems require (N2) complexity when computed directly; however, can be accelerated to (N) time using FMM. In our PVFMM software library, we use kernel independent FMM and this allows us to compute potentials for a wide range of elliptic kernels. Our method is high order, adaptive and scalable. In this paper, we discuss several algorithmic improvements and performance optimizations including cache locality, vectorization, shared memory parallelism and use of coprocessors. Our distributed memory implementation uses space-filling curve for partitioning data and a hypercube communication scheme. We present convergence results for Laplace, Stokes and Helmholtz (low wavenumber) kernels for both particle and volume FMM. We measure efficiency of our method in terms of CPU cycles per unknown for different accuracies and different kernels. We also demonstrate scalability of our implementation up to several thousand processor cores on the Stampede platform at the Texas Advanced Computing Center.

Type
Computational Software
Copyright
Copyright © Global-Science Press 2015 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

[1]Chandramowlishwaran, Aparna, Madduri, Kamesh, and Vuduc, Richard. Diagnosis, tuning, and redesign for multicore performance: A case study of the fast multipole method. In Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, pages 112. IEEE Computer Society, 2010.Google Scholar
[2]Cheng, H., Greengard, L., and Rokhlin, V.. A fast adaptive multipole algorithm in three dimensions. Journal of Computational Physics, 155(2):468498, 1999.Google Scholar
[3]Engquist, Björn, Ying, Lexing, et al. A fast directional algorithm for high frequency acoustic scattering in two dimensions. Communications in Mathematical Sciences, 7(2):327345, 2009.Google Scholar
[4]Ethridge, Frank and Greengard, Leslie. A new fast-multipole accelerated poisson solver in two dimensions. SIAM Journal on Scientific Computing, 23(3):741760, 2001.CrossRefGoogle Scholar
[5]Fong, William and Darve, Eric. The black-box fast multipole method. Journal of Computational Physics, 228(23):87128725, 2009.Google Scholar
[6]Fu, Yuhong, Klimkowski, Kenneth J., Rodiny, Gregory J., Berger, Emery, Browne, James C., C, James, Singer, Jrgen K., Van De Geijn, Robert A., and Vemaganti, Kumar S.. A fast solution method for three-dimensional many-particle problems of linear elasticity. Int. J. Num. Meth. Engrg, 42:12151229, 1998.3.0.CO;2-5>CrossRefGoogle Scholar
[7]Fu, Yuhong and Rodin, Gregory J. Fast solution method for three-dimensional stokesian many-particle problems. Communications in Numerical Methods in Engineering, 16(2):145149, 2000.3.0.CO;2-E>CrossRefGoogle Scholar
[8]Gimbutas, Zydrunas and Rokhlin, Vladimir. A generalized fast multipole method for nonoscillatory kernels. SIAM Journal on Scientific Computing, 24(3):796817, 2003.Google Scholar
[9]Greengard, L. and Rokhlin, V.. A fast algorithm for particle simulations. J. Comput. Phys., 73(2):325348, December 1987.Google Scholar
[10]Greengard, Leslie. Fast algorithms for classical physics. Science, 265(5174):909914, 1994.Google Scholar
[11]Greengard, Leslie F. and Huang, Jingfang. A new version of the fast multipole method for screened coulomb interactions in three dimensions. Journal of Computational Physics, 180(2):642658, 2002.CrossRefGoogle Scholar
[12]Hamada, T., Narumi, T., Yokota, R., Yasuoka, K., Nitadori, K., and Taiji, M.. 42 TFlops hierarchical N-body simulations on GPUs with applications in both astrophysics and turbulence. In Proceedings of SC09, The SCxy Conference series, Portland, Oregon, November 2009. ACM/IEEE.Google Scholar
[13]Hu, Qi, Gumerov, Nail A, and Duraiswami, Ramani. Scalable fast multipole methods on distributed heterogeneous architectures. In Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, page 36. ACM, 2011.Google Scholar
[14]Jetley, Pritish, Wesolowski, Lukasz, Gioachin, Filippo, Kaleé, Laxmikant V, and Quinn, Thomas R. Scaling hierarchical n-body simulations on gpu clusters. In Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, pages 111. IEEE Computer Society, 2010.Google Scholar
[15]Langston, Harper, Greengard, Leslie, and Zorin, Denis. A free-space adaptive fmm-based pde solver in three dimensions. Communications in Applied Mathematics and Computational Science, 6(1):79122, 2011.Google Scholar
[16]Lashuk, Ilya, Chandramowlishwaran, Aparna, Langston, Harper, Nguyen, Tuan-Anh, Sampath, Rahul, Shringarpure, Aashay, Vuduc, Richard, Ying, Lexing, Zorin, Denis, and Biros, George. A massively parallel adaptive fast multipole method on heterogeneous architectures. Communications of the ACM, 55(5):101109, May 2012.Google Scholar
[17]Lindsay, Keith and Krasny, Robert. A particle method and adaptive treecode for vortex sheet motion in three-dimensional flow. Journal of Computational Physics, 172(2):879907, 2001.Google Scholar
[18]Makino, Junichiro, Fukushige, Toshiyuki, and Koga, Masaki. A 1.349 tflops simulation of black holes in a galactic center on grape-6. In Supercomputing, ACM/IEEE 2000 Conference, pages 4343. IEEE, 2000.CrossRefGoogle Scholar
[19]Malhotra, Dhairya and Biros, George. pvfmm: A distributed memory fast multipole method for volume potentials, 2014. submitted.Google Scholar
[20]Malhotra, Dhairya, Gholami, Amir, and Biros, George. A volume integral equation stokes solver for problems with variable coefficients. In High Performance Computing, Networking, Storage and Analysis, SC14: International Conference for, pages 92102. IEEE, 2014.Google Scholar
[21]Rahimian, A., Lashuk, I., Veerapaneni, S.K., Chandramowlishwaran, A., Malhotra, D., Moon, L., Sampath, R., Shringarpure, A., Vetter, J., Vuduc, R., Zorin, D., and Biros, G.. Petascale direct numerical simulation of blood flow on 200k cores and heterogeneous architectures. In SC ’10: Proceedings of the 2010 ACM/IEEE conference on Supercomputing, pages 112, Piscataway, NJ, USA, 2010. IEEE Press.Google Scholar
[22]Song, Jiming, Lu, Cai-Cheng, and Chew, Weng Cho. Multilevel fast multipole algorithm for electromagnetic scattering by large complex objects. Antennas and Propagation, IEEE Transactions on, 45(10):14881493, 1997.Google Scholar
[23]Takahashi, Toru, Cecka, Cris, Fong, William, and Darve, Eric. Optimizing the multipole-to-local operator in the fast multipole method for graphical processing units. International Journal for Numerical Methods in Engineering, 89(1):105133, 2012.CrossRefGoogle Scholar
[24]Warren, Michael S and Salmon, John K. Astrophysical n-body simulations using hierarchical tree data structures. In Proceedings of the 1992 ACM/IEEE Conference on Supercomputing, pages 570576. IEEE Computer Society Press, 1992.Google Scholar
[25]Warren, Michael S and Salmon, John K. A parallel hashed oct-tree n-body algorithm. In Proceedings of the 1993 ACM/IEEE conference on Supercomputing, pages 1221. ACM, 1993.Google Scholar
[26]Ying, Lexing, Biros, George, and Zorin, Denis. A kernel-independent adaptive fast multipole method in two and three dimensions. Journal of Computational Physics, 196(2):591626, 2004.Google Scholar
[27]Ying, Lexing, Biros, George, Zorin, Denis, and Langston, Harper. A new parallel kernel-independent fast multipole method. In Supercomputing, 2003 ACM/IEEE Conference, pages 1414. IEEE, 2003.Google Scholar
[28]Yokota, R., Bardhan, J.P., Knepley, M.G., Barba, LA, and Hamada, T.. Biomolecular electrostatics using a fast multipole bem on up to 512 gpus and a billion unknowns. Computer Physics Communications, 2011.Google Scholar