Optimizing BIT1, a Particle-in-Cell Monte Carlo Code, with OpenMP/OpenACC and GPU Acceleration
Authors: Jeremy J. Williams, Felix Liu, David Tskhakaya, Stefan Costea, Ales Podolnik, Stefano Markidis
Abstract: On the path in direction of creating the first fusion vitality items, plasma simulations have change into indispensable devices for supporting the design and progress of fusion machines. Amongst these necessary simulation devices, BIT1 is a sophisticated Particle-in-Cell code with Monte Carlo collisions, significantly designed for modeling plasma-material interaction and, particularly, analyzing the power load distribution on tokamak divertors. The current implementation of BIT1 relies upon solely on MPI for parallel communication and lacks help for GPUs. On this work, we deal with these limitations by designing and implementing a hybrid, shared-memory mannequin of BIT1 in a position to utilizing GPUs. For shared-memory parallelization, we rely on OpenMP and OpenACC, using a task-based technique to mitigate load-imbalance factors throughout the particle mover. On an HPE Cray EX computing node, we observe an preliminary effectivity enchancment of roughly 42%, with scalable effectivity displaying an enhancement of about 38% when using 8 MPI ranks. Nonetheless relying on OpenMP and OpenACC, we introduce the first mannequin of BIT1 in a position to using GPUs. We study two fully completely different data movement strategies: unified memory and particular data movement. Basic, we report BIT1 data swap findings all through each PIC cycle. Amongst BIT1 GPU implementations, we reveal effectivity enchancment by the use of concurrent GPU utilization, significantly when MPI ranks are assigned to devoted GPUs. Lastly, we analyze the effectivity of the first BIT1 GPU porting with the NVIDIA Nsight devices to further our understanding of BIT1 computational effectivity for large-scale plasma simulations, in a position to exploiting current supercomputer infrastructure