Optimizing BIT1, a Particle-in-Cell Monte Carlo Code, with OpenMP/OpenACC and GPU Acceleration
Authors: Jeremy J. Williams, Felix Liu, David Tskhakaya, Stefan Costea, Ales Podolnik, Stefano Markidis
Summary: On the trail towards creating the primary fusion vitality units, plasma simulations have turn out to be indispensable instruments for supporting the design and growth of fusion machines. Amongst these important simulation instruments, BIT1 is a complicated Particle-in-Cell code with Monte Carlo collisions, particularly designed for modeling plasma-material interplay and, specifically, analyzing the facility load distribution on tokamak divertors. The present implementation of BIT1 depends solely on MPI for parallel communication and lacks assist for GPUs. On this work, we handle these limitations by designing and implementing a hybrid, shared-memory model of BIT1 able to using GPUs. For shared-memory parallelization, we depend on OpenMP and OpenACC, utilizing a task-based strategy to mitigate load-imbalance points within the particle mover. On an HPE Cray EX computing node, we observe an preliminary efficiency enchancment of roughly 42%, with scalable efficiency displaying an enhancement of about 38% when utilizing 8 MPI ranks. Nonetheless counting on OpenMP and OpenACC, we introduce the primary model of BIT1 able to utilizing GPUs. We examine two completely different information motion methods: unified reminiscence and specific information motion. General, we report BIT1 information switch findings throughout every PIC cycle. Amongst BIT1 GPU implementations, we reveal efficiency enchancment by way of concurrent GPU utilization, particularly when MPI ranks are assigned to devoted GPUs. Lastly, we analyze the efficiency of the primary BIT1 GPU porting with the NVIDIA Nsight instruments to additional our understanding of BIT1 computational effectivity for large-scale plasma simulations, able to exploiting present supercomputer infrastructure