Enhancing the performance of checkerboard Thomas method by using shared memory for solving the heat transfer problems on GPU

Authors

Ph.D. Student, Mechanical Engeering, University of Birjand, Birjand, Iran.

Abstract

General Purpose Graphics Processing Unite (GPGPU) allows the user to utilize GPU for general computing purposes. Using these processors can cause a significant speedup in numerical calculation for solving CFD problems. Several studies have been performed to investigate the advantages of using the GPGPU in numerical calculations including solving tridiagonal set of equations. In 2016, Checkerboard method introduced for solving tridiagonal set of equations in ADI solvers. In this method each set of equations is divided in to several smaller independent set of equations. Then each one of them will be solved by Thomas algorithm in checkerboard style. In addition to the participation of many threads, in this method it is possible to store the information of each set of equation in shared memory. In the present research, according to consideration around using shared memory, a strategy for using this memory in checkerboard Thomas method has been offered. Results shows that utilizing shared memory has been caused to computing speedup between 1.2x to 1.6x, compared with utilizing global memory. Also, it was found that bank conflict causes to decrease the speed from 10.9% to 18.8% in checkerboard Thomas method.

Keywords

Main Subjects


[1]  Du P, Weber R, Luszczek P, Tomov S, Peterson G, Dongarra J (2012) From CUDA to OpenCL: Towards a performance-portable solution for multi-platform GPU programming. Parallel Comput 38(8): 391-407.
[2]  Sengupta S, Harris M, Zhang Y, Owens JD (2007) Scan primitives for GPU computing. Proceedings of the 22nd ACM SIGGRAPH/EUROGRAPHICS symposium on Graphics hardware, San Diego, California.
[3]  Sakharnykh N (2009) Tridiagonal solvers on the GPU and applications to fluid simulation. NVIDIA GPU Technology Conference, San Jose, California, USA.
[4]  Zhang Y, Cohen J, Owens JD (2010) Fast tridiagonal solvers on the GPU. ACM Sigplan Notices 45(5): 127-136.
[5]  Göddeke D, Strzodka R (2011) Cyclic reduction tridiagonal solvers on GPUs applied to mixed-precision multigrid. IEEE T Parall Distr 22(1): 22-32.
[6]  Davidson A, Owens JD (2011) Register packing for cyclic reduction: a case study, Proceedings of the fourth workshop on general purpose processing on graphics processing units, Newport Beach, California, USA.
[7]  Sakharnykh N (2010) Efficient tridiagonal solvers for ADI methods and fluid simulation. NVIDIA GPU Technology Conference, San Jose, California, USA.
[8]  Egloff D (2011) Pricing financial derivatives with high performance finite difference solvers on GPUs, Wen-mei W. Hwu (Eds.), GPU Computing Gems Jade Edition, pp. 23.309-23.322, Burlington: Morgan Kaufmann.
[9]  Zhang Y, Cohen J, Davidson AA, Owens JD (2011) A Hybrid Method for Solving Tridiagonal Systems on the GPU, Wen-mei W. Hwu (Eds.), GPU Computing Gems Jade Edition, pp. 11.117-11.132, Burlington: Morgan Kaufmann.
[10] Wei Z, Jang B, Zhang Y, Jia Y (2013) Parallelizing alternating direction implicit solver on GPUs, Procedia Computer Science. 18: 389-398.
[11] Kim HS, Wu S, Chang LW, Hwu WMW (2011) A scalable tridiagonal solver for gpus. International Conference on Parallel Processing (ICPP), Taipei, Taiwan.
[12] Esfahanian V, Darian HM, Gohari SI (2013) Assessment of WENO schemes for numerical simulation of some hyperbolic equations using GPU. Comput Fluids 80: 260-268.
[13] Esfahanian V, Baghapour B, Torabzadeh M, Chizari H (2014) An efficient GPU implementation of cyclic reduction solver for high-order compressible viscous flow simulations. Comput Fluids 92: 160-171.
[14] Darian HM, Esfahanian V (2014) Assessment of WENO schemes for multi‐dimensional Euler equations using GPU. Int J Numer Meth Fl 76(12): 961-981.
[15] Zolfaghari A, Foadaddini A (2016) Developing new Checkerboard Thomas algorithm for solving tridiagonal set of equations on GPU. Modares Mechanical Engineering 16(2): 309-318. (in Persian)
 [16] Beck JV, Wright N, Haji-Sheikh A, Cole KD, Amos D (2008) Conduction in rectangular plates with boundary temperatures specified. Heat Mass Transfer 51: 4676-4690.