In this post, I will analyze the CUDA implementation of the N-Body simulation. The implementation that I will be using as a reference for this article is provided with the CUDA GPU Computing SDK 10.2. The source code for this implementation is available in the “%NVCUDASAMPLES_ROOT%\5_Simulations\nbody” in the GPU Computing SDK 10.2 samples base folder.
I assume the reader has a good understanding of the CUDA programming API.
In this article I will provide a brief introduction to OpenCL. OpenCL is a open standard for general purpose parallel programming across CPUs, GPUs, and other programmable parallel devices. I assume that the reader is familiar with the C/C++ programming languages. I will use Microsoft Visual Studio 2008 to show how you can setup a project that is compiled with the OpenCL API.
In a previous article, I gave an introduction to programming with CUDA. Now I’d like to go into a little bit more depth about the CUDA thread execution model and the architecture of a CUDA enabled GPU. I assume that the reader has basic knowledge about CUDA and already knows how to setup a project that uses the CUDA runtime API. If you don’t know how to setup a project with CUDA, you can refer to my previous article: Introduction to CUDA.