In this post, I will analyze the CUDA implementation of the N-Body simulation. The implementation that I will be using as a reference for this article is provided with the CUDA GPU Computing SDK 4.0. The source code for this implementation is available in the “C\src\nbody” in the GPU Computing SDK 4.0 base folder.
I assume the reader has a good understanding of the CUDA programming API.
In this article I will provide a brief introduction to OpenCL. OpenCL is a open standard for general purpose parallel programming across CPUs, GPUs, and other programmable parallel devices. I assume that the reader is familiar with the C/C++ programming languages. I will use Microsoft Visual Studio 2008 to show how you can setup a project that is compiled with the OpenCL API.
In a previous article, I gave an introduction to programming with CUDA. Now I’d like to go into a little bit more depth about the CUDA thread execution model and the architecture of a CUDA enabled GPU. I assume that the reader has basic knowledge about CUDA and already knows how to setup a project that uses the CUDA runtime API. If you don’t know how to setup a project with CUDA, you can refer to my previous article: Introduction to CUDA.