GPU Skinning of MD5 Models in OpenGL and Cg

Bob with Lamp (GPU Skinning)

Bob with Lamp (GPU Skinning)

This tutorial builds upon the previous article titled [Loading and Animating MD5 Models with OpenGL]. It is highly recommended that you read the previous article before following this one. In this tutorial, I will extend the MD5 model rendering to provide support for GPU skinning. I will also provide an example shader that will perform the vertex skinning in the vertex shader and do per-fragment lighting on the model using a single point light. For a complete discussion on lighting in CgFX, you can refer to my previous article titled [Transformation and Lighting in Cg].


In skeletal animation, vertex skinning is the process of transforming the vertex position and normal of a mesh based on the matrix of animated bones that the vertex is weighted to. Before the advent of the programmable shader pipeline in graphics hardware, it was necessary to compute the position and normal of every vertex of a mesh on the CPU and upload the vertex information to the GPU before the animated model could be rendered correctly. Using a programmable vertex shader, we can upload the vertex information of the mesh to GPU memory then for subsequent renders, we pass the transformed bones of the animated skeleton to the GPU and allow the vertex program to compute the animated vertex position and normals. The benifit is that instead of sending thousands of vertices to the GPU every frame, only a small fraction of data needs to be sent to animate the entire model.

In this example, I will use vertex buffer objects (VBO’s) to store the model’s vertex information directly in GPU memory and render the animated model using a custom vertex shader and fragment shader written in the Cg shader language.


The demo shown in this article uses several 3rd party libraries to simplify the development process.

  • The Cg Toolkit (Version 3): The Cg Toolkit provides the tools and API needed to integrate the Cg shader programs in your application.
  • Boost (1.46.1): Boost has some very useful libraries that I use throughout my demo applications. In this demo, I use the Signals, Filesystem, Function, and Bind boost libraries to provide a generic, platform independent functionality that simplifies some of the features used in this demo.
  • Simple DirectMedia Layer (1.2.14): Simple DirectMedia Layer (SDL) is a cross-platform multimedia library that I use to create the main application window, initialize OpenGL, and handle keyboard, mouse and joystick input.
  • OpenGL Mathmatics (GLM): An OpenGL centric mathmatics library for 3D graphics applications.
  • Simple OpenGL Image Library (SOIL): SOIL is a tiny C library used primarily for uploading textures into OpenGL.

Any dependencies used by the demo are included in the source distribution available at the bottom of the article so you can hopefully just unzip, compile and run the included samples.

Skeletal Animation

When an animator creates a skeletal animated character in a modeling package (like 3D Studio Max, Maya, or Blender) the animator must perform a process of weighting each vertex of the mesh to a number of bones that represents the skeleton. This process of weighting vertices to bones is called “rigging”.

Once the model is correctly rigged the animator will export the model together with the animations that are associated with that model. In some cases the same animation can be applied to multiple models. In order to correctly animate the model together with a particular animation we must be able to transform the animated skeletal structure into a form that makes sense to the model we are trying to animate. In order to do that we need some reference pose that represents the mesh in it’s “identity” pose (the pose of the model if no animation is applied). This “identity” pose is called the “bind” pose.

The bind pose is very important for vertex skinning because we will use the bind pose to transform the animated bone matrices back into a form that makes sense for our model.

Once we have the animated bone matrices we can apply them to the vertex positions and normals based on the amount of weight that each vertex is assigned to the bone. The result is an animated character model with correct vertex positions and normals.

Let’s see how we can do this in practice.

The MD5Model Class

At this point you should have throughly read the previous article on loading and animating MD5 models because now I will only discuss the differences between that implementation and one that performs the vertex transformations on the GPU.

The Header File

In order to optimize the mesh rendering, it makes sense to store the vertex information in vertex buffer objects (VBOs) and upload the vertex information in the model’s bind pose to the GPU when the model is loaded the first time. In order to support the VBOs, we need to store VBO ID’s for each sub-mesh of the model.

We also need to store two additional streams that will be used to transform the vertex positions on the GPU.

The bone weights will be stored in the m_BoneWeights buffer. Each vertex will store up to four weights for a maximum of four bones that can be weighted to each vertex. Generally four bones is enough to animate the vertices of the mesh and for this demo, the MD5 model we are loading also does not use more than four bones per vertex. The bone weights for each vertex will be packed into a 4-component floating-point vector.

The bone indices will be stored in the m_BoneIndex buffer. Each vertex will store up to four indices for a maximum of four bones per vertex that can be applied to the animated vertex position.

I’ve highlighted additional declarations for the Mesh object.

Since we will be using a CgFX shader to transform our vertex positions, we need to associate an effect to the model. So I added a parameterized constructor that takes a reference to an Effect.

In addition to the constructor, I’ve also added a function to compute the bind pose and the inverse bind pose matrices for every joint in the model from the model’s initial joints.

And a few additional member variables to store the bind pose and inverse bind pose matrices for each joint of the model. And an array of matrices that will store the animated bone matrices pre-multiplied by the inverse bind pose.

And of course, we need to store the reference to the effect that will be used to render the model.

This model class supports both CPU and GPU vertex skinning so we define a member variable that lets us switch between the two skinning modes.

The MD5ModelClass File

When the model is destroyed, we also have to delete all of the vertex buffer objects that were created for the meshes. For simplicity, we’ll create a few helper functions that we can use to create and destroy vertex buffer objects.

And in the model’s destructor, we have to delete the vertex buffer object for all the submeshes of the model.

The MD5Model::LoadModel method has also been slightly modified to build the bind pose and the inverse bind pose matrices for each joint of the model. Sine the joint’s bind-pose is defined in the “joints” section of the MD5 model file, we can build the bind pose matrices after the joints have been read in.

I’ve highlighted the additional line.

Also, after each mesh has been imported in the MD5Model::LoadModel method, we will call a method to create and populate the vertex buffer objects of each mesh.

Again, I have highlighted the additional line of code.

The MD5Model::BuildBindPose Method

In the MD5Model::BuildBindPose method we will use the model’s “joints” definition to build the bind-pose, and an inverse bind-pose matrix for each joint of the model.

To build the bind-pose matrix array of the MD5 model, we simply build a translation and rotation matrix from the joint’s position and orientation parameters and create the combined matrix of the joint by multiplying these two matrices. The inverse bind-pose matrix is simply the inverse of the bind-pose matrix as seen on line 329.

Since the joint’s orientation is stored as a quaternion, we need to convert it to a 4×4 rotation matrix before we can create the compound homogeneous transformation matrix. Luckily, the GLM math library provides a function for doing this conversion.

Then we store these matrices for each joint in the model in the bind-pose and the inverse bind-pose vector containers.

It is not actually necessary to store the bind-pose matrix after we have calculated the inverse bind pose of the joints. Only the inverse bind-pose matrix is needed when updating the animation of the model.

The MD5Model::PrepareMesh Method

There are also a few changes that need to be made to the MD5Model::PrepareMesh method that take the additional buffers I mentioned earlier (the bone index buffer and the bone weight buffer).

The bone index and bone weight information is extracted from the weight information that is defined for each vertex.

As a precaution, I’ve added the assert on line 361 to make sure that no vertex has been weighted to more than four bones.

The MD5Model::CreateVertexBuffers Method

Since we will only be manipulating the mesh vertices on the GPU, we can upload the vertex data to vertex buffer objects (VBOs). If you’ve followed my article on terrains [Multi-textured Terrain in OpenGL] then you should be familiar with using vertex buffers.

The MD5Model::Update Method

The MD5Model::Update method also needs to be modified to account for the animated joints from the animation class are now stored as matrices so they can be easily sent to the GPU shader program.

The important thing to note is that before the animated joints can be applied to the vertices of the mesh, they need to un-transform the bind-pose positions and rotations of the mesh to get the vertices into the correct space. This is done by multiplying the animated joints by the inverse of the bind pose matrix.

On line 479, the animated joints are retrieved from the animation class and then we loop through all the joints and multiply them by the inverse bind-pose matrix of that joint.

The MD5Model::PrepareMesh method will transform the vertex positions and normals on the CPU using the animated skeleton. This is not actually necessary to do when we are doing the vertex skinning on the GPU but if I want to render the transformed normals of the model (for debugging) then I still need to do this step on the CPU (since I can’t read-back the transformed normals from the vertex program on the GPU).

The MD5Model::RenderMesh Method

The MD5Model::RenderMesh method will render the model’s sub-mesh using the OpenGL API. Since we now support GPU skinning, we will render the mesh using either the MD5Model::RenderCPU or the MD5Model::RenderGPU method.

If we are doing vertex skinning on the CPU, we’ll use the RenderCPU method and if we are doing the vertex skinning on the GPU, we’ll use the RenderGPU method.

The MD5Model::RenderCPU method is pretty much identical to the MD5Model::RenderMesh method from the [Loading and Animating MD5 Models with OpenGL] article. I’ve added support for materials in this version, but that’s about it.

Let’s take a look at the RenderGPU method.

The MD5Model::RenderGPU Method

For the MD5Model::RenderGPU method we will use the effect shader framework that is introduced in the [Introduction to Cg Runtime with OpenGL]. We will also use the vertex buffers that were initialized previously.

The first thing we will do is to setup the effect parameters that are used for the shader.

The baseSampler effect parameter takes the texture object ID that defines the texture that is used to map onto the mesh.

The boneMatrix parameter accepts the array of matrices that defines the animated joints of the model that we got from the MD5Animation class in the MD5Model::Update method.

In order to use the effect to render the model, we need to get a reference to the pass that defines the vertex and fragment programs. The pass is accessible via the technique.

Before we can use the parameters in the shader, they have to be committed to the GPU program. This is done by using the EffectManager::UpdateSharedParameters method and the Effect::UpdateParameters method.

The EffectManager::UpdateSharedParameters method will commit the parameters that are shared by all effects that are loaded by the effect manager class. To find out which shared parameters that are supported by the EffectManager please refer to the EffectManager::CreateSharedParameters method in the example source code provided at the end of this article.

The Effect::UpdateParameters method will updated all parameters that are unique to the shader effect.

Next we want to bind all of the vertex stream data that will be used to render the model.

And draw the mesh geometry using the shader effect.

The Pass::BeginPass method will bind the vertex and fragment shader programs to the rendering pipeline and it will also make sure the texture unit is correctly bound to the correct texture stage.

The Pass::EndPass method will disconnect the vertex and fragment programs from the rendering pipeline so we can once again render geometry using the fixed-function pipeline in OpenGL.

And we also have to make sure we restore other OpenGL states and disconnect the textures and vertex buffer objects.

In additions to the changes to the model class, there are a few changes made to the animation class.

The MD5Animation Class

The MD5Animation class is almost identical to the original implementation as described in the previous article [Loading and Animating MD5 Models with OpenGL]. The only difference I made is I added an additional parameter to store the animated skeleton joint as a matrix. When the animation frames are interpolated, I compute a resulting matrix for each joint of the animated skeleton.

The Header File

The only addition I made to the MD5Animation.h file is the additional matrix list that stores a 4×4 transformation matrix for each joint of the animated skeleton.

I’ve highlighted the additional lines in the code sample shown.

I also added an access method for the matrices of the animated skeleton.

The MD5Animation::GetSkeletonMatrixList method is used in the MD5Model::Update method shown earlier to get the animated skeleton joints. These matrices are transformed by the inverse of the bind pose to get them in the final space to be applied to the vertices of the mesh for rendering.

The Class File

When the skeletons of the animation are interpolated to compute the final transformation for each joint, I also store the final matrix transformation so the resulting joint matrices can be applied to the vertex shader.

And that is the only changes I made to the MD5Animation class file from the previous implementation. The only thing left to show is the effect that is used to render the model.

The Shader Effect File

The shader effect used for this demo combines both the vertex program which will apply the bone matrices to compute the final vertex position and normal of the animated skeleton, and a fragment program which will light the model using a single point light.

I’ve already shown you how to create the streams buffers for the bone index and bone weights for each vertex in the MD5Model::PrepareMesh method. And I’ve shown how you can connect the shader parameters and bind the streams using vertex buffer objects in the MD5Model::RenderGPU method. In this section I will only show the implementation of the shader program.

Global Variable Definition

The first thing we do in the shader program is define the global variables and structures that are used in the shader program.

First we define a constant to indicate the maximum number of bones that our animated skeleton can have. For this demo, the model we are using only contains 33 joints so this is fine. If your models contain more than 58 joints, then you will need to increase the MAX_BONES limit but you should be aware that each profile has a limit to the maximum number of GPU storage locations for variables.

These global variables are the ones that are being set in the MD5Model::RenderGPU method shown earlier.

At the time of this writing, an explanation of the different profiles could be found on the NVidia website

The Vertex Program

The vertex program will take the incoming vertex positions and vertex normals in object space and transform it into the animated positions and normals.

We also need to compute the clip-space position of the vertex and pass it as an-out parameter from the function.

The incoming vertex position is bound to the POSITION semantic and the OpenGL API uses the glVertexPointer method to define how that data is passed to the GPU.

The vertex normal is bound to the NORMAL semantic and the incoming data is passed using the glNormalPointer method in OpenGL.

The three TEXCOORDn semantics are bound to the input streams defined in the application using the glTexCoordPointer pointer. The correct semantic is determined by the current active texture stage defined by the glClientActiveTexture method in OpenGL.

The two uniform parameters are passed as arguments to the vertex program when the program is compiled (this is done in the definition for the pass shown later).

On lines 57-61 the summed matrix transform for the vertex is computed by multiplying the animated bone matrices by the bone weight. The finalWeight is computed manually to ensure that the sum of the weights adds to one.

On line 63 and 64 the animated position and normal of the vertex is computed by multiplying the incoming vertex and normal by the summed matrix.

On line 66, the texture coordinate is simply passed-through to the fragment program.

And finally, on line 67 the clip-space position of the vertex is computed from the WORLDVIEWPROJECTION matrix and the object space vertex position.

The Fragment Program

The fragment program accepts the output parameters from the vertex program and outputs a single color value that is bound to the COLOR semantic.

The fragment program shown here is identical to the fragment program for the Blinn-Phong lighting model that was shown in the article titled [Transformation and Lighting in Cg]. The only addition here is the texture sampler that is used to define the base color of the fragment.

For a throughout explanation of the lighting model shown here, take a look at my previous article [Transformation and Lighting in Cg]. The only additional part here are the highlighted lines where the texture sampler is used to determine the base color of the fragment.

Technique and Passes

We only define a single technique for this effect and only a single pass for that technique.

The special profile latest is used to indicate that this pass should use the latest vertex and fragment profiles that are supported on the current platform.

If everything goes right, then the final result should be something similar to what is shown below.


In addition to the references that were credited in the original article titled [Loading and Animating MD5 Models with OpenGL] I also used the following books as a reference.

The Cg Tutorial

The Cg Tutorial

The Cg Tutorial: The Definitive Guide to Programmable Real-Time Graphics (2003). Randima Fernando and Mark J. Kilgard. Addison Wesley.

Download the Source

You can download the source code for this demo from the link below.


15 thoughts on “GPU Skinning of MD5 Models in OpenGL and Cg

    • You didn’t say that we need glut…but in the folder there is glut…:-? do you use it in this project?or you use just sdl?

  1. When i run the source code i get two warnings while compiling the shader:
    Kd conflicts with semantics DIFFUSE
    Ks conflicts with semantics SPECULAR
    And a load of errors on cgfx line 54:
    cannot locate suitable resource to bind parameter “” (times 12 or so)

    so when i run the example and press G the model stays in bind-pose.
    (normals still animate offcourse, since they are done on the CPU)

    (Ati Radeon HD 4870)

    • Did you install the Cg toolkit from the nVidia site? Are you running directly from Visual Studio?
      Can you send me the exact output from the console when you run the program.

      I tested this demo on 2 PC’s:
      – Laptop with nVidia GeForce GT 330M
      – Desktop with nVidia GeForce 7600 GS (I think)
      But I didn’t have any problems running the demo on either PC.

  2. I have the same problem as Daniel, I have debugged a bit and it goes wrong in the file

    Technique.cpp at
    line m_bIsValid = ( m_cgTechnique != NULL ) && ( cgValidateTechnique(m_cgTechnique) == CG_TRUE );

    It print’s out the following error:

    Resources/shaders/C6W5_skin4m.cgfx(54) : error C5041: cannot locate suitable resource for bind paramter “”


    Resources/shaders/C6W5_skin4m.cgfx(55) : error C5041: cannot locate suitable resource for bind paramter “modelViewProj”

    I suspect it is something with the bone array since when i put the modelViewProj parameter above the bone paramter the modelViewProj error disapears

    Also running on ATI HD 4800 Series.

    I gues ill start looking on the internet for solutions, or maybe the CG examples.

  3. Alright. I tried something in the shader, i reduced the amount of bones to about 20, and then it runs. But ofcourse 20 bones is not enough.

    When running with 30 bones i get the error:

    Error c6007 constant register limit exceeded; more then 96 constant registers needed to compile program

  4. I gues i can work with just 20 bones on my computer and when i hand in my homework i will increase it, or does someone have a model with 20 bones and multiple animations. Because i want to do the animation blending

    • You didn’t modify the technique at all? I know older vertex programs support a maximum of 96 4-component floating point constants. That’s only 24 4×4 matrices. The vertex program in the skinning example uses 58 4×4 matrices.

      The example from the Cg tutorial book defines an array of 72 4-component floating point constants and creates a 3×4 matrix for each bone. That limits the skeleton to a maximum of 24 bones. The animated character I was using has 33 joints so that’s a minimum of 33 matrices (or a minimum of 99 4-component floats) so this is already over the vs_1_1 constant register limitation.

      I assume that most people have graphics adapter with Shader Model 3 (equivalent to DirectX9) or better for which the number constant registers is something like 256 4-component floating point constants.

      I’ve reduced the number of bones in the shader to 32 (#define MAX_BONES 32) and the animation still works fine. Can you try this in your own environment.

      Doing animation blending will not change the number of joints in the model’s skeleton. The animated skeletons need to be blended on the CPU first into any pose you want, blending bones accordingly then you always pass the same number of bones to the GPU. So it’s a matter of how many bones or joints your model has that determines what the “MAX_BONES” value should be in the shader.

  5. Just wanted to post a general note if you are having problems running the example.

    It seems that CgFX has trouble choosing the latest profile when the “latest” special profile is specified to compile the vertex and fragment programs. If you have trouble running the demo, try changing the vertex profile to “gp4vp” and the fragment profile to “gp4fp” in the “C6E5_skin4m.cgfx” file to see if that fixes the issue.

  6. I did some more research and found out that ATI only supports cg profiles: arbfp1 and arbvp1. You can however compile to glsl using glslv and glslf. 🙂 now it should run on ATI cards 🙂

  7. First of all, great tutorial, I successfully animated my model in my application with it.

    But, I have a problem. From what I understand, this piece of code from MD5Model.cpp :

    // Multiply the animated skeleton joints by the inverse of the bind pose.
    for ( int i = 0; i < m_iNumJoints; ++i )
    m_AnimatedBones[i] = animatedSkeleton[i] * m_InverseBindPose[i];

    is necessary because, in the VBO, the vertices are already positioned in Bind Pose. So we have to multiply by the inverse matrix to "remove" the bind pose from the animation matrix and only have the move from bind pose to the correct position.

    So, my question is, could it be possible to modify the animation data so that we don't have to multiply by the inverse matrix?

    I'm asking because I successfully implemented the algorithm, but as soon as I have more than one character on screen, the framerate drops and according to a profiler, it is because of this matrix multiplication. I tried to find a way to remove it and failed, so I wondered if it was even possible. What do you think?

    • Vincent,

      You could pre-compute (and cache) the bone matrices for each animation frame taking the inverse bind pose into consideration as a pre-process step. This would mean that you would need to store another set of matrices (NumBones * NumAnimationFrames) for each unique model that uses the same animation but has a different rig. So as usual, this optimization is a trade-off of processing power for memory.

      The reason for the multiplication of the inverse bind pose is so that the same animation data can be applied to many different models that have the same skeletal hierarchy but a different shape. So a male model and a female model (whos rigs may be different sizes) can share the same animation data. Their bind pose will make sure the final bone positions are correct.

      If you don’t have different models that share the same animation data, then pre-computing the animation frame with the inverse bind pose might be a good solution for you.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.