Friday, August 12, 2011

GPU-based Image Processing on Tegra 2 and Microsoft Kinect

In this tutorial, we will focus on the real-time image processing of the depth data and RGB data we collect from the Microsoft Kinect (see last tutorial if you want to configure your Tegra 2 to interface with the Kinect). In particular, we will show you how we can write custom fragment shader programs that will perform the image transformation (i.e., mapping the color map back to the depth map data) using the GPUs on the Tegra 2.

When we mention GPGPU (General-Purpose computation on Graphics Processing Units) nowadays, one may immediately think of CUDA C/C++, ATI Streams, or OpenCL.  In fact, prior to these tools (about 7 years ago), people have already invented ways to utilize GPUs for computing demanding tasks.

To harness the GPU processing power on the Tegra 2, we will first introduce vertex shader and fragment shader programs in OpenGL ES2.

OpenGL ES2 and Shader Programs: 
The Nvidia Development Kit (NDK) provides a easy-to-use shader wrapper (nv_shader.cpp) to interface with the graphics cards. The idea behind GPGPU is simple, we basically create a texture (a piece of memory on the GPU, either shared with the CPU or not). Then, we write a custom vertex shader and a fragment shader to perform image transformation on that texture. Similar to the CUDA's kernel calls, the fragment programs will execute in parallel and provide us a reasonable speedup if we can utilize it properly. Of course, all these come with limitations that we will explore next.

RGB+D? (Simple calibration on GPU)
The Kinect RGB camera and depth camera are physically located at two different places.  In addition, the depth camera and rgb camera has different focal length, principle point, etc... Explaining the underlying camera calibration methods, camera models, and is beyond the scope of this tutorial. Feel free to check out the work from others below.

Important: You may need to run this and extract the intrinsic and extrinsic parameters of your Kinect. Just update the parameters in our fragment program accordingly.

Fragment Shader for Mapping RGB values to depth map
Fragment program is basically a small program that runs on the graphics card on a per-pixel basis. Here is the fragment program that we have created to map the RGB (640x480) color image back to the depth map (640x480). Notice that we have stored the depth value in the alpha channel of the RGBA texture. The detail of the calibration is shown below. The source code shall be straightforward to others who have done GPU programming.

precision mediump float;
varying vec2 v_texCoord;  
uniform sampler2D s_texture;

//my own calibrated data.. replace the following with your own.
//xxx_rgb - parameters for the color camera
//xxx_d - parameters for the depth camera
const float fx_rgb = 5.0995627581871054e+02;
const float fy_rgb = 5.1009672745532589e+02;
const float cx_rgb = 3.2034728823678967e+02;
const float cy_rgb = 2.5959956434787728e+02;

const float fx_d = 5.5879981950414015e+02;
const float fy_d = 5.5874227168094478e+02;
const float cx_d = 3.1844162327317980e+02;
const float cy_d = 2.4574257294583529e+02;

//Size of the image, use to transform from texture coord to image coord..
vec2 img_coord = vec2(640.0,480.0);

//Rotation and translation matrix
vec3 T = vec3(2.7127130138943419e-02, -1.0041314603411674e-03,-5.6746227781378283e-03);
vec3 R1 = vec3(9.9996078957902945e-01, -8.5633968850082568e-03,-2.2555571980713987e-03);
vec3 R2 = vec3(8.5885385454046812e-03, 9.9989832404109968e-01, 1.1383258999693677e-02);
vec3 R3 = vec3(2.1578484974712269e-03, -1.1402184597253283e-02, 9.9993266467111286e-01);

void main()
        //get the depth data from the texture.
 float depth = texture2D(s_texture, v_texCoord).w; 
 if(depth == 0.0 || depth == 1.0){
  gl_FragColor = vec4(0, 0, 0, 1);
        //transform to image coordinate first, texture coord is from 0 to 1
 float x_d = (v_texCoord.x)*img_coord.x;
 float y_d = (v_texCoord.y)*img_coord.y;
        vec3 P3D;
 vec3 P3D_1;
 vec2 P2D_rgb;
 float real_depth = (2.0 * depth)+0.35;
        //this should be in metric 3D space (world coordinate)
        P3D.x = real_depth * (x_d - cx_d) / fx_d;
 P3D.y = -real_depth * (y_d - cy_d) / fy_d; //negative because +y is up.
 P3D.z = real_depth;

 //transform this then project to the camera using the extrinsic parameters.
 P3D_1 = vec3(dot(R1, P3D)-T.x, dot(R2, P3D)-T.y, dot(R3, P3D)-T.z); 
 //now we map this back to the image using the intrinsic parameters of the color camera
 float P3D_1_1 = 1.0 / P3D_1.z;
 P2D_rgb.x = (P3D_1.x * fx_rgb * P3D_1_1) + cx_rgb;
 P2D_rgb.y = -(P3D_1.y * fy_rgb * P3D_1_1) + cy_rgb; //negative for the +y is down 
 //transform back to texture coordinate 
 P2D_rgb = P2D_rgb/img_coord;
 //extract the RGB value, linearly interpolated, 
        //and store the final result (1.0-depth inverted the result such that objects are brighter if they are closer to the camera)

 gl_FragColor=vec4(1.0, 1.0, 1.0, 0)*texture2D(s_texture, P2D_rgb)+vec4(0,0,0,1.0-depth);

I/Render Loop:(  842): Display loop 0.029735 (s)
I/Render Loop:(  842): Display loop 0.031630 (s)
I/Render Loop:(  842): Display loop 0.032471 (s)
~ about 0.03 ms per frame!
It takes about 0.03ms to perform such computation on the GPU. This includes the time requires to update the texture, to perform rgb to depth mapping which includes pixel interpolation, and to display the result on screen. Pretty impressive! What's next? We will now ready to render these in 3D, and perhaps allow user to change the perspective using the touchscreen.

Source Code:
svn co multitouch

Demo Video:
Non-calibrated RGBD (using fragment program to adjust the brightness of the pixel). Notice the misalignment.

With the proper calibration, now the RGB (color) image will map to the depth image.

Special Thanks:
James Fung, Nvidia Technology Development for supplying the Ventana Development Kit.

Related Articles:
OpenGL ES2 Tutorial
Camera Matrix Tutorial
To be continued....

Wednesday, July 27, 2011

HOWTO: Using Microsoft Kinect on Tegra Ventana (Android 3.0)

  In this tutorial, we will show you how to write a native Android application (NDK) that uses Microsoft Kinect on the Tegra Ventana (Android 3.0) development kit. Although we have not verified that our setup will run on other platforms, the process we describe below shall be easily ported to other Linux based devices. To achieve real-time performance, we have used OpenGL ES2 to render the depth data as a 2D texture (see demo video), thus reduced the overheads in transferring and re-rendering the frames over the Java layer. Since the data capturing and rendering engine are fully multi-threaded, our approach can utilize the multi-core on the Tegra 2 platform.

Environment Setup (Ubuntu/Debian):
To get started, first you need to install Nvidia Tegra Android Development Kit on your machine. In our setup, we have used the Ubuntu-Linux (64-bit) version.

The setup shall be straight forward.
chmod +x
My default installation path is ~/NVPACK. Please make sure you flash the Ventana board with Android 3.0 (if you haven't done so before) at the end of the installation process. Also, please do backup the data before doing so.

Now once the setup is completed, you shall have the android ndk, eclipse, and TDK sample code, etc... installed.

raymondlo84@ealab_corei7:~/NVPACK$ ls -C
android-ndk-r5c    android-sdk-linux_x86  eclipse             oprofile    TDK_Samples
Android_OS_Images  apache-ant-1.8.2       nvsample_workspace  readme.txt  uninstall

Now to verify if we have setup everything correctly, we will run eclipse (the one in NVPACK directory), compile the source code (use Build Projects or Ctrl+B), and run the multitouch source code on the device (right click on the project name, Run, and Run As Application). Now you can touch the screen and see your fingers (up to 10) being tracked in real-time. Pretty amazing.

Important: Please make sure you uninstall the multitouch application after testing. Else, it will conflict with our current application.

Hardware setup:
At this step, we shall be able to run the sample code on the Ventana and confirm that the development environment is setup properly. If not, please check if the Ventana is connected to the PC properly.

The easiest way to know is to run
$adb shell
This shall bring you to the shell on the Ventana.

Next, we plug the Microsoft Kinect to the Ventana's USB port. To check if the camera is detected properly, we can check the dmesg with the following commands. ($ - bash shell, # the shell on the Ventana).

$adb shell
Then, we shall see the device is detected and mounted. Notice that Kinect is actually recognized as multiple devices: Xbox NUI Camera and Xbox NUI Audio. Looks like the Microsoft Kinect has a usb hub internally.

<6>[70990.661082] usb 2-1.1: new high speed USB device using tegra-ehci and address 9
<6>[70990.696832] usb 2-1.1: New USB device found, idVendor=045e, idProduct=02ad
<6>[70990.703809] usb 2-1.1: New USB device strings: Mfr=1, Product=2, SerialNumber=4
<6>[70990.711490] usb 2-1.1: Product: Xbox NUI Audio
<6>[70990.716013] usb 2-1.1: Manufacturer: Microsoft
<6>[70990.720483] usb 2-1.1: SerialNumber: A44887C10800045A
<6>[70992.181268] usb 2-1.3: new high speed USB device using tegra-ehci and address 10
<6>[70992.224649] usb 2-1.3: New USB device found, idVendor=045e, idProduct=02ae
<6>[70992.240886] usb 2-1.3: New USB device strings: Mfr=2, Product=1, SerialNumber=3
<6>[70992.256521] usb 2-1.3: Product: Xbox NUI Camera
<6>[70992.274979] usb 2-1.3: Manufacturer: Microsoft
<6>[70992.280051] usb 2-1.3: SerialNumber: A00367A07065045A
#ls /dev/bus/usb/002/*
To override the permission problem temporarily, we can run the following commands. (IMPORTANT: We have to run this command every time we restart the machine, unplug the Kinect, or if the device goes to sleep! oh well!)

$adb shell
#chmod -R 777 /dev/bus/usb/002/
Once we have confirmed that the Kinect is detected successfully, we shall then replace the multitouch source code in the ~/NVPACK/TDK_Samples/Android_NVIDIA_samples_2_20110315/apps directory with the one we have provided below.

Now, we will go back to eclipse, and then refresh the project. (click on the project folder in eclipse and then press F5). Again. #chmod -R 777 /dev/bus/usb/002/ if you haven't done so, or you will see a blank screen). Rebuild and run!

If everything goes well, we shall have the application running like the following video. To change the tilt angle on the Kinect, we can simply use the touchscreen as shown below.

Code Structure and Optimization:
In this section, we will explain the structure of the source code, the optimization steps and customization that we have made to make the code runs as efficient as possible.

Figure 1. The source code structure of our demo application. 

Instead of recompiling the libfreenect and the libusb libraries from external sources (of course we can do that with static linked library approach or so), in this tutorial we provide a complete source tree, which includes the libfreenect and libusb libraries. (Note: Free feel to contact us if we shall not include this in our package.).

As we can see from Figure 1, the structure of the source code is fairly simple.
  • multi.cpp - the main code which handles OpenGL rendering, key/touchscreen events, and other logics (adopted from the TDK sample code).
  • kinect.cpp -  a wrapper for the kinect driver, convert depth map to RGB and handles other callback functions from libfreenect (adopted from the libfreenect sample code)
  • libusb/* - the libusb source code for the USB interface.
  • libfreenect/* - the libfreenect source code which interfaces with the Microsoft Kinect.


The runtime of the rendering loop is ~16ms, which translates to ~60fps. The key bottleneck of the algorithm is the texture loading step which costs about ~14ms to perform.
    struct timeval start, end;
    double t1,t2;
    static double elapsed_sec=0;
    static int count=0;
    gettimeofday(&start, NULL);
    gettimeofday(&end, NULL);
    elapsed_sec += (t2-t1);
    char buf[512];
    sprintf(buf, "Display loop %f (s)\n", (elapsed_sec)/100.0);
__android_log_write(ANDROID_LOG_INFO, "Render Loop:", buf);
 Total run time (averaged 100 trials) :

I/Render Loop:( 4026): Display loop 0.016168 (s)

Tegra Android Development Pack

Tested Platform:
Tegra Ventana Development Kit (from Nvidia)
Ubuntu 10.04.2 (64 bits)

Source Code:


svn co multitouch

for the latest source.

Other Demo video:

Blind navigation with a wearable range camera and vibrotactile helment:

This work is accepted and will be published in the proceeding of ACM Multimedia 2011 (ACMM2011).

See: and for a list of our publications.

Known Issues:
1. The application will crash when we change the orientation of the device.
2. The application does not wake up properly if it were sent to the background.

Special Thanks:
James Fung, Nvidia Technology Development for supplying the Ventana Development Kit.

... to be continued.