ROS + RaspberryPi Camera Module #4: Running ROS master on Jetson TX1 and OpenCV with CUDA enabled

28 Oct 2017 » ROS, tx1, opencv

source code: ros_face_detect

OpenCV with CUDA enabled

The current system setup uses a Raspberry Pi 3(Raspi) with Ubuntu 16.04.1 as the operating system, and ROS, version Kinetic, as the middle ware. The Raspi publishes raw images taken from a Raspberry Pi Camera Module V2.1 over the /webcam/image_raw/compress topic provided by the ROS package video_stream_opencv which subscribes to nodes running on a Jetson TX1 (Ubuntu 16.04.1, ROS Kinetic).

Despite the Raspi being able to publish raw RGB images at close to 30fps, the performance deteriorates as a result of the computational demands placed on the hardware by the face detection algorithm purely relying on the CPU. The publishing rates after detection was between 3-4hz.

In order to increase the performance, a rewrite of the program to allow for GPU acceleration of the computer vision algorithms heavily reliant on matrix multiplication was required. The code changes were minimal as OpenCV3 offers a user friendly API that allows for easy refactoring, while on the other hand the environment set up took more time to set up unfortunately.

Setting up the environment

Prior to starting this exercise, I had previously installed OpenCV 3 along side ROS Kinetic which caused some problems. The problem was that the version included in ROS opencv package did not have cuda enabled. To resolve this issue a build from source was required. Under the directory path /usr/local , following the instructions outlined at OpenCV with CUDA with Tegra , I was able to successfully build and install OpenCV with CUDA enabled.

Despite a successful install, ROS was still having troubles dealing with two versions of OpenCV, and this required some redirection of CMAKE paths and rebuilds of OpenCV dependent ROS packages. The problem and solution is discussed here.

A nice API

The code changes were actually minimal, and the majority is displayed below. The refactor was just making sure that I was passing the correct matrix type to the methods defined in cv::cuda .

cv::Mat img_gray;
cv::cuda::GpuMat img_gray_gpu;
cv::cuda::GpuMat img_cur_gpu;

// Convert Mat to GpuMat

cv::cuda::cvtColor(img_cur_gpu, img_gray_gpu, CV_BGR2GRAY);
cv::cuda::equalizeHist(img_gray_gpu, img_gray_gpu);

cv::cuda::GpuMat objbuf;
// Find faces in image that are greater than min size (10,10) and store in
// vector<cv::Rect>.
fc_->detectMultiScale(img_gray_gpu, objbuf);
fc_->convert(objbuf, faces_);
std::cout << "Faces detected...: " << faces_.size() << std::endl;


As a result, the publishing rates improved roughly 3x, from 3-4hz to 9-10hz. Considering “real-time” is considered to be somewhere between 10-12hz, this is acceptable. I plan to implement a deep learning driven algorithm next to see how much faster GPU accelerated inference can be in the task of face detection.


