ROS + RaspberryPi Camera Module #4: Running ROS master on Jetson TX1 and OpenCV with CUDA enabled
source code: ros_face_detect
OpenCV with CUDA enabled
The current system setup uses a Raspberry Pi 3(Raspi) with Ubuntu 16.04.1 as the
operating system, and ROS, version Kinetic, as the middle ware. The Raspi publishes raw images taken
from a Raspberry Pi Camera Module V2.1 over the /webcam/image_raw/compress
topic provided by the ROS package video_stream_opencv
which subscribes to
nodes running on a Jetson TX1 (Ubuntu 16.04.1, ROS Kinetic).
Despite the Raspi being able to publish raw RGB images at close to 30fps, the performance deteriorates as a result of the computational demands placed on the hardware by the face detection algorithm purely relying on the CPU. The publishing rates after detection was between 3-4hz.
In order to increase the performance, a rewrite of the program to allow for GPU acceleration of the computer vision algorithms heavily reliant on matrix multiplication was required. The code changes were minimal as OpenCV3 offers a user friendly API that allows for easy refactoring, while on the other hand the environment set up took more time to set up unfortunately.
Setting up the environment
Prior to starting this exercise, I had previously installed OpenCV 3 along side
ROS Kinetic which caused some problems. The
problem was that the version included in ROS opencv package did not have cuda
enabled. To resolve this issue a build from source was required. Under the
directory path /usr/local
, following the
instructions outlined at OpenCV with CUDA with
Tegra
, I was able to successfully build and install OpenCV with CUDA enabled.
Despite a successful install, ROS was still having troubles dealing with two versions of OpenCV, and this required some redirection of CMAKE paths and rebuilds of OpenCV dependent ROS packages. The problem and solution is discussed here.
A nice API
The code changes were actually minimal, and the majority is displayed
below. The refactor was just making sure that I was passing the correct matrix
type to the methods defined in cv::cuda
.
cv::Mat img_gray;
cv::cuda::GpuMat img_gray_gpu;
cv::cuda::GpuMat img_cur_gpu;
// Convert Mat to GpuMat
img_gray_gpu.upload(img_gray);
img_cur_gpu.upload(cur_img_);
cv::cuda::cvtColor(img_cur_gpu, img_gray_gpu, CV_BGR2GRAY);
cv::cuda::equalizeHist(img_gray_gpu, img_gray_gpu);
cv::cuda::GpuMat objbuf;
// Find faces in image that are greater than min size (10,10) and store in
// vector<cv::Rect>.
fc_->detectMultiScale(img_gray_gpu, objbuf);
fc_->convert(objbuf, faces_);
std::cout << "Faces detected...: " << faces_.size() << std::endl;
Results
As a result, the publishing rates improved roughly 3x, from 3-4hz to 9-10hz. Considering “real-time” is considered to be somewhere between 10-12hz, this is acceptable. I plan to implement a deep learning driven algorithm next to see how much faster GPU accelerated inference can be in the task of face detection.
###References
- https://docs.opencv.org/master/d6/d15/tutorial_building_tegra_cuda.html
- https://answers.ros.org/question/242376/having-trouble-using-cuda-enabled-opencv-with-kinetic/