personal notes (raw & unedited)

Notes on working with D435 realsense, OpenVino, ROS 2

Mon, 03 Aug 2020 00:00:00 +0000

WIP

Notes on working with D435 realsense, OpenVino, ROS 2.


Camera Model:	D435
librealsense Version:	v2.16.5
Firmware Versions:	05.12.06.00
OS & Version:	Ubuntu 18.04.4 LTS
Kernel Version:	4.15.0-112-generic

ROS Distro: Dashing note: lsb_release outputs LTS but MATE was installed on the NUC.

Goal: To publish 2d & 3d images from a D435 to a ros topic that would be consumed by an OpenVino machine vision pipeline with intentions to use with a Turtlebot2.

Status/Results: While I have been successful in working with 2d images, 3d is yet to work. [3/Aug/2020]

$nuc@nuc: ros2 run realsense_ros2_camera realsense_ros2_camera
$remote_pc@remote_pc: ros2 launch pipeline_object_topic.launch.py

Remember to update the yaml file in /params that the launch file loads with a path to the model, and associated labels.

FPS: The fps is very unstable having a range of 4-15fps for 2D object detection when using the D435. The fps is unstable especially when compared with the IpCamera input where the system was much more stable.

Getting the Intel Realsense camera D435 to work has been more involved then I thought, and a review of issues at the official repository suggests difficulties in integrating the device has not been as straightforward for others as well.

Working toward a solution involved the following steps:

Downgrading kernel version to 4.15* after trying 5.3* unsuccessfully.
Update firmware.
Run patches and install dependencies.
Moving everything to work with Dashing and installing ros2 packages related to realsense.
After device side was resolved to an acceptable point, remapping the camera topic to appropriate INPUT_TOPIC defined in the source was required.

Follow up items

Calibrate IMU: https://github.com/IntelRealSense/librealsense/issues/5166
Try with RealSenseCamera input in place of RealSenseCameraTopic by running OpenVino pipeline on the NUC.
Debug below warnings when running realsense-viewer with 3D.

 03/08 15:01:39,022 WARNING [139875290420992] (sensor.cpp:338) Unregistered Media formats : [ UYVY ]; Supported: [ ]
 03/08 15:02:04,803 WARNING [139875103786752] (backend-v4l2.cpp:1013) Frames didn't arrived within 5 seconds
 03/08 15:02:09,809 WARNING [139875103786752] (backend-v4l2.cpp:1013) Frames didn't arrived within 5 seconds
 03/08 15:02:14,814 WARNING [139875103786752] (backend-v4l2.cpp:1013) Frames didn't arrived within 5 seconds

Command line reminder:

lsusb -v
usb-devices
uname -r
lsb_release -a
dmesg
rs-fw-update -l
v4l2-ctl --list-device
realsense-viewer
ros2 topic list -t

References

https://kowalczyk.me/change-default-kernel-in-ubuntu-18-04/ https://github.com/IntelRealSense/librealsense/blob/master/doc/distribution_linux.md https://github.com/IntelRealSense/librealsense/issues/1225 https://github.com/IntelRealSense/librealsense/issues/5598 https://github.com/intel/ros2_intel_realsense https://dev.intelrealsense.com/docs/firmware-update-tool https://qiita.com/t_kumazawa/items/eb3a60f0ca1fbca70bee https://naonaorange.hatenablog.com/entry/2018/11/04/174745

Using RPLidar A2 with Turtlebot 2 running ROS Melodic with a Kobuki base

Sat, 11 Jul 2020 00:00:00 +0000

WORK IN PROGRESS Last Edited: 18/7/2020

The objective of this note is to document my steps in getting a Turtlebot2 with a Kobuki base localizing on a static map so that I can reference at a future date. The static map was created using the slam_toolbox package, and laser scans from a RPLidar A2 mounted on top of a Turtlebot2. I use the AMCL package for localization, as I haven’t been successful in getting the localization mode to work with the slam_toolbox package at the time of this writing. (11th/7/2020) I’ve placed all necessary launch files and robot descriptions at the following repo.

Robot details

Constructing the Turtlebot 2 to run freely is not cheap as you can see from the list of hardware.

		Cost
Robot	Turtlebot2 w/ Kobuki base	$1049.00
Lidar	RPLidar A2M8	$449.00
Computer	NUC8i7BEH	$729.99
Power	Portable Charger RAVPower 27000mAh 85W(100W Max)	$129.00
Controller	PS3 Controller	$30.99
Total:		$2387.98

Setup

Install the necessary packages into your ROS workspace as turtlebot packages to support Turtlebot2 have not been released to Melodic and is a work in progress (22/6/2020). [1]

sudo apt install ros-melodic-laptop-battery-monitor
sudo apt install ros-melodic-openni2-camera
sudo apt install ros-melodic-openni2-launch
sudo apt install ros-melodic-gmapping
sudo apt install ros-melodic-yocs-msgs
sudo apt install ros-melodic-yujin-ocs

mkdir -p ros_ws/src
cd ros_ws/src
git clone https://github.com/turtlebot/turtlebot.git

cd turtlebot
git clone https://github.com/turtlebot/turtlebot_interactions.git
git clone https://github.com/turtlebot/turtlebot_apps
git clone https://github.com/turtlebot/turtlebot_msgs

cd ~/row_ws/src
git clone --single-branch --branch melodic https://github.com/yujinrobot/kobuki.git

### To work with this guide clone the turtlebot2_lidar repo.
git clone https://github.com/surfertas/turtlebot2_lidar.git

If you take a look at the turtlebot2_lidar repo, you can see that I have created a URDF representation of the lidar device and placed the file in the urdf/sensors directory, e.g. in this case the file name is rplidar_a2.urdf.xacro. The specifications of the RPLidar A2 were taken from the [device data sheet]. Add the device to the robot description. You can use Onshape to design a mount or feel free to use mine for the RPLidar A2. The Turtlebot data sheet comes in handy as well. [2]

Steps to create a map

Place your turtlebot on the power dock. It is recommended that the dock is fixed to a specific location free from obstruction. Boot up the NUC powered by the portable charger.
You will need to ssh from 4 panels, for 1) bringup 2) slam_toolbox 3) rviz 4) teleop. For panel 3) add the argument -X when using ssh to allow for forwarding. [3] Forwarding will be needed when using RVIZ from your remote pc.
Source the necessary environment from your working directory. Run source /opt/ros/melodic/setup.bash && source devel/setup.bash.
Next on panel 1), check which port the lidar is connected to by executing the following command ls -l /dev | grep ttyUSB. In my case, the lidar was required to be connected to ttyUSB0 as ttyUSB1 was being used by the connection to the Turtlebot. This paramater is set as a environment variable when your run the environment script. Once the NUC is powered on, unplug any USB cables, and make sure that the lidar is connected first, so ttyUSB0 is assigned to the lidar.
Change permissions of the port by executing sudo chmod 666 /dev/ttyUSB0
Run roslaunch turtlebot2_lidar bringup_minimal.launch in panel 1). At this point you should see the lidar start to spin.
On panel 2) run roslaunch turtlbot2_lidar slam_toolbox_lidar.launch. I had to add a static_transform_publisher to the launch file publish a transform from the base_laser_link to laser. Note that the base_laser_link was defined in urdf/sensors/rplidar_a2.urdf.xacro.
On panel 3) start up RVIZ by running roslaunch turtlebot_rviz_launchers view_navigation.launch which will kick off RVIZ on your remote laptop. You should see a top down visual of the Turtlebot with scans picking up some nearby obstacles. Get the slam_toolbox panel open in rviz by selecting from the top left menu: Panels->Add New Panel-> slam_toolbox->SlamToolboxPlugin.
Finally on panel 4) run roslaunch turtlebot_teleop ps3_teleop.launch. Make sure that the ps3 controller has been synced with the NUC. Steps to sync can be found here if you are having trouble. [4]
You are ready to create a map! Teleop around and continue until you are satisfied with the generated map.
To save the map we need to start by inputting <path_to_directory>/<name> to save map and also for serialization (to be used with slam_toolbox localization mode). Note that the name doesn’t have the file type, just the prefix. (e.g. if you want files like map.pgm and map.yaml, use <path_to_directory>/map). If no path is specified, the files will be placed in ~/.ros.
Now you should have the necessary files needed for localization.

Steps to localize using the new created map and the AMCL package

Open a terminal panel and ssh into the NUC. Go to the root of the workspace directory and source the ROS environment. source /opt/ros/melodic/setup.bash && source devel/setup.bash
Set the environment variable TURTLEBOT_MAP_FILE in turtlebot_env.sh with the path to the .yaml file that was created in the previous map creation process. Execute the bash file to update necessary environment variables. . turtlebot_env.sh
Launch bringup_minimal.launch to boot up the physical turtlebot. Make sure that the TURTLEBOT_SERIAL_PORT is set to /dev/ttyUSB1 or whatever port the turtlebot is connected to.
Launch RVIZ. Follow the aforementioned steps from launching RVIZ when creating a map.
Before launching the AMCL launch file, set the argument defaults for initial_pose_x, initial_pose_y,initial_pose_a with the position data obtained from the starting position in the mapping exercise (used the dock in my case).
Launch rplidar_a2_amcl.launch. A couple things to point out here. First, I am using the default amcl.launch.xml file. This hasn’t been tuned for the RPLidar A2 specifically. Secondly, I’ve added the static_transform_publisher to define the transform between base_laser_linkand laser here as well, in addition to defining a parameter named frame_id with value set to map which is passed to the map_server. Without these 2 steps I was getting transform errors. Use rosrun rqt_tf_tree rqt_tf_tree to see if correct transforms are being published. In this case you the expected transforms are map->odom->base_footprint-> base_link

If the laser scan isn’t really lining up with the map as you would expect, use 2D Pose Estimate to publish a pose estimation. Repeat until the scan matches as expected.
Once the scans match up, you are ready to navigate. In RVIZ, use 2D Nav Goal to set a target goal and watch your turtlebot go.

Localization with slam_toolbox (INCOMPLETE)

Repeat steps 1.-4. of localization with AMCL package.
We need to modify the params yaml file that is passed to the slam_toolbox package. The original config dir is located at slam_toolbox/config. I moved the config directory under turltebot2_lidar for easy modification. The file that needs to be modified is mapper_params_online_sync.yaml.
Change the mode parameter to localization from mapping. Further you can set the map_file_name, map_start_pose, map_start_at_dock. Note for map_file_name don’t include the file type. (e.g. /map.posegraph would just be /map)

TODO: debug Failed to compute laser pose, aborting continue mapping (“laser” passed to lookupTransform argument source_frame does not exist. ) Tried publishing the transform via different launch file but still doesnt work but Removing the / from /laser in the static_transform_publish made it so laser was recognized, but now getting different issue Failed to compute laser pose, aborting continue mapping (Lookup would require extrapolation at time 1593875399.670327337, but only time 1593875399.833934289 is in the buffer, when looking up transform from frame [laser] to frame [base_footprint])

References

https://github.com/turtlebot/turtlebot/issues/272
https://clearpathrobotics.com/turtlebot-2-open-source-robot/
https://superuser.com/questions/310197/how-do-i-fix-a-cannot-open-display-error-when-opening-an-x-program-after-sshi
http://surfertas.github.io/amr/deeplearning/machinelearning/2018/12/22/amr-3.html

Turtlepi: from ROS indigo to ROS melodic

Wed, 11 Mar 2020 00:00:00 +0000

Just went the through the exercise of updating a small ROS project I worked on about 3 years ago from ROS indigo to the latest ROS melodic.

Original 2017 post: Turtlepi #7: Automatic Target Generation for the Turtlebot

The port was rather straight forward. The main issue was that Turtlebot 2 has yet to be included in the official melodic release. [1,2] Using the scripts put together as a result of community discussions [3] with minor modifications for my use case, helped a lot.

Working through old code is always enjoyable, and humbling. I was under the belief that I had shared a repository that was complete and working. Working through the 3 year old repository indicated otherwise.

https://github.com/turtlebot/turtlebot/issues/272
https://answers.ros.org/question/294600/ros-melodic-does-it-support-turtlebot2/?answer=294603#post-id-294603
https://github.com/gaunthan/Turtlebot2-On-Melodic

Korabo: Getting users is tough

Fri, 14 Feb 2020 00:00:00 +0000

Intro

I spent the last couple months developing and deploying a progressive webapp, partially because I thought I had a ground breaking idea, and partially to sharpen my web development skills. The fully functioning web app can be found at: https://www.korabo.io. Note that the application is intended for users with a US bank account.

Building and deploying a functioning MVP/prototype is relatively easy when compared to attracting users.

The fact that I am having extreme amount of difficulty in getting users to even sign up raises flags for the viability of this project.

Context

Problem:

The problem being addressed is the hassles of splitting proceeds after the fact with collaborators. This idea came about as I watched my wife try to split proceeds from a yoga workshop she hosted with a few other instructors.

Solution:

The solution consists of setting the percentage share allocation prior to the service being delivered, and splitting proceeds on a per-payment basis. The application allows the user to specify percentage splits and create a customized checkout (e.g. video message) to share with customers via different types of links (url, QRcode), which directs customers to a credit card checkout powered by Stripe.

An example of a link to a checkout is a customized shield as the one shown below.

Tools & Services

		Running Costs
Front-end	materialUI , ReactJs	free
Back-end	node, express, MongoDB	free
Payments	Stripe	*free
Deployment	Heroku	7$ /month
Versioning	GitHub	free
CI	CircleCI	free
Testing	Jest	free
Legal	iubenda	$25/month
Admin	gsuite, trello	$6/month
Total:		$38/month

Stripe charges 2.9%+$0.3 per payment, additional 1% for international cards, and costs associated for maintaining and handling payouts for connected accounts.

Steps Taken to Attract Users

I officially pushed master to production on Jan 18th, so its been less then a month since I launched at the time of this writing. The net results of my efforts thus far is a total of 7 sign ups including myself, of which 3/7 are complete strangers. Out of the 3 strangers, 1/3 of the strangers had fully completed the on-boarding process, which includes a stripe account set up. See the Google Analytics results since releasing.

Jan 18th : Reached out to 2 family/friends. 2/2 signed up, but both stopped at the Stripe account on-boarding step.

Jan 21st: Posted on some GT MSCS slack channels. The post was mostly ignored, but I did receive one inquiry about what makes this app different from Venmo or Cushion. The action resulted in 0 sign ups.

Jan 28th : Responded to Ask HN: “What interesting problems are you working on?” which resulted in a few page views and 2 sign ups. My reply got 3 up-votes, exciting for someone relatively new to the community.

Jan 29th: Reached out to 2 “friends” and prior work colleagues which resulted in 0/2 sign ups. The exercise was disappointing for different reasons, thus the “” around friends.

Feb 4th: Posted an inquiry on reddit for design jobs as I needed proper logo design work to be done.

Feb 5th: Replied to a separate AskHN where a member was soliciting for logo design work coincidentally.

Feb 6th: Posted a Job posting on UpWork for Logo Design work. This resulted in 48 views on UpWork, and 2 proposals. I wasn’t anticipating any sign ups, but the post resulted in 0 sign ups and 0 completed jobs as the professionals associated with the proposal weren’t a fit with the project.

Feb 7th: Used the Show HN functionality. Probably the most disappointing result against my personally set expectations (clearly misplaced and high), as I was expecting some sort of response in the form of comments or votes! Phrases like reality is harsh, and the truth hurts really hits home here. That said, the exercise did result in an uptick in page views albeit off a very low base. Final result: 0 sign ups.

Feb 9th: Created a twitter account and started to follow members that used relevant hash tags, and members with what I viewed as engaging in a relevant profession. 0 sign ups.

Summary

In reality its likely too soon to make any conclusions as its been only less then a month, but the preliminary results feels disappointing to say the least.

Autonomous Intelligent Systems #3: Robot Mapping

Sun, 10 Nov 2019 00:00:00 +0000

Continuing the self-study exercise of working through the Robot Mapping course offered by the Autonomous Intelligent Systems lab at the University of Freiburg.

Similar to past related exercises, I am completing the programming tasks in python as opposed to matlab.

GitHub

Before attempting the problem set (sheets) complete the slides+recording on the following topics.

EKF SLAM [slides][recording]

Sheet 4

Exercise 1: Implement the prediction step of the EKF SLAM algorithm.

This was relatively straight forward as task just requires updating the respective indices of the state with the motion model and the covariance with the Jacobian of the motion model. This can be found in prediction_step.py.

Exercise 2: Implement the correction step.

Run main.py which should generate an image for each time step. The figures will be saved to the /plots directory. Next, from the /plots directory run the following command to generate a video from the images.

$ ffmpeg -r 10 -start_number 0 -i 'odom_%d.png' -b 500000  odom.mp4

The output should represent a stream of a robot in motion with the visualization of the landmarks that the robot is sensing at each time step.

A couple things to note:

Possible the jacobian for the observation shown on pg. 39 is incorrect. Specifically the partial derivative of atan2(dy,dx)-mut,theta is -1 based on my derivation as opposed to -q shown on pg. 39. I am in the process of double checking with the author.
The suggested implementation by the assignment was to compute the Kalman gain after the stacked Jacobian of the observation was fully constructed for all observations for that time step. I found that this led to unstable results. Instead, computing the Kalman gain after each observation resulted in more stable results.

TODO: Implement the visualization of the probability ellipses.

ROScon2019: Follow up ToDo list

Wed, 06 Nov 2019 00:00:00 +0000

I was fortunate enough to attend ROSCon2019 held in Macau, China this year. I learned more about ROS and the community then I have in the past X years trying to learn ROS independently out of personal interest. Amazing how many companies there were working on cutting edge technology/applications. Two questions really stuck with me. First, was a question about robotics being driven by Roboticists or Software Engineers. The second, was related to hiring, where questions were posed to the audience asking who was hiring and who was looking. The number of hands raised for hiring significantly outnumbered the hands for looking. That aside always great to put a face to a name.

Raw & unedited todo list

Walk through real time demo and papers shared by Victor at Alias Robotics in his presentation.
- Towards a distributed and real-time framework for robots: Evaluation of ROS 2.0 communications for real-time robotic applications.
- Real-time Linux communications: an evaluation of the Linux communication stack for real-time robotic applications.
- Time-Sensitive Networking for robotics.
DRAFT: Review https://micro-ros.github.io/ and demo
Review Response Time Analysis of ROS2 Processing Chains introduced by Igor at Bosch
Rewatch the Keynote by Ian Sherman at Formant where among other topics discussed bringing solutions found in backend micro services into the robotics space.
Reproduce the results obtained by iRobot engineers in running ROS2 on raspi. [github]
DRAFT Implement some of the suggestions made at “188 ROS bugs later: Where do we go from here?” by Christopher Timperly of CMU and Andrzej Wasowski of IT University of Copenhagen to write better ROS code.
- CI -build passing
- use smoke test to find missing run time
- linters ament_lint
- cppcheck, mypy
- use sanitizers
- mithra: oracle learning for simulation -based testing
- HAROS
- phryky-units
Implement the one thing Nicolo from Cruise Automation wanted audience to remember; the use of ros::TransportHints().tcpNoDelay() and rosbag record –tcpnodelay
Look into Husarnet a P2P network layer for robots and IOTwith first class ROS support.
Since I missed day 2, watch day 2 live streams!

ROS2: Cross compile package for Raspiberry Pi 3 B+

Mon, 14 Oct 2019 00:00:00 +0000

Notes related to cross compiling ros2 packages for a Raspiberry Pi 3 B+ running Ubuntu Mate 18.04.

Context: I am trying to work on a DIY “see through wall” application where I mount an IP camera (e.g. Amcrest) in the family room, and view the image stream from the room next door, ideally keeping the perspective consistent across rooms.

The set up I have in mind consists of an IP camera powered by POE. The ros2_ipcamera node would be running on a raspi in a different room, tasked to retrieve the images from the rtsp uri and publish them for viewing on a monitor or webapp.

The following notes solves the task of getting the ros2_ipcamera running on a raspi.

Alberto, the engineer that put the work into creating the cross-compile GitHub repo (currently a PR to replace the original as of Oct.14, 2019) was extremely helpful in helping me work through my gaps in understanding of the cross-compile process. Thanks!!!

Preliminary Steps

Preliminary steps to be taken on the target system, in this case the raspi.

Flash Ubuntu Mate 18.04 onto a mini SD.
Install ros2-dashing-desktop. Note that base wont work in this particular case as we need some dependencies related to OpenCV.
Install colcon.
Run source /opt/ros/dashing/setup.bash. We can quickly test by checking to see if ros2 topic list works.

Cross Compile

Follow steps outlined here to cross compile.

NOTE: Since I was going to compile ros2_ipcamera directly on the raspi, I needed to transfer the entire install directory as opposed to just the install/lib directory per the example given by Alberto.

Further, since vision_opencv depends on OpenCV I needed to include a line to the original dockerfile. The modified dockerfile can be found here. If a reader is aware of an easier way to install opencv please leave me a note below.

For this particular project, the steps modified for my specific use case were as follows.

On host system:

$ git clone https://github.com/alsora/cross_compile/tree/alsora/refactor_cross_compile
$ mkdir -p ~/ros2_ws/src
$ cd ~/ros2_ws
$ wget https://raw.githubusercontent.com/ros2/ros2/dashing/ros2.repos
$ vcs import src < ros2.repos
$ git clone -b ros2 https://github.com/ros-perception/image_common src/ros-perception/image_common
$ git clone -b ros2 https://github.com/ros-perception/vision_opencv src/ros-perception/vision_opencv

$ cd ~/cross_compile

# Build all the docker images contained in the docker_environments directory.
$ bash build.sh

# Specify environment variable (e.g. ubuntu-arm64, raspbian).
$ source env.sh ubuntu-arm64

$ bash get_sysroot.sh

# Script to add COLCON_IGNORE to directories to be ignored and can modify as necessary.
$ bash ignore_pkgs.sh ~/ros2_cc_ws dashing

$ bash cc_workspace.sh ~/ros2_cc_ws

# Transfer install directory to target system.
$ rsync -avz --progress install/* user@address:~/ros2_install

On target system:

$ source ~/ros2_install/install/setup.bash
$ mkdir -p ~/ros2_ws/src
$ cd ~/ros2_ws/src
$ git clone https://github.com/surfertas/ros2_ipcamera.git
$ cd ..
$ colcon build –symlink-install

Conclusion

Overall this was a great exercise to learn more about ros2, cross compiling, IP cameras, POE, and general software development. That said, I am pretty sure that this is not the best set up for the “see through wall” application as a much simpler solution can be considered. Further, despite being able to compile the package on the raspi, the performance of the node is poor, which is more so the result of sub-optimal decisions made by my self.

Autonomous Intelligent Systems #2: Robot Mapping

Mon, 07 Oct 2019 00:00:00 +0000

Continuing the documentation of a self-study exercise of working through the Robot Mapping course offered by Autonomous Intelligent Systems lab at the University of Freiburg.

Note: sheet 2 was skipped as it was just an example application of bayes theorem.

Before attempting the problem set (sheets) complete the slides + recording.

Sheet 3

Exercise 1.a: Describe briefly the two main steps of the Bayes filter in your own words.

The 2 main steps consists of a predict step and an update step. The predict step propagates a belief of the state and error covariance through a system model to obtain an a priori estimate for the next time step. [1] The update step takes the a priori estimate and corrects/updates with a measurement, incorporating new information to obtain the a posteriori estimate.

Exercise 1.b: Describe briefly the meaning of the following probability density functions.

$p(x_t | u_t, x_{t-1})$ → distribution of the state at time,$t$, given a control action applied from a state at time $t-1$.
$p(z_t | x_t)$ → distribution of a given measurement at time $t$, given the state at time $t$.
$bel(x_t)$ → a priori belief, and the current estimate of the distribution of the state at time $t$.

Exercise 1.c: Specify the distributions that corresponds to the above mentioned 3 terms in the EKF.

$p(x_t | u_t, x_{t-1})$ → The mean is $g(\mu_t, u_{t-1})$, while the covariance is the matrix Q representing the process noise.
$p(z_t | x_t)$ → The mean is $h(\mu_t)$, while the covariance is the matrix R representing the measurement noise.
$bel(x_t)$ → $\mu_t$ the state estimate at previous time step before putting through system model and covariance is $\Sigma_t$.

Exercise 1.d: Explain in a few sentences all of the components of the EKF algorithm.

$\mu_t$ → state estimate at time $t$, (n,1)
$\Sigma_t$ → covariance matrix at time $t$, (n,n)
$\bar{\mu_t}$ → state estimate propagated through time using the process model., (n,1)
$\bar{\Sigma_t}$ → covariance matrix at time $t$ updated with Jacobian, G, and process noise, Q. (n,n)
$g$ → function to update state space based on some motion/process/system model., (n,1)
$G_t^x $ → Jacobian of the motion matrix. (n,n)
$R_t$ → measurement noise, (m,m)
$h$ → measurement function, (m,1)
$H_t^x$ → Jacobian of h, (n,m)
$Q_t$ → process noise, (n,n)
$K_t$ → Kalman gain

Exercise 2.a: Derive the Jacobian matrix $G_t^x $ of the noise-free motion function $g$ with respect to the pose of the robot. Use the odometry motion model as in exercise sheet 1.

Apologies for the sloppy hand writing. Will port to latex at some point

Exercise 2.b: Derive the Jacobian matrix $H_t^i$ of the noise-free sensor function $h$ corresponding to the $i^{th}$ landmark.

ROS2: Quality of Service (QoS)

Sat, 17 Aug 2019 00:00:00 +0000

Quality of Service (QoS)

12/8/2020: Edit to remind about a key concept from the docs with respect to QoS.

A connection between a publisher and a subscription is only made if the pair has compatible QoS profiles.

ROS2 allows for granular management of the quality of service (QoS) by exposing the QoS profile. The profile is a struct defined as follows in types.h.

/// ROS MiddleWare quality of service profile. *
typedef struct RMW_PUBLIC_TYPE rmw_qos_profile_t
{
  enum rmw_qos_history_policy_t history;
  size_t depth;
  enum rmw_qos_reliability_policy_t reliability;
  enum rmw_qos_durability_policy_t durability;
  struct rmw_time_t deadline;
  struct rmw_time_t lifespan;
  enum rmw_qos_liveliness_policy_t liveliness;
  struct rmw_time_t liveliness_lease_duration;
  bool avoid_ros_namespace_conventions;
} rmw_qos_profile_t;

Modified for readability

If the QoS settings are not modified the default settings are applied. For example, the QoS defaults settings for publishers and subscribers are specified by the profile rmw_qos_profile_default.

static const rmw_qos_profile_t rmw_qos_profile_default =
{
  RMW_QOS_POLICY_HISTORY_KEEP_LAST,
  10,
  RMW_QOS_POLICY_RELIABILITY_RELIABLE,
  RMW_QOS_POLICY_DURABILITY_VOLATILE,
  RMW_QOS_DEADLINE_DEFAULT,
  RMW_QOS_LIFESPAN_DEFAULT,
  RMW_QOS_POLICY_LIVELINESS_SYSTEM_DEFAULT,
  RMW_QOS_LIVELINESS_LEASE_DURATION_DEFAULT,
  false
};

This profile isn’t particularly fitting for sensor data where in most use cases the priority is placed on receiving data in a timely fashion as opposed to receiving all the data. Reliability can be sacrificed. Default profiles are available, e.g. rmw_qos_profile_sensor_data.

Example use cases:

//example1.hpp
  rmw_qos_profile_t qos_profile = rmw_qos_profile_sensor_data;

//example1.cpp
auto qos = rclcpp::QoS(
    rclcpp::QoSInitialization(
      qos_profile.history,
      qos_profile.depth
    ),
    qos_profile);

pub_ = create_publisher<sensor_msgs::msg::Image>(topic_, qos);


//example2.hpp
  rclcpp::QoS qos_;

//example2.cpp

RtspStreamer::RtspStreamer(const rclcpp::NodeOptions & options)
: Node("rtsp_streamer", options),
  qos_(rclcpp::QoSInitialization::from_rmw(rmw_qos_profile_sensor_data))
{
  ...
}

QoS signature definition

A more complete example can be found in the cam2image demo.

References

https://index.ros.org/doc/ros2/Concepts/About-Quality-of-Service-Settings/
http://design.ros2.org/articles/qos.html1
https://github.com/ros2/demos/blob/master/image_tools/src/cam2image.cpp
https://github.com/ros2/rmw/blob/master/rmw/include/rmw/qos_profiles.h
https://github.com/ros2/rmw/blob/master/rmw/include/rmw/types.h
https://answers.ros.org/question/298594/how-does-ros2-select-the-default-qos-profile/
https://docs.ros2.org/latest/api/rclcpp/qos_8hpp_source.html

Autonomous Intelligent Systems #1: Robot Mapping

Sat, 01 Jun 2019 00:00:00 +0000

Documenting a self-study exercise of working through the Robot Mapping course offered by the Autonomous Intelligent Systems lab at the University of Freiburg.

The course uses Matlab but as an extra personal challenge I am porting the code to work with Python3 as I proceed forward.

GitHub

Before attempting the problem set (sheets) complete the slides+recording on the following topics.

Course Introduction
Introduction to Robot Mapping
Homogeneous Coordinates

Sheet 1

Exercise 1: Skipping as just an intro to Octave.

Exercise 2: Implement an odometry model.

$ avconv -r 10 -start_number 0 -i 'odom_%d.png' -b 500000  odom.mp4

The output should represent a stream of a robot in motion with the visualization of the landmarks that the robot is sensing at each time step.

Exercise 3:

3.a

v2t() and t2v() are defined here.

Chaining transformations is the result of applying the dot operator to each transformation matrix in order. An example of a composition can found here.

3.b Given two robot poses $x_1$, and $x_2$ how do you get the relative transformation from $x_1$to $x_2$?

3.c Given a robot pose and observation z of a landmark relative to $x_t$ compute the location of the landmark.

We can complete the exercise by converting the robot pose to a homogeneous transformation using the v2t() function and taking the dot product with the homogeneous representation of observation $z$. Note that if we were given the location of the landmark in the world frame, we would need the inverse of the transformed pose $x_t$, to map the landmark location to a coordinate in the pose frame.

Before I forget #1: FFT algorithm

Mon, 13 May 2019 00:00:00 +0000

FFT Algorithm

Intro

A polynomial in coefficient form (e.g. $1+x-x^2+x^5)$ can be converted to point-value form in $O(nlogn)$ time.

Given two polynomials in coefficient form we can convert from coefficient representation which would require $O(n^2)$ time to point-value form through evaluation in $O(nlogn)$ using whats known as the FFT algorithm. Multiplication in point-value form requires $O(n)$ time. Once the multiplication has been complete we can interpolate to retrieve the coefficient representation of the polynomial.

A concrete example is as follows.

Let: \[A(x) = 1 + x + 2x^2\] \[B(x) = 2 + 3x\]

Find the degree of each $A(x)$ and $B(x)$. Let $n=4$ which is equal to the minimum power of 2 that is greater than the $deg(A)$ and the $deg(B)$.
Use the $n$-th roots of unity. In this case we need the $4$-th roots of unity which are $w_4 = <1,i,-1,-i>$
Pass the coefficients of the polynomials of $A(x)$, $a=[1,1,2,0]$ and the 4-th roots of unity, $w_4$ to the FFT algorithm. Point-value form = $FFT(a, w_4)$. Repeat for $B(x)$.
Once we have the results, use element-wise multiplication to obtain the $c$, the point-value representation of $C(x)$. To interpolate take the inverse of $w_4$, $w_4^-1 = <1,-i,-1,i>$, and $c=[20, -5-i, -2, -5+i]$ and pass to $1/n *FFT(c, w_4^-1)$ to obtain the coefficients the $[2,5,7,6]$ which maps to the polynomial $2+5x+7x^2+6x^3$.

FFT: a Divide & Conquer (D&C) approach

A recursive algorithm:

def FFT(a, w):
    if |w| = 1:
        return a

    a_even = even_indices_of_a
    a_odd = odd_indices_of_a

    S = FFT(a_even, w^2)
    O = FFT(a_odd, w^2)

    for i in 0 to n/2:
        r_i = S_i + w_i * O_i
        r_i+n/2 = S_i - w_i * O_i
    return [r_0...r_n]

A concrete example using the polynomials A(x) and B(x) introduced previously. Sorry for the formatting but it will make sense if you stare at it long enough.

LEVEL 1

FFT($a=[1,1,2,0]$, $w_4=[1,i,-1,i]$):

$|w_4|$ $\neq$ $1$ so continue.

a_even $= [1,2]$

a_odd $= [1,0]$

$S$ = $\color{blue}{FFTEVEN}$(a_even$=[1,2]$, $w_4^2=w_2=[1,-1]$) $= [3,-1]$

$O$ = $\color{purple}{FFTODD}$(a_odd$=[1,0], w_4^2=w_2=[1,-1]$) $= [1, 1]$

return $[4, -1+i, 2,-1-i]$

LEVEL 2

$\color{blue}{FFTEVEN}$(a_even$=[1,2], w_4^2=w_2=[1,-1]$)

$|w_2|$ $\neq$ $1$ so continue.

a_even $= [1]$

a_odd $= [2]$

$S$ = $\color{teal}{FFTEVENEVEN}$($a_even=[1]$, $w_2^2=w_1=[1]$) $= 1$

$O$ = $\color{fuchsia}{FFTEVENODD}$($a_odd=[2]$, $w_2^2=w_1=[1]$) $= 2$

return $[1+12=3, 1+ -12=-1]$

$\color{purple}{FFTODD}$(a_odd$=[1,0]$, $w_4^2=w_2=[1,-1]$)

$|w_2|$ $\neq$ $1$ so continue.

a_even $= [1]$

a_odd $= [0]$

$S$ = $\color{red}{FFTODDEVEN}$($a_even=[1], w_2^2=w_1=[1]$) $= 1$

$O$ = $\color{green}{FFTODDODD}$($a_odd=[0], w_2^2=w_1=[1]$) $= 0$

return $[1+10=1, 1+ -10=1]$

LEVEL 3

$\color{teal}{FFTEVENEVEN}$(a_even=$[1]$, $w_2^2=w_1=[1]$)

$|w_1|$ $= 1$ return $1$

$\color{fuchsia}{FFTEVENODD}$(a_odd=$[2]$, $w_2^2=w_1=[1]$)

$|w_1|$ $= 1$ return $2$

$\color{red}{FFTODDEVEN}$(a_even$=[1]$, $w_2^2=w_1=[1]$)

$|w_1|$ $= 1$ return $1$

$\color{green}{FFTODDODD}$(a_odd$=[0]$, $w_2^2=w_1=[1]$)

$|w_1|$ $= 1$ return $0$

Repeating for b, the results of FFT($b, w_4$) $= [5, 2+3i, -1, 2-3i]$

$c = a * b$, element-wise multiplication results in $[20, -5-i, -2,-5+i]$.

To move from point-value representation back to coefficients, we can interpolate by using the same FFT D&C algorithm by passing $c$, and $w_4^-1$ as inputs resulting in the coefficents $[2,5,7,6]$ and a final solution $2+5x+7x^2 + 6x^3$.

A minimal example using cmake to create a c++ shared library.

Wed, 01 May 2019 00:00:00 +0000

1# CMakeLists.txt for PrimUtil a library for utilities related to working with prime numbers.

cmake_minimum_required(VERSION 2.8.9)
project (PrimeUtil)
set(CMAKE_CXX_STANDARD 11)

include_directories(${CMAKE_CURRENT_SOURCE_DIR}/include)
add_library(PrimeUtil SHARED src/primeutil.cpp)

install(TARGETS PrimeUtil DESTINATION /usr/lib)
install(FILES include/primeutil.h DESTINATION include)

2# Make the generator and build from the build directory.

$ cd build
$ cmake ..
-- The C compiler identification is GNU 5.5.0
-- The CXX compiler identification is GNU 5.5.0
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Configuring done
-- Generating done
-- Build files have been written to:
/home/tasuku/workspace/cpp/project_euler/PrimeUtil/build
$ make
Scanning dependencies of target PrimeUtil
[ 50%] Building CXX object CMakeFiles/PrimeUtil.dir/src/primeutil.cpp.o
[100%] Linking CXX shared library libPrimeUtil.so
[100%] Built target PrimeUtil

3# Make the library available system wide. This will install and place in /usr/lib.

$ sudo make install
[100%] Built target PrimeUtil
Install the project...
-- Install configuration: ""
-- Installing: /usr/lib/libPrimeUtil.so
-- Up-to-date: /usr/local/include/primeutil.h

4# CMakeLists.txt for example program (solution to a Project Euler question) that links to PrimeUtil library.

cmake_minimum_required(VERSION 2.8.9)
project (largest_prime_factor)
set(CMAKE_CXX_STANDARD 11)

find_library(PRIMEUTIL_LIB libPrimeUtil.so)
message(STATUS ${PRIMEUTIL_LIB})

add_executable(largest_prime_factor src/main.cpp)
target_link_libraries(largest_prime_factor PRIVATE ${PRIMEUTIL_LIB})

5# Make the generator and build. A binary called largest_prime_factor should be found in the /build and executable.

$ ./largest_prime_factor 5678899999
Largest prime factor is: 6217

References

https://cmake.org/cmake/help/v3.0/manual/cmake-language.7.html#line-comment
http://www.dillonbhuff.com/?p=15
http://derekmolloy.ie/hello-world-introductions-to-cmake/

Autonomous Mobile Robot #4: Using GCP Storage

Sun, 31 Mar 2019 00:00:00 +0000

Series:

I have fallen back to PyTorch (once again). The training environment is still integrated with Google Cloud Platform(GCP).

The current process implemented to get the collected samples to the training environment is notably manual as one can see in the walk through in the coming paragraphs. I plan to automate the process as much as possible on iterations to follow.

A rough outline of the system in place

The raw images from a monocular camera and associated controls are stored on an external SSD connected to the Raspi via a USB cable. At a user specified frequency a pickle file that contains samples (path to image, controls) is saved to the SSD.
Once data collection process is complete the data needs to be transferred to GCP storage. Note that a GCP bucket is required. The following command will transfer the file from the SSD to the specified cloud storage.

$ gsutil -m cp -r  [image_dir] gs://[bucket_name]

It is worth highlighting that using the option -m speeds up the transfer considerably where gsutil help describes the flag as follows.

Causes supported operations (acl ch, acl set, cp, mv, rm, rsync, and setmeta) to run in parallel. This can significantly improve performance if you are performing operations on a large number of files over a reasonably fast network connection.

Next we need to move the training data from GCP storage to the VM instance used for GPU training. We can use gsutil -m cp -r gs://[bucket_name] [data_dir_on_vm_instance].
Once the data has been transferred, we can move to the training environment and generate the gcs.csv file. Go to /data/gcs and run the following command.

$  ls [data_dir_on_vm_instance] >> gcs.csv

Once complete, we can run the script generate_csv_with_url.py and a path_to_data.csv will be generated.

Start the training

Set the training configuration in the config/default.py file. Run python train.py and the training process should kick off.

Autonomous Mobile Robot #3: Pairing with a PS3 Controller for teleop

Sat, 22 Dec 2018 00:00:00 +0000

Series:

I ran into issues trying to pair a PS3 dual shock controller with a Raspberry Pi to be used for training an autonomous mobile robot (e.g. Donkey) running on ROS. Note that I am using Ubuntu Mate as the OS as opposed to Raspbian, the de-facto OS for the Raspberry Pi.

First try the ROS recommended method.

If this does not work try the following:

Start with the PS3 controller disconnected. If you see the PS3 Controller listed as a device remove the device first using the remove command, otherwise you can skip that step.

$ surferta@surfertas: sudo bash
$ root@surfertas:~# bluetoothctl
$ [NEW] Controller B8:27:EB:05:90:2E surfertas [default]
$ [NEW] Device 28:A1:83:4B:3A:30 PLAYSTATION(R)3 Controller
$ [PLAYSTATION(R)3 Controller]# remove 28:A1:83:4B:3A:30
$ [bluetooth]# scan on

Connect PS3 controller to the Raspberry Pi.

$ [PLAYSTATION(R)3 Controller]# power on
$ [bluetooth]# devices

Disconnect the PS3 controller from the Raspberry Pi.

$ [bluetooth]# agent on
$ [bluetooth]# trust 28:A1:83:4B:3A:30

References

https://wiki.debian.org/BluetoothUser

Medical imaging: playing with the ChestXray-14 dataset

Wed, 12 Dec 2018 00:00:00 +0000

I recently had the chance to work with the ChestX-ray14 image data-set [1], consisting of 112,200 frontal X-ray images from 30,805 unique patients and 14 different thoracic disease labels. The dataset is imbalanced (e.g. 60,361 examples associated with “No Findings”). Imbalances appear to be common in the medical imaging domain and has driven research to address the issue via augmentation techniques using GANs most recently. [2]

The classification task is multi-label with each X-ray image labeled with 0 or more diseases, as opposed to a multi-class task where labels are mutually exclusive. One can learn more about multi-label classification in this tutorial.

source code

Objective

The objective of the exercise was to train a number of multi-label classifiers on the entire ChestX-ray14 dataset and compare to results presented in Wang et al. 2017.

Data Analysis

For EDA on the ChestX-ray14 dataset check out good work done in a Kaggle kernel.

For this exercise I used GCP (Google Cloud Platform) for storage and training. Data and meta data can be found here. In addition to storage, appropriate compute was necessary. A GCP Deep Learning VM was used for pre-processing and training. $\textbf{Note}$: dont forget to click “Install NVIDIA GPU driver automatically on first startup?” and also select the appropriate image. (When using Tensorflow you may run into CUDA version issues.) I ended up using an image with CUDA 9.0. Training is done using 1 NVIDIA Tesla P100 and 16 CPUs with 104GB of memory collectively.

Preprocess

Converted string labels, e.g. Effusion Emphysema Infiltration Pneumothorax, to multi-hot encodings.
Converted raw images, and associated labels into TFRecords.
Standardized the images, subtracting mean and dividing by the standard deviation on a per image basis.
Resized images to dimensions 224x224x3.

Architectures

A base case simple CNN.
A pre-trained ResNet-v2-50 used as fixed feature extractor, with outputs fed into 2 fully connected layers. (Backprop only through the FCs)
An ensemble of feature extractors with outputs put through a transition layer before applying the add operator. Resulting vectors are passed through 2 fully connected layers.

Feature extraction was done using pre-trained models found at tensorflow hub.

Evaluation

For evaluation, the AUC ROC metric was used as in Wang et. al. Googles machine-learning crash course does a good job in explaining ROC + AUC ROC. Further, though not applied in this exercise, accuracy measures used for multi-label classification requires a different set of metrics. [4,5,6]

Results

\[\begin{array}{rrr} \hline \textbf{Disease} & \textbf{ResNetv2-50 FE AUC} & \textbf{Ensemble AUC} & \textbf{Wang et. al} \\ \hline Cardiomegaly & 0.6770 & 0.7980 & \textbf{0.8100} \\ Emphysema & 0.7300 & 0.7950 & \textbf{0.8330} \\ Effusion & 0.5710 & 0.6550 & \textbf{0.7585} \\ Hernia & 0.6590 & 0.6930 & \textbf{0.8717} \\ Nodule & 0.7210 & \textbf{0.7510} & 0.6687 \\ Pneumothorax & 0.5210 & 0.6960 & \textbf{0.7993} \\ Atelectasis & 0.6090 & \textbf{0.7920} & 0.7003 \\ Pleural Thickening & 0.6440 & 0.6660 & \textbf{0.6835} \\ Mass & 0.7720 & \textbf{0.8420} & 0.6933 \\ Edema & 0.6390 & 0.6820 & \textbf{0.8052} \\ Consolidation & 0.7630 & \textbf{0.8210} & 0.7032 \\ Infiltration & 0.6130 & \textbf{0.7060} & 0.6614 \\ Fibrosis & 0.6950 & 0.7480 & \textbf{0.7859} \\ Pneumonia & 0.6640 & \textbf{0.7200} & 0.6580 \\ \hline \end{array}\]

ResNet-v2-50 as a feature extractor takes about 50 minutes.
Ensemble of feature extractors takes about 90 minutes, for 10 epochs. Early stopping is used after 5 epochs for results.

Further Studies

Address class imbalance using example-weighted neural network training.
Use data augmentation to increase sample size as well as address class imbalances.
Integrate more features (e.g. age, gender, etc.) as a embedding and concatenate with the encoded images after feature extraction.

Acknowledgements

Training environment was based off code examples found at cs230-stanford. This is one of the better starting points I have come across, in addition to walking through best practices on data pipelines, and reproducibility. Note that build_dataset.py was heavily modified in my use case for use with GCP storage and TFRecords, as well as input_fn.py and train.py to work with the multi-label task.

References

https://nihcc.app.box.com/v/ChestXray-NIHCC/file/256057377774
https://github.com/xinario/awesome-gan-for-medical-imaging
http://lpis.csd.auth.gr/publications/tsoumakas-ijdwm.pdf
https://stats.stackexchange.com/questions/12702/what-are-the-measure-for-accuracy-of-multilabel-data
https://stackoverflow.com/questions/37746670/tensorflow-multi-label-accuracy-calculation
https://towardsdatascience.com/journey-to-the-center-of-multi-label-classification-384c40229bff

Refresher: a few resources covering RNNs, trainable parameters + flops

Wed, 07 Nov 2018 00:00:00 +0000

A few topics/resources that I needed recently as a refresher. Need to summarize at a later date…

RNNs

Improving learning

https://pytorch.org/docs/stable/_modules/torch/nn/modules/normalization.html
http://ceur-ws.org/Vol-2142/paper4.pdf
https://github.com/DingKe/pytorch_workplace/blob/master/rnn/modules.py#L122
https://discuss.pytorch.org/t/proper-way-to-do-gradient-clipping/191/14
https://github.com/yunjey/pytorch-tutorial/blob/master/tutorials/02-intermediate/language_model/main.py
https://forums.fast.ai/t/30-best-practices/12344

Variable RNN

https://pytorch.org/docs/stable/nn.html#torch.nn.utils.rnn.pack_padded_sequence
https://towardsdatascience.com/taming-lstms-variable-sized-mini-batches-and-why-pytorch-is-good-for-your-health-61d35642972e
https://discuss.pytorch.org/t/understanding-pack-padded-sequence-and-pad-packed-sequence/4099/6
https://gist.github.com/Tushar-N/dfca335e370a2bc3bc79876e6270099e

Calculating trainable parameters and flops

Flops

http://machinethink.net/blog/how-fast-is-my-model/
https://stats.stackexchange.com/questions/328926/how-many-parameters-are-in-a-gated-recurrent-unit-gru-recurrent-neural-network
https://petewarden.com/2015/04/20/why-gemm-is-at-the-heart-of-deep-learning/
https://piazza.com/class/jjjilbkqk8m1r4?cid=1063
https://stats.stackexchange.com/questions/291843/how-to-understand-calculate-flops-of-the-neural-network-model

Trainable parameters

https://stackoverflow.com/questions/42786717/how-to-calculate-the-number-of-parameters-for-convolutional-neural-network
https://www.learnopencv.com/number-of-parameters-and-tensor-sizes-in-convolutional-neural-network/
https://stats.stackexchange.com/questions/328926/how-many-parameters-are-in-a-gated-recurrent-unit-gru-recurrent-neural-network

Random

https://documents.epfl.ch/users/f/fl/fleuret/www/dlc/dlc-handout-6-going-deeper.pdf

Protii #3: Integration of Yolov2 Object Detection

Sun, 30 Sep 2018 00:00:00 +0000

Protii #3: Integration of Yolov2 Object Detection

One of the features that I am implementing is object detection and tracking. Ultimately I would like to convert the algorithm from object detection to people detection, classification, and tracking.

As I had already integrated the Yolov2 in a previous exercise, the source code was refactored for this particular project.

Move to ROS Service

At the moment the the detector is subscribing to the stream topic and continuously running the detection algorithm and publishing results. Considering inference is relatively expensive, changing the system to use ROS service or actions likely makes more sense. In other words, run the detection algorithm when a trigger has been fired. The trigger can be an algorithm requiring less compute, or a time based trigger.

Inference is done on the TX2 and I get about 4.6~4.8 FPS, with raw images coming in at about 30 FPS. These stats degrades when measured over the network, and this degradation can be seen in the video below.

Next steps would be to retrain, (transfer learning?) and tune the model to detect humans only, and convert the ROS package to implement detection as a service. Also, probably make sense to update the package to use Yolov3.

Results

The video shows the display that is connected to the Jetson TX2 via a HDMI cable. I access the protii webapp via chrome and viewed through inspection and used the simplescreenrecorder [1] for desktop recording.

References

https://www.ubuntupit.com/15-best-linux-screen-recorder-and-how-to-install-those-on-ubuntu/

Protii #2: RTSP stream to ROS message

Sun, 09 Sep 2018 00:00:00 +0000

Getting a stream of images from a camera to a device for processing is easier said then done, and with plethora of options available selecting the right pipeline is a separate challenge in itself.

For this particular project, after spending hours with the raspi camera module, I settled on using a IP camera to stream images directly to a Jetson TX series SOC, in this case a TX2.

After some web surfing, and looking at options, I settled on the Amcrest series.

Specifically the IP8M-2493EW, technically produced for use outdoors according to the description, but figured indoor use was equally possible.

Image Capture Setup and Pipeline

The setup would consist of streaming media from the IP camera using the real time streaming protocol, RTSP, a ROS node running on the TX2 subscribing to the URI and converting the stream to ROS image type, and publishing the ROS consumable image over a virtual private network where a subscriber sits on the server. (I plan to write on a separate note about how I got the images from the server running on a Digital Ocean droplet to the webapp using socket.io)

In order to accomplish this I needed to find the URI associated with the stream, which sounds trivial but difficult when documentation is sparse, and coding up a ROS package to consume the stream and publish results, with image processing done using openCV3.3.1.

RTSP stream URI

The URI needs to be in the form ‘rtsp://(username):(password)@(ip):(port)/cam/realmonitor?channel=1&subtype=0’ where the variables enclosed in () needs to be populated.

Conversion to ROS message

This URI is then used as the source for the cv::VideoCapture object. The frames are read, resized, and converted to a ROS image of type sensor_msgs::ImagePtr.

Find the full code here.

Summary

The ROS package is still incomplete and just the bare minimum and sufficient for this prototype project. The stream is relatively stable, and I have found no issues with the pipeline running on the TX2 for multiple days. Next steps for this particular component is to optimize for fps and consider security measures.

Protii #1: Documenting my efforts in developing a web based personal surveillance system

Sun, 26 Aug 2018 00:00:00 +0000

This series will be an attempt to document my attempt to build a web based personal surveillance system that integrates technologies from different domains including web based technologies, robotics, and machine learning.

Its an ongoing and incomplete project that has been pushed to production at www.protii.com.

The experience thus far has been rewarding, and an opportunity to reflect on the actual work that goes into developing a system that is functional and useful.

Notes on Bayesian Learning #1: Bayes Nets and Graphical Models

Wed, 11 Apr 2018 00:00:00 +0000

Bayes Nets and Graphical Models
1. What is bayes theorem?
2. Joint distributions are hard to deal with…
3. Bayes nets and graphical models.
4. Markov Blankets
5. D-separation
Mixture Models
1. K-means
2. GMM
3. EM algorithm
Sampling
1. Gibbs sampling
2. Metropolis-Hastings (MH) sampling
3. Hamiltonian Monte Carlo
Marko Chains
1. HMM

What is bayes theorem?

To understand Bayes theorem, lets start with the joint distribution between random variables $Y$, which we will designate as the cause, $X$, as the effect. The variables are binary taking on values of either $0$ or$1$. \[P(X,Y)\]

We can factor, or break apart, the joint in two different ways via the chain rule. \[P(X,Y) = P(X|Y) P(Y) = P(Y|X)P(X)\]

Thus in order to determine, \(P(Y	X)\), the probability of the cause given
the effect, we can reorder the above to obtain \[\frac{P(X	Y)P(Y)}{P(X)} =
P(Y	X)\]

The above is often shown as \[\frac{(Likelihood)*(Prior)}{(Normalization)} = (Posterior)\]

Essentially, we are updating our prior belief of $Y$ represented as $P(Y)$ by the likelihood effect given the cause divided by some normalization constant to arrive upon a new belief of $Y$ given $X$.

Joint distributions are hard to deal with…

Most often times the joint distribution is not practical to compute, in other words intractable. Lets consider a set of random variables consisting of 2 random binary variables, $(X,Y)$, again. We know that there are $2^2$ possible outcomes, $(0,0), (1,0), (0,1), (1,1)$, allowing for effortless computation of the joint probability of $P(X=1, Y=0)$ as $0.25$. This was easy as we knew how many outcomes aligned with $X=1$ and $Y=0$ and also the set of all possible outcomes.

When is this process difficult? Well now consider a set of 20 random binary variables. The set of all possible outcomes is $2^20-1$ or $1,048,575$ combinations. Now consider random variables taking on an arbitrary number of values. You can quickly see how the “all possible outcomes” statement becomes difficult.

The joint distribution often represents an obstacle which has resulted in the development of abstractions that help to address the intractability of some joint distributions via graphical models and sampling methods.

Bayes nets and graphical models

Graphical models are ways to address the computation load of some joint distributions by introducing domain expertise in the form of cause and effect relationships.

“A Bayesian network, Bayes network, belief network, Bayes(ian) model or probabilistic directed acyclic graphical model is a probabilistic graphical model (a type of statistical model) that represents a set of variables and their conditional dependencies via a directed acyclic graph (DAG).”- Wiki[1]

“The first is to see the network as a representation of the joint probability distribution. The second is to view it as an encoding of a collection of conditional independence statements.” - AIMA [2]

Said simply its a graphical model defining conditional relationships between random variables, with connecting arrows (not bi-directional as its acyclic (not cyclical)) usually in the arrows flowing from causes to effects.

Sticking with the example from Wikipedia, lets work with a small graphical model represented by a set of three random boolean variables, $s$, $rain$, $wet$, where $s$ stands for sprinkler.

In words

The probabilities that this graphical represents is as follows: $P(s|rain)$, the probability of a given state of the sprinkler given whether it rains or not, $P(rain)$, the probability of whether it rains or not, and $P(wet|s, rain)$, the probability that its wet, given the state of the sprinkler and whether it rains or not.

The joint distribution over $s, rain, wet$ can be now be represented as $P(wet|s,rain)P(s|rain)P(rain)$

Conditional Probability Table(CPT)

We can derive a CPT, conditional probability table, from the graphical model. This is where the domain knowledge comes into play as the CPT is typically provided.

Lets assume the below CPT (I set the probabilities using some common sense):

\[\begin{array}{rr} P(s|r) \\ \hline r & s &0.1 \\ \hline r & \neg s &0.9 \\ \hline \neg r &s &0.7 \\ \hline \neg r &\neg s &0.3\\ \hline \end{array} \begin{array}{rr} P(r) \\ \hline r &0.3 \\ \hline \neg r &0.7 \\ \hline \end{array} \begin{array}{rr} P(w|s,r) \\ \hline r & s & w & 0.99 \\ \hline r & s & \neg w &0.01 \\ \hline r & \neg s & w &0.98 \\ \hline r & \neg s & \neg w &0.02 \\ \hline \neg r & s & w & 0.8 \\ \hline \neg r & s & \neg w &0.2 \\ \hline \neg r & \neg s & w &0.05 \\ \hline \neg r & \neg s & \neg w &0.95 \\ \hline \end{array}\]

Example: Probability that its not raining given the evidence that the sprinkler is on and its not wet.

$ P(\neg r | s, \neg w) = \frac{P(\neg r, s, \neg w)}{P(s,\neg w)}$ [ Apply chain rule and reorder. ]

$= \frac{P(\neg r, s, \neg w)}{\sum_r P(r=r,s,\neg w)}$ [ Marginalize the denominator for all values of $r$. ]

$= \frac{P(\neg r)P(s | \neg r)P(\neg w | s,\neg r)}{\sum_r P(r=r)P(s | r=r)P(\neg w |s,r=r)}$ [ Factor using the CPT. ]

$= \frac{P(\neg r)P(s\ |\neg r)P(\neg w |s,\neg r)}{\sum_r P(r=r)P(s |r=r)P(\neg w |s,r=r)}$ [ Move terms not dependent on $r$ out of the summation, in this case no simplification! ]

$= \frac{P(\neg r)P(s |\neg r)P(\neg w |s,\neg r)}{P(r)P(s |r)P(\neg w |s,r) + P(\neg r)P(s |\neg r)P(\neg w|s,\neg r)}$ [ Expand. ]

$= \frac{(0.7)(0.7)(0.2)}{(0.3)(0.1)(0.01) + (0.7)(0.7)(0.2)}$ [ Plug in values based on CPT, the domain knowledge. ]

$= 0.996948$

This value seems reasonable, as we would expect that it was not raining if the sprinkler was on in addition if the grass was not wet.

Resources:

http://www.ics.uci.edu/~rickl/courses/cs-271/2011-fq-cs271/2011-fq-cs271-lecture-slides/2011fq271-17-BayesianNetworks.pdf

Markov blankets

Markov blanket for a node in a Bayesian Network is a set of nodes that consists of the nodes parents, children, and childrens parents.

The Markov Blanket is a powerful tool in reducing the computational load.

“The Markov blanket of a node contains all the variables that shield the node from the rest of the network. This means that the Markov blanket of a node is the only knowledge needed to predict the behavior of that node.” - Wiki

Mathematically, \[P(A|MB(A),B) = P(A|MB(A))\] where $MB(A)$ is the Markov Blanket of A

Resources:

https://en.wikipedia.org/wiki/Markov_blanket

D-separation

D-seperation can be a tool to determine if there exists conditional independence between variables. Resonating with the theme of simplifying the problem at hand, conditional independence and thus D-separation:

“play an important role in using probabilistic models for pattern recognition by simplifying both the structure of a modle and the computations needed to perform inference and leearning under that model.” - Bishop [1]

Rules as outlined below: [2]

Rule 1: x and y are d-connected if there is an unblocked path between them.

Rule 2: x and y are d-connected, conditioned on a set Z of nodes, if there is a collider-free path between x and y that traverses no member of Z. If no such path exists, we say that x and y are d-separated by Z, We also say then that every path between x and y is “blocked” by Z.

Rule 3: If a collider is a member of the conditioning set Z, or has a descendant in Z, then it no longer blocks any path that traces this collider.

In summary, the introduction of knowledge kills dependence in the case of Rule 1 and Rule 2, while knowledge results in dependence in the case of Rule 3.

e.g. $P(C | A,B) = P(C | B)$ is really asking if $C$ is orthogonal to $A$ given $B$. If all paths are destroyed by this knowledge then $C$ and $A$ are independent and the equivalence holds. If a path is created, for example, if $B$ is the point of collision on a path between $A$ and $C$, then $A$ and $C$ are not independent and the equivalence does not hold.

Resources:

Bishop, Christopher, Pattern Recognition and Machine Learning
http://bayes.cs.ucla.edu/BOOK-2K/d-sep.html
https://www.andrew.cmu.edu/user/scheines/tutor/d-sep.html

IMDB-WIKI: notes on refactoring data preprocess pipeline

Sat, 07 Apr 2018 00:00:00 +0000

This note is an update to IMDB-WIKI: trying a small model for age classification where I attempted to simplify the objective of age classification by reducing the number of classes and applying a learning model with a relative smaller capacity. That said, rather than modelling, the main focus of the exercise was to handle data in relatively raw and unedited form to extract and load into a format readily consumable by a deep learning model.

The first iteration was sufficient for personal use, but the sloppiness of the project quickly surfaced as others attempted to use the scripts that I had put together.

I had the opportunity to work on the repo again, and refactored the scripts to allow for easier use by others, though some work still is required. See the below points, related to the rework.

A few takeaways

The new implementation is using PyTorch, and the Dataset API to extract, transform, and load the data after preprocessing. In contrast to the original Chainer implementation, I needed paths to the images, so I added the option to the imdb_preprocess.py to return input features as paths to images, by setting the --get-paths flag to True.
The model for this exercise was a pretrained VGG16 model with redefinition of the classifier block. [1] I had trouble extracting the input size to nn.Sequential() so had to reference the documentation.

self.classifier = nn.Sequential(
    nn.Linear(512 * 7 * 7, 4096),
      nn.ReLU(True),
      nn.Dropout(),
      nn.Linear(4096, 4096),
      nn.ReLU(True),
      nn.Dropout(),
      nn.Linear(4096, num_classes),
)

The faces only IMDB data set contains images of all sizes and dimensions, with float values normalized between 0 and 1. As VGG16 takes in 3 channels, I cropped and reduced dimensions to gray scale, then took an additional step to rescale to 0-255 and apply np.uint8(). [Questionable if this is the best way, and would like to here other suggestions if any.]. The final step was to convert to 3 channels, which is just copying the 1d image, across the 3 channels. The source can be found here. The conversion to uint8 is required as the torchvision.transforms.ToPILImage() method used in transformer.py does not take floats at the time of this writing. [2]
On a second pass of the data, I noticed ages well beyond the valid upper range, and included a range check into imdb_preprocess.py. I would imagine a closer investigation to surface further possible improvements.

Next steps at some point

As this 2nd iteration was just refactoring the data extraction and loading process, I have not spent much time on the modeling side and have included a pre-trained VGG16 implementation as a starting point.

http://pytorch.org/docs/0.2.0/_modules/torchvision/models/vgg.html
https://github.com/pytorch/vision/issues/4

Autonomous Mobile Robot #2: Inference as a ROS service

Sun, 01 Apr 2018 00:00:00 +0000

Series:

Inference as a service

We ended the previous post with a trained model. The trained model now can be integrated into a ROS system, and allow for inference given an input feature, image in this case.

Out of the possible options that we can consider, topics, services, actions, using services seems to provide the functionality that fits for this particular use case.

ROS service

The ROS Wiki provides an adequate description of a ROS service. Note that calls to services are blocking, in other words processes are blocked until the requested service completes.

“Request / reply is done via a Service, which is defined by a pair of messages: one for the request and one for the reply. A providing ROS node offers a service under a string name, and a client calls the service by sending the request message and awaiting the reply. Client libraries usually present this interaction to the programmer as if it were a remote procedure call.”

For detailed information and tutorials reference the wiki.[1,2]

Implementing the service provider

The service we would like to provide is given an image, we would like to predict the associated throttle and steering command. We can consider the serverside implementation, waiting for a client to call the specified service. Since prediction is based on a deep net, we assume that the service will be running on hardware with access to GPU. Cuda should be available.

In order to set up the service, we can follow the below steps.

Specify the service request message type and response message type as a .srv file, and place in the /srv directory. PredictCommand.srv
Update the CMakeLists.txt by adding PredictCommand.srv to add_service_files(). CMakeLists.txt
Implement the service interface as found in nn_controller_service.py which consists of initiating the service, and defining a handler as below: nn_controller_service.py

def _init_nn_controller_service(self):
    """ Initialize nn controller service. """
    self._service_nn_controller = rospy.Service(
        'amr_nn_controller_service',
        PredictCommand,
        self._nn_controller_handler
    )
    rospy.loginfo("NN controller service initialized")

def _nn_controller_handler(self, req):
    """ Handler for nn controller service.
    Args:
        req - request
    Returns:
        res - ROS response, commands
    """
    try:
        cv_img = cv2.imdecode(np.fromstring(req.image.data, np.uint8), 1)
    except CvBridgeError as e:
        rospy.logerr(e)

    output = run_inference(self._m, cv_img)

    cmd_msg = Command2D()
    cmd_msg.header =  std_msgs.msg.Header()
    cmd_msg.header.stamp = rospy.Time.now()
    cmd_msg.lazy_publishing = True
    cmd_msg.x = output[0]  # throttle
    cmd_msg.y = output[1]  # steer

    return cmd_msg

Note the handle calls the run_inference() method. This can be redefined to match the training environment used. For example, the process will differ if the training was done using TensorFlow as opposed to PyTorch. For this first pass, we used PyTorch for training, thus run_inference() was defined per below.

def run_inference(model, img, use_cuda=1):
    """ Runs inference given a PyTorch model.
    Args:
        model - pytorch model
        img - numpy darray
        use_cuda - 1 is True, 0 is False
    Returns:
        throttle - throttle command
        steer - steer command
    """
    model.eval()
    trans = imagenet_transforms()['eval_transforms']
    img = trans(torch.from_numpy(img.transpose(2,0,1))).unsqueeze(0)
    if use_cuda:
        img = img.cuda()
  
    img = torch.autograd.Variable(img)
    # Cuda tensor to numpy doesnt support GPU, use .cpu() to move to host mem. 
    throttle, steer = model(img).data.cpu().numpy()[0]
    print(throttle, steer)

Now the service is ready to be called.

Implementing the client side

In our project, we are running the client on a Raspi, a relatively resource constrained SoC. The packages running on the raspi is responsible for taking a monocular image and publishing the image. We have placed the client in the amr_nn_controller package. We also need to define the service in the /srv directory, and update the CMakeLists.txt.

class NNControllerClient(object):

    """
    Given an image input, a dl model infers commands.
    Controller subscribes to raw image topic, and calls the
    'amr_nn_controller_service'.
    """

    def __init__(self, img_topic, cmd_topic):
        self._img_topic = img_topic
        self._cmd_topic = cmd_topic

        rospy.wait_for_service('amr_nn_controller_service')
        rospy.loginfo("Service 'amr_nn_controller_service' is available...")
        self._serve_get_prediction = rospy.ServiceProxy(
            'amr_nn_controller_service',
            PredictCommand,
            persistent=True
        )

        # Initialize subscriber and publisher
        self._image_sub = rospy.Subscriber(self._img_topic, CompressedImage, self._sub_callback)
        self._cmd_pub = rospy.Publisher(self._cmd_topic, Command2D, queue_size=10)

    def _sub_callback(self, img_msg):
        """ Handler for image subscriber.
        Args:
            img_msg - ROS image
        """
        try:
            resp = self._serve_get_prediction(img_msg)
        except rospy.ServiceException as e:
            rospy.logerr("Service call failed: {}".format(e))
        self._cmd_pub.publish(resp.commands)

We initialize a subscriber to subscribe to the topic publishing the images. Every time an image is published, the call back makes a request to the service to get commands for control. Once the service returns with a valid response, the publisher publishes the commands to the command topic, making the commands public.

Instructions

Instructions on how to demo.

On the hardware with GPU available (Jetson TX1, TX2, Cloud etc)

Make sure you have placed the trained model in the models directory in amr_nn_controller_service package.

$ cd amr_core/amr_master/amr_nn_controller_service/launch/
$ roslaunch amr_nn_controller_service.launch

On the raspi:

$ cd amr_core/amr_worker/amr_bringup/launch/
$ roslaunch amr_nn_bringup.launch

References

http://wiki.ros.org/Services
http://wiki.ros.org/rospy/Overview/Services

Autonomous Mobile Robot #1: Data collection to a trained model

Sat, 31 Mar 2018 00:00:00 +0000

Series:

Note

These notes will be to document the progress related to the AMR project found in this repo, where AMR stands for autonomous mobile robot. The base robot is assumed to be a Donkey at the time of writing, the RC car based autonomous race car popularized by DIY Robocars originating out of the US. Any advice on system design, coding, anything is much appreciated.

The objective is to create an autonomous mobile robot that can function in a diverse range of environments.

The assumed set up is that we have a raspi connected to a disassembled RC car as specified in the Donkey documents. [1] In contrast to the Donkey software we will be using primarily ROS (Kinetic) running on Ubuntu Mate 16.04.1.

The following code will bring up the environment to allow for the teleoperation of the mobile robot, and initiate the data collection and storage process.

$ git clone https://github.com/surfertas/amr_core.git
$ cd amr_worker/amr_bringup/launch/
$ roslaunch amr_teleop_bringup.launch

Make sure that the external packages joy package and videostream_opencv are installed and placed in the amr_worker directory.

Thanks to the below as some of the code in this repo has been influenced by them. (Licenses have been respected to the best of my knowledge.)

Collecting the data and storing

The project uses ROS middleware. In order to train a deep learning model, we need to first collect data. ROS uses a pub/sub protocol which facilitates an environment that is actually quite conducive for the data collection process.

Sensor data is published over ROS topics. Any person interested in the data can simply subscribe to the respective ROS topic, and obtain the data as it is published. (Not really ideal from a safety perspective, though is addressed in ROS2.)

For this project, the data collection process is handled by the amr_data_processor package. The script amr_data_storage_node.py has the specific implementation. The required configurations need to be set in data_storage.yaml found in the amr_data_processor/config directory before using.

We throw away non-synced data and only collect synced data as the publishing frequency for each sensor is likely to be different. We simplify the problem by only storing data that is synchronized by using the ApproximateTimeSynchronizer() [1] method which allows for the synchronization of different topics. See below for snippet showing an example usage of this method. I have included the initialization of the synchronizer as well as the associated call back function. The full source can be found here. Details of the sync algorithm can be found at the ROS wiki site. [2]

def _init_subscribers(self):
    """ Set up subscribers and sync. """
    # Initialize subscribers.
    img_sub = message_filters.Subscriber(self._img_topic, CompressedImage)
    cmd_sub = message_filters.Subscriber(self._cmd_topic, Command2D)
    subs = [img_sub, cmd_sub]

    # Sync subscribers
    self._sync = message_filters.ApproximateTimeSynchronizer(
        subs,
        queue_size=10,
        slop=0.2
    )
    self._sync.registerCallback(self._sync_sub_callback)
    rospy.loginfo("Synced subscribers initialized...")
    
def _sync_sub_callback(self, img, cmd):
    """ Call back for synchronize image and command subscribers.
    Args:
        img - image message of type CompressedImage
        cmd - velocity message of type TwistStamped
    """
    if len(self._img_path_array) < self._capacity:
        cv_img = cv2.imdecode(np.fromstring(img.data, np.uint8), 1)
        path = os.path.join(self._data_dir, '{}.png'.format(rospy.get_rostime()))
        cv2.imwrite(path, cv_img)

        self._img_path_array.append(path)
        self._cmd_array.append([cmd.x, cmd.y])

        if len(self._cmd_array) % self._save_frequency == 0:
            self._save_data_info()

In this case, each time the CompressedImage message published by the topic specified in self._img_topic syncs with Command2D message published by the topic specified in self._cmd_topic, the call back function self._sync_sub_callback() is called with the synced messages passed as parameters. The call back function converts the ROS image to a cv matrix format, and stores the data to the specified path.

Note that we are storing the data to an external SSD as the image files can consume material amount of memory quickly.

We store the path to the image as well, which will be used as features fed into a learning algorithm and the commands, used as the target or label, as an array. The data is saved to disk periodically by calling self._save_data_info() presented below.

def _save_data_info(self):
    """ Call periodically to save as input (path) and label to be used for
        training models.
    """
    data = {
        "images": np.array(self._img_path_array),
        "control_commands": np.array(self._cmd_array)
    }
    with open(os.path.join(self._data_dir, "predictions.pickle"), 'w') as f:
        pickle.dump(data, f)

    rospy.loginfo("Predictions saved to {}...".format(self._data_dir))
    

Transforming data into consumable form

Once we are satisfied with the amount of data collected, we can turn to loading and formatting the data to transform the raw data into something consumable by a learning model for training.

The PyTorch Dataset API is tremendously handy as PyTorch allows for almost seamless integration of pandas allowing for the reformatting of the data to be a breeze. The Dataset class is shown below. The repo page is found here.

We can simply load the pickle file and convert the dict to a pandas DataFrame object. Note that the Dataset api requires the definition of __getitem__ and __len__. This can be customized for the task at hand. We can easily change the number of input features or target variables used, as well as generate sequences if necessary.

class AMRControllerDataset(Dataset):

    """
    Custom dataset to handle amr controller.
    Input is an image taken from a monocular camera, with controller mapping
    image to steering and throttle commands.
    """

    def __init__(self, pickle_file, root_dir, transform=None):
        self._pickle_file = pickle_file
        self._root_dir = root_dir
        self._transform = transform
        self._frames = self._get_frames()

    def __len__(self):
        return len(self._frames)

    def __getitem__(self, idx):
        path = self._frames['images'].iloc[idx]
        # Get image name
        img_name = path.rsplit('/',1)[-1]
        # Create path to image
        img_path = os.path.join(self._root_dir, img_name)
        # Get actual image    
        img = io.imread(img_path)
        if self._transform is not None:
            img = self._transform(img)

        return {
            'image': img,
            'commands': self._frames[['throttle', 'steer']].iloc[idx].as_matrix()
        }

    def _get_frames(self):
        pickle_path = os.path.join(self._root_dir, self._pickle_file)
        with open(pickle_path, 'rb') as f:
            pdict = pickle.load(f)

        img_df = pd.DataFrame(pdict['images'], columns=['images'])
        controls_df = pd.DataFrame(pdict['control_commands'], columns=['throttle', 'steer'])
        df = pd.concat([img_df, controls_df], axis=1)
        return df

Thats it.

We can just initiate AMRControllerDataset() from our training implementation and the data will be ready to go.

Training a model based on the collected data

Now that we have our raw data converted to a consumable form we can move on to the training step.

The work flow can be summarized as below and found in amr_models directory.

Specify data set, done in data_loader.py. (Walk through above)
Specify data transformations, done in transforms.py (Note: If using a pre-trained model that used imagenet, need to take care to use appropriate transformations when running inference.)
Specify learning architecture, done in model.py.
Layout training, validation process in train.py

To split the training data into a train and validation data set, I found the SubsetRandomSampler()[4] extremely helpful. The training will save a model every time the metric (MSE) is improved on the validation step.

Once we are satisfied, we are finished with the training and we have a model saved that can be loaded and used for inference.

Summary

By simply calling roslaunch amr_teleop_bringup.launch we kick off a system that records and stores synced sensor data that can be processed by a training system to train a learning model. In the next update we will walk through how to use the trained model.

References:

http://docs.donkeycar.com/guide/build_hardware/#parts-needed
https://docs.ros.org/api/message_filters/html/python/
http://wiki.ros.org/message_filters/ApproximateTime
http://pytorch.org/docs/master/data.html

Notes on Decision Tree Learning #3: Boosted Trees

Thu, 29 Mar 2018 00:00:00 +0000

Notes on Trees:

Decision Trees
1. One line summary
2. Entropy, Info gain, Gini Coefficient, Gini gain
3. High Variance, tendency to overfit
Random Forest
1. One line summary / Pros of RF
2. Random Forest construction
3. Error rate
4. Feature (variable) importance
Boosted Trees (discrete adaboost)
1. Combining weak classifiers to create a strong classifier
2. Algorithm
3. Outlier detection

Boosting

3.1 One line summary

A general algorithm that has the ability to improve the accuracy of a learning algorithm. The following notes will discuss boosting in the context of tree based learners. Boosting allows for the ensemble of weak classifiers to produce a strong classifier, where weak and strong has references to the accuracy of the classifier.

3.2 Algorithm

The general idea is to generate an ensemble of trees sequentially, where each iteration of tree creation focuses on samples that were misclassified in the tree generation. This is done, by increasing the weight of the misclassified sample, increasing the probability of being sampled in the next tree generation round.

As a result of the sequential nature, parallelization is difficult in contrast to Random Forest, which is often considered easily parallized. Further, a benefit of boosting, is that the algorithm has the ability to not only reduce variance, but also bias, where as improvements from Random Forest is concentrated in reducing variance.

Indeed, it has been proved that AdaBoost’s combined classifier has an error rate that converges to the Bayes optimal provided that the algorithm is given enough data, that it is run for enough but not too many rounds, and that the weak hypothe- ses come from a class of functions that is “sufficiently rich. [2]

The algorithm as introduced in ** Intro to Boosting ** [1]:

Given: $(x_1, y_1) … (x_m, y_m)$ where $x_i \in X, y_i \in Y = {-1, +1}$

Initialize: $D_1(i) = \frac {1}{m}$

For $t=1..T:$

Train weak learner using distribution $D_t$
Get weak hypothesis $h_t: X \rightarrow {-1,+1}$ with error: $\epsilon_t = P_{i~D_t}[h_t(x_i) \neq y_i]$
Choose $\alpha_t = 0.5 \ln(\frac{1-\epsilon_t}{\epsilon_t})$
Update: $D_{t+1}(i) = \frac{D_t(i)exp(-\alpha_ty_ih_t(x_i))}{Z_t}$

Finally once $T$ trees produced: $H(x) = sign(\sum_{t=1}^T\alpha_th_t(x))$

Note that the influence of the weak classifier $h_t(x)$ is given by $\alpha$ which is a function of the error or frequency of misclassification by the given tree. The lower the $\epsilon$ the higher the influence.

3.3 Outlier detection

As the algorithm iteratively assigns higher weights to misclassified samples, the algorithm has the ability to detect outliers. Adaboost focuses its weight on the hardest examples, the examples with highest weights often turn to be outliers.[1]

References:

https://cseweb.ucsd.edu/~yfreund/papers/IntroToBoosting.pdf
http://rob.schapire.net/papers/explaining-adaboost.pdf

Notes on Decision Tree Learning #2: Random Forest

Fri, 23 Mar 2018 00:00:00 +0000

Notes on Trees:

Decision Trees
1. One line summary
2. Entropy, Info gain, Gini Coefficient, Gini gain
3. High Variance, tendency to overfit
Random Forest
1. One line summary / Pros of RF
2. Random Forest construction
3. Error rate
4. Feature (variable) importance
Boosted Trees (discrete adaboost)
1. Combining weak classifiers to create a strong classifier
2. Algorithm
3. Outlier detection

Random Forest

2.1 One line summary

As the name suggests, Random Forest, is a collection of decision trees which represents the forest, with the details of each tree constructed randomly.

The result of ensembling the randomly generated tree classifiers in a reduction of variance, and improvement in generalization relative to a single decision tree.

Pros of Random Forest [1]:

It is unexcelled in accuracy among current algorithms.
It runs efficiently on large data bases.
It can handle thousands of input variables without variable deletion.
It gives estimates of what variables are important in the classification.
It generates an internal unbiased estimate of the generalization error as the forest building progresses.
It has an effective method for estimating missing data and maintains accuracy when a large proportion of the data are missing.
It has methods for balancing error in class population unbalanced data sets.
Generated forests can be saved for future use on other data.
Prototypes are computed that give information about the relation between the variables and the classification.
It computes proximities between pairs of cases that can be used in clustering, locating outliers, or (by scaling) give interesting views of the data.
The capabilities of the above can be extended to unlabeled data, leading to unsupervised clustering, data views and outlier detection.
It offers an experimental method for detecting variable interactions.

2.2 Random Forest construction

Randomness is present at a number of points in the construction of a forest. The sample set, feature set, depth of tree, can all be set using a stochastic process. If a random process is used when selecting which feature to use for splitting, as opposed to using gini gain or information gain, we result in the ExtraTrees learner, or extremely randomized trees.

When constructing the learner, the boostrap aggregating, also known as bagging, is applied to the tree learners. Given a training set $X = x_1, …, x_n$ with responses $Y = y_1, …, y_n$, bagging repeatedly (B times) selects a random sample with replacement of the training set and fits trees to these samples. [2]

For the classification task, a majority vote is taken over the set of tree learners.

Using numpy we can implement the majority vote function as so:

# Class method
def majority_vote(self, samples):
    """ Method to compute majority vote over a give set of tree classifiers.
    Args:
        samples - a set of samples for classification.
    Returns:
        final_predictions - prediction based on majority vote.
    """
    # Get classification based on each tree.
    predictions = np.array([tree.classify(samples) for tree in self.trees])
    # For each sample, use majority vote for classification.
    final_predictions = [
        np.argmax(np.bincount(predictions.T[sample]))
        for sample in range(n_samples)
    ]
    return final_predictions

2.3 Error rate

The error rate of the forest is dependent on how accurate the individual classifiers are and the dependence between them. [3] Put another way, the strength of each classifier and the correlation between said classifiers contributes to the error rate.

Breiman introduced an upperbound using strength and correlation as inputs: \[generalization_error \leq \frac{\rho (1-s^2)}{s^2}\]

An indepth analysis can be found in A Study of Strength and Correlation in Random Forests. [4]

2.4 Feature (variable) importance

Random forests are used to get estimates of importance of the features. sklearn offers a easy way to obtain the importance of features. See a [5] for a practical example. Breiman’s original paper provides a more detailed analysis of the topic.

from sklearn.ensemble import RandomForestClassifier
clf = RandomForestClassifier()
clf.fit(X_train, y_train)
importances = clf.feature_importances_
sorted_idx = np.argsort(importances)

References:

https://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htm
https://en.wikipedia.org/wiki/Random_forest
https://www.stat.berkeley.edu/~breiman/randomforest2001.pdf
https://hal.archives-ouvertes.fr/hal-00598466/document

Notes on Decision Tree Learning #1: Decision Trees

Wed, 21 Mar 2018 00:00:00 +0000

Notes on Trees:

Decision Trees
1. One line summary
2. Entropy, Info gain, Gini Coefficient, Gini gain
3. High Variance, tendency to overfit
Random Forest
1. One line summary / Pros of RF
2. Random Forest construction
3. Error rate
4. Feature (variable) importance
Boosted Trees (discrete adaboost)
1. Combining weak classifiers to create a strong classifier
2. Algorithm
3. Outlier detection

Decision Trees

1.1 One line summary

Decision trees is a learning algorithm that uses features to split observations along branches to obtain a conclusion in the leaf nodes (nodes with no children), where a conclusion is a classification of the input sample.

1.2 Entropy, Information gain, Gini Coefficient, Gini gain

Given a sample with many features, the task is to find the feature, or combination of features, that results in an architecture that classifies the features efficiently.

We can use the information gain measure used by ID3, C4.5, C5.0 algorithms, or gini impurity to find available features that will result in the most effective split at each given node. The general idea is that we want to find features that increase the information gained per node. We want to move from a state of confusion to clarity, high entropy to low entropy, and said one final way, from a heterogeneous state to a homogeneous state.

Information Gain

Calculating the information gain algorithmically is a nice to way to introduce the concept. In order to understand information gain, the concept of entropy needs to be introduced first.

\[entropy(p_1…p_n) = -\sum_{i=1}^n p_i\log_2(p_i)\]

(e.g. considering a binary classification task, if the number of $0$ and number of $1$ is equal the entropy is maximized where as if there are only $0$ or only $1$, then entropy is minimized. We can quicky confirm this.

import numpy as np
def entropy(x):
    n = float(len(x))
    n_ones = np.count_nonzero(x)
    # Need to handle case log2(0) = -inf
    if n_ones == 0:
        return 0
    else: 
        entropy = (
            -(n_ones/n)*np.log2(n_ones/n) - (1-n_ones/n)*np.log2(1-n_ones/n)
        )    
        return entropy

x1 = np.array([0,0,0,0,0,0,0,0,0,0])
x2 = np.array([0,0,0,0,0,1,1,1,1,1])

print("Entropy of homogenous classification: {}".format(entropy(x1)))
print("Entropy of even split classification: {}".format(entropy(x2)))

Entropy of homogenous classification: 0
Entropy of even split classification: 1.0

Entropy is also intepreted as the number of bits needed to explain the classification. (e.g. when the classification is evenly split, we need 1 bit to explain 2 classes.) [1]

Calculate entropy for parent node. The higher the entropy the worse it is.
For each feature:
- calculate the entropy for children nodes after splitting on the selected feature.
- calculate the weighted sum of entropies across children nodes.
- calculate information gain as $entropy(parent) - \sum_i entropy(child_i)$
Select feature with highest information gain
Repeat 1-3 until all nodes are leafs (1 class), or a max depth limit is reached.

Gini Gain

The gini gain is a similar concept to information gain, in that it provides an evaluation metric to select features that create the best split. Gini is used in the CART (Classification and Regression Trees) algorithm.

Impurity, or Gini Coefficient, is computed using below equation: \[impurity(p_1..p_n) = 1- \sum_{i=1}^n p_i^2\]

def impurity(x):
    n = float(len(x))
    n_ones = np.count_nonzero(x)
    impurity = (
        1-((n_ones/n)**2 + (1-n_ones/n)**2)
    )    
    return impurity

print("Impurity of homogenous classification: {}".format(impurity(x1)))
print("Impurity of even split classification: {}".format(impurity(x2)))

Impurity of homogenous classification: 0.0
Impurity of even split classification: 0.5

We can quickly observe that the metric outputs similar results to the entropy measure. Gini gain can be computed in the same way as was done for information gain.

1.3 High variance, tendency to overfit

Decision trees are often associated with high variance and the tendency to overfit to the training data. Given a deep enough tree, and nodes, a decision tree can perfectly fit the training data, but fail on validation and testing, failing to generalize. Tree based algorithms introduced later, address the high variance attribute of decision trees.

References:

http://www.cs.bc.edu/~alvarez/ML/id3
http://web.cs.ucdavis.edu/~vemuri/classes/ecs271/Decision%20Trees-Construction.htm

Monty Hall: Optimal strategy for a n-door game

Mon, 12 Mar 2018 00:00:00 +0000

The Monty Hall problem is an interesting problem having stumped experienced mathematicians, despite the seemingly simple problem statement. The problem is considered a paradox of the veridical type, because the solution is so counterintuitive that some believe the conclusion is absurd. [1]

The purpose of this note is to consider a derivative of the original problem statement to build further intuition. I was recently introduced to a similar iteration that helped to cement the intuition behind the logic to solve the problem.

Monty Hall

The original problem statement had the following constraints: [1]

The host must always open a door that was not picked by the contestant (Mueser and Granberg 1999).
The host must always open a door to reveal a goat and never the car.
The host must always offer the chance to switch between the originally chosen door and the remaining closed door. The question is should the player switch given the host has revealed a goat? The answer is yes, the optimal strategy is to switch, as the remaining door has a probability of $\frac{2}{3}$ to contain the car. Given that any door has a probability of $\frac{1}{3}$ to contain the car prior to any information via action by player, and the host, the probability of $\frac{2}{3}$ is somewhat counterintuitive.

Monty Hall with a Twist

Adding a bit of a complexity to the original problem, actually helped in my understanding and I hope will have a similar impact on the reader.

In addition to the original constraints, assume the following constraints.

The player must select the number of doors.
The number of doors can range from $3$ to $15$ inclusive.
At each turn, the player has the option to switch.
If there are $2$ doors remaining, the player is forced to make a final decision on switching and the game ends.

Now rephrase the problem statement to what is the optimal strategy for this game?.

Solution

Lets try by select $3$ doors, which gives us the original Monty Hall

problem.

The probability that an given door has the car is $\frac{1}{3}$. After player selection, and host action to reveal goat, the probability that the other door contains a car is $\frac{2}{3}$. Thinking about probability as an object with mass helps with building intuition for this problem. Since $\frac{1}{3}$ of mass is contained in the player selected door, the remaining $\frac{2}{3}$ has to be contained by the remaining doors. When there are two doors remaining, it is evenly split. When the number of doors is reduced by host action, the probability mass of $\frac{2}{3}$ is contained by the remaining door. Thus when given the choice, of $\frac{2}{3}$ or $\frac{1}{3}$, of probability mass, it is logical to select the door with more mass. Its ok to be greedy here. Thus we reach the solution that the player should switch.

Lets try selecting $7$ doors.

The player selects a door, and the host reveals a goat. The probability that any given door has the car is $\frac{1}{7}$, and thus the probability mass contained in the player selected door is $\frac{1}{7}$, while the remaining $5$ doors, remember the host has removed a door, equally share the remaining probability mass of $\frac{6}{7}$, thus each door has a probability of $\frac{6}{35}$, or $0.1714$ vs. the $0.1428$ of probability mass currently held by the player. Should the player switch? Well if this was the end of the game then yes, but the player still doesnt know what the result of other doors being revealed will have on the probability masses contained in each remaining door.

The host reveals another goat leaving $4$ doors plus the $1$ door the player selected originally. Keep in mind that the probability mass of $\frac{1}{7}$ that the player currently has remains constant, thus the remaining $\frac{6}{7}$ of probability mass needs to be shared by the remaining $4$ doors. This equates to each door having $ \frac{\frac{6}{7}}{4}$, or $\frac{6}{28}$, $0.2143$ in decimals. Ok now we see that by waiting and staying put, the available probability mass by switching has increased from $0.1714$ to $0.2143$. This is nice discovery.

Lets wait until there are $2$ doors remaining in addition to the door the originally selected by the player. Again, the same logic applies. The remaining $\frac{6}{7}$ of probability mass needs to be split between the $2$ doors, leaving $\frac{6}{14}$, or $0.4286$ of mass per door.

I would guess that this pattern is consistent, in other words, the per door probability mass contained by the other doors increases as more doors are revealed.

If this is true, we can easily show by computation that the optimal strategy given an upper bound of $n$ doors is to select the n-door game, and wait until there is a total of $2$ doors remaining before switching, as the remaining $1$ door will have $\frac{n-1}{n}$ of probability mass.

Summary

Thinking about probabilities as an object with mass was really the key for me to build intuition. I hope this walk through helps to build similar intuition for others. In completing this note, the next question, is what happens when $n$ goes to infinity?

References

https://en.wikipedia.org/wiki/Monty_Hall_problem

Data Augmentation: a minimal example using TensorFlow Dataset API

Wed, 24 Jan 2018 00:00:00 +0000

In working with the Udacity’s Drive data, I wanted to augment the available data to increase the size of the data set in hopes of improving the results of training PilotNet, an end to end deep learning model, developed by Nvidia. [1]

I decided to use TensorFlow’s Dataset API to create the data pipeline, a great API that abstracts a lot while still allowing flexibility for the developer to customize the pipeline to a given task. That said, I had a hard time finding best practices on data augmentation and the associated pipeline using the Dataset API. After some investigation of the TensorFlow documentation, I found the definition to the concatenate() method. [2] Unfortunately, there were no examples of how to construct a pipeline for augmentation, thus will use this post to introduce a minimal example. Please refer to a full working data pipeline applied to the Udacity dataset here.. The DataHandler class defined in the source code was quickly put together, thus any advice on how to improve the pipeline or best practice tips would be appreciated.

Minimal example: Using `concatenate()` to augment original data

Please see this jupyter notebook for the minimal example.

The notebook walks through the use of TensorFlow API to upload image based on information found in csv file, and the use of the concatenate() method to create an augmented dataset.

References

https://arxiv.org/abs/1604.07316
https://www.tensorflow.org/api_docs/python/tf/data/Dataset#concatenate

A Review of XGBoost (eXtreme Gradient Boosting)

Thu, 14 Dec 2017 00:00:00 +0000

Gradient boosting is an ensemble technique, where prediction is done by an ensemble of simple estimators. Realistically, gradient boosting can be done over various estimators but in practice GBDT is used where gradient boosting is over decision trees. Instead of heterogeneous grouping of estimators, the grouping is homogeneous and consists of a variation of decision trees with different parameter settings. The tree is built greedily, thus the algorithm is fast but at a cost. The greedy strategy often times results in sub-optimal solutions. [1]

The decision tree ensemble is a set of classification and regression trees (CART). In contrast to standard decision trees where the leaf contains decisions, the leaves found in CART are associated with real value scores which provides a richer interpretation that go beyond classification. [3] The ensemble can be mathematically represented as follows:

\[\sum_{k=1}^K D_k\]

The algorithm is trained by iteratively adding a decision tree trained to reduce the residual of the target function and the current ensemble of trees, $D(x)$. At each iteration the hope is that the residual, the remaining error, gets successively smaller. Though not realistic for a complex function, if we were able to train a tree against $R(x)$ where the residual was truly zero, then the trained ensemble would have fit the distribution completely and perfectly.

\[R(x) = f(x) – D(x) \] \[where D(x) = tree_1(x) + tree_2(x) + …\]

Objective function and Optimization

Gradient boosting relies on regression trees, where the optimization step works to reduce RMSE, while for binary classification the standard log loss, $-\frac{1}{N}\sum_{i=1}^Ny_i log(p_i) + (1+y_i) log(1-p_i)$ is used. For a multi-class classification problem the cross entropy loss is the input to the objective function to be optimized.

Combining the loss function with a regularization term arrives at the objective function. The regularization term controls the complexity and reduces the risk of over-fitting. [2] XGBoost uses gradient descent for optimization improving the predictive accuracy at each optimization step by following the negative of the gradient as we are trying to find the “sink” in a n-dimensional plane. In XGBoost, the regularization term is defined as:

\[\Omega(f) = \gamma T + \frac{1}{2}\lambda\sum_{j=1}^Tw_j^{2}\]

Parameters

Learning rate, $\eta$, is a factor that is applied to each individual tree prior to adding to the ensemble. In practice, a small learning rate is preferred as large learning rates results in larger steps which increases the frequency of discontinuities. A suggested learning rate range for grid search is a range between $0.01 < \eta < 0.1$.
Number of trees, and depth of each tree. Typically a larger number of trees is preferred with a depth that is not excessive. As seen in the visualization [1], we can see that an ensemble of deep trees can result in relatively lower residuals, but at the cost of increased noise/discontinues in the surface of the function approximation.

XGBoost Parameters (Going Deeper):

The authors of XGBoost have divided the parameters into three categories, general parameters, booster parameters, and learning task parameters. I have highlighted the majority of parameters to be considered when tuning parameters. For the full list refer to the documentation. [5]

General Parameters: parameters to defined the overall functionality.

$\textbf{booster}$ (default = gbtree): can select the type of model (gbtree or gblinear) to run at each iteration.
$\textbf{silent}$ (default = 0): If set to one, silent mode is set and the modeler will not receive any feedback after each iteration.
$\textbf{nthread}$ (default = max # of threads): used to set the number of cores to use for processing.

Booster Parameters: Two types of boosters, tree and linear, but as tree typically outperforms linear, the tree boosters is only considered in most literature.

$\textbf{eta}$ (default = 0.3): Learning rate used to shrink weights on each step. Typical final values fall in between 0.01 ~ 0.2. [4] Note this differs from the recommendation from [1] suggesting the learning rate range best set between 0.01 ~ 0.1.
$\textbf{min_child_weight}$ (default = 1): Used to control overfitting and defines the minimum sum of weights of all observations required in a child. A larger number restricts models ability to learn finer details of training set.
$\textbf{max_depth}$ (default = 6): Typical values 3-10, defines the maximum depth of a tree.
$\textbf{max_leaf_nodes}$: the number of terminal nodes of leaves in a tree. Has a mathematical relationship with depth of tree.
$\textbf{gamma}$ (default = 0): specifies the minimum loss reduction required to make a split.
$\textbf{subsample}$ (default = 1): Defines the fraction of observations to be used when sampling randomly from each tree. Typically values use range between 0.5 – 1.
$\textbf{colsample_bytree}$ (default = 1): Fraction of columns to be used when random sampling for tree build out.
$\textbf{lambda}$ (default = 1): L2 regularization term.
$\textbf{alpha}$ (default = 0): L1 regularization term.
$\textbf{scale_pos_weight}$ (default = 1): A value greater than 0 should be used in case of high class imbalances.

Learning Task Parameters: used to define the optimization objective.

$\textbf{objective}$ (default = reg:linear): defines the loss function to be minimized. Options include binary:logistic, multi:softmax, multi:softprob.
$\textbf{eval_metric}$: Metric used for validation. Options include rmse(default for regression), mae, logloss, error (default for classification), merror, mlogloss, auc.

References

http://arogozhnikov.github.io/2016/06/24/gradient_boosting_explained.html
https://www.slideshare.net/ShangxuanZhang/kaggle-winning-solution-xgboost-algorithm-let-us-learn-from-its-author
http://xgboost.readthedocs.io/en/latest/model.html
https://www.analyticsvidhya.com/blog/2016/03/complete-guide-parameter-tuning-xgboost-with-codes-python/
http://xgboost.readthedocs.io/en/latest/parameter.html#general-parameters

ROS + RaspberryPi Camera Module #5: Yolo2 object detection on Raspberry Pi 3, with a bit of help from mother JetsonTX1

Mon, 13 Nov 2017 00:00:00 +0000

In a recent post, we discussed optimizing a face detection system based on classical computer vision techniques to run on a GPU using OpenCV with CUDA enabled. As a result of the optimization the performance improved from 3~4hz to ~10hz. As a recap of the system setup used in this exercise, the Raspberry Pi camera took raw images and published the images to an image ROS topic. The Jetson TX1 subscribed to the raw image topic and handled the heavy lifting which included preprocessing the raw image and detecting any faces. The two machines were connected over WIFI.

We use a similar set up in this exercise allowing the Jetson TX1 to take care of object detection using the Yolo2 algorithm while the Raspberry Pi was solely responsible for streaming compressed raw RGB images. Once again I am amazed how ROS can help to integrate different languages and frameworks (C++11, Python, OpenCV, PyTorch) seamlessly.

source: ros_object_detection

YOLO: You Only Look Once

This youtube recording of a presentation given by the creators of YOLO, titled YOLO 9000: Better, Faster, Stronger suffices in introducing the algorithm. Further, techniques used to transfer learned weights from ImageNet for classification to a use in object detection is covered as well. The combination of the ImageNet and COCO data set using a word tree [28:00], and the discussion related to back propagating different errors based on which data set the input was derived from, was informative.

System Setup

The launch file loads the configuration details found in yolo2.yaml and stores the path details on the parameter server. Next, the service run_inference_yolo2, which is specified in the run_inference_yolo2.py in the /scripts directory is initialized and launched. Finally, the object detection node is initiated. The node will wait until the service is available before making a request.

The system is quite simple as the hard work of object detection is abstracted to the service call.

Performance

Face detection using classical computer vision techniques with CUDA enabled resulted in 10fps on the Jetson TX1. Object detection using Yolo2 obviously is a much more difficult task as this implementation will be detecting 80 different classes. The performance measured in publishing rates was between 3.3-3.8hz while over-clocking increased the performance to 5hz. Considering the number of classes being covered this is amazing, and understandably why there is much hope for deep learning technologies. Here is a snap shot of the results.

Summary

This set up, though not completely on the edge, uses a network set up and does not require cloud access. The concept of having an agent worker with constrained resources dependent on a mother machine is a concept that is worth exploring further.

References

https://www.youtube.com/watch?v=GBu2jofRJtk

ROS + RaspberryPi Camera Module #4: Running ROS master on Jetson TX1 and OpenCV with CUDA enabled

Sat, 28 Oct 2017 00:00:00 +0000

ROS + RaspberryPi Camera Module #4: Running ROS master on Jetson TX1 and OpenCV with CUDA enabled

source code: ros_face_detect

OpenCV with CUDA enabled

The current system setup uses a Raspberry Pi 3(Raspi) with Ubuntu 16.04.1 as the operating system, and ROS, version Kinetic, as the middle ware. The Raspi publishes raw images taken from a Raspberry Pi Camera Module V2.1 over the /webcam/image_raw/compress topic provided by the ROS package video_stream_opencv which subscribes to nodes running on a Jetson TX1 (Ubuntu 16.04.1, ROS Kinetic).

Despite the Raspi being able to publish raw RGB images at close to 30fps, the performance deteriorates as a result of the computational demands placed on the hardware by the face detection algorithm purely relying on the CPU. The publishing rates after detection was between 3-4hz.

In order to increase the performance, a rewrite of the program to allow for GPU acceleration of the computer vision algorithms heavily reliant on matrix multiplication was required. The code changes were minimal as OpenCV3 offers a user friendly API that allows for easy refactoring, while on the other hand the environment set up took more time to set up unfortunately.

Setting up the environment

Prior to starting this exercise, I had previously installed OpenCV 3 along side ROS Kinetic which caused some problems. The problem was that the version included in ROS opencv package did not have cuda enabled. To resolve this issue a build from source was required. Under the directory path /usr/local , following the instructions outlined at OpenCV with CUDA with Tegra , I was able to successfully build and install OpenCV with CUDA enabled.

Despite a successful install, ROS was still having troubles dealing with two versions of OpenCV, and this required some redirection of CMAKE paths and rebuilds of OpenCV dependent ROS packages. The problem and solution is discussed here.

A nice API

The code changes were actually minimal, and the majority is displayed below. The refactor was just making sure that I was passing the correct matrix type to the methods defined in cv::cuda .

cv::Mat img_gray;
cv::cuda::GpuMat img_gray_gpu;
cv::cuda::GpuMat img_cur_gpu;

// Convert Mat to GpuMat
img_gray_gpu.upload(img_gray);
img_cur_gpu.upload(cur_img_);

cv::cuda::cvtColor(img_cur_gpu, img_gray_gpu, CV_BGR2GRAY);
cv::cuda::equalizeHist(img_gray_gpu, img_gray_gpu);

cv::cuda::GpuMat objbuf;
// Find faces in image that are greater than min size (10,10) and store in
// vector<cv::Rect>.
fc_->detectMultiScale(img_gray_gpu, objbuf);
fc_->convert(objbuf, faces_);
std::cout << "Faces detected...: " << faces_.size() << std::endl;

Results

As a result, the publishing rates improved roughly 3x, from 3-4hz to 9-10hz. Considering “real-time” is considered to be somewhere between 10-12hz, this is acceptable. I plan to implement a deep learning driven algorithm next to see how much faster GPU accelerated inference can be in the task of face detection.

###References

https://docs.opencv.org/master/d6/d15/tutorial_building_tegra_cuda.html
https://answers.ros.org/question/242376/having-trouble-using-cuda-enabled-opencv-with-kinetic/

ROS + RaspberryPi Camera Module #3: An alternative package for publishing images from Raspi

Fri, 08 Sep 2017 00:00:00 +0000

Streaming images from Raspi over Network: video_stream_opencv

For this particular project, I originally started off by using the raspicam_node developed by Ubiquitous Robotics, which handles JPEG compression on the RaspberryPi leveraging the videocore found locally. The publish rate tested on the Raspi was ~30hz, but over the network the performance degrades and is unable to maintain a reasonable fps unless the dimensions are set to around 640x480. There is also another ROS package video_stream_opencv [1] that works on the Raspi pretty much out of the box and provides ~25hz+ over the network as well with dimensions set at 640x480. The good thing with this package is that we can simply start by installing using sudo apt-get install ros-kinetic-video-stream-opencv and then finish the installation process by building the package.

After installing and building the package the webcam.launch file can be launched which will publish the following topics.

/webcam/camera_info
/webcam/image_raw
/webcam/image_raw/compressed
/webcam/image_raw/compressed/parameter_descriptions
/webcam/image_raw/compressed/parameter_updates
/webcam/image_raw/compressedDepth
/webcam/image_raw/compressedDepth/parameter_descriptions
/webcam/image_raw/compressedDepth/parameter_updates
/webcam/image_raw/theora
/webcam/image_raw/theora/parameter_descriptions
/webcam/image_raw/theora/parameter_updates

Checking the publishing rate over the network from the master computer results in satisfactory results.

$ rostopic hz /webcam/image_raw/compressed
subscribed to [/webcam/image_raw/compressed]
average rate: 31.105
    min: 0.020s max: 0.040s std dev: 0.00453s window: 27
average rate: 29.865
    min: 0.004s max: 0.153s std dev: 0.01801s window: 55
average rate: 29.978
    min: 0.000s max: 0.167s std dev: 0.02493s window: 86

References

http://wiki.ros.org/video_stream_opencv

ROS + RaspberryPi Camera Module #2: Setting up a Network and Detecting Faces

Mon, 04 Sep 2017 00:00:00 +0000

Networking the Raspi with the Master

Once we get the raspicamera to publish images, we need to set up our system so that multiple machines can communicate with each other. http://wiki.ros.org/ROS/Tutorials/MultipleMachines

Having enabled the raspi camera to publish images, I have moved to set up a master/child system, where the Raspberry Pi will be publishing images to be processed by the master sitting on the host machine. The concept is not unique, and allocating the computationally intensive processes to a master with greater capacity is quite common.

ROS Message to CVImage

The master has been tasked to detect faces, the first step in a multi step project I am currently working on. In order to process the image, we require a conversion from a ROS image type of sensor_msgs::Image to the opencv matrix type cv::Mat, which will enable us to use the opencv standard libraries. Since the raspicam_node only publishes compressed images, we need to make sure to provide the proper “hint” to const TransportHints &transport_hints=TransportHints()). For this case, we can create an image_transport subscriber as so:

cam_img_sub_ = it_.subscribe(
    "/raspicam_node/image", 10, &FaceDetect::convertImageCB, this,
image_transport::TransportHints("compressed"));

Once we have the subscriber in place, we can use the cv_bridge to convert from ROS messages to opencv Matrices. [1]

  cv_bridge::CvImagePtr cv_ptr;
  try
  {
    cv_ptr = cv_bridge::toCvCopy(img, sensor_msgs::image_encodings::BGR8);
  }
  catch (cv_bridge::Exception& e)
  {
    std::cerr << "cv_bridge exception: " << e.what();
    return;
  }

  cur_img_ = cv_ptr->image;
  detectFace();             // A method that I defined to detect faces.        

We store the image in a private variable cur_img_ of type cv::Mat. Note that since we are not transferring all the class attributes of cv_ptr associated with the class CvImage, [2] we need to take extra care when converting back to ROS messages, which I will touch upon later.

Detecting Faces

To detect faces we can use the object classifier, cv::CascadeClassifier found in opencv2/objdetect/objdetect.hpp. We need to make sure that we specify the path to the cascade of interest and load() the cascade. If you get an error along the lines of Error: Assertion failed (!empty()) in detectMultiScale, check the file path was specified correctly. Faces isnt the only trained models available. Body segments, facial parts (eyes), and expressions (smile) can be found at github.com/opencv [3].

The process of detecting faces has become common knowledge, thus doing a quick google search will provide sufficient background information. I particularly liked this post where facial recognition, is covered as well[4].

Convert back to ROS message type

Once we have our vector of faces, we can place bounding boxes(rectangles) around the detected faces, and publish the processed image. In order to publish over a ROS topic, we will need to convert from the cv::Mat type back to a ROS message type. As we allocated the image earlier separately, we need to populate the public attributes in the class CvImage[2], encoding, and header. We can use the following code to convert:

// img_with_bounding_ is a private variable that I created. You should replace
// with the image you would like to convert.
sensor_msgs::ImagePtr msg = cv_bridge::CvImage(std_msgs::Header(), "bgr8",
img_with_bounding_).toImageMsg();

detected_faces_pub_.publish(msg);

Once we have made the conversion, we can simply publish the message using the publisher we initiated earlier.

References

http://docs.ros.org/kinetic/api/cv_bridge/html/c++/namespacecv__bridge.html
http://docs.ros.org/kinetic/api/cv_bridge/html/c++/classcv__bridge_1_1CvImage.html
https://github.com/opencv/opencv/tree/master/data/haarcascades
https://medium.com/@ageitgey/machine-learning-is-fun-part-4-modern-face-recognition-with-deep-learning-c3cffc121d78

ROS + RaspberryPi Camera Module #1: Publishing image from raspi on different host machine.

Mon, 28 Aug 2017 00:00:00 +0000

TL;DR: Make sure your environment is setup correctly and remember (stating the obvious) that you will be working on a ARM architecture as opposed to a x86 most likely found on your laptop.

Setup

Environment: Raspi3 running UbuntuMATE 16.04.2 using the Raspi Camera v2.1. ROS Kinetic is used and one can install ROS Kinetic for Ubuntu running on ARM architecture using instructions found here

These notes assumes you have enabled the camera through raspi-config. Make sure raspistill and raspivid are functioning as expected. (ex. raspistill -o test.png should result in an image named test.png placed in the current directory.)

Using the raspicam_node

One can take the time to program their own Raspi camera node, but for this exercise we will be using raspicam_node created by UbiquityRobotics. The ROS node can be installed following the instructions found here.

Network configuration

Some steps need to be taken to address the network configuration, in order for Raspi to communicate with a master on a different machine.

http://wiki.ros.org/ROS/Tutorials/MultipleMachines
http://wiki.ros.org/ROS/NetworkSetup

In summary, ROS_MASTER_URI that is set for the machine considered the master (laptop in my case), needs to be set exactly the same for the child (raspberry-pi in my case). In my particular case, the hostnames were not resolving, and was getting an error message Couldn't find an AF_INET address for on my master machine. To resolve this issue, I explicitly specified the ROS_IP of the child. (Use ifconfig to find IP address of the child).

Once the network is setup, roscore needs to be initialized on the master (laptop), before launching the launch file related to raspicam_node, otherwise you will get an error message indicating that the master can not be reached.

The raspicam_node outputs the following topics:

/raspicam_node/camera_info
/raspicam_node/image/compressed
/raspicam_node/parameter_descriptions
/raspicam_node/parameter_updates
/rosout
/rosout_agg

Notice that the /raspicam_node/image/compressed is only available, thus without the appropriate conversion, the image will not be viewable using rosrun image_view image_view image:=/topic_name. Thus we will need to publish a conversion from /image/compressed topic to /image by using ` rosrun image_transport republish compressed in:=/raspicam_node/image raw out:=/raspicam_node/image. Another way to access the image is by usingrqt_image_view` which starts a GUI and allows for the visualization of the capture.

Next steps

Now that we have the camera working on a child machine separate from the master, we can work on more interesting robotics projects related to collaboration across multiple machines.

References

N-gram: Some more PyTorch tutorials

Sun, 20 Aug 2017 00:00:00 +0000

Working through tutorials to familiarize myself with PyTorch…

Key points from the tutorial

Word embeddings are dense vectors of real numbers where each word in a vocabulary is represented by a vector.
One hot encoding is used to attribute a unique identifier to a word. The encoding converts a word to a vector of |V| elements [0,0,…1,…0,0]. Word w is in the location of where the 1 is within the vector.
The issue with one hot encoding is that each word is treated as an independent entity and no relationships are represented.
To improve on this, we consider semantic relationships or attributes. If each attribute is a dimension then we can find the similarity by using some linear algebra. The similarity is defined by the angle between the vector of attributes. The tutorial explanation was sufficient thus will cut and paste.

Since manually engineering the possible symantic attributes is tedious, we allow for latent semantic attributes, where the neural network will learn the semantic attributes. This results in practicality in exchange for transparency, as we will not be able to see what the actual attributes the model has learnt.
Embeddings are stored as a |V| x D, number of words in vocabulary by dimension of embedding.
Use torch.nn.Embedding(vocab size, dimension of embedding). We need to use torch.LongTensor to index in as indices are integers and not floats.

class NGramLanguageModeler(nn.Module):

    def __init__(self, vocab_size, embedding_dim, context_size):
        super(NGramLanguageModeler, self).__init__()
        self.embeddings = nn.Embedding(vocab_size, embedding_dim)
        self.linear1 = nn.Linear(context_size * embedding_dim, 128)
        self.linear2 = nn.Linear(128, vocab_size)

    def forward(self, inputs):
        embeds = self.embeddings(inputs).view((1, -1))
        out = F.relu(self.linear1(embeds))
        out = self.linear2(out)
        log_probs = F.log_softmax(out)
        return log_probs

N-Gram Language Modeling

The example the tutorial walks through is the N-gram language model, where the objective is to compute \[P(w_i | w_{i-1}, w_{i-2},…,w_{i-n+1})\]. In the case of the example, a 2-gram language model is considered, where the previous 2 words is associated with a target of the following word. The model trains on a test sentence, the author uses Shakespeare Sonnet 2, where the vocabulary is defined as the set of words that can construct the test sentence. Running the model we can see that the loss reduces over each epoch, but a test of the model is not provided.

Checking the predictive power

Again, just walking through a tutorial verbatim can be boring. Playing around a bit is more rewarding, and I find that it helps with memory retention as well. Instead of basing the vocab on one sonnet, I decided to test with multiple sonnets[2], and see if training on first 5 and testing on the 6th would result in something interesting. The vocabulary set consists of 398 unique words used in sonnets 1-6, derived from the original 647 words used.

Training the model, not surprisingly, shows the loss decrease on each epoch consistent with the tutorial, but evaluating the model after training on 100 epochs results in 0% accuracy. Though dissappointing, This probably isn’t surprising considering the limited data and capacity of the model…

Note to self: Remember that PyTorch Variable cant be transformed to numpy, because they’re wrappers around tensors that save the operation history. We can retrieve a tensor held by autograd.Variable by using .data and then using .numpy(), to convert from a Variable to a numpy array. [3]

References

http://pytorch.org/tutorials/beginner/nlp/word_embeddings_tutorial.html#getting-dense-word-embeddings
http://nfs.sparknotes.com/sonnets/sonnet_6.html
https://discuss.pytorch.org/t/how-to-transform-variable-into-numpy/104/2
http://colah.github.io/posts/2014-07-NLP-RNNs-Representations/

Raspimouse: Simple teleop program written in Rust

Sat, 19 Aug 2017 00:00:00 +0000

tl;dr: Great event, and hopefully we see more events like this.

Intro

This is a quick write up of my work at the ROS Japan UG #12 Raspberry Pi Mouseハッカソン, hosted by RT Corporation, and Groove X. RT Corporation kindly provided the Raspimouse (20+ units), a differential drive robot powered by the raspberry pi, allowing participants to hack with a live robot while Groove X is a robotics company started by the “dad” of the Pepper robot.

github repo: https://github.com/surfertas/rustymouse

I had two focuses today, with the first being rewriting the teleop code in the Rust language. I was under the impression this would be trivial, but had some hiccups that consumed some time to resolve. The second was to move from simulation to the real world, though it appears I will not have sufficient time to complete part two on this occasion.

Teleop the Raspimouse in Rust

I started with the crate that Takashi Ogura [1] created, where he implements the “hello world” of ROS, a subscriber and publisher nodes, based on rosrust being developed by Adnan[2].

Main steps taken:

Add geometry_msgs/Twist to build.rs file where the messages are defined rosmsg_main!(“std_msgs/String”, “geometry_msgs/Twist”). Currently message generation is being done through a build script, where the use of procedural macros is preferred. If there is any one with experience in procedural macros, please reach out to Adnan!
Add the name of the binary to be created, in this case vel_publisher and path src/vel_publisher.rs to Cargo.toml.
Finally, it was really about programming, where simple io, and pattern matching was sufficient. (In reality, I should remove the new line mark, \n, from the pattern matching to make the code more elegant…)

#[macro_use]
extern crate rosrust;
extern crate env_logger;

use rosrust::Ros;
use std::{thread, time};
use std::io;

rosmsg_include!();

fn main() {
    env_logger::init().unwrap();
    let mut ros = Ros::new("vel_publisher").

    let mut publisher = ros.publish("cmd_vel").unwrap(
    println!("w: forward, s: backward, a: left, d: right >");

    loop {
        let mut vel_cmd = msg::geometry_msgs::Twist::
        thread::sleep(time::Duration::

        let mut command = String::new();
        io::stdin().read_line(&mut command);

        match command.as_str() {
            "w\n" => vel_cmd.linear.x = 0.35,
            "s\n" => vel_cmd.linear.x = -0.35,
            "a\n" => vel_cmd.angular.z = 3.21,
            "d\n" => vel_cmd.angular.z = -3.21,
            "q\n" => break,
            _ => println!("Command not recognized"),
        }

        publisher.send(vel_cmd).
    }
}

Issues

One issue I had was that I needed to publish to the topic /raspimouse/diff_drive_controller/cmd_vel, which would allow me to teleop the mouse, but rosrust for some reason was not accepting this topic name. Reviewing the source code[3] I was not able to find an immediate solution, thus resulted in writing a python script to remap between cmd_vel and /raspimouse/diff_drive_controller/cmd_vel using the below python script. This allowed me to successfully teleop the mouse in simulation.

#!/usr/bin/env python
import rospy
from geometry_msgs.msg import Twist

def callback(msg):
    pub = rospy.Publisher('/raspimouse/diff_drive_controller/cmd_vel', Twist,
queue_size=10)
    pub.publish(msg)

def topic_mapper():
    rospy.init_node('mapper', anonymous=True)
    rospy.Subscriber("/cmd_vel", Twist, callback)
    rospy.spin()

if __name__ == '__main__':
    topic_mapper()

References

Bag of Words: Working through some pytorch tutorials

Fri, 18 Aug 2017 00:00:00 +0000

Some NLP with Pytorch

The pytorch tutorial on NLP, really introducing the features of pytorch, is a great crash course introduction to NLP. This exercise for me, is more about getting comfortable with a new frame work then anything (have to jump on the Pytorch band wagon with the release of v2).

The first section is using Bag of words, checkout the wiki for a decent intro and rather comprehensive introduction. In short, in its naive form, the frequency of each word is used as a feature for training a classifier.

The first step is creating a vocabulary, and is done by taking the union of the data sets under consideration. In the pytorch tutorial data and test_data are combined and used to create an index of words, word_to_ix. The simple script provided does the job.

word_to_ix = {}
for sent, _ in data + test_data:
    for word in sent:
        if word not in word_to_ix:
            word_to_ix[word] = len(word_to_ix)

The wiki example is rather clear so will introduce below:

John likes to watch movies. Mary likes movies too.
John also likes to watch football games.

These two texts will describe the sample space, thus the list constructed is as follows.

[
    "John",
    "likes",
    "to",
    "watch",
    "movies",
    "Mary",
    "too",
    "also",
    "football",
    "games"
]

We can get the bag of words representation by backing out the frequency of each word for each text sample, thus we get:

[1, 2, 1, 1, 2, 1, 1, 0, 0, 0]
[1, 1, 1, 1, 0, 0, 0, 1, 1, 1]

Training with BOW

I have left the original comments as they are helpful for a new comer. The texts are converted to feature vectors with the appropriate type as inputs to autograd.Variable() need to be of torch tensor type. The labels in string form are converted to a torch.LongTensor, an integer tensor likewise. The data set is used and trained for 100 epochs, and the results show that the model has learned to classify between and english and spanish.

        # Step 1. Remember that Pytorch accumulates gradients.
        # We need to clear them out before each instance
        model.zero_grad()

        # Step 2. Make our BOW vector and also we must wrap the target in a
        # Variable as an integer. For example, if the target is SPANISH, then
        # we wrap the integer 0. The loss function then knows that the 0th
        # element of the log probabilities is the log probability
        # corresponding to SPANISH
        bow_vec = autograd.Variable(make_bow_vector(instance, word_to_ix))
        target = autograd.Variable(make_target(label, label_to_ix))

        # Step 3. Run our forward pass.
        log_probs = model(bow_vec)

        # Step 4. Compute the loss, gradients, and update the parameters by
        # calling optimizer.step()
        loss = loss_function(log_probs, target)
        loss.backward()
        optimizer.step()

The tutorial tests on the word “creo” . Before we find the log probabilities as spanish: -0.1599 and english: -0.1411, while after 100 epochs of training on such a small data set we find that the results are, spanish: 0.3315 and english: -0.6325.

Recap

Its always fun trying to change the tutorial in some form, to see how different tweaks impact the results. Since we started with such a small data set, a quick and easy adjust, is to add more examples, (in particular spanish text with the use of creo). Increasing the data set by a few samples, quickly improves the results with the log probabilities moving to spanish: 0.5855 and english: -7205. In this case, its probably best to apply the softmax function to represent the log probabilities as a probability distribution between 0 and 1*. The probability of creo being spanish moves from 72.39% to 78.68%. Not bad of an improvement for a small boost to the sample set, but more impressive is that we were able to get such results with such a small data set to start with. *To convert from log probs to a probability distribution between 0 and 1, simply exponentiate the log probs values using a base e, and normalize by the sum for each case).

Reference

http://pytorch.org/tutorials/beginner/nlp/deep_learning_tutorial.html
https://en.wikipedia.org/wiki/Bag-of-words_model
https://en.wikipedia.org/wiki/Softmax_function
http://www.spanishdict.com/translate/yo%20creo

Transfer Learning: Working through the pytorch tutorial

Thu, 17 Aug 2017 00:00:00 +0000

Quick post on Transfer Learning

A common situation that we encounter is the lack of data, which results in not having sufficient data to properly train a high capacity architecture. Thus, often times, a pretrained model is used for initialization as opposed to (fine-tuning) or as a fixed feature extractor, where all layers excluding the final FC is frozen.

The pytorch tutorial[1] provides a couple examples, one related to finetuning a resnet18 model pre-trained on imagenet 1000 dataset. When finetuning, we use the pre-train model as the initialization to our new architecture, where we have redefined the final fully connected layer to take in they same number of in features model_ft.fc.in_features while we reset the number of out features to accommodate the number of labels, which in the case is set to two as we are classifying between ants and bees, and train per usual on our smaller data set.

model_ft = models.resnet18(pretrained=True)
num_ftrs = model_ft.fc.in_features
model_ft.fc = nn.Linear(num_ftrs, 2) #nn.Linear(number of in_features, number of
out_features)

if use_gpu:
    model_ft = model_ft.cuda()

criterion = nn.CrossEntropyLoss()

# Observe that all parameters are being optimized
optimizer_ft = optim.SGD(model_ft.parameters(), lr=0.001, momentum=0.9)

Resnets

Deep Residual Learning is presented in “Deep Residual Learning for Image Recognition”, and builds on the evidence that depth of a neural network plays a significance role in the performance of a given model. A key excerpt from the paper is quoted below:

Let us consider H(x) as an underlying mapping to be fit by a few stacked layers (not necessarily the entire net), with x denoting the inputs to the first of these layers. If one hypothesizes that multiple nonlinear layers can asymptotically approximate complicated functions2 , then it is equivalent to hypothesize that they can asymptotically approximate the residual functions, i.e., H(x) − x (assuming that the input and output are of the same dimensions). So rather than expect stacked layers to approximate H(x), we explicitly let these layers approximate a residual function F(x) := H(x) − x. The original function thus becomes F(x)+x. Although both forms should be able to asymptotically approximate the desired functions (as hypothesized), the ease of learning might be different.

The results are pretty impressive, considering the ants and bees data set only consists of 120 training images, and 75 images for validation. As suggested in the paper, this result is likely related to the ability for deep representations having excellent generalization performance on recognition tasks.

Epoch 22/24
----------
train Loss: 0.0800 Acc: 0.8525
val Loss: 0.0429 Acc: 0.9542

Epoch 23/24
----------
train Loss: 0.0714 Acc: 0.8730
val Loss: 0.0425 Acc: 0.9477

Epoch 24/24
----------
train Loss: 0.0620 Acc: 0.9016
val Loss: 0.0434 Acc: 0.9412

Training complete in 51m 51s
Best val Acc: 0.954248

Next Steps

As for next steps, would be interesting to see finetuning on the resnet model performs on a data set consisting of larger number of labels.

Reference

http://pytorch.org/tutorials/beginner/transfer_learning_tutorial.html
https://arxiv.org/pdf/1512.03385.pdf

Rust-Docker: Speeding up compile times when using docker + rust

Thu, 22 Jun 2017 00:00:00 +0000

Recently, have been trying to have a go at ROS using the rust language. As launch files aren’t available, a clear alternative is using docker containers with docker compose.

In order to accomplish the above, I first need to be able to create containers that can compile the rust language. Do a bit of searching on the internet, I came across Docker container for easily building static Rust binaries, which seemed perfect for this particular exercise.

Simply running allows for the compilation of a static Rust binary with no external dependencies:

$ alias rust-musl-builder='docker run --rm -it -v "$(pwd)":/home/rust/src ekidd/rust-musl-builder'
$ rust-musl-builder cargo build --release

One suggestion that was made to me to improve the build times, was to cache the dependencies necessary by creating a .cargo/registry in the rust project.

Going to the root of the rust project directory:

$ mkdir .cargo/registry -p

and add the mounting command as follows when creating the alias:

$ alias rust-musl-builder='docker run --rm -it -v "$(pwd)":/home/rust/src -v "$(pwd)/.cargo/registry":/home/rust/.cargo/registrye kidd/rust-musl-builder'
$ rust-musl-builder cargo build --release

The first build will take the usual expected time, while the next builds will be faster with any dependencies having been cached.

Thanks @yasuyuky!

Turtlepi #7: Automatic Target Generation for the Turtlebot

Tue, 23 May 2017 00:00:00 +0000

Source: https://github.com/surfertas/turtlepi

This is a quick note to update the work done thus far while here at Idein Inc. One of the short term objective was to create an environment that would allow for the collection of data of a robot navigating in a simulation environment based on a given policy. The expert policy was provided by the ROS navigation stack, specifically the move_base found in the ROS node. Since the robot will have access to a static map, we will be using the AMCL, adaptive monte carlo localization algorithm as opposed to SLAM. [1] See the figure below for the navigation stack setup. The move_base given the below setup publishes control commands to the cmd_vel topic as messages of type geometry_msgs/Twist allowing the robot to follow a path computed by global_planner and local_planner to a goal specified independently.

A typical way of generating a target for navigation is by using RViz to specify a goal explicitly using the Navigation 2d feature. This is fine in the case where only a few sample runs are needed, but in our case we would like to automate the target generation process in order to efficiently generate a large amount of data in simulation.

Automated Target Generation System

The program logic associated with the automated target generation service generate_nav_target is quite simple, though the integration requires work.

We start by obtaining the meta data related to the map. In this case we are using a static map, and assume that the map environment does not change during an episode, where an episode is defined as a run from current location to the target. The ROS framework already provides a service static_map that is of type nav_msgs::GetMap and returns a response of type nav_msgs/OccupancyGrid. We can observe the raw message definition defined per below. In particular, we want to copy over the MapMetaData info and int8[] data for later use. Luckily int8[] is of type std::vector<int8_t> in c++ thus the = is overloaded and copying the vector is easy. NOTE: the data is in row-major form so we need to be sensitive when using the data. The map gets initialized once when the target generator service is launched.

# This represents a 2-D grid map, in which each cell represents the probability of
# occupancy.

Header header 

#MetaData for the map
MapMetaData info

# The map data, in row-major order, starts with (0,0).  Occupancy
# probabilities are in the range [0,100].  Unknown is -1.
int8[] data

When the service is initiated, the callback generateTargetService is called, and generates a new target. The service uses the cost map that was saved at initialization and randomly selects a point on the cost map. The grid point is converted to the world coordinates via a conversion as per below:

void TargetGenerator::mapToWorld(uint32_t mx, uint32_t my, double& wx, double&
wy)
{
    wx = map_origin_x_ + (mx + 0.5) * map_resolution_;
    wy = map_origin_y_ + (my + 0.5) * map_resolution_;
}

With the world coordinates in hand, a distance check is made against a threshold which I have set to 8m. We want to avoid targets that are too close to the current location. If and only if the threshold check passes and the selected point on grid is in the free space, which is associated with a cost map value of 0, then the generated point is selected as a valid target.

A couple of additional notes:

I have used visualization markers to make target visualization in RViz easier. You need to add a marker, and specify the appropriate topic to subscribe to. In this case the topic is /turtlepi_navigate/visualization_marker.
The implementation has much room for optimization. This implementation is a first pass. When considering a large number of calls to this service. It would be inefficient to randomly select a point from the entire cost map, when the valid points are only contained to the free space. I look to implement an algorithm that would return the free space points only, and the service can randomly select a point from the set of free points.
Once the target goal is obtained the action server is used to communicate the goal and follow on communication with the robot until the target is reached or a failure event occurs.

Results

See the results of an example run below.

Usage

git clone  https://github.com/surfertas/turtlepi.git
// On first terminal
cd ~/turtlepi_gazebo/launch/
roslaunch turtlepi.launch
// Wait until you see that odom received message.
// On second terminal
cd ~/turtlepi_navigate/launch/
roslaunch turtlepi_gentarget.launch

References

http://robots.stanford.edu/papers/fox.aaai99.pdf

Turtlepi #6: Dealing with Quaternions, no not GANs

Thu, 18 May 2017 00:00:00 +0000

This is a quick post to better understand the concept of quaternions and its relation to robotics. The catalyst for this particular post was the use of tf::Quaternion in a recent ROS implementation, and a self directed question inquiring whether or not I truly understood and remember the concept of a quaternion. The answer was no, and is still no, but I can definitely say that I remember more now than I did the day before.

Advantages of Quaternions

Quaternions are used to represent rotations and is an alternative to the often referenced Euler angles. Despite the elevated complexity with respect to the learning curve, the benefits warrant the effort. Some benefits cited included: [1]

Dont have to worry about Gimbal lock a property specific to Euler angles.
Greater efficiency in terms of memory foot print and computation relative to matrix and angle/axis representation.
Depends on the use case, but quaternions only contain a rotation as opposed to a translation and scaling.

What is a Quaternion

Mathematically, a quaternion is represented by a scalar and a vector, $(a_0. \textbf{a})$, and in the expanded form as a linear combination: \[ a = a_0 + a_1i + a_2j + a_3k \]

The scalar component represents the magnitude of the rotation, while the vector component represents the axis of rotation.

The quaternion under consideration represents a rotation in 3 dimensions. A property of a rotation in 3 dimensions is that any combination of rotations can be represented by a rotation along a given axis. Thus if we are given two rotations represented by quaternions,$r$, and $s$, that are applied to a particular axis represented by $\textbf{v_1}$, we can get the new representation, $\textbf{v_2}$ , by considering the following equation: \[\textbf{v_2} = (rs) (0 \ \textbf{v_1})^T(rs)^{-1}\]

Before moving forward, at this point we need to review a few properties related to quaternions to understand what operations are taking place.

The inverse of a quaternion, $r^{-1}$ is not as simple as just taking the inverse, but is actually the representations conjugate, which simply means that the vector elements of the quaternion are negated. Thus the quaternion inverse of $(r_0, \textbf{r})$ is equivalent to $r_0, -\textbf{r})$.
Multiplication when considering quaternions, is again, not particularly straightforward, in the common sense. Multiplication represents the composition of two rotations, thus $rs$, actually is representing one rotation followed by another rotation.
Rotations are not communicative. This means that $rs \neq sr$, and in words, the order of rotations matters. This can be quickly confirmed using your right hand and applying two rotations in different order.

Quaternion Multiplication

Now back to the equation \[\textbf{v_2} = (rs) (0 \ \textbf{v_1})^T(rs)^{-1}\]. Obtaining the inverse is somewhat self explained, thus will focus on quaternion multiplication.

If we expand, $r$, and represent the rotation as a linear combination, we get \[r = r_0 + r_1i + r_2j + r_3k\] and likewise for $s$ we get \[s = s_0 + s_1i + s_2j + s_3k\]

Before we proceed lets consider the fundamental formula of quaternion algebra discovered by William Rowan Hamilton . For some context this formula just happened to occur to Mr. Hamilton as he was walking along some river. [2] Impressive to say the least. \[ i^2 = j^2 = k^2 = ijk = -1\]

From the fundamental formula we can further derive a few other identies, specifically: \[ i^2 = j^2 = k^2 = -1\] \[ ij = -ji = k\] \[ jk = -kj = i\] \[ ki = -ik = j\]

which can be used along side the associated multiplication table[2]:

Using the identities and the multiplication we can compute the result of $rs$ as $n = r \times s = n_0 + n_1i + n_2j + n_3k$ where \[n_0=(r_0s_0−r_1s_1−r_2s_2−r_3s_3)\] \[n_1=(r_0s_1+r_1s_0−r_2s_3+r_3s_2)\] \[n_2=(r_0s_2+r_1s_3+r_2s_0−r_3s_1)\] \[n_3=(r_0s_3−r_1s_2+r_2s_1+r_3s_0)\]

I will refrain from writing out all the computations but the calculations is really as simple as aligning the first rotation along the y-axis of the multiplication table and the second rotation along the x-axis of the multiplication table and applying the identities to simplify.

Result

Now that we have the inverse and multiplication results in hand, we can simply relay on plain matrix multiplication to compute the new axis, given the original from and two rotations.

References

http://stackoverflow.com/questions/1840314/when-do-i-need-to-use-quaternions
http://mathworld.wolfram.com/Quaternion.html
https://www.mathworks.com/help/aerotbx/ug/quatmultiply.html

Turtlepi #5: Getting familiar with boost and its relation to ROS

Thu, 11 May 2017 00:00:00 +0000

Having been reading a lot more ROS related code in recent days, I am starting to see a lot more use of the boost library, thus writing a quick post as a reminder to myself of the different functionality that boost offers. Will keep this an open post as I will sure to be adding new functions as I progress.

boost::bind()

Taken from the documentation the purpose of boost::bind is introduced as:

boost::bind is a generalization of the standard functions std::bind1st and std::bind2nd. It supports arbitrary function objects, >functions, function pointers, and member function pointers, and is able to bind any argument to a specific value or route input arguments into arbitrary positions. bind does not place any requirements on the function object; in particular, it does not need the result_type, >first_argument_type and second_argument_type standard typedefs.

A few examples helps to clarify and move us from abstraction to concreteness.

Consider the standard library functions std::bind1st(), and std::bind2nd() presented below. Note that we will stick with the same examples presented in the documentation.

Boost vs. Standard Library

// standard library
std::bind2nd(std::ptr_fun(f), 5)(x);    // f(x, 5)

// boost
boost::bind(f, _1, 5) (x);              // f(x, 5)

// standard library
std::bind1st(std::ptr_fun(f), 5)(x);    // f(5, x)

// boost
boost::bind(f, 5, _1)(x);               // f(5,x)

Thus in boost, _1, acts as a place holder for the first input to the function call that is not “bound”. The docs indicate that boost::bind is more flexible, and the examples given definitely makes you think that is the case.

boost::bind(f, _2, _1)(x, y);               // f(y, x)
boost::bind(g, _1, 9, _1)(x);               // g(x, 9, x)
boost::bind(g, _3, _3, _3)(x, y, z);        // g(z, z, z)
boost::bind(g, _1, _1, _1)(x, y, z);        // g(x, x, x)

Relation to ROS

Why does this matter in relation to ROS. One use case we can consider is when one wants to pass arguments to a callback.

void imageCallback(const sensor_msgs::ImageConstPtr& msg, uint32_t additional_arg)
{
    // some code
}

ros::Subscriber sub = 
    nh.subscribe<sensor_msgs::Image>("/topic", 10, boost::bind(imageCallback, _1, additional_arg));

Important

One thing to note, is that _1 indicates the target location of where the message of type sensor_msgs::Image will be directed. In this, if we observe the function definition of imageCallback(), we can see that the message is located in the first slot.

boost::shared_ptr

boost::shared_ptr is a wrapper for a raw C++ pointer which helps to manage the lifetime of the pointer. There is a preference to use a “smart” pointer in place of the typical method which leaves the responsibility of deleting the object to the programmer, which obviously raises the risk of memory leaks. Since c++ 11, the standard library has included this functionality, thus we can be using std::shared_ptr in place of boost in our own coding if necessary.

The use of smart pointers allows for the automatic deletion of objects, thus is generally considered safer and the preferred way of implementation. An example using the std::shared_ptr is presented below: [3]

void f()
{
    typedef std::shared_ptr<MyObject> MyObjectPtr; // nice short alias
    MyObjectPtr p1; // Empty

    {
        MyObjectPtr p2(new MyObject());
        // There is now one "reference" to the created object
        p1 = p2; // Copy the pointer.
        // There are now two references to the object.
    } // p2 is destroyed, leaving one reference to the object.
} // p1 is destroyed, leaving a reference count of zero. 
  // The object is deleted.

I like the idea that with a bit more code, we can guarantee safe programming, at least with respect to pointers.

Shared pointers and ROS

How does shared pointers come in to play when programming with ROS? The use of shared pointers is apparent when dealing with intraprocess publishing, i.e. when the publisher and subscriber to a particular topic exist in the same node. If we want to skip the serialize/deserialize step when considering intraprocess publishing, which is process intensive and reasons for latency, we need to publish the message as a shared pointer. See th example from the ROS Wiki: [4]

ros::Publisher pub = nh.advertise<std_msgs::String>("topic_name", 5);
std_msgs::StringPtr str(new std_msgs::String);
str->data = "hello world";
pub.publish(str);
// Note that std_msgs::StringPtr is a redefinition of boost::shared_ptr.
// From the docs [5]:
//  typedef boost::shared_ptr< ::std_msgs::String> std_msgs::StringPtr
//  

This form of publishing is what can make nodelets such a large win over nodes in separate processes. Note that when publishing in this fashion, there is an implicit contract between you and roscpp: >you may not modify the message you’ve sent after you send it, since that pointer will be passed >directly to any intraprocess subscribers. If you want to send another message, you must allocate a >new one and send that. - ROS WIKI

References

http://www.boost.org/doc/libs/1_64_0/libs/bind/doc/html/bind.html
http://answers.ros.org/question/12045/how-to-deliver-arguments-to-a-callback-function/
http://stackoverflow.com/questions/106508/what-is-a-smart-pointer-and-when-should-i-use-one?rq=1
http://wiki.ros.org/roscpp/Overview/Publishers%20and%20Subscribers
http://docs.ros.org/electric/api/std_msgs/html/String_8h.html

Turtlepi #4: Resolving the spinning Turtlebot in RViz when using the navigation stack

Mon, 08 May 2017 00:00:00 +0000

Turtlebot spins out of control

Not sure if others have had similar issues, but when trying to use the navigation stack I found that the turtlebot had the tendency to spin out of control when given a target that required rotation as a first move.

Solution

In order to resolve this issue, we need to first inspect the relevant configuration file for the DWS planner, dwa_local_planner_params.yaml found in the /opt/ros/kinetic/share/turtlebot_navigation/param directory. The parameters that need to be considered are specified as follows:

DWAPlannerROS:

# Robot Configuration Parameters - Kobuki
  max_vel_x: 0.5  # 0.55
  min_vel_x: 0.0

  max_vel_y: 0.0  # diff drive robot
  min_vel_y: 0.0  # diff drive robot

  max_trans_vel: 0.5  # choose slightly less than the base's capability
  min_trans_vel: 0.1  # this is the min trans velocity when there is negligible rotational velocity
  trans_stopped_vel: 0.1

  # Warning!
  #   do not set min_trans_vel to 0.0 otherwise dwa will always think
  #   translational velocities
  #   are non-negligible and small in place rotational velocities will be
  #   created.

  max_rot_vel: 5.0  # choose slightly less than the base's capability
  min_rot_vel: 0.4  # this is the min angular velocity when there is negligible translational velocity
  rot_stopped_vel: 0.4

  acc_lim_x: 1.0 # maximum is theoretically 2.0.
  acc_lim_theta: 2.0
  acc_lim_y: 0.0  # diff drive robot
…

The current solution offered is capping the maximum rotation velocity by setting the max_rot_vel parameter to a smaller value then the default specification of 5.0. Reducing this value to 1.0 resolved the issue. [1]

In order to this, we can either directly amend the dwa_local_planner_params.yaml file to set a new default or, we can redefine the parameter by using <param name="move_base/DWAPlannerROS/max_rot_vel" value="1.0"/> in the <include> tag that launches the{move.base.launch.xml file as mentioned here.[2]

Result

We can see that with the redefinition of max_rot_vel the spinning issue has been resolved.

References

https://github.com/turtlebot/turtlebot_apps/pull/140
http://answers.ros.org/question/239340/turtlebot-spinning/

Turtlepi #3: Getting RGB image to display in RViz using the Astra Pro Camera

Wed, 03 May 2017 00:00:00 +0000

Update: 2020/1/22

If you find this article at all helpful, please consider a supportive contribution.

This is somewhat specific to the Turtlebot 2 that is being shipped with the Astra Pro by Clearpath Robotics.

At least in my case, there were some obstacles in getting the RGB camera and depth camera working simultaneously out of the box. This post is an attempt to document a working solution.

Appears that others have had similar issues fwiw.

Getting the Launch file

I will quickly outline the solution to get the Astra Pro working such that depth and RGB image topics are recognized by RViz. (I specify the objective as getting RViz to recognize /camera/rgb/image_raw topic as this is a result that was necessary for the current ongoing project.

First get the astra_pro.launch file following the below steps:

mkdir -p turtlebot_ws/src
cd turtlebot_ws/src
catkin_init_ws
git clone https://github.com/tonybaltovski/ros_astra_launch.git --branch
upstream
git clone https://github.com/tonybaltovski/ros_astra_camera.git --branch
upstream
cd ..
rosdep install --from-paths src --ignore-src --rosdistro=indigo -y
catkin_make
source devel/setup.bash
rosrun astra_camera create_udev_rules
(you may need to reboot here)
roslaunch astra_launch astra_pro.launch

You can investigate the launch file to get a better idea of what is being called, and what parameters are being set.

This will allow for the depth camera to function as well as the RGB camera. After launching roslaunch astra_pro.launch, we can run rostopic list and the list of all available topics will be outputted to the terminal.

We should be able to confirm that the below topics related to the depth camera and RGB image camera are available as well.

/camera/depth/camera_info
/camera/depth/image
/camera/depth/image/compressed
/camera/depth/image/compressed/parameter_descriptions
/camera/depth/image/compressed/parameter_updates
/camera/depth/image_raw
…

/camera/rgb/astra_pro_uvc/parameter_descriptions
/camera/rgb/astra_pro_uvc/parameter_updates
/camera/rgb/camera_info
/camera/rgb/image_raw
/camera/rgb/image_raw/compressed
/camera/rgb/image_raw/compressed/parameter_descriptions
/camera/rgb/image_raw/compressed/parameter_updates

At this point rosrun image_view image_view image:=/camera/rgb/image_raw works but trying to subscribe to the /camera/rgb/image_raw topic in RViz will result in the following status error message:

CameraInfo/P resulted in an invalid position calculation (nans or infs)

This requires a relatively manual step that is most likely not obvious to new comers. To resolve this we need to:

Create a yaml file related to the RGB camera calibrations, which will be named rgb_Astra_Orbbec.yaml.
Edit the astra_pro.launch to reflect the location of the yaml file.

Camera calibration

In order to generate the rgb_Astra_Orbbec.yaml file we can follow the tutorial “How to Calibrate a Monocular Camera”.[1] A couple points to note:

They use a large checker board for calibration, with the checker dimensions set at 108mm. You can use a smaller checker board but make sure that you set the --square argument to reflect the size that you use. A checker board printed to A4 printing paper will be associated with 2.5cm sized checkers, based on my measurements. Note that a checker board template is made available in the tutorial.
After calibrating hit commit, (it does really take a minute or so, and the screen does appear to freeze temporarily), and this should create a directory in .ros called camera_info if it doesn’t exist already and place a file named rgb_Astra_Orbbec.yaml.

Edits to Launch file

Before making the amends, launch astra_pro.launch and run rostopic echo /camera/rgb/camera_info. You should notice that the values are not populated and filled with zeros.

---
header:
  seq: 416
  stamp:
    secs: 1493810620
    nsecs:  89788527
  frame_id: camera_rgb_optical_frame
height: 0
width: 0
distortion_model: ''
D: []
K: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
R: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
P: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
binning_x: 0
binning_y: 0
roi:
  x_offset: 0
  y_offset: 0
  height: 0
  width: 0
  do_rectify: False
---

Open the launch file and direct your attention to 19-23:

<!-- By default, calibrations are stored to
file://${ROS_HOME}/camera_info/${NAME}.yaml,
where ${NAME} is of the form "[rgb|depth]_[serial#]", e.g.
"depth_B00367707227042B".
See camera_info_manager docs for calibration URL details. -->
<arg name="rgb_camera_info_url" default="" />
<arg name="depth_camera_info_url" default="" />

Set the default argument for rgb_camera_info_url file:///$HOME/.ros/camera_info/rgb_Astra_Orbbec.yaml

Now go to line 132, where you should see

<!-- <param name="camera_info_url" value="file:///tmp/cam.yaml"/> -->

Uncomment and replace with:

<param name="camera_info_url" value="$(arg rgb_camera_info_url)"/>

This should resolve the issue and rostopic echo /camera/rgb/camera_info should output the correct values derived from the calibration yaml file we generated earlier.

---
header:
  seq: 44
  stamp:
    secs: 1493810762
    nsecs: 374881266
  frame_id: camera_rgb_optical_frame
height: 480
width: 640
distortion_model: plumb_bob
D: [0.1924862242666659, -0.1428745350678355, -0.008005953314755045,
-0.01514558091529794, 0.0]
K: [631.9627446307516, 0.0, 292.3340769022051, 0.0, 626.7628503190605,
231.5643918983762, 0.0, 0.0, 1.0]
R: [1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0]
P: [663.3839721679688, 0.0, 285.5615141729359, 0.0, 0.0, 662.3782348632812,
228.5056059693034, 0.0, 0.0, 0.0, 1.0, 0.0]
binning_x: 0
binning_y: 0
roi:
  x_offset: 0
  y_offset: 0
  height: 0
  width: 0
  do_rectify: False
---

Further, subscribing to /camera/rgb/image_raw should not output a status error in RViz, and you should be able to visualize the RGB image properly.

Reference

http://wiki.ros.org/camera_calibration/Tutorials/MonocularCalibration

Turtlepi #2: Collecting data from simulation

Fri, 28 Apr 2017 00:00:00 +0000

Intro:

I am in the process of setting up an environment in Gazebo+RViz+ROS to collect navigation related data of a Turtlebot. Given a target, I need to record the data related to sensor input, controls, the relative target location, and the current location of the Turtlebot within a given map.

Plan:

The logic that I plan to implement initially is to send a generic target as a goal, and record the relevant data until the task is complete, where completion is defined as success, or some error where a reset is necessary.

The SimpleActionClient presented here, actionlib, is the relevant class and exposes the topics that I require for this exercise. The action client subscribes to move_base/goal which translates to a goal for move_base to pursue in a given map.

The message type associated with the topic is move_base_msgs/MoveBaseActionGoal which is defined as follows:

std_msgs/Header header
  uint32 seq
  time stamp
  string frame_id
actionlib_msgs/GoalID goal_id
  time stamp
  string id
move_base_msgs/MoveBaseGoal goal
  geometry_msgs/PoseStamped target_pose
    std_msgs/Header header
      uint32 seq
      time stamp
      string frame_id
    geometry_msgs/Pose pose
      geometry_msgs/Point position
        float64 x
        float64 y
        float64 z
      geometry_msgs/Quaternion orientation
        float64 x
        float64 y
        float64 z
        float64 w

Functions

Many of the functions that will come in handy are declared here

The functions that will be specifically used are as follows:

sendGoal()
waitForResult()
getState()

with the declarations displayed below:

void sendGoal(const Goal& goal,
    SimpleDoneCallback done_cb = SimpleDoneCallback(),
    SimpleActiveCallback active_cb = SimpleActiveCallback(),
    SimpleFeedbackCallback feedback_cb = SimpleFeedbackCallback());

bool waitForResult(const ros::Duration& timeout = ros::Duration(0,0) );

ResultConstPtr getResult() const;

SimpleClientGoalState getState() const;

Psuedo-code

The logic can be quickly shown with psuedo-code:

while ( true )
    sendGoal();
    while( waitForResult() )
        collect data mentioned previously and store.
    elihw
    GetState() → Log if success or not.
elihw

Remarks

Will post on the results once completed.

References

https://github.com/ros/actionlib/blob/indigo-devel/include/actionlib/client/simple_action_client.h
http://wiki.ros.org/move_base
http://wiki.ros.org/navigation/Tutorials/SendingSimpleGoals

Turtlepi #1: RViz + Gazebo-Turtlebot localization in simulation

Thu, 27 Apr 2017 00:00:00 +0000

Intro

Posting on some results related to working with Gazebo + RViz. The task at hand was importing a new world model into gazebo, building a map, and using localization and navigation packages to allow for the Turtlebot to navigate to a given target. I am currently working on a larger project, as part of my work at Idein Inc. which requires data generated from simulation, thus my venture into Gazebo + RViz.

The general feeling I have is that RViz + Gazebo is not so friendly to a new comer, and requires a decent amount of head banging to get things to work. That said, since we are dealing with a lot of moving parts, the complexity of the system and the API should be somewhat expected. To consider that this is the case after ALOT has already been abstracted is quite impressive.

source: https://github.com/surfertas/turtlepi

Map Generation

Gazebo has a build editor which allows one to create a “world”. This will clearly come in use at some point later, but considering the time constraints, instead of creating a new world model, I relied on models created by erlerobot found here. The model.config, and model.sdf were placed into a directory called /circuit2 and this was placed in ~./gazebo/models. The circuit2.world file was placed in the /worlds directory under a ros package that I created called turtlepi_gazebo, which allows for easier access by other programs.

Now that we have a world environment we can generate a map. generate_map.launch, was constructed to handle the launch of files related to teleop, gmapping, and RViz.

After a painstakingly manual task of teleoping the turtlebot around, we get two outputs, a .pgm image, and a .yaml file referencing the .pgm image. In term of steps:

$ roslaunch generate_map.launch

Use the teleop keypad to move around the map. You can confirm the construction of the map in RViz which should have loaded as a result of roslaunch.

Once you are satisfied with the constructed map, run map_server.

$ rosrun map_server map_saver -f <file_name_to_save_to>

You should now see the previously mentioned files generated.

Localization

This is really all we need to see the turtlebot in action, localizing and navigating its way based on the generated map and given target.

$ rosrun turtlepi_localize.launch map_file:=<location_of_.yaml_file>

Not sure for others, but it took awhile for the map to load and set up which in the meantime ROS was streaming the following warning:

[ WARN] …:Timed out waiting for transform from base_footprint to map to become
available before running costmap, tf error:…

Setting the Global options-> Fixed Frame from map to base_link resolved the issue. (Though subsequently flipping back made the localization and navigation task function seemingly better)

Results

Thoughts

I am just happy I got this working, but clearly a lot of road blocks ahead.

References

http://gazebosim.org/tutorials?cat=build_world&tut=building_editor
http://learn.turtlebot.com/2015/02/03/8/
https://github.com/erlerobot/gym-gazebo/blob/master/gym_gazebo/envs/assets/worlds/circuit2.world

IMDB-WIKI: trying a small model for age classification

Tue, 18 Apr 2017 00:00:00 +0000

Update: 2020/1/22

If you find this article at all helpful, please consider a supportive contribution.

Overview

In the paper, “DEX: Deep EXpectation of apparent age from a single image”, the authors were able to display remarkable results in classifying the age of an individual based on a given single image. The results were obtained using an ensemble of convolutional neural networks. The model design process consisted of starting with a VGG-16 architecture pre-trained on image-net which was fine-tuned on the IMDB-WIKI data set. This is followed by training on the ChaLearn LAP data set. The data set is split in 20 sub data sets. Though a random process was followed in the paper the authors were careful in maintaining the distribution of the original data set for each subset. This results in 20 different trained models where the final prediction used was the average of the ensemble of 20 networks trained. See the paper for the exact details, as this is just a high level summary.

The data set is large enough to the extent that in memory management is non-trivial. The IMDB data set is 269GB while the face-crop version is 7GB. Trying to pre-process this data set, let alone train, is beyond the scope/capacity of an individual with limited resources. The challenge for this particular exercise is to see the results of using a small subset of the original data and using a lower capacity architecture.

For some perspective, training on the architecture proposed by the authors took 5 days to train on the entire IMDB + WIKI data set.

source: https://github.com/surfertas/deep_learning/tree/master/projects

UPDATES: Apr/07/2018: IMDB-WIKI: notes on refactoring data preprocess pipeline

Data: faces only (7GB)

The data is relatively “raw” as the provided data (at least the face-crop version) was inconsistent in dimensions, and color channels. The authors suggest this was the case as the data was collected using a web-crawler. Just to give an example, the dimensions of a few samples from the training set are listed below. We can see that some are colored and some are gray scale, while the sizes are not consistent. The formatting steps taken here were to convert all images to gray scale, and resize images to dimensions of $128\times128$. A simple python script can handle this processing.

That was just handling the images. Now we need the labels, which again was not easily obtained. The meta data is stored separately and unfortunately in a .mat file. (Yes, matlab). The meta information stored is as follows: dob: date of birth (Matlab serial date number)

photo_taken: year when the photo was taken
full_path: path to file
gender: 0 for female and 1 for male, NaN if unknown
name: name of the celebrity
face_location: location of the face. To crop the face in Matlab run img(face_location(2):face_location(4),face_location(1):face_location(3),:))
face_score: detector score (the higher the better). Inf implies that no face was found in the image and the face_location then just returns the entire image
second_face_score: detector score of the face with the second highest score. This is useful to ignore images with more than one face. second_face_score is NaN if no second face was detected.
celeb_names (IMDB only): list of all celebrity names
celeb_id (IMDB only): index of celebrity name

The label that we require for training is the age parameter, which is not stored as meta information, and requires some calculation. The age value can be obtained by taking the photo_taken and subtracting dob, the date of birth. Sounds easy? No…as the dob is stored as a Matlab serial number.

Luckily we can use the scipy.io.loadmat to load the .mat file to a python consumable (kind of) format. We can access the dob by some proper indexing, and convert the Matlab serial number to a usable format by using datetime.date.fromordinal( serial_number ).year. Ok, so now we have the dates in a consistent format thus extracting the age is now trivial.

Now the inputs, $128\times128$ gray scale images, and labels, estimated age, can be packaged together and dumped to a pickle file. The script can be found here. The script allows the user to specify the size of the training set, by setting the parameter --partial.

Training

Now we are ready for training. Keep in mind, we are using only a subset of the data, and using an experimental self-designed model that has a much smaller capacity. Further, the model was not pre-trained on Imagenet thus we need to take steps to adjust accordingly to increase the odds of success.

The original paper uses $101$ age classes, which was appropriate for the data set size and learning architecture used. As we are only using a subset of the data and a very simple model, the number of possible classes was reduced and set to 4, {Young, Middle, Old, Very Old}, with the bucketing defined by Young $(30yrs < age)$, Middle $(30 \leq age < 45)$, Old $(45 \leq age < 60)$, and Very Old $(60 \leq age)$. The distribution of the data gets concentrated around $30-60$, see the below image for the distribution of the ages for each data set. [2] The images were preprocessed using ZCA Whitening.

The model used for this exercise is shown below (the Chainer framework was used for implementation). Basically the model consisted of 3 convolution layers, with ReLU activition, batch normalization, max-pooling, and dropouts applied, basically any form of regularization was used that might help in reducing training time, and risk of overfitting, and as a result increase learning ability to compensate for the lack of data and capacity of the model.

class CNN(chainer.Chain):

    def __init__(self, n_out):
        super(CNN, self).__init__(
            conv1=L.Convolution2D(1, 32, ksize=3, stride=2, pad=1),
            bn1=L.BatchNormalization(32),
            conv2=L.Convolution2D(32, 64, ksize=3, stride=2, pad=1),
            bn2=L.BatchNormalization(64),
            fc3=L.Linear(4096, 625),
            fc4=L.Linear(625, n_out)
        )

    def __call__(self, x):
        h = F.relu(self.bn1(self.conv1(x)))
        h = F.max_pooling_2d(h, ksize=2, stride=2, pad=0)
        h = F.dropout(h, ratio=0.3, train=True)
        h = F.relu(self.bn2(self.conv2(h)))
        h = F.max_pooling_2d(h, ksize=2, stride=2, pad=0)
        h = F.dropout(h, ratio=0.3, train=True)
        h = F.dropout(F.relu(self.fc3(h)), ratio=0.3, train=True)
        return self.fc4(h)

Results

The first 10,000 images were used with the distribution of data is as follows:

YOUNG: 0.1381
MIDDLE: 0.4439
OLD: 0.3386
VERY OLD: 0.0794

Well as one would expect, the results are quite poor on this first pass. The model used fitted(over) to the dataset well, while the test accuracy shows really no improvement and seemingly cant do much better than random guessing. Realistically this was fully expected as the capacity of the model, and size of the data set was extremely constrained. Lack of pre-training on image-net is a consideration that likely impacted as well. The exercise was really just an exercise, and a good lesson about how data is not always in consumable form, and the importance of the general infrastructure necessary to properly train and learn on data of any meaningful size.

That said, it is likely too early to form a conclusion. Will make further attempts with different architectures.

References

[1] Rothe, R., Radu, T., Van Gool, L. DEX: Deep EXpectation of apparent age from a single image. [2] https://data.vision.ee.ethz.ch/cvl/rrothe/imdb-wiki/

Denoising Autoencoder

Wed, 12 Apr 2017 00:00:00 +0000

DAE and Chainer

Getting up to speed with Chainer has been quite rewarding as I am finding the framework quite intuitive and the source code of the framework user friendly, where any roadblocks can be smoothly resolved with a bit of source code mining. I have found porting implementations in one frame work to another framework efficient in learning a new tool, thus have been working to port a TensorFlow tutorial to a Chainer tutorial. The latest addition is a Denoising Auto-Encoder.

source: 06_autoencoder.py

Auto-Encoder (Auto-associator, Diabolo Network)

Just to provide context, an Auto-Encoder is an unsupervised neural network, and in the kindly simplified wording by Mr. Y. Bengio, the neural network is trained to encode the input $x$, into some representation $c(x)$ so that the input can be reconstructed from that representation. Hence the target of the neural network is the input itself. [1]

Literature commonly introduces the concept of the encoder, which maps the inputs to a hidden layer, typically of smaller dimension than the inputs, to create a “bottle-neck”. (If hidden layer is linear, and MSE criterion is used to train, then the units of the hidden layer can be associated with the principal components under PCA. Simply, the model is finding a best representation of the inputs constrained to the number of hidden units.)

The deeplearning.net tutorial [2], quickly introduces the key concepts necessary to understand the tutorial. The encoder is defined by a deterministic mapping, \[y =\sigma(Wx + b)\]

The latent representation $y$, or code is then mapped back (with a decoder) into a reconstruction $z$ of the same shape as $x$. The mapping happens through a similar transformation, \[z=\sigma(W’y+b’)\]

Tied Weights

Further, the tutorial implementation uses tied weights, where the weight matrix, $W’$ , of the decoder, is related to the weight matrix, $W$, of the encoder by \[W’ =W^T\]The model trains to optimize the parameters, $W, b, b’$, to minimize the average reconstruction error.

Denoising

The denoising introduces stochasticity by “corrupting” the input in a probabilistic manner. This translates to adding noise to the input to try to confuse the model, with the idea to create a more robust model capable of reconstruction. The tutorial implementation uses a corruption level parameter that adjusts the amount of noise, and is input to a binomial distribution used for sampling.

Results

After 1 epoch, the model has yet to learn the reconstruction of the MNIST digits, while after 100 epochs we can observe that the model has learned to reconstruct the inputs relatively well.

Reference

Bengio,Y., Learning deep architectures for AI, Foundations and Trends in Machine Learning 1(2) pages 1-127.
Denoising Autoencoders(dA), http://deeplearning.net/tutorial/dA.html

Creating your own robot: GAZEBO

Mon, 10 Apr 2017 00:00:00 +0000

Intro

The current project I am working on will require data generation, collection, and processing of events that occur with in a simulated environment, thus I plan to use Gazebo + ROS heavily. The base robot model will be a Turtlebot 2, though the model definition will require some modifications. The modifications will require similar edits to the simulated Turtlebot, thus will require some modifications to the robot model defined in the .sdf file (Simulator Description Format).

Model Database Structure

Gazebo has a nice step by step intro to building your own robot, which I will quickly introduce here. Before getting started with the construction, a brief detour, (well worth the read here), discusses the best practices for the construction of the file directory related to the robot models.

Further, the writers of the tutorial have kindly provided a link to a repository that has many models defined, which comes in great use for those getting initially involved. Each model is a self contained tutorial to a certain extent.

Robot Construction

In terms of creating the robot model, the tutorial is self contained, and very straightforward. By the end of the tutorial, you should have a model of a robot with 2 wheels + 1 caster. A couple of concepts to direct attention:

Keep <static>True</static> while building out the robot, as this will allow the robot to be ignored by the physics engine, and will be easier to visualize the intermediate result, instead of having parts falling apart having succumbed to gravitational forces.
With respect to the orientation of axis, the right hand rule is followed. The robot front facing surface is in the positive x direction, left facing being in the positive y direction, top facing being in the positive z direction and so forth.

Recap

Here is a snapshot of the final result after messing around with the file. I also copy/pasted the turtlebot model and loaded a couple into the Gazebo environment. Overall, the take away thus far, is that this will be a relatively straightforward process to build out a personal robot to use in the coming experimentation.

References

Make a Mobile Robot,http://gazebosim.org/tutorials?tut=build_robot&cat=build_robot

Y.A.DQN.P: Yet another DQN Post

Sun, 02 Apr 2017 00:00:00 +0000

Intro:

Another great exercise assignment #3[1] presented by Berkeley’s DRL course, where the assignment pushes students to implement(kind of) and train the DQN algorithm presented in 2013, by Mnih et al, in the paper “Playing Atari with Deep Reinforcement Learning”. [2] The implementation uses off-policy training, with experience replay, epsilon greedy exploration, and the use of an additional target network for stabilization (introduced in later papers). The relevant lectures by John Schulman can be found at CS294-112 2/15/2017 and CS294 2/22/2017

source: DQN

Review:

A considerable amount of the code was provided(kudos to Szymon Sidor from OpenAI), and the student is left to implement a few key portions in tensorflow. Specifically, the student is left to implement the update of the replay buffer, sampling of the replay buffer, definition of the error value used for optimization, target network updates, and set up of the training step to train the deep convolutional neural network used to approximate the Q-function that generates an estimate of the state-value function for a given state and action.

The implementation was more of an exercise in getting a handle on the starter code, and understanding the API of the relevant objects found in the system. As a result, one should derive in an increased understanding of tensorflow, (numpy + python to say the least), in addition to a greater appreciation for the abstraction that Keras + other frameworks provides.

Further, knowing that training the model would require 1 million + steps, GPU acceleration is pretty much a must for fast experimentation. This GPU access was not provided, and is left to the student to source. (May be different for enrolled Berkeley students.) I decided on using the AMI provided by bitfusion.io, “Bitfusion Ubuntu 14 Tensorflow” where a g2.2xlarge costs $0.715/hr (includes software costs of $0.065). Wanting to initially avoid paying up for the $0.065 cost of software, I attempted a supposed cheaper route by starting with a bare bones Ubuntu 14.04 AMI, and building necessary software myself. I failed miserably and crawled back to bitfusion.io. (Lesson here is dont be a “@#@$ for a tick”)

Thank you bitfusion.io! as the AMI is decked…

Ubuntu 14 AMI pre-installed with Nvidia Drivers, Cuda 7.5 Toolkit, cuDNN 5.1, TensorFlow 1.0.0, TFLearn, TensorFlow Serving, TensorFlow TensorBoard, Keras, Magenta, scikit-learn, Python 2 & 3 support, Hyperas, PyCuda, Pandas, NumPy, SciPy, Matplotlib, h5py, Enum34, SymPy, OpenCV and Jupyter to leverage Nvidia GPU as well as CPU instances.

After gaining access to the instance through an ssh via my command-line, all that was left was a sudo -H pip install gym[atari] and copying the relevant files over. I really wish I would have started with this in hindsight. (I am in no way associated with bitfusion.io fwiw, nor Berkeley for that matter.)

Once the required code was implemented and the GPU access set up appropriately, the rest was just about hitting the start button to initiate training.

DQN algorithm:

CNN model:

Key Points/Concepts:

Algorithm is hybrid of online and batch Q-value iteration, interleaves optimization with data collection.
Replay memory for increased data efficiency, increased stability, reward propagation beyond 1-step, and addresses the concerns around i.i.d breakdown associated with purely online updates.
Target network fixed over timesteps (~10k steps) to approximate $Q(s_{t+1},a_{t+1})$. John comments that this allows $Q$ to chase a relatively non-moving target ($Q$ chases $TQ$). Another way to think about this in my own words is that: $r + \gamma * max Q(s_{t+1},a_{t+1})$ has 1 time step of real world data derived from $r$, when compared to $Q(s,a)$, and is a better estimate of the expected return under a given policy at time step $t$ from a particular state,$s$ after taking action $a$. We want $Q(s_t,a_t)$ to better approximate the expected return, or said differently catch up to $r+ \gamma * max Q(s_{t+1},a_{t+1})$, thus as in any situation, a slow moving target is easier to catch than a fast moving target.

Results:

Ideally testing different parameter settings is recommended, but considering personal time constraints, I left the parameters at default values.

Replay buffer size: 1m
Exploration: epsilon linear decay, floor at 0.1
Target network: 10k step updates
Gamma: 0.99
Batch size: 32
Frame history length: 4
Gradient norm clipping: 10

The default settings were sufficient to produce acceptable results, as we see the AI agent is able to start scoring against the opponent.

The following video is after the agent had consistently starting scoring on average.

Modifications:

A few suggestions were made based off subsequent published papers that have shown to speed up training times and general performance. (Note: not implemented.)

Double DQN:[3] use $r + \gamma * Q^{target}(s’,Q(s’,a’))$ for the target Q-value.
Dueling nets:[4] parameterize $Q$ as $Q_\theta(s,a) = V_{\theta}(s) + F_{\theta}(s,a) - mean_{a’}F_{\theta}(s,a’)$
Prioritized Experience:[5] Use last bellman error to estimate gradient step, and use update value for either proportional, $p_i =|\delta_i| + \epsilon$ or rank, $p_i = \frac{1}{rank_i}$ weighting.

Final Thoughts:

With relatively cheap access to GPU acceleration, very accessible starter code, and great insight from publicly available lectures, any one can implement a history making algorithm, given the time and patience. This is great, and I hope that more first class institutions continue to share such high quality material to the general public.

References

Deep RL Assignment 3: Q-Learning on Atari. http://rll.berkeley.edu/deeprlcourse/docs/hw3.pdf
Mnih, Volodymyr, Kavukcuoglu, Koray, Silver, David, Graves, Alex, Antonoglou, Ioannis, Wierstra, Daan, and Riedmiller, Martin. Playing atari with deep reinforcement learning. arXiv:1312.5602, 2013.
Van Hasselt, H., Guez, A., and Silver, D. Deep reinforcement learning with double Q-learning, 2015
Wang, Z., de Freitas, N., Lanctot, M. Dueling network architectures for deep reinforcement learning arXiv preprint arXiv: 1511.06581
Schaul, T., Quan, J., Antonoglou, I., and Silver, D., Prioritized experience replay. arXiv preprint arXiv:1511.05952(2015)

Behavioral Cloning

Fri, 24 Mar 2017 00:00:00 +0000

This post is a quick summary of findings from working on assignment 1 from CS 294: Deep Reinforcement Learning, Spring 2017, open sourced by Berkeley. [1] Reviewing the material, I have to say the quality and depth is overwhelming, and despite being one sample, attests to the education provided by Berkeley and the professors employed. I envy the students that have the opportunity to engage and participate live in a course structured by Professor Levine and team. Making this lecture accessible is a great service, and I can only share my appreciation through a simple virtual “Thank you”.

source code: behavioral cloning

Problem Definition

In part 1 of the assignment, we are tasked with finding a policy for a selection of agents, with each agent having their own definition of an observation space, $o \in R^{n_{o}}$, and action space, $a \in R^{n_{a}}$,. The policy in the deterministic case is defined as $a = \pi(o)$, and in the stochastic case is a distribution, $\pi(a | o)$, over the observation space. We want to learn the policy that results in the largest return on evaluation, which in this case is defined as $\sum_{t=0}^n r_t$, but difficult in practice. The lecture introduces the concept of behavioral cloning, which addresses the question, given examples derived from an expert, can we learn a sufficient policy. Note that I don’t describe the policy as optimal, as the policy learned is only good as the input, in other words the so called expert in this case. The expert can be formulated in many ways, such as human control, MPC (Model Predictive Control), or even a simple logic based control algorithm. Its clear the term expert is subjective, and we can easily see how the resulting policy learned is really only good as the expert provided.

For this particular case, an expert policy is provided, and we are essentially left to solve a supervised learning problem, finding the function approximator that best fits the given data, where the data is defined by the expert, and the amount of samples generated for training is specified by the user. The problem simplifies to a matching game, where we need to find and match the function approximator used by the expert.

Experimentation Hopper-v1

Wanting to start with a relatively easy environment provided by OpenAI gym, based on the simulation engine Mujoco, I chose the Hopper-v1 environment which is composed of an observation space of 11 dimensions and actions space of 3 dimensions. A simple 2 layer NN was constructed, with the hidden layer composed of 64 units, ReLU as the activation function, with the rest of the relevant parameters defaulted to the default values specified in Keras. Using a batch size of 128, and training on 10 epochs, we can see that the neural net is able to learn the policy of the expert given enough samples, where samples is defined as the number of roll outs. The only pre-processing done was a normalization of the data set, using the mean, $\mu_{train}$ and standard deviation, $\sigma_{train}$ of the training data, with the same normalization procedure used on the test set, using $\mu_{train}$ and $\sigma_{train}$. Further, to handle the edge case of a zero deviation, small gaussian noise was added to any zero-valued $\sigma$. A quick exercise confirms the sensitivity to the amount of data, as we train on roll out samples of 5, 10, 20, 40, 200, and 400. Each roll out is of variable length but consists of multiple tuples of $(o,a)$. Provided enough data, the capacity of this neural network was sufficient to “clone” the policy implicit in the training data supplied by the expert.

Experimentation Humanoid-v1

Lets see if we can try and generate a fail case. We can test the capacity of this particular neural net, on a more complex environment. The Humanoid-v1 seems to be a good test, with the environment consisting of an observation space of 376 dimensions, and action space of 17 dimensions. The capacity of the simple neural net was sufficient, provided that enough samples of the expert policy was supplied.

Just as an exercise we can increase the capacity of the model, using a deeper neural network defined by 2 hidden layers with 512 units each, with some regularization via dropouts (30%) after each layer, results in relatively satisfactory results, again given enough data, but such capacity is clearly unnecessary when given samples from an expert policy in this particular situation.

Conclusion:

In summary, access to enough data samples from an expert policy, and given a model with large enough capacity, we can find a pretty good policy that defines the actions at a given state, provided with a certain observation. The problem is finding that expert in the real world and collecting enough data samples to train a high capacity model.

(All mistakes are mine. Please kindly contact me on any mistakes found.)

Note:

Hopper-v1 and Humanoid-v1 images credited to [2]
Experiments were based on a trial version of Mujoco [3]

References

Deep RL Assignment 1: Imitation Learning. http://rll.berkeley.edu/deeprlcourse/docs/hw1.pdf
OpenAI, Hopper-v1. https://gym.openai.com/envs/Hopper-v1
MuJoCo advanced physics simulation. https://www.roboti.us/index.html

ROS+RViz: Husky p-controller #2

Wed, 22 Mar 2017 00:00:00 +0000

The marker location is determined from the laser scan data which is in the reference frame of the base_laser. Technically, RViz will do the conversion for us, but as an exercise, was worth transforming the location in the base_laser frame to that of odom before publishing data for use by RViz.

The steps taken were as follows:

Include the necessary headers

Include relevant header files declaring the necessary message types.

...
#include <geometry_msgs/TransformStamped.h>
#include <geometry_msgs/PoseStamped.h>
#include <tf2_geometry_msgs/tf2_geometry_msgs.h>
...

Declare the buffer for storage

Declare the buffer, and listener in the header file. The buffer stores the transform related data, and has the public member function lookupTransform that we will be using to get the transform between two frames.

tf2_ros::Buffer tfBuffer_;
tf2_ros::TransformListner listener_ {tfBuffer_};

Declare relevant messages

Declare a geometry_msgs::TransformStamped message, transformStamped, and use the lookupTransform() function to get the transform. The point here is that the frames dont need the ‘/’ in front, and the first parameter is the frame that we want to transform to or said differently, the target frame. The full specification of the function can be found here.

geometry_msgs::Transformed transformStamped;
try {
    transformStamped = tfBuffer_.lookupTransform("odom", "base_laser", ros::Time(0));
} catch (tf2::TransformExcetion &ex) {
    ROS_WARN("%s", ex.what());
    ros::Duration(1.0).sleep();
}

“Stamp” the data

Finally we need to “stamp” the source point before passing to the transform() function. The api of geometry_msgs/PoseStamped can be found here.

geometry_msgs::PoseStamped pose_in;
geometry_msgs::PoseStamped pose_out;

pose_in.pose.position.x = min*cos(-theta);
pose_in.pose.position.y = min*sin(-theta);
pose_in.header.stamp = ros::Time(0);
pose_in.header.frame_id = "base_laser";

tfBuffer_.transform(pose_in, pose_out, "odom");

Now we have the location of the pillar originally in the /base_laser frame transformed to the /odom frame.

The full source code can be found here, under ros_pkg/git/husky_highlevel_controller/src/HuskyHighlevelController.cpp

Dont forget to make the necessary changes to the CMakefile and package.xml files.

ROS Wiki

Also, found small issues with the ROS Wiki thus made a quick edit. Basically the prior post had the specifications of the parameters to lookupTransform() described incorrectly.

References

ROS Wiki, tf2 Tutorials. Writing a tf2 listener.
Fankhauser, P., Dominic, J., and Wermelinger, Martin.(2017) Course 3, Programming for Robotics (ROS).

ROS+RViz: Husky p-controller #1

Tue, 14 Mar 2017 00:00:00 +0000

A simple proportional controller implemented in C++ using ROS and simulated in Gazebo with visualization done through Rviz…package available here.

This is a result of working through exercise 3, provided by the introductory course on ROS at ETZH. My personal opinion is that this is one of the most clear and concise introductions that I have come across, and also provides a guide post on best practices, which I surely will look to implement in my own work.

The p-controller, is a control that simply adjusts velocity in proportion to the distance to the target, in this case the pillar. Despite the controller working, I had to make some on-the-fly hacks.

The main factors that need to be resolved:

Distance to Pillar. Since the pillar is the only object in the world, problem clearly simplified for us (thanks!), the distance is simply the minimum range that the laser scan returns and is found in ranges[].
Orientation of Pillar from the perspective of the base_laser (see above figure) is alpha, and can be determined quickly, as in the discovery of the minimum distance, we also get the index, or number of angle_increments for free, and can use this in calculating the orientation.
The linear velocity in x and y direction, was set as to be the distance to the pillar, thus the velocity would decrease as the proximity to the Pillar increased. Further, to avoid a collision, the velocity was set to 0, if Husky was within 0.5m of the pillar. The linear velocity and angular velocity (which is was simply set to alpha), is pumped into a geometry_msgs::Twist message and published for listening by gazebo and Rviz.
In order to place the marker in the correct position (here is a tutorial on markers btw), we need the cartesian coordinates, which again can be quickly determined with some trivial geometry. The x, y coordinates are then set in visualization_msgs::Marker message and subsequently published using the appropriate publisher.

Thats really it…and you get a pretty cool looking simulation running. This is a snapshot of Husky running in RViz, with laser scan, marker, robot model, and tf all set accordingly.

Laplacian pyramids, application to blends #2

Thu, 09 Mar 2017 00:00:00 +0000

Laplacian pyramids, application to blends #1

In the previous post we covered the construction of the Gaussian Pyramid, followed by a brief overview of the procedure to construct the Laplacian Pyramid. In this post, we will relate the procedure to the application of blending two different surfaces, or images in the case of photography.

Blending procedure outline

A simple outline of the blending procedure, is as follows.

Construct the Laplacian Pyramid for each image.
Construct a Gaussian Pyramid for the mask.
Apply the respective mask with the appropriate dimensions and blend the two images, repeating this step for each layer.
Collapse the pyramid by expanding the layer with the smallest dimensions, to that of the next layer, and adding the two layers together. This procedure should be applied recursively, until the base is hit and all layers have been accumulated.

Mask generation

Once we have the Laplacian Pyramid, the only step remaining is to create an appropriate mask for blending the two images together. The mask will determine how natural the blend appears, as the mask will determine how much of each image to use per pixel location. The transformation is a simple weighted average, where $\alpha$ determines the weighting. \[\alpha(image A) + (1-\alpha)(image B)\]

In order to create a blend that appears natural, a mask that allows for a smooth transition is necessary, thus a continuous function along a particular axis is desired as opposed to a step function. The sigmoid function is a decent starting point, as the input domain is squashed into the range of $0$ to $1$, which is exactly what we need. That said, the sigmoid function in its raw form results in a mask that is too “steep” of a slope, which will result in a rather abrupt transition from image A to image B.

In order to mold the mask into something more applicable, we can add a “fatness” parameter to control the slope, and speed of the transition.

Just from the mask alone we can visually confirm the smooth transition from the left to the right side. Applying this kind of mask to blend each layer of the Laplacian results in rather favorable results. The masks can be quickly generated using the below functions written in python.

def sigmask(image):
    r,c = image.shape
    y = sp.special.expit(np.arange(-c//2,c//2))
    return np.tile(y,(r,1))

def fatmask(image, fatness):
    r,c = image.shape
    y = map(lambda x: 1/(1+np.exp((1/float(fatness))*-x)), np.arange(-c//2,c//2))
    return np.tile(y,(r,1))

Hand-eye mask template

Now to produce the hand eye blend, we need a mask with the shape of an ellipse, or some shape that matches the general characteristics of the shape of a human eye, in addition to maintaining the above properties that results in a smooth transition. The 2D Gaussian function is close to ideal for generating a mask suitable for this particular task, with a greater deviation along the x-axis resulting in an elliptical shape resembling that of an eye.

Once we have the mask template in hand, we can generate the Gaussian Pyramid and apply the blending operation at each layer of the Laplacian, followed by the collapse of the pyramid to generate the final image. Note the images were rescaled for visualization. Each layer has successively smaller dimensions than the dimensions of the prior layer.

The blended result for each layer.

Final blended product

The final collapsed image.

(All mistakes are mine, any corrections appreciated.)

References:

Burt, Peter J and Adelson, Edward H. A Multiresolution Spline with Application to Image Mosiacs
Burt, Peter J and Adelson, Edward H. The Laplacian Pyramid as a Compact Image Code

Actor-Critic and Policy Gradient Methods #6

Wed, 08 Mar 2017 00:00:00 +0000

Finally, after discussing the policy gradient method, and variance derived from a Monte-Carlo driven algorithm, we can trek into the space of Actor-Critic algorithms.

To quickly recap, the derivation of a policy gradient requires the calculation of an integral, which can be estimated via Monte-Carlo techniques, which transforms the integration problem to one of calculating a weighted average samples.[1] In the case of a RL agent, the samples equate to the episodes, thus the agent can obtain an estimate through acting in the environment. As we discussed previously, a small sample set can misrepresent the expected value, thus a large number of episodes are required to counter the variance.

To address the scope for high variance, the literature indicates that a general approach to reducing the variance associated with a Monte-Carlo approach is to introduce a control variate, a base-line or a parameterized function approximator that has zero mean. The Actor-Critic algorithm introduces a “critic”, which to frame intuitively is a judge that judges the actions of the actor. If the actor is the source of high variance, the critic supplies some sanity, and provides low-variance feedback on the quality of the performance.[2]

The introduction of a critic, not only reduces variance, but also maintains favorable convergence properties of the policy gradient methods, where convergence is obtained if the estimated gradients are unbiased and if the learning rates satisfy: \[\sum^{\infty}\alpha_{a,k} = \infty\] \[\sum^{\infty}{\alpha^2_{a,k}}<\infty\]

As suggested earlier, the introduced term has zero mean which guarantees that the expected value of the estimator is unchanged, and the “biasedness” is not impacted, while the same can not be said for the variance.

This is assuming the term is independent of the action space. We can do a quick work through of the math to show that the introduction of a control variate does not in fact affect the expected value.

Introduce the variate: \[\nabla_{\theta}J(\theta) = E_{\pi_{\theta}}[\nabla_{\theta}\log\pi_{\theta}(s,a)(Q^{\pi_{\theta}}(s,a)-B(s)]\]

By linearity we can break up the expected value: \[\nabla_{\theta}J(\theta) = E_{\pi_{\theta}}[\nabla_{\theta}\log\pi_{\theta}(s,a)(Q^{\pi_{\theta}}(s,a)]-E_{\pi_{\theta}}[\nabla_{\theta}\log\pi_{\theta}(s,a)B(s)]\]

Focus on the operand on the right side: \[E_{\pi_{\theta}}[\nabla_{\theta}\log\pi_{\theta}(s,a)B(s)]\]

Using the ratio trick, in reverse: \[\sum_Sd^{\pi_{\theta}}(s)\sum_A\nabla_{\theta}\pi_{\theta}(s,a)B(s)\]

Pull out $B(s)$ as independent of actions and $\nabla$ under linearity: \[\sum_Sd^{\pi_{\theta}}(s)B(s)\nabla_{\theta}\sum_A\pi_{\theta}(s,a)\]

We know that that PMFs satisfy $\sum_x p(x) = 1$ and the derivative of a constant is 0: \[\sum_Sd^{\pi_{\theta}}(s)B(s)0 = 0\]

Thus we derive that the expected value is unchanged: \[\nabla_{\theta}J(\theta) = E_{\pi_{\theta}}[\nabla_{\theta}\log\pi_{\theta}(s,a)(Q^{\pi_{\theta}}(s,a)-0]\]

In summary, an algorithm that uses the policy gradient method enjoys convergence if certain conditions are satisfied with regards to bias and the learning rate, though remains exposed to high variance. This translates directly to high sample complexity, a large number of episodes needed to derive a satisfiable estimate using Monte Carlo methods. Variance can be reduced using a zero mean term, Critic, which supplies low variance knowledge, without impacting the expected value, and thus is capable of maintaining convergence properties inherited from the policy gradient method. In the next post, will cover different critics followed by current state of the art algorithms.

References

Grondman et.al. A Survey of Actor-Critic Reinforcement Learning: Standard and Natural Policy Gradients
E. Greensmith, P. L. Bartlett, and J. Baxter. Variance Reduction Techniques for Gradient Estimates in Reinforcement Learning

ROS+Gazebo: Husky teleop in Robocup World

Mon, 06 Mar 2017 00:00:00 +0000

A quick post to show how to get a teleop capable husky simulation working in a robocup environment. This was presented as an exercise in the lecture material as part of the Programming for Robotics – ROS series, which is a great set of introductory material to ROS using C++. (Highly recommended)

The specific exercise asks to get the husky simulation working with teleop capabilities in the robocup14_spl_field.world.

The exercise is straight forward, with any road blocks easily resolved by searching ROS Answers.

To run, construct the below launch file first, and place the file in the launch folder of the teleop_twist_keyboard package.

<?xml version="1.0"?>
<launch>
   <arg name="world" default="robocup14_spl_field"/>
   <include file="$(find husky_gazebo)/launch/husky_empty_world.launch">
        <arg name="world_name" value="/usr/share/gazebo-2.2/worlds/$(arg world).world"/>
   </include>
   <node name="teleop" pkg="teleop_twist_keyboard" type="teleop_twist_keyboard.py" output="screen"/>
</launch>

A few points to consider:

The husky_empty_world.launch file is found in the husky_gazebo package thus need to specify find husky_gazebo or specify the path explicitly.
Despite indicating this was a C++ based series, the teleop_twist_keyboard is a python script thus need to set type as teleop_twist_keyboard.py.
After launching, if you encounter the below output, follow these instructions to resolve. Gazebo should launch rather quickly.

Warning [gazebo.cc:215] Waited 1seconds for namespaces.
Warning [gazebo.cc:215] Waited 1seconds for namespaces.
Error [gazebo.cc:220] Waited 11 seconds for namespaces. Giving up.
Error [Node.cc:90] No namespace found
Error [Node.cc:90] No namespace found

If successful you should see Gazebo launch with the Husky robot centered in the Robocup world. After teleoping a bit, you too can send the Husky into the goal…

Actor-Critic and Policy Gradient Methods #5

Thu, 02 Mar 2017 00:00:00 +0000

Having developed a decent understanding of the policy gradient method in the derivation of the gradient of the objective function, $\nabla J(\theta)$, we can use the gradient ascent algorithm to optimize an objective function, parameterized by $\theta$, with the $J(\theta)$ representing the quality of a policy.

Before we step ahead to the Actor-Critic algorithm, best to introduce the REINFORCE algorithm, which only has an actor, and uses PGM to optimize the objective function. The lack of the critic, which has variance reducing properties, exposes the algorithm to high variance. The psuedo-code is taken from D.Silvers lectures on Policy Gradient Methods.

Lets consider the Policy Gradient Theorem.

\[\nabla_{\theta}J(\theta) = E_{\pi_{\theta}}[\nabla_{\theta}\log\pi_{\theta}(s,a)Q^{\pi_{\theta}}(s,a)]\]

$Q^{\pi_{\theta}}(s,a)$, is the action-value function, or the expected return of discounted rewards, after taking action,$a$, and following policy $\pi$, there on after. This value can be swapped for some other estimates, and does seem like the path of research has been to experiment with different estimations. In the case of REINFORCE, the simple episodic case, where no Critic, or base-line, is considered the total reward at the end of an episode is used, and represents an unbiased estimate of $Q^{\pi_{\theta}}(s,a)$ where an episode is a sequence such that $ {s_1,a_1,r_2,s_2,a_2,r_3…s_{t-1},a_{t-1},r_T} $. This has two implications, one is that training is done off-policy and the second implication that high variance can not be avoided, at least without the introduction of a base-line. Will talk about critics, base-lines, and its role in variance reduction and convergence in the following post.

To understand the concept of variance, we can consider the overly simplistic exercise. Lets assume that the true expected value of a given policy, $E_{\pi_{\theta}}[.]$, is $10$, and consider episodes of length $10$. As $n \rightarrow \infty$, where $n$ is the number episodes, the sampled expected value should converge to the true expected value. Lets consider the case where we have $1$ sample, $ {1,1,…10} $, and for illustration purposes assume no discounting. The sampled average over $1$ episode, is $19$, where $19$ is equal to $\sum_t r_t$ We can see that this greatly differs from the true expected value, and thus impacts the speed at which the algorithm converges.

Laplacian pyramids, application to blends #1

Tue, 28 Feb 2017 00:00:00 +0000

Update: 2020/Feb/3rd Hi thanks for stopping by. Why not checkout https://www.korabo.io. The app allows you to set percentage share with collaborators and generate different types of entry points to a checkout like the shield below.

+++++++

Working through some classic CV algorithms to refresh the memory, I am constantly reminded of the ingenuity of the algorithms and the researchers behind the work. In this memory refresh (post), I plan to work through the algorithm related to Laplacian Pyramids and its application to image composition, or equivalently blending stated differently.

The objective of the procedure is to produce a seamless image, given two images as inputs, or as Burt and Adelson stated,

How can the two surfaces be gently distorted so that they can be joined together with a smooth seam?

The location of the blend is variable and is case dependent. The mask, controls the location and degree of the blend, and the set of possible masks that one can contrive is large.

Taking two images A(left), and B(right), and a mask, M(center), presented above, we can blend the images utilizing Laplacian pyramids to construct a blend that results in the following image. There is scope for automation, but for this exercise, the location of eye was manually engineered to “work”.

In the following section I will introduce the procedure that underlies the Laplacian Pyramid, and discuss the relevance and importance of engineering an appropriate mask.

Constructing the Laplacian Pyramid

The details of the algorithm can be found in the publications by Burt and Adelson [1,2], thus I will skimp on the details, and recommend the reader to work through the paper. A top level overview of the algorithm is as follows:

Construct the gaussian pyramids.
Construct the laplacian pyramids.
Create the blended pyramids.
Collapse the blended pyramids to reconstruct the original image exactly.

To start, we need to determine the number of layers of the pyramid, which can be done given the dimensions of the original image, and kernel, and satisfying the following equations. \[C = M_c2^{N}+1\] \[R = M_r2^N + 1\] where C is the columns size, R is the row size, $M_c$ and $M_r$ are equivalent to $\frac{1}{2}$ (kernel size - 1), and N being the number of layers. Thus, if we have the dimensions of the image, and kernel, we can calculate the number of layers: \[N = floor(\log_2(\frac{R-1}{M_r})\]

Once we have determined the number layers we can construct the gaussian pyramid, using the $reduce()$ function to convolve and then downsample over the given number of layers. The open source OpenCV library[3] has the pyrDown() function which uses a $5\times5$ kernel shown below, to convolve, and downsamples by rejecting the even number rows and columns at each successive layer.

\[\frac{1}{256}\begin{bmatrix} 1 & 4 & 6 & 4 & 1 \\ 4 & 16 & 24 & 16 & 4 \\ 6 & 24 & 36 & 24 & 6 \\ 4 & 16 & 24 & 16 & 4 \\ 1 & 4 & 6 & 4 & 1 \\ \end{bmatrix}\]

The algorithm itself is rather trivial to implement, using the filter2D() function found in the OpenCV library, which handles the padding and convolution step.

A personal rendition of the reduce function:

def mypyrDown(image, kernel):
    image = image.astype(dtype=np.float)
    dst = cv2.filter2D(image, -1, kernel, borderType=cv2.BORDER_REFLECT)
    return dst[::2, ::2].astype(dtype=np.float64)

This should leave us with N - 1 downsampled images, with the original image used as the base of the “pyramid”. The effect of the reduce operation is in effect applying a low-pass filter. The convolve step is an averaging operation, which takes pixels in the neighborhood and applies a linear transformation. This basically has the effect of dispersing the local per pixel information. The downsample, reduces redundancy and correlation of the pixels. This works on the idea that neighbor pixels are correlated. The following image is the gaussian pyramid, with each layer scaled to match original. We can see clearly how the fine features are removed and we are left with a rather course representation in the final layer.

Construction of the Laplacian Pyramid involves upsampling, padding then convolving with the same kernel used in the downsample. pyrUp() can be used for this. The construction itself is recursively applying pyrUp() to a layer in the Gaussian Pyramid and subtracting from the next layer. The procedure starts with the smallest layer. At each step, the procedure effectively is isolating the error between two images, the original image and the averaged image, with the error interpreted as the high frequency information lost in the convolution and downsample step. Again, the implementation is relatively trivial, and really highlights the ingenuity of the designers. Simple yet powerful.

#The code was written for clarity over efficiency
def mypyrUp(image, kernel):
    r,c = image.shape
    uns = np.zeros((r*2,c*2))
    uns[::2,::2] = image
    dst = cv2.filter2D(uns, -1, kernel, borderType=cv2.BORDER_REFLECT)
    return dst.astype(dtype=np.float64)*4

def mygaussPyramid(image, levels):
    lst = [image.astype(dtype=np.float64)]
    for _ in range(levels):
        image = reduce_layer(image)
        lst.append(image)
    return lst

def mylaplacianPyramid(gaussPyr)
    retlst = []
    layers = len(gaussPyr)
    count = 0
    for i in range(layers):
        if i < layers-1:
            r,c = gaussPyr[i].shape
            retlst.append(gaussPyr[i] - expand_layer(gaussPyr[i+1])[:r,:c])
    retlst.append(gaussPyr[-1])
    return retlst

Just to expand on the concept of the effect of a low pass filter, we can consider the unsharp masking method used to sharpen images. \[g_{sharp} = f + \gamma(f-h_{blur}\star f)\] The process extracts the difference, error, between an image, and the average image, and adds back some proportion of this difference. The idea is that the error represents areas of high intensity, and accentuates the given area by doubling(not literally) down. The following images displays the results of the unsharp mask operation. [Original, Blurred, Sharpened]

See the associated code for completeness.

gbeach = cv2.GaussianBlur(beach, (5,5), 1)
sharp =cv2.addWeighted(beach, 1.5, gbeach, -0.5, 0, beach)

Will cover blending and collapsing in the next post, followed by a brief discussion on masks…

(All mistakes are mine, any corrections appreciated.)

References:

Burt, Peter J and Adelson, Edward H. A Multiresolution Spline with Application to Image Mosiacs
Burt, Peter J and Adelson, Edward H. The Laplacian Pyramid as a Compact Image Code

Overflow in Image Processing

Tue, 14 Feb 2017 00:00:00 +0000

Something that I found simple yet profound is the idea of overflow encountered in image processing which really is a by-product of how image pixels are represented in computation.

Typically the intensity of a pixel is represented by 8 bits, or 1 byte which has the capacity to represent $256$ values, $2^8$, ranging from $0$ to $255$.

One example of an issue that arises as a result is apparent when implementing an operation as simple as averaging.

For the purpose of discussion, lets say we have a gray scale image represented by a $2\times2$ matrix as such:

\[\begin{bmatrix} 240 & 240 \\ 10 & 120 \end{bmatrix}\]

The average function applied to two identical images should obviously result in mapping to the same matrix, and is equivalent to applying the identity matrix.

>>> import numpy as np 
>>> image = np.array([[240,240],[10,120]])  
>>> (image + image) / 2  
array([[240, 240],
       [ 10, 120]])

This is not particular surprising and one has to wonder what the point of this post is at this point, which is fine.

First, the above matrix calculation used values represented by the type int64, which has 64 bits to switch and the capacity to represent a large quantity of values, specifically $-2^{63}$ to $2^{63}-1$. A bit of an overkill for this exercise, but serves the purpose.

For images represented by 8 bits, or specifically uint8 in numpy, the problem of overflow becomes apparent very quickly. Working through the same exercise as above, but constraining the capacity to one associated with uint8 we discover that the calculations fail to produce the correct results.

>>> image = np.array([[240,240],[10,120]]).astype('uint8') 
>>> (image + image) / 2  
array([[112, 112],
       [ 10, 120]], dtype=uint8)

What happened? If we break down the calculations, the issue becomes clear. First the averaging function adds two numbers of type unsigned integer represented by 8 bits. $240 + 240$ would equal $480$, but because of the capacity constraints, the operation is equivalent to applying a modulo 255 after the addition operator, $(240 + 240) % 255 $. The result wraps around and outputs 225, then dividing this by 2 results in 112 after rounding. A completely expected but undesired result. This can be easily resolved by increasing the capacity of the representation, by converting from uint8 to int64, for example.

Why does this matter? Simply, unattended the results will be visually apparent, in addition to the intensity of the pixels being artificially undervalued.

A visualization exercise will make the result more clear.

We start by introducing the original, as a disclaimer this a private picture taken of a original Heather Brown painting that I purchased a while back.

To make the difference more apparent, we use two identical image and take the average, which should result in the same image, and equivalent to applying the identity matrix. Its clear even visually that the result is what we want.

The image below is applying the same operation but this time using unsigned integer with 8 bit representation. The result is clear if we compare with the desired output above, and visually can confirm the under valuation of the intensity of the pixel values.

This can easily be averted if we pay attention to the transitive operations, and remain sensitive to the types involved.

A very simple concept, easily overlooked, but a basic concept worth knowing.

Actor-Critic and Policy Gradient Methods #4

Tue, 31 Jan 2017 00:00:00 +0000

In this post, we will break down the ratio trick that helped us in the previous post to digest the policy gradient theorem, using the steps outlined by D. Meyer[1].

First-off lets recall the identity $\nabla_{\theta}\log(w) = \frac{1}{w}\nabla_{\theta}w$. Lets remember this as the identity will come in use later.

Start with a function: \[y=\pi(s,a;\theta)\] Take the log of both sides and link to new variable $z$: \[z=\log y=\log \pi(s,a;\theta)\] Take the derivative, recalling the chain rule definition: \[\frac{\partial z}{\partial\theta}=\frac{\partial z}{\partial y}\frac{\partial y}{\partial \theta}\] where \[\frac{\partial z}{\partial y}=\frac{1}{\pi(s,a;\theta)}\] \[\frac{\partial y}{\partial \theta}=\frac{\partial \pi(s,a;\theta)}{\partial \theta}=\nabla_{\theta}\pi(s,a;\theta)\] thus \[\frac{\partial z}{\partial\theta}=\frac{\nabla_{\theta}\pi(s,a;\theta)}{\pi(s,a;\theta)}\] and using the identity we introduced earlier, we arrive upon: \[\frac{\partial z}{\partial\theta}=\frac{\nabla_{\theta}\pi(s,a;\theta)}{\pi(s,a;\theta)}= \nabla_{\theta}\log\pi_{\theta}(s,a)\]

Finally: \[\nabla_{\theta}\pi(s,a;\theta)=\pi(s,a;\theta)\nabla_{\theta}\log\pi_{\theta}(s,a)\]

which is the final result that we were after, and the “trick” that helps make the policy gradient method palatable to a certain extent.

At last we can move on to looking at some algorithms…

(All mistakes are mine, corrections appreciated.)

References:

Meyer, David. Notes on policy gradients and the log derivative trick for reinforcement learning

Actor-Critic and Policy Gradient Methods #3

Sun, 29 Jan 2017 00:00:00 +0000

Ok, so now that we got the objective function defined, $J(\theta)$, which to remind ourselves is a function that is evaluating the quality of the policy, where higher is better. We also have an update rule that is basically hill “climbing” as we attempt an ascend to an optima, preferably a global but most likely will be local; $\theta’ = \theta + \alpha\nabla_{\theta}J(\theta)$.

Now lets get a grip on the $\nabla_{\theta}J(\theta)$. We talked about gradients in a separate series “Gradient Descent with a dash of Linear Algebra”, and understand that the gradient is a column vector of partial derivatives with respect to each parameter that parameterizes the policy, $\pi(a|s, \theta)$.

The policy gradient theorem states: for any differentiable policy, for any policy objective function $J(\theta)$, the policy gradient is given by:

\[\nabla_{\theta}J(\theta)=\mathbf{E}_{\pi_{\theta}}[\nabla_{\theta}\log\pi_{\theta}(s,a)Q^{\pi_{\theta}}(s,a)]\]

The question is how do we get from $\nabla_{\theta}J(\theta)$ to $E_{\pi_{\theta}}[\nabla_{\theta}\log\pi_{\theta}(s,a)Q^{\pi_{\theta}}(s,a)]$? Will work backwards step by step, starting from the end result stated above.

First, rewrite the expected value over policies, $E_{\pi_{\theta}}[..]$ as $\sum_{s}d(s)\sum_{a}\pi_{\theta}(s,a)$ to get: \[\nabla_{\theta}J(\theta)=\sum_{s}d(s)\sum_{a}\pi_{\theta}(s,a)\nabla_{\theta}\log\pi_{\theta}(s,a)Q^{\pi_{\theta}}(s,a)\] The $Q^{\pi_{\theta}}(s,a)$ is the state-action value function, and an estimate of the return obtained from taking action, $a$, from state $s$, and following policy, $\pi$, there after. We’ll leave this as is.

Next use the $\textbf{so-called ratio trick}$, $\nabla_{\theta}\pi_{\theta}(s,a) = \pi_{\theta}(s,a)\frac{\nabla_{\theta}\pi_{\theta}(s,a)}{\pi_{\theta}(s,a)}=\pi_{\theta}(s,a)\nabla\log\pi_{\theta}(s,a)$ , which we will break down in the next post, to unwind further. This leaves us with: \[\nabla_{\theta}J(\theta)=\sum_{s}d(s)\sum_{a}\nabla_{\theta}\pi_{\theta}(s,a)Q^{\pi_{\theta}}(s,a)\]

As a result of linearity, we can push out the $\nabla$ outside the summations:\[\nabla_{\theta}J(\theta)=\nabla_{\theta}\sum_{s}d(s)\sum_{a}\pi_{\theta}(s,a)Q^{\pi_{\theta}}(s,a)\]

We can re-introduce the expectation over policies for clarity.

\[\nabla_{\theta}J(\theta)=\nabla_{\theta}\mathbf{E}_{\pi_{\theta}}[Q^{\pi_{\theta}}(s,a)]\]

What does this mean?! I guess it makes some intuitive sense in that we are trying to find the direction, given by the gradient, in which the expected value of the Q function for a given policy is higher. The reason that researchers want to work away from this formulation is that the gradient of the expected value could be relatively difficult to compute.

Now we need to just get a handle on this ratio trick that is used often when talking of policy gradient methods.

(All mistakes are mine, corrections appreciated.)

References:

Silver, David. Lecture 7: Policy Gradient
Meyer, David. Notes on policy gradients and the log derivative trick for reinforcement learning
Mohamed, Shakir. Machine Learning Trick of the Day (5): Log Derivative Trick

Actor-Critic and Policy Gradient Methods #2

Tue, 24 Jan 2017 00:00:00 +0000

This is a continuation of a series of posts to assess the actor-critic policy gradient algorithm. In the previous post we left off by introducing the gradient ascent update \[\Delta\theta=\alpha\nabla_{\theta}J(\theta)\], and left a couple questions unanswered. Specifically, what is the $J(\theta)$ function, and how do we calculate the value associated with the gradient of $J(\theta)$.

tl;dr: $J(\theta)$ is the objective function and represents the quality of a given policy. There are three variations typically introduced but the policy gradient method is uniformly applicable.

First, in terms of $J(\theta)$, which is evaluating the quality or “goodness” of a particular policy, $\pi(a|s,\theta)$, is typically presented in three forms, one for the episodic case, continuous case, and a case where the average reward per time step is considered.

For the episodic case, the start value can be used, thus

\[J_1(\theta) = V^{\pi_{\theta}}(s_1) = \mathbf{E}_{\pi_{\theta}}[v_1]\]

In English, assuming my understanding is correct, what this is indicating is that the policy objective function $J(\theta)$ is defined by the expected return, cumulative discounted rewards, starting from $s_1$ for a given policy.

For the continuous case, the average value is typically considered, where the objective function is defined as:\[J_{aaV}(\theta)=\sum_sd^{\pi_{\theta}}(s)V^{\pi_{\theta}}(s)\]

$d^{\pi_{\theta}}$, is the the probability of being in state, $s$, under policy, $\pi_{\theta}$. Thus $J_{aaV}(\theta)$ is just the probability of being in state, $s$, multiplied by the expected return from state, $s$, given a particular policy, $\pi$. Basically as the name indicates, its the $\textit{average}$ expected returns over all possible states, with the probability of each state being drawn from a stationary distribution, $d^{\pi_{\theta}}(s)$.

Finally, the average reward per time step is defined as: \[J_{avR}(\theta) = \sum_{s}d^{\pi_{\theta}}(s)\sum_{a}\pi_{\theta}(s,a)\mathbf{R}_s^a\]

Breaking down the symbols, starting from the right side, we are taking the rewards obtained for taking a particular action, discounted by the probability of that action given a policy, and this is evaluated and summed over all actions, and finally evaluated and summed for all possible states.

As defining the objective functions took a bit longer than expected, I will continue the interpretation in the next post.

Actor-Critic and Policy Gradient Methods #1

Sun, 22 Jan 2017 00:00:00 +0000

In the following posts, I will attempt to breakdown the Actor-Critic Policy Gradient Algorithm.

Research often highlights the practicality of the policy gradient method in evaluating models with high dimensions, and continuous action spaces, as opposed to discrete and limited action space where some iteration of Q-learning, or more generally a function approximation has been applied. In contrast to value function approximation where convergence is not guaranteed, albeit in practice displaying significant results (DQN), the variants of the policy gradient method offers guarantees on convergence to at least a local optima. The downsides are, likewise the risk of converging to a local optima, inefficiencies (sample complexity high), and the fact that stand alone, the associated variance is high as well.

In terms of progression, in the coming posts I will:

Break down the policy gradient method, and introduce the associated algorithm.
Attempt to address the so-called “likelihood ratio” trick that is often time just rushed through in the material, which will require a detour in understanding the chain rule, and the likelihood estimator.
Touch on the topic of variance, what this actually means, and a comparison to an algorithm that exhibits high bias.
Wrap back around to the Actor-Critic algorithm which introduces the base-line to address the issue of high variance that the policy gradient method exhibits on a stand alone basis.

So why go through this, and take notes? The current so-called state of the art in Deep Reinforcement Learning is a variant of the Actor-Critic architecture and has been given the name of Asynchronous Advantage Actor-Critic, or A3C, and presented here by Mnih et al.

So lets begin… The policy gradient method essentially is using the gradient approach in the optimization problem to find the $\theta$ that maximizes $J(\theta)$. If your background has been in dealing with neural nets or some optimization problem where the objective is to minimize the cost function, you may be used to seeing the gradient $\textit{descent}$ update used. In this case we are trying to $\textbf{maximize}$ the value of the policy as a goal, thus need to ascend or increase which leads us to the (approximate) gradient ascent update rule. \[\theta’ = \theta + \alpha\frac{\partial J(\theta)}{\partial \theta}\] or equivalently: \[\Delta\theta=\alpha\nabla_{\theta}J(\theta)\]

For clarity, we are trying to find the best $\theta$, that maximizes the “quality” of $\pi(a |s;\theta)$.

A couple questions should be considered now. First, what is $J(\theta)$, the quality of the policy, and secondly, how can we obtain the gradient of this particular function?

Will answer in the following post…

Rust lifetime #3

Fri, 20 Jan 2017 00:00:00 +0000

Following on in my dive into the concept of ownership in Rust, will assess the idea of lifetimes. I will follow the documentation at rust-lang.org. The tutorial breaks down the concept in three chunks:

Ownership (Covered in a previous post)
References and Borrowing (Covered in a previous post)
Lifetimes

In this post will summarize my general understanding of lifetimes.

tl;dr: the concept of lifetimes seems quiet complex, but makes “lifetimes” of values transparent and promotes safer programming.

Point #1: “use after free” prevention

The documentation provides a rather intuitive break down of an example of what the concept of lifetimes is attempting to prevent.

I acquires a handle to some kind of resource.
I lend a reference to the resource to you.
I free the resource, as I am done with the resource, and decide to deallocate.
You decide to use the resource.

This would cause an error. The reference you have been lent is now pointing at an invalid resource. The ‘use after free’ concept should be familiar and applies to this scenario.

let you;                    //introduce you

{
    let me = "resource";    //introduce scope value, me
    you = &me;              //me lends the reference to "resource" to you
}

println!(you);              //you is still referencing resource of me

The above would fail, as a you is referecing a resource that has gone out of scope. The rust compiler is able to catch this as a result of having visibility on the “lifetimes” of different values.

Point #2: syntax

Some syntax to keep in mind. Rust allows functions to have generic parameters placed between <>:

fn foo<'a>(...)             //foo has one lifetime, 'a
fn bar<'a, b'>(...)         //bar has two lifetimes, in the case we had two reference
                            //parameters with different lifetimes. 

fn foo<'a>(x: &'a i32)      //a reference to an i32, with a lifetime 'a
fn foo<'a>(x: &'a mut i32)  //a mutable reference to an i32, with a lifetime 'a


struct Foo<'a> {            //when dealing with structs, need to ensure
    x:  &'a i32,            //that any reference to Foo doesnt outlive a
}                           //reference to the i32 the struct it contains

Point #3: ‘static lifetime

Static lifetime is a special type of lifetime that has a lifetime over the entire program.

static FOO: i32 = 5;            //adds an i32 to the data segment of the binary.
let x: &'static i32 = &FOO;     //x refers to the i32

LSTM cell break down

Mon, 16 Jan 2017 00:00:00 +0000

I came across a video by Martin Gorner, which can viewed here. The presentation provides a top level, yet self contained overview of deep learning and tensorflow.

I found his presentation of the LSTM cell informative, in particular, the way he chose to present the concept had impact on my learning process.

The specific segment on the LSTM cell contained a walk through of the components of the cell, stepping through each gate.

Concatenate: $X= X_t | H_{t-1}$
Forget gate: $f = \sigma(XW_f + b_f)$
Update gate: $u = \sigma(XW_u + b_u)$
Result gate: $r = \sigma(XW_r + b_r)$
Input: $X’ = \tanh(XW_c + b_c)$
new C: $C_t = f * C_{t-1}+u*X’$
new H: $H_t = r * tanh(C_t) $
Output: $Y_t = softmax(H_tW +b)$

The concatenate step takes $X_t$ and concatenates with $H_{t-1}$ resulting in a vector with dimensions $p+n$. As the other gates are working in dimension $n$, we need a transformation that maps $\mathbf{R}^{p+n} \rightarrow \mathbf{R}^{n}$, thus need to apply $W_i$ where $i\in {f,u,r,c}$.
$C_t$ is the result of applying the forget gate, $f$, which regulates what is forgotten and what remains. The result of applying $f$, element wise, is summed with the update gate, $u$, applied to $X’$ elementwise. The internal state of the cell is updated in this manner.
Note that the sigmoid non-linearity, $\sigma$, applied in the forget, update, and result gate, squashes the result between 0 and 1, thus acting as a discount factor.
Another point raised, is the reason we apply the $\tanh$, non-linear function. Note that $C_t$ is the sum of two positive operands, thus easily can result in divergence. By applying $\tanh$ the result is squeezed between -1 and 1, thus helping to mitigate the risk of divergence.

Keep in mind that this is one possible structure for an LSTM cell…the possibilities of combinations are limitless though, in contrast, the differentiation on performance as a result of cell structure appears limited.

Gradient descent with a dash of Linear Algebra #4

Mon, 16 Jan 2017 00:00:00 +0000

This is #4 of a series where I am trying to digest the presentation, by Ian Goodfellow.

In the last post, we assessed the impact of a gradient step on the cost function, $J(\theta)$. Now we consider the types of destinations that the method will encounter as the gradient steps progress.

In the following slides he talks on critical points. From single variate calculus, we know that if a function, $f$, is differentiable and $x$ is an optima then $f’(x) = 0$, and $x$ is a critical point associated with the function, $f$.

Since the gradient is the first derivative, and a zero gradient implies a critical point, with the Hessian matrix, $\textbf{H}$, in hand we can classify the optima as a minima, maxima, or saddle point.

If all the eigenvalues, $\lambda_i > 0$, then the critical point is a minima, which he short hands as $\lambda_{min} > 0$. If all eigenvalues, $\lambda_i < 0$, then critical point is a maxima, while if mixed then classified as a saddle point.

Is there is a way to back out the same information without calculating eigenvalues?

If we take advantage of a couple properties we can.

$\det(H) = \lambda_1\lambda_2…\lambda_n$
$\text{tr}(H) = \lambda_1 + \lambda_2 +…+\lambda_n$

where the $\det(H)$ can be found without calculating eigenvalues and the tr$(H) = \sum_{i=1}^nh_{ii}$ where $h_{ii}$ represents the components of $\textbf{H}$ along the diagonal.

Thus we can use the commonly defined rules as follows:

if $\det(H) > 0 \wedge \text{tr}(H) > 0 \rightarrow \text{Minima}$
if $\det(H) > 0 \wedge \text{tr}(H) < 0 \rightarrow \text{Maxima}$
else $\rightarrow \text{Saddlepoint}$

and we are done…for now.

Rust borrowing #2

Sat, 14 Jan 2017 00:00:00 +0000

Following on in my dive into the concept of ownership in Rust, will assess the idea of references and borrowing. I will follow the documentation at rust-lang.org. The tutorial breaks down the concept in three chunks:

Ownership (Covered in the previous post)
References and Borrowing
Lifetimes

In this post will summarize my general understanding of borrowing.

tl;dr: The concept of borrowing, prevents iterator invalidation, and use after free, helping to create a safer environment. If you want to mutate a passed variable, need to utilize a mutable reference.

Point #1:

Essentially, under references and borrowing, a resource is lent by the owner and “borrowed” by the function.

For example, consider the below as example given:

fn foo(v1: &Vec<i32>, v2: &Vec<i32>) -> i32 {
    
    42
}

let v1 = vec![1, 2, 3];     //v1 has ownership
let v2 = vec![1, 3, 3];     //v2 has ownership

let answer = foo(&v1, &v2); //ownership is lent to foo

//foo consumes and returns the borrowed resource.

println!("{}",v1[0]);       //This is ok! as owenership has been returned back to v1

Point #2

References are immutable. Thus a variable passed by reference, lets say v1 in the above case, can not be amended within foo. Will give an error.

fn foo(v: &Vec<i32>) {
    v.push(5);      //illegal operation, v is borrowed, cant mutate.
}

let v = vec![];     //v has ownership

foo(&v);            //foo borrows

Point #3

If we want to mutate a borrowed variable, need to use a “mutable reference”. The example given is as follows:

let mut x = 5;      //x has ownership and x is mutable.
{
    let y = &mut x; //y borrows a mutable reference.
    *y += 1;        //y can now mutate x, but needs to use '*' operator
}
println!("{}", x);  //x returns 6

The rules defined in the documentation:

First, any borow must last for a scope no greater than that of the owner. Second, you may have one or the other of these two kinds of borrows, but not both at the same time. 1) one or more references (&T) to a resource, 2) exactly one mutable reference (&mut T).

Rust ownership #1

Thu, 12 Jan 2017 00:00:00 +0000

I will follow the documentation at rust-lang.org to get a better grip on ownership. The tutorial breaks down the concept in three chunks:

Ownership
References and Borrowing
Lifetimes

In this post will summarize my general understanding of ownership.

tl;dr: Keep in mind the ownership concept when reassigning variables with existing bindings. If not a primitive, an error will occur at compile time.

Point #1:

Basically that variable bindings have ownership of what the variable is bound to. The example given is that:

fn foo() {
    let v = vec![1,2,3];
}

Ok, to have “ownership” means that v basically owns vec![1,2,3]. Thus if we decide set a new variable, y = v, then v no longer has ownership, and will result in a compile time error. This addresses the idea that Rust gaurantees that bindings to resources are unique.

This is not allowed:

fn foo() {
    let v = vec![1,2,3];
    y = v;
    println!("{}", v[0]);
}

The phrase used is “move”, as in the error will be related to the attempt to accessing, the use of the “moved” value v.

Point #2:

The trait, Copy, is introduced, where a trait is feature that adds incremental behavior.

Basic concept is that if its a primitive type that is bound, then the Copy trait comes in to play and we can reassign ownership and still address the original the original variable.

fn foo() {
    let v = 5;
    y = v;
    println!("{}", v);
}

Ok, this is fine now and will compile as 5 is a primitive. Essentially the Copy trait invokes a deep copy, and y does not reference the object pointed to by v.

Gradient descent with a dash of Linear Algebra #3

Thu, 12 Jan 2017 00:00:00 +0000

This is #3 of a series where I am trying to digest the presentation, by Ian Goodfellow.

In the last post, we broke down and assessed the Hessian matrix, $H$. We continue on in this post by analyzing the impact of a gradient step on the cost function and finding the optimal gradient step with respect to cost reduction.

This is followed by the introduction of the Taylor series approximation, and the expansion of the cost function, $J(\theta)$. The taylor series is typically defined as \[\sum_{n=0}^{\inf}\frac{f^{(n)}(a)}{n!}(x-a)^n\] and in the expansion would look like: \[f(x)= f(a) + \frac{f’(a)}{(x-a)} + \frac{f’‘(a)}{2!}(x-a)^2…\]

When applied to the cost function case, keeping in mind that $(\vec{\theta}-\vec{\theta)_0}\cdot(\vec{\theta}-\vec{\theta}_0)=(\vec{\theta}-\vec{\theta_0})^T(\vec{\theta}-\vec{\theta}_0)$. Note, that I use the vector symbol, to make it clear that we are dealing with vectors. This is applicable to all $\theta$ when referring generally to parameters.

\[J(\theta) = J(\theta_0) + (\theta-\theta_0)^T\textbf{g}+\frac{1}{2}(\theta-\theta_0)^T\textbf{H}(\theta-\theta_0)+…\]

Specifically, Ian approximates the cost function, $J(\theta)$ out to the 2nd order in order to assess the sensitivity of the cost to a gradient step, resulting in: \[J(\theta-\epsilon \textbf{g}) \approx J(\theta) -\epsilon \textbf{g}^T\textbf{g} + \frac{1}{2}\epsilon^2\textbf{g}^T\textbf{H}\textbf{g}\]

For easier comprehension, we can move the minus sign out: \[J(\theta-\epsilon \textbf{g}) \approx J(\theta) -(\epsilon \textbf{g}^T\textbf{g} - \frac{1}{2}\epsilon^2\textbf{g}^T\textbf{H}\textbf{g})\]

and break down the equation as: \[\text{new} J(\theta) \approx \text{old} J(\theta) - \text{adjustment to cost as a result of gradient step}, \ \epsilon\]

Now if $\textbf{g}^T\textbf{H}\textbf{g} \leq 0$, its guaranteed that the term $(\epsilon \textbf{g}^T\textbf{g} - \frac{1}{2}\epsilon^2\textbf{g}^T\textbf{H}\textbf{g})$ is positive, thus we are reducing the old $J(\theta)$ by some amount.

So the question is what is the optimal step size, or the $\epsilon$ that results in the largest cost reduction? We can just take the derivative, set to 0, and solve for $\epsilon$. \[\frac{\partial}{\partial \epsilon}J(\theta) =( \textbf{g}^T\textbf{g} - \epsilon\textbf{g}^T\textbf{H}\textbf{g})=0\] \[\epsilon^* = \frac{\textbf{g}^T\textbf{g}}{\textbf{g}^T\textbf{H}\textbf{g}}\]

He also touches upon the worst case scenario when $\textbf{g}$ aligns with $\lambda_{max}$. This goes back to the previous post, where if the direction aligns with the direction of eigenvalue, i.e, the eigenvector, $\textbf{g}$, and gradient run parallel, the angle between is 0, thus $\cos^2(0) = 1$, and we are left with subtracting $\lambda_{max}$ with out any discounting, thus the “adjustment to cost as a result of gradient step” will be the smallest when $\textbf{g}$ aligns with $\lambda_{max}$ and below holds: \[(\epsilon - \frac{1}{2}\epsilon^2\lambda_{max})\textbf{g}^T\textbf{g}\]

Will continue on the next post…all mistakes are mine, if any please point out and I will amend.

Gradient descent with a dash of Linear Algebra #2

Wed, 11 Jan 2017 00:00:00 +0000

This is #2 of a series where I am trying to digest the presentation, Ian Goodfellow.

We left off, having defined the cost function, $J(\theta)$, the gradient, $\nabla_{\theta}J(\theta)$, and the Hessian, $\textbf{H}$.

The $\textbf{H}$, can be diagonalized, as $\textbf{H}$ is symmetric, and requires that the $\det{H} \neq 0$. \[\textbf{H}=\textbf{Q}\Lambda\textbf{Q}^T\] where $\textbf{Q}$ is the matrix of eigenvectors, $v_i$, and $\Lambda$ is the diagonal matrix containing the eigenvalues, $\lambda_i$. Thus for example, a $2\times2$ Hessian matrix with unique eigenvalues,$\lambda_1,\lambda_2$ the expansion would be:

\[\textbf{H}=\begin{bmatrix}v_1 & v_2 \end{bmatrix}\begin{bmatrix}\lambda_1 & 0 \\ 0 & \lambda_2 \end{bmatrix}\begin{bmatrix}v_1 \\ v_2 \end{bmatrix}\]

Further, he defines the $\textbf{H}$, applied in the direction $d$, as $d^T\textbf{H}d=\sum_i\lambda_i\cos^2(\theta_i)$ where $\theta$ in this case is defined as the angle between the eigenvector, $v_i$, and direction vector $d$. $\textit{This is important, as its used in the simplification to follow}$. Just remember that if directions aligned and $v_i$ is just a translation of $d$, the vectors run parallel, thus angle between is $0$, and $\cos^2(\theta_i) = 1$

Will continue on the next post…all mistakes are mine, if any please point out and I will amend.

Gradient descent with a dash of Linear Algebra #1

Tue, 10 Jan 2017 00:00:00 +0000

Updates: Dec/10/2019: with links and fixed grammar.

In this series of posts I am trying to digest the presentation by Ian Goodfellow found in the deep learning resource deep learning book, as its sparse in details (which I am sure was expanded on in the live session) and dense in mathematical notation, at least from the perspective of a non-math person. The purpose of this post is to breakdown the math, and reinforce my understanding.

Ian starts by introducing the cost function, $J(\theta)$, that we want to minimize, the “gradient”, $g=\nabla_{\theta}J(\theta)$, which is just basically a column vector of partial derivatives with respect to each parameter,$\frac{\partial J(\theta)}{\partial \theta_i}$, applied to $\vec{\theta}$, and $\textbf{H}$, the Hessian matrix, which is the partial derivatives with respect to each component of the gradient, resulting in a $i\times j$ matrix.

A quick example to move away from abstraction and to crystallize what we are dealing with… Consider a hypothetical cost function $J(\theta_1,\theta_2) = \theta_1^2 + \theta_1\theta_2$. In this case the gradient vector would be \[\begin{bmatrix}\frac{\partial J(\theta)}{\partial \theta_1} & \frac{\partial J(\theta)}{\partial \theta_2}\end{bmatrix}=\begin{bmatrix}2\theta_1+\theta_2 & \theta_1\end{bmatrix}\] The Hessian matrix, $\textbf{H}$, in this case would be:

\[\begin{bmatrix} \frac{\partial}{\partial\theta_1}\frac{\partial}{\partial \theta_1}J(\theta) & \frac{\partial}{\partial \theta_1}\frac{\partial}{\partial \theta_2}J(\theta)\\ \frac{\partial}{\partial\theta_2}\frac{\partial}{\partial \theta_1}J(\theta) & \frac{\partial}{\partial \theta_2}\frac{\partial}{\partial \theta_2}J(\theta) \end{bmatrix}= \begin{bmatrix}2 & 1\\ 1 & 0 \end{bmatrix}\]

Ok…one post to handle one slide..this may take a series of posts…

Select sort in Rust

Mon, 09 Jan 2017 00:00:00 +0000

A first pass at using rust, an implementation of select sort algorithm…

fn main() {
    let mut xs : [i32; 6] = [50,20,70,10,60,30];
    let arr_len = xs.len();

    for o in 1..arr_len {
        let mut sm_ind = o;
        for i in o..arr_len {
            if xs[i].lt(&xs[sm_ind]) {
                sm_ind = i;
            }
        }
        xs.swap(o, sm_ind);
    }
    for n in 1..arr_len {
        print!("{} ", xs[n]);
    }
    println!();
}

A few things learned in this excercise:

Need to use mut ahead of variable declaration as default is for immutable declaration.
Type check used at compile time, as a lot of errors were a result of mismatched types.
Feels like I just translated an implementation done in C, thus would be interested in seeing what a veteran rustecean implementation would look like.

LSTM with dropout

Mon, 09 Jan 2017 00:00:00 +0000

Working through the a tf tutorial, the authors introduce a paper by Zaremba et al. 2014, which addresses the issues of applying dropout as a form of regularization to RNN. A naive application results in unsatisfactory outcomes, thus they propose the application of dropouts only to the non-recurring inputs, $h_t^{l-1}$, where a LSTM is defined as: \[\textbf{LSTM}: h_t^{l-1}, h_{t-1}^{l}, c_{t-1}^l \rightarrow h_t^l,c_t^l\] $h_{t-1}^{l}$, is the recurring input, the current layer, $l$, from previous time step, and $c_{t-1}^l$, is the memory unit from previous time step.

A graphical representation introduced by the paper:

The main contribution was applying dropout function, $D$, to the non-recurring input, thus $D(h_t^{l-1})$.

This comment stuck with me and basically sums up nicely.

Standard dropout perturbs the recurrent connections, which makes it difficult for LSTM to learn to store information for long periods of time. By not using dropout on the recurrent connections, the LSTM can benefit from dropout regularization without sacraficing its valuable memorization ability.

Wishing that I had a few LSTMs to embed into my brain…

The Random Walker

Tue, 10 May 2016 00:00:00 +0000

Summary

In the following exercise, a simple random walk in one-dimension will be defined and assessed. An analytical approach formulated and a solution presented. Specifically, a random walk with a partially reflecting barrier at $0$ will be considered, with an empirical comparison suggesting convergence to the analytical description of the solution occuring as large number of trials are considered.

Introduction

Random walks is a well studied subject matter with applications arising across many fields. Random walks are commonly utilized to model topic specific behaviours, despite the phenomena appearing unique to the particular field. A typical random walk is a special kind of Markov chain, where states are all integers, largest move per transition is one step in either direction, and there is no probability assigned to remaining in the current state.

Simple Random Walk

A random walk is commonly defined as a stochastic sequence, defined by

\[S_n = i + \sum_{k=1}^n X_k \ s.t \ (n \in \Bbb{N}) \]

where $\{S_n\}$ is a stochastic sequence and $\{X_k\}$ are $i.i.d$ random variables. Specifically, a simple random walk is defined as a sequence of $X_k$ where the random variables take a value of $1$ or $-1$ with probability $p \in [0,1]$ and $1-p$, respectively. Let $S = \{S_0,S_1,S_2…\}$ be the partial sum process associated with $X$. The sequence $S$ is the simple random walk with parameter $p$. Further context can be found here

Further, a simple random walk can be considered symmetric if $Pr(X_k=1) = Pr(X_k=-1)$ when considering the one dimension case.

Random walks with additional properties

Random walks with absorption barriers are commonly studied in applications to the “The Gambler’s ruin” or the “The monkey at the cliff” problems. Consider a discrete random walk confined to a range $[a,b]$ and absorbing barriers at $a$ and $b$. In this example, $a$ and $b$ are considered absorbing states and characterized by the walk ending when either state is reached.

When considering reflecting barriers, $a$ is a reflecting barrier means that as soon as the walk reaches $a$ in the next step the walk returns with probability $1$.

In the following example, a discrete random walk with a partially reflecting barrier at 0 and absorbing barrier at a user defined $n$ will be considered. Specifically, if $x = 0$, the probability we remain in the current state is $Pr(x=0) = \frac{1}{2} $ and $Pr(x=1) = \frac{1}{2}$, otherwise $x=k$ and $Pr(x=k+1) = \frac{1}{2}$ and $Pr(x=k-1) = \frac{1}{2}$

The question we look to answer is what is the expected number $E_n$ of steps to reach $n$ from $i$ considering the prior mentioned probabilities.

A recurrence can be set up to solve for $E_n$.

Initial values:

\[E_n = 0 \tag{1} \]

\[E_0 = 1 + 0.5E_0 + 0.5E_1 \tag{2} \]

\[E_i = 1 + 0.5E_{i-1} + 0.5E_{i+1} \tag{3} \]

Note that $(3)$ simplifies to $E_{i+1} = 2E_i - E_{i-1} - 2$ with the associated characteristic equation defined as $x^2-2x+1$. Solving for the roots results in a double root at $1$, thus the homogenous solution has the form $E_i = a + bi$. In order to find the associated particular solution use \[E_i = c + di + ei^2 \tag{4} \]

Plug in $(4)$ into $(3)$ to solve for $e = -1$, and setting $c,d=0$, we find that $E_i = -i^2$. Thus the general solution is defined as

\[E_i = a + bi - i^2 \tag{5} \].

Finally, using $(1)$ and $(5)$, we find that $0 = a+bn-n^2$ and using $(2)$ and $(5)$ we find that $b = -1$, which is followed by $a= n + n^2$. Thus generally, the expected number, $E_n$, of steps to reach $n$ from $i$ is defined by

\[E_i = n + n^2 - i - i^2 \tag{6} \]

A random walk with partially reflecting barrier and absorbing barrier

In the below application, the analytical solution to the random walk with a partial reflecting barrier at $0$ and absorbing barrier at $n$ is calculated and compared to an empirical solution resulting from the mean derived from the user defined number of trials. The accepted $n$ values is restricted to $n < 30$ and the difference between $n$ and $i$ is restricted to $n-i < 10$ for practical reasons. The application is initialized as follows: $trials = 100$, $n = 10$, $i = 5 $.

RandomWalker

The web application was constructed using html, css, javascript, with bulk of application written in Elm. Github repository can be found here. All feedback will be appreciated!

Random Walk

Tue, 10 May 2016 00:00:00 +0000

Random Walk with Reflecting Boundries

journey with elm

Wed, 09 Mar 2016 17:08:56 +0000

An attempt to log my journey through learning elm. Will work through tutorials found on the net, with an attempt to tweak each provided example so that I can get a better grip on the material. The examples will be in reverse chronological order, with the top displaying the most recent challenge.

Tutorials: \

A random walk, continuing on with the use of the Random module…

Estimating pi…

The approximation is based on the idea that: \[\lim_{n \to \infty} \frac{t}{n} = \frac{Area \ of \ Circle \ Quad}{Area \ of \ Circumscribed \ Square \ Quad} = \frac{\frac{\pi r^2}{4}}{r^2} = \frac{\pi}{4}\]

Admittedly took a bit of brute force to get this one done. Constrained the movement of the eyes by adjusting the related R values on the x,y-axis.

Learnt a bit about signals, and some work to get the coordinates to track mouse moves…

Instead of calculating and displaying the fibonacci sequence, I chose to pump out the sequence of perfect numbers up to 1000.

Perfect number, a positive integer that is equal to the sum of its proper divisors. The smallest perfect number is 6, which is the sum of 1, 2, and 3. Other perfect numbers are 28, 496, and 8,128. The discovery of such numbers is lost in prehistory. It is known, however, that the Pythagoreans (founded c. 525 bc) studied perfect numbers for their “mystical” properties.
-Encyclopedia Britannica

If anyone has suggestions on improving the properDivisor implementation please advise.

module PerfnumBars where

import Color exposing (lightBlue, lightGrey, lightPurple, orange, purple, red, yellow)
import Graphics.Collage exposing (collage, filled, rect)
import Graphics.Element exposing (down, flow, right ,show)
import List exposing ((::), map2, reverse, drop, head, length, map)
import Maybe exposing (withDefault)

properDivisors      : Int -> List Int
properDivisors n    = List.map(\(x,y) -> y) (List.filter (\(x,y) -> x == 0)
(List.map (\x -> (n % x, x)) [1..n]))

perfectNum          : Int -> List Int
perfectNum n        = List.filter (\ x -> List.sum (properDivisors x) // 2 == x)[1..n]

indexedPerfectNum   : Int -> List (Int, Int)
indexedPerfectNum n = map2 (,) [0..n] (perfectNum n)

color n =
    let colors = [lightBlue, lightGrey, lightPurple]
    in
        drop (n % (length colors)) colors |> head |> withDefault red

bar (index, n) =
    flow right [
        collage (n) 20 [ filled (color index) (rect (toFloat n) 20) ],
        show n
    ]

main = flow down <| map bar (indexedPerfectNum 1000)

Really no additional explanation required here, just changed the wording a bit.

personal notes (raw & unedited)

Notes on working with D435 realsense, OpenVino, ROS 2

Notes on working with D435 realsense, OpenVino, ROS 2.

Follow up items

Command line reminder:

References

Using RPLidar A2 with Turtlebot 2 running ROS Melodic with a Kobuki base

Robot details

Setup

Steps to create a map

Steps to localize using the new created map and the AMCL package

Localization with slam_toolbox (INCOMPLETE)

References

Turtlepi: from ROS indigo to ROS melodic

Korabo: Getting users is tough

Intro

Context

Problem:

Solution:

Tools & Services

Steps Taken to Attract Users

Summary

Autonomous Intelligent Systems #3: Robot Mapping

Exercise 1: Implement the prediction step of the EKF SLAM algorithm.

Exercise 2: Implement the correction step.

ROScon2019: Follow up ToDo list

Raw & unedited todo list

ROS2: Cross compile package for Raspiberry Pi 3 B+

Preliminary Steps

Cross Compile

Conclusion

Autonomous Intelligent Systems #2: Robot Mapping

Exercise 1.a: Describe briefly the two main steps of the Bayes filter in your own words.

Exercise 1.b: Describe briefly the meaning of the following probability density functions.

Exercise 1.c: Specify the distributions that corresponds to the above mentioned 3 terms in the EKF.

Exercise 1.d: Explain in a few sentences all of the components of the EKF algorithm.

Exercise 2.a: Derive the Jacobian matrix \(G_t^x \) of the noise-free motion function \(g\) with respect to the pose of the robot. Use the odometry motion model as in exercise sheet 1.

Exercise 2.b: Derive the Jacobian matrix \(H_t^i\) of the noise-free sensor function \(h\) corresponding to the \(i^{th}\) landmark.

ROS2: Quality of Service (QoS)

Quality of Service (QoS)

References

Autonomous Intelligent Systems #1: Robot Mapping

Exercise 1: Skipping as just an intro to Octave.

Exercise 2: Implement an odometry model.

Exercise 3:

3.a

3.b Given two robot poses \(x_1\), and \(x_2\) how do you get the relative transformation from \(x_1\)to \(x_2\)?

3.c Given a robot pose and observation z of a landmark relative to \(x_t\) compute the location of the landmark.

Before I forget #1: FFT algorithm

FFT Algorithm

Intro

A concrete example is as follows.

FFT: a Divide & Conquer (D&C) approach

LEVEL 1

LEVEL 2

LEVEL 3

A minimal example using cmake to create a c++ shared library.

References

Autonomous Mobile Robot #4: Using GCP Storage

A rough outline of the system in place

Start the training

Autonomous Mobile Robot #3: Pairing with a PS3 Controller for teleop

References

Medical imaging: playing with the ChestXray-14 dataset

Objective

Data Analysis

Preprocess

Architectures

Evaluation

Results

Further Studies

Acknowledgements

References

Refresher: a few resources covering RNNs, trainable parameters + flops

RNNs

Improving learning

Variable RNN

Calculating trainable parameters and flops

Flops

Trainable parameters