A compilation of concepts I want to remember...

 » Home
 » About Me
 » Github

Data Augmentation: a minimal example using TensorFlow Dataset API

24 Jan 2018 » tensorflow, dl

In working with the Udacity’s Drive data, I wanted to augment the available data to increase the size of the data set in hopes of improving the results of training PilotNet, an end to end deep learning model, developed by Nvidia. [1]

I decided to use TensorFlow’s Dataset API to create the data pipeline, a great API that abstracts a lot while still allowing flexibility for the developer to customize the pipeline to a given task. That said, I had a hard time finding best practices on data augmentation and the associated pipeline using the Dataset API. After some investigation of the TensorFlow documentation, I found the definition to the concatenate() method. [2] Unfortunately, there were no examples of how to construct a pipeline for augmentation, thus will use this post to introduce a minimal example. Please refer to a full working data pipeline applied to the Udacity dataset here.. The DataHandler class defined in the source code was quickly put together, thus any advice on how to improve the pipeline or best practice tips would be appreciated.

Minimal example: Using concatenate() to augment original data

Please see this jupyter notebook for the minimal example.

The notebook walks through the use of TensorFlow API to upload image based on information found in csv file, and the use of the concatenate() method to create an augmented dataset.


  1. https://arxiv.org/abs/1604.07316
  2. https://www.tensorflow.org/api_docs/python/tf/data/Dataset#concatenate