In working with the Udacity’s Drive data, I wanted to augment the available data to increase the size of the data set in hopes of improving the results of training PilotNet, an end to end deep learning model, developed by Nvidia. [1]
I decided to use TensorFlow’s Dataset API to create the data pipeline, a great
API that abstracts a lot while still allowing flexibility for the developer to
customize the pipeline to a given task. That said, I had a hard time finding
best practices on data augmentation and the associated pipeline using the
Dataset API. After some investigation of the TensorFlow documentation, I found
the definition to the concatenate()
method. [2] Unfortunately, there were no
examples of how to construct a pipeline for augmentation, thus will use this
post to introduce a minimal example. Please refer to a full working data
pipeline applied to the Udacity dataset
here..
The DataHandler
class defined in the source code was quickly put together,
thus any advice on how to improve the pipeline or best practice tips would be
appreciated.
Minimal example: Using concatenate()
to augment original data
Please see this jupyter notebook for the minimal example.
The notebook walks through the use of TensorFlow API to upload image based on
information found in csv file, and the use of the concatenate()
method to
create an augmented dataset.
References
- https://arxiv.org/abs/1604.07316
- https://www.tensorflow.org/api_docs/python/tf/data/Dataset#concatenate