...
We have downloaded the following archived files (ILSVRC2012_img_train.tar, ILSVRC2012_img_val.tar, ILSVRC2012_bbox_train_v2.tar.gz
) from the official website, and then processed the dataset to make it compatible with both TensorFlow and PyTorch. NCI hosts this dataset in two formats: raw and TFrecords. Both are described below.
Raw
...
Data Format
As mentioned above several archived files are downloaded directly from the ImageNet site. The first one contains training data (ILSVRC2012_img_train.tar
), the second is for validation data (ILSVRC2012_img_val.tar
), and the third archive (ILSVRC2012_bbox_train_v2.tar.gz
) contains the bounding boxes for objects inside the images. That is the bounding boxes identify the objects that are used in the actual training.
...
Code Block | ||||
---|---|---|---|---|
| ||||
train/n01440764/n01440764_10026.JPEG train/n01440764/n01440764_12433.JPEG train/n01440764/n01440764_172.JPEG train/n01440764/n01440764_31283.JPEG train/n01440764/n01440764_5776.JPEG train/n01440764/n01440764_7946.JPEG train/n01440764/n01440764_10027.JPEG train/n01440764/n01440764_12435.JPEG train/n01440764/n01440764_1735.JPEG train/n01440764/n01440764_31293.JPEG train/n01440764/n01440764_5781.JPEG train/n01440764/n01440764_7950.JPEG train/n01440764/n01440764_10029.JPEG train/n01440764/n01440764_12446.JPEG train/n01440764/n01440764_17454.JPEG train/n01440764/n01440764_3129.JPEG train/n01440764/n01440764_5785.JPEG train/n01440764/n01440764_7963.JPEG train/n01440764/n01440764_10040.JPEG train/n01440764/n01440764_1244.JPEG train/n01440764/n01440764_17501.JPEG train/n01440764/n01440764_31406.JPEG train/n01440764/n01440764_5802.JPEG train/n01440764/n01440764_7967.JPEG train/n01440764/n01440764_10042.JPEG train/n01440764/n01440764_1245.JPEG train/n01440764/n01440764_17514.JPEG train/n01440764/n01440764_3146.JPEG train/n01440764/n01440764_5848.JPEG train/n01440764/n01440764_7982.JPEG ... |
Tensorflow Record
...
Format
Although those raw images are ready for deep learning with PyTorch, Tensorflow will require some additional steps. The raw images will make the training process slower with Tensorflow. To speed up training, we need to convert the raw images into Tensorflow Records (TFrecords) using the script and instructions given on the following page: https://github.com/tensorflow/models/blob/master/research/slim/datasets/download_and_convert_imagenet.sh
...