Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

We have downloaded the following archived files (ILSVRC2012_img_train.tar, ILSVRC2012_img_val.tar, ILSVRC2012_bbox_train_v2.tar.gz) from the official website, and then processed the dataset to make it compatible with both TensorFlow and PyTorch. NCI hosts this dataset in two formats: raw and TFrecords. Both are described below.

Raw

...

Data Format

As mentioned above several archived files are downloaded directly from the ImageNet site. The first one contains training data (ILSVRC2012_img_train.tar), the second is for validation data (ILSVRC2012_img_val.tar), and the third archive (ILSVRC2012_bbox_train_v2.tar.gz) contains the bounding boxes for objects inside the images. That is the bounding boxes identify the objects that are used in the actual training.

...

Code Block
languagebash
titleRaw Imagenet data
train/n01440764/n01440764_10026.JPEG  train/n01440764/n01440764_12433.JPEG  train/n01440764/n01440764_172.JPEG    train/n01440764/n01440764_31283.JPEG  train/n01440764/n01440764_5776.JPEG  train/n01440764/n01440764_7946.JPEG
train/n01440764/n01440764_10027.JPEG  train/n01440764/n01440764_12435.JPEG  train/n01440764/n01440764_1735.JPEG   train/n01440764/n01440764_31293.JPEG  train/n01440764/n01440764_5781.JPEG  train/n01440764/n01440764_7950.JPEG
train/n01440764/n01440764_10029.JPEG  train/n01440764/n01440764_12446.JPEG  train/n01440764/n01440764_17454.JPEG  train/n01440764/n01440764_3129.JPEG   train/n01440764/n01440764_5785.JPEG  train/n01440764/n01440764_7963.JPEG
train/n01440764/n01440764_10040.JPEG  train/n01440764/n01440764_1244.JPEG   train/n01440764/n01440764_17501.JPEG  train/n01440764/n01440764_31406.JPEG  train/n01440764/n01440764_5802.JPEG  train/n01440764/n01440764_7967.JPEG
train/n01440764/n01440764_10042.JPEG  train/n01440764/n01440764_1245.JPEG   train/n01440764/n01440764_17514.JPEG  train/n01440764/n01440764_3146.JPEG   train/n01440764/n01440764_5848.JPEG  train/n01440764/n01440764_7982.JPEG
...


Tensorflow Record

...

Format

Although those raw images are ready for deep learning with PyTorch, Tensorflow will require some additional steps. The raw images will make the training process slower with Tensorflow. To speed up training, we need to convert the raw images into Tensorflow Records (TFrecords) using the script and instructions given on the following page: https://github.com/tensorflow/models/blob/master/research/slim/datasets/download_and_convert_imagenet.sh

...