Page tree

The MNIST (Mixed National Institute of Standards and Technology) database of handwritten digits is one of the most researched datasets in machine learning. It has a training set of 60,000 examples, and a test set of 10,000 examples. The digits have been size-normalized and centered in a fixed-size image, i.e. 28*28 pixel.

A sample image from MNIST test dataset is show below

We have  provided a copy of the MNIST dataset as the part of the NCI AI/ML data collection under the project wb00. Please join the project to access this dataset. 

The directory tree for the MNIST dataset is listed below

$ tree /g/data/wb00/MNIST/
/g/data/wb00/MNIST/
├── npz                     # To feed in the the Tensorflow function tf.keras.datasets.mnist.load_data(path="/g/data/wb00/MNIST/npz/mnist.npz")
│   └── mnist.npz 
└── raw                     # To feed in the Pytorch function datasets.MNIST("/g/data/wb00",...)    
├── t10k-images-idx3-ubyte  # test set images
├── t10k-labels-idx1-ubyte  # test set labels 
├── train-images-idx3-ubyte # training set images
└── train-labels-idx1-ubyte # training set labels

2 directories, 5 files

You can find some examples on accessing the MNIST dataset under the NCI-ai-ml environment example space, i.e. "${NCI_AI_ML_ROOT}/examples".