Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Link to Dev list discussionhttps://lists.apache.org/thread.html/b339c4dd7068f6ae3c4262b82481e3b53f85fff368884e525703cb16@%3Cdev.mxnet.apache.org%3E

Feature ShepherdAnirudh Acharya

Problem

Data IO is often a bottleneck for training and inference workflows with image data. And as data size gets larger and is unable to fit in the main memory, data loading can bring down the performance of the workflow. Which is why it would be beneficial to have the image stored in the binary recordIO format, which is much more compact than raw image files, occupies less memory and more efficient while data loading.

The goal of this project is to have an easy to use and intuitive interface to pre-process image data and create recordIO files. Currently our customers have to clone the whole MXNet repository to use a command line tool to pre-process and create recordIO files from image datasets. This is inconvenient for our customers, with the proposed change the customers will be able to use this functionality straight out of the PyPi package.

...

As a user, I’d like to have an API to convert a dataset of raw images into binary format and pack them as RecordIO files.

Open Questions


Proposed Approach

Implement a new API in MXNet's Data IO API that accepts an image list file or a numpy array, and converts that data into recordIO file format and stores the file. The proposed approach will also parallelize and user will be given the option to set the number of threads he/she can use to perform this function. The proposed API will have the same functionality as an existing CLI tool, which is currently used by customers for creating .rec files, but customers will have the convenience of using this functionality from the PyPi package itself.

...

Code Block
languagepy
themeEclipse
titlearr2rec
def mx.io.np2rec(data, labels, transforms, dataset_params):
    """ 
    Convert numpy representation of images to binary files.
     Input Parameters - 
     ---------- 
     data - numpy array holding all the image data. 
         Supported array shapes - 
         (N,H,W) - image with uint8 data
         (N,3,H,W) - image with RGB values (float or uint8)
         (N,4,H,W) - image with RGBA values.
         N is the number of the images.
         H and W are the rows and columns of the image.
         The pixel values should be in the range of [0...1] for float data type and [0...255] for int data type. Values outside this range will be  clipped.
     labels - numpy array holding labels for each of the images. Should be of length N.
     transforms - gluon.transforms.Compose object
     dataset_params - dict object whose description is given in the appendix
     output_path - string object containing the path to the output location

     Return type - 
     ---------- 
     rec_file_path - str object depicting the path of the output rec file 
     """
     return rec_file_path

Backward Compatibility

Post implementing the API the existing CLI tool will continue to exist, but users will also be directed to the new API and its accompanying documentation/tutorials.

Performance Benchmarks


Alternative Approach

One of the initial approaches I came up with involved having each of the image transforms and dataset_params as a parameter to the API. This will end up creating an API with potentially 10-15 parameters and adding/removing more transforms or parameters might be difficult and could lead to API breakage. Hence using gluon.transforms was preferred.

Backward Compatibility

Post implementing the API the existing CLI tool will continue to exist, but users will also be directed to the new API and its accompanying documentation/tutorials.

APPENDIX

Dataset Params

These parameters describe how the record files will be packed sequentially in the .rec file


Parameter

Default Value/Optional

Description

num_workers
1
Have multiple workers doing the job. This option will imply shuffling the dataset.

label_width
1
specify the label_width in the list, by default set to 1

pack_label
0
whether to also pack multi dimensional label in the record file

nsplit
1
used for part generation, logically split the .lst file to NSPLIT parts by position

Supported Gluon Transforms

These transforms are to perform pre-processing of images. Each of these transforms will be implemented as a gluon transform functions. The proposed API spec accepts transforms.Compose which is of type SequentialBlock. This SequentialBlock will contain a stack of transforms that will be applied. The user could define his own HybridBlock and include it in the SequentialBlock, to make it extensible.


Parameter

Default Value/Optional

Description

resize

Optional

resize the image to the newsize [width, height]

center_crop
0
specify whether to crop the center image to make it square.
1 - perform cropping
0 - no cropping

quality

95 for JPEG;
9 for PNG
JPEG quality for encoding (1-100, default: 95)
PNG compression for encoding (1-9, default: 3).
color-1Force color (1), gray image (0) or keep source unchanged (-1)
encoding[‘.jpg’]Encoding type. Can be '.jpg' or '.png'
inter_method1Image interpolation methods.
NN(0) BILINEAR(1) CUBIC(2) AREA(3) LANCZOS4(4) AUTO(9) RAND(10)