Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

As a user, I’d like to have an API to convert a dataset of raw images into binary format and pack them as RecordIO files.

Open Questions

<>

Proposed Approach

Implement a new API in MXNet's Data IO API that accepts an image list file or a numpy array, and converts that data into recordIO file format and stores the file. The proposed approach will also parallelize and user will be given the option to set the number of threads he/she can use to perform this function. The proposed API will have the same functionality as an existing CLI tool, which is currently used by customers for creating .rec files, but customers will have the convenience of using this functionality from the PyPi package itself.

...

Post implementing the API the existing CLI tool will continue to exist, but users will also be directed to the new API and its accompanying documentation/tutorials.

Performance Benchmarks

<>

Alternative Approach

One of the initial approaches I came up with involved having each of the image transforms and dataset_params as a parameter to the API. This will end up creating an API with potentially 10-15 parameters and adding/removing more transforms or parameters might be difficult and could lead to API breakage. Hence using gluon.transforms was preferred.

...