Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  1. Why is RecordIO a preferred format for image data in MXNet. Are there alternatives to it like Apache Parquet or Avro etc.. ?
  2. What are the options for editing an already created .rec file?
    1. the ideal solution is to rewrite the file as files are always read and written as streams of data and it would not be possible to add records in the middle of a file in-place. This cannot be seen as a drawback of the API as this limitation is shared by the reading/writing of any generic text file in Python or other programming languages. But reading, writing and editing of record files can be accomplished by read_idx() and write_idx() methods of the MXIndexedRecordIO object.

      Code Block
      languagepy
      themeEclipse
      linenumberstrue
      '''
      Existing RecordIO files can perused and edited and rewritten using the below 
      two code snippets of reading and writing RecordIO files.
      '''
      
      
      # Write a record
      label1 = [2,3]
      id1 = 2
      header1 = mx.recordio.IRHeader(0, label1, id1, 0)
      with open('img.jpg', 'rb') as fin:
          img = fin.read()
          s1 = mx.recordio.pack(header1, img)
      write_record = mx.recordio.MXIndexedRecordIO('img.idx', 'img.rec', 'w')
      write_record.write_idx(id1, s1)
      
      
      # Read record
      read_record = mx.recordio.MXIndexedRecordIO('img.idx', 'img.rec', 'r') 
      item = read_record.read_idx(2)
      header, img = mx.recordio.unpack_img(item)
      print(header.label)


...

Code Block
languagepy
themeEclipse
titleim2rec
def mx.io.im2rec(list_file, transforms, dataset_params, output_path):
    """ 
    Convert image.list file containing the path to the raw images to binary files.
     Input Parameters - 
     ---------- 
     list_file - str object containing the path to the list file
     transforms - gluon.transforms.Compose object
     dataset_params - dict object whose description is given in the appendix
     output_path - string object containing the path to the output location
     
     Return type - 
     ---------- 
     rec_file_path - str object depicting the path of the output rec file 
    """
    return rec_file_path
arr2rec API Specification

...

Code Block
languagepy
themeEclipse
titlearr2rec
def mx.io.np2rec(data, labels, transforms, dataset_params):
    """ 
    Convert numpy representation of images to binary files.
     Input Parameters - 
     ---------- 
     data - numpy array holding all the image data. 
         Supported array shapes - 
         (N,H,W) - image with uint8 data
         (N,3,H,W) - image with RGB values (float or uint8)
         (N,4,H,W) - image with RGBA values.
         N is the number of the images.
         H and W are the rows and columns of the image.
         The pixel values should be in the range of [0...1] for float data type and [0...255] for int data type. Values outside this range will be  clipped.
     labels - numpy array holding labels for each of the images. Should be of length N.
     transforms - gluon.transforms.Compose object
     dataset_params - dict object whose description is given in the appendix
     output_path - string object containing the path to the output location

     Return type - 
     ---------- 
     rec_file_path - str object depicting the path of the output rec file 
     """
     return rec_file_path


Backward Compatibility

...


Parameter

Default Value/Optional

Description

num_workers
1
Have multiple workers doing the job. This option will imply shuffling the dataset.
labelbatch_widthsize14096specify the
pass_throughFalse label_width in the list, by default set to 1


pack_label0


whether to also pack multi dimensional label in the record filensplitparts
1
used for part generation, logically split the .lst file to NSPLIT parts by position

...