Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  1. Why is RecordIO a preferred format for image data in MXNet. Are there alternatives to it like Apache Parquet or Avro etc.. ?
  2. What are the options for editing an already created .rec file?
    1. the ideal solution is to rewrite the file as files are always read and written as streams of data and it would not be possible to add records in the middle of a file in-place. This cannot be seen as a drawback of the API as this limitation is shared by the reading/writing of any generic text file in Python or other programming languages. But reading, writing and editing of record files can be accomplished by read_idx() and write_idx() methods of the MXIndexedRecordIO object.

      Code Block
      languagepy
      themeEclipse
      linenumberstrue
      '''
      Existing RecordIO files can perused and edited and rewritten using the below 
      two code snippets of reading and writing RecordIO files.
      '''
      
      
      # Write a record
      label1 = [2,3]
      id1 = 2
      header1 = mx.recordio.IRHeader(0, label1, id1, 0)
      with open('img.jpg', 'rb') as fin:
          img = fin.read()
          s1 = mx.recordio.pack(header1, img)
      write_record = mx.recordio.MXIndexedRecordIO('img.idx', 'img.rec', 'w')
      write_record.write_idx(id1, s1)
      
      
      # Read record
      read_record = mx.recordio.MXIndexedRecordIO('img.idx', 'img.rec', 'r') 
      item = read_record.read_idx(2)
      header, img = mx.recordio.unpack_img(item)
      print(header.label)


...