Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

Status

Current state: UNDER DISCUSSION ACCEPTED

Discussion thread: <TBD>mail of vote result

JIRA: SAMZA-2421

Released:

...

  1. Schema of the incoming Avro records is expected to be the same for all messages between two flushes of the SystemProducer. This is because an Avro file expects all of its entries to conform to a schema and here blob is an Avro file.
  2. OutgoingMessageEnvelope.key will be ignored here. This is because the Avro blobs created consist of only the schema taken from the OutgoingMessageEnvelope.message.getSchema and all the messages which are Avro records. There is no way to insert a key into the records as that would make the record incompatible with the schema. Other implementations of AzureBlobWriter can utilize the key in a meaning fashion.
  3. Optionally, OutgoingMessageEnvelope.message can be a byte array provided the first message to the SystemProducer or the first message sent after a flush is an Avro record. The first message being an Avro record is needed to get the schema for the Avro file. If the subsequent messages are byte arrays, then the byte arrays are appended to the Avro file directly without any check to see if the byte array indeed is an encoded Avro record which adheres to the schema of the Avro file. It is responsibility of the user of the SystemProducer to ensure that the byte arrays are valid failing which the resulting Azure file will be unreadable.
  4. SystemProducer is agnostic to schema evolution as long as the user adheres to the same schema between two flushes. Hence, if the user wants to change the schema then a flush prior to the change is mandatory. This is because, all messages sent between two flushes go into a single Avro file and are expected to adhere to the schema of the file.
Code Block
public class AvroAzureBlobWriter implements AzureBlobWriter { 
	// Avro's DataFileWriter is used to write the incoming Avro records. The sink for DataFileWriter is AzureBlobOutputStream.
    /**
     * write the message to blob via DataFileWriter which writes to AzureBlobOutputStream
     */
	public void write(OutgoingMessageEnvelope messageEnvelope) {
	}
    
	/**
     * flush the contents of DataFileWriter and underlying AzureBlobOutputStream 
     * this operation uploads a block.
     */
	public void flush() {
	}

   /**
    * Close the DataFileWriter and underlying AzureBlobOutputStream
    * this operation commits the blob with all blocks uploaded so far.
    */
	public void close() {
	}
}

...