Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: add version information for serialization.encoding (HIVE-7142)

...

  • MetadataTypedColumnsetSerDe: This SerDe is used to read/write delimited records like CSV, tab-separated control-A separated records (sorry, quote is not supported yet).
  • LazySimpleSerDe: This SerDe can be used to read the same data format as MetadataTypedColumnsetSerDe and TCTLSeparatedProtocol, however, it creates Objects in a lazy way which provides better performance. It Starting in Hive 0.14.0 it also supports read/write data with a specified encode charset, for example:

    Code Block
    ALTER TABLE person SET SERDEPROPERTIES ('serialization.encoding'='GBK');
  • ThriftSerDe: This SerDe is used to read/write Thrift serialized objects. The class file for the Thrift object must be loaded first.
  • DynamicSerDe: This SerDe also read/write Thrift serialized objects, but it understands Thrift DDL so the schema of the object can be provided at runtime. Also it supports a lot of different protocols, including TBinaryProtocol, TJSONProtocol, TCTLSeparatedProtocol (which writes data in delimited records).

...

  • Instance of a Java class (Thrift or native Java)
  • A standard Java object (we use java.util.List to represent Struct and Array, and use java.util.Map to represent Map)
  • A lazily-initialized object (For for example, a Struct of string fields stored in a single Java string object with starting offset for each field)

...