Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • There are some cases where Hive is trying to create a PrimitiveTypeEntry or ObjectInspector based on a Java class type. Such as ObjectInspectorFactory.getReflectionObjectInspector(). In these cases, there would be no data type params available to add to the PrimitiveTypeEntry/ObjectInspector, in which case we might have to default to some kind of type attributes - max precision decimal or max length char/varchar. This happens in a few places:
    • TypedSerDe (used by ThriftByteStreamTypedSerDe). Might be ok since if it's just using Thrift types.
    • S3LogDeserializer (in contrib). Might be ok, looks like it is only a deserializer, and for a custom S3 struct.
    • MetadataTypedColumnsetSerDe. Might be ok, looks it might just use strings.
    • GenericUDFUtils.ConversionHelper.ConversionHelper(), as well as GenericUDFBridge. This is used by old-style UDFs, in particular for the return type of the UDF. So in the general case it is not always possible to have type parameters for the return type of UDFs. GenericUDFs would be required if we want to be able to return a char length/decimal precision as part of the return type metadata, since they can customize the return type ObjectInspector.
  • If cast operators remain implemented as UDFs, then the UDF should probably be implemented as a Generic UDF so that the return type ObjectInspector can be set with the type params. In addition, the type parameters need to be somehow passed into the cast UDF before its initialize() method is called.
  • Hive code does a lot of pointer-based equality using PrimitiveTypeEntry/TypeInfo/ObjectInspector objects. So a varchar(15) object inspector is not equal to a varchar(10). This may have some advantages such as requiring conversion/length enforcement in this case, but it seems like this may not always be desirable behavior.
  • The SerDe initialize() method receives a string representation of the column types in the "column.types" property. Unfortunately we can't just use the same syntax as was used during the column definition here (i.e., "decimal(10,2)"), because it looks like the current convention for this type string uses commas as a delimiter between the different column types, so "decimal(10,2)" would be considered 2 different column types "decimal(10" and "2)". There seem to be a few SerDes which use String.split(",") on this type string, and there may be a chance that we could affect 3rd party SerDe implementations if we change the format of this string value.

MetaStore Changes

There are a few different options here:

...