Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

{"test1":"hi","test2":"there","test3":12,"test4":12.5,"test5":null,"test6":true,"test7":false,"test8":["element1","element2"],"test9":[],"test10":[1]}

The inferred schema should yield a STRUCT with the following fields:

nameschema
test1STRING
test2STRING
test3INT64
test4FLOAT64
test5STRING
test6BOOLEAN
test7BOOLEAN
test8ARRAY{STRING}
test9ARRAY{STRING}
test10ARRAY{INT64}

This is achievable by introducing a recursive method, inferSchema(JsonNode jsonValue), which is capable of both 1.) inferring the schema for simple JSON documents, and 2.) making a recursive call to break down JSON documents with complex data types into their constituent parts and Schemas:

Code Block
titleJsonConverter.java
private Schema inferSchema(JsonNode jsonValue) {
    switch (jsonValue.getNodeType()) {
        case NULL:
            return Schema.OPTIONAL_STRING_SCHEMA;
        case BOOLEAN:
            return Schema.BOOLEAN_SCHEMA;
        case NUMBER:
            if (jsonValue.isIntegralNumber()) {
                return Schema.INT64_SCHEMA;
            }
            else {
                return Schema.FLOAT64_SCHEMA;
            }
        case ARRAY:
            SchemaBuilder arrayBuilder = SchemaBuilder.array(jsonValue.elements().hasNext() ? inferSchema(jsonValue.elements().next()) : Schema.OPTIONAL_STRING_SCHEMA);
            return arrayBuilder.build();
        case OBJECT:
            SchemaBuilder structBuilder = SchemaBuilder.struct();
            Iterator<Map.Entry<String, JsonNode>> it = jsonValue.fields();
            while (it.hasNext()) {
                Map.Entry<String, JsonNode> entry = it.next();
                structBuilder.field(entry.getKey(), inferSchema(entry.getValue()));
            }
            return structBuilder.build();
        case STRING:
            return Schema.STRING_SCHEMA;
        case BINARY:
        case MISSING:
        case POJO:
        default:
            return null;
    }
}


In the current model, the assumption made in the event that a JSON value is not capable of being intelligibly inferenced (null, []) is that the actual schema is a STRING.

 

Compatibility, Deprecation, and Migration Plan

...