Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

{"test1":"hi","test2":"there","test3":12,"test4":12.5,"test5":null,"test6":true,"test7":false,"test8":["element1","element2"],"test9":[],"test10":[1]}

The inferred schema should yield a STRUCT with the following fields:

nameschema
test1STRING
test2STRING
test3INT64
test4FLOAT64
test5STRING
test6BOOLEAN
test7BOOLEAN
test8ARRAY{STRING}
test9ARRAY{STRING}
test10ARRAY{INT64}

This is achievable by the introduction of introducing a recursive method, inferSchema(JsonNode jsonValue), which calls itself in order to handle is capable of both 1.) inferring the schema for simple JSON documents, and 2.) making a recursive call to break down JSON documents with complex data types and is equipped with the capability of building nested Schemas and providing coverage for all JSON documentsinto their constituent parts and Schemas:

Code Block
titleJsonConverter.java
private Schema inferSchema(JsonNode jsonValue) {
    switch (jsonValue.getNodeType()) {
        case NULL:
            return Schema.OPTIONAL_STRING_SCHEMA;
        case BOOLEAN:
            return Schema.BOOLEAN_SCHEMA;
        case NUMBER:
            if (jsonValue.isIntegralNumber()) {
                return Schema.INT64_SCHEMA;
            }
            else {
                return Schema.FLOAT64_SCHEMA;
            }
        case ARRAY:
            SchemaBuilder arrayBuilder = SchemaBuilder.array(jsonValue.elements().hasNext() ? inferSchema(jsonValue.elements().next()) : Schema.OPTIONAL_STRING_SCHEMA);
            return arrayBuilder.build();
        case OBJECT:
            SchemaBuilder structBuilder = SchemaBuilder.struct();
            Iterator<Map.Entry<String, JsonNode>> it = jsonValue.fields();
            while (it.hasNext()) {
                Map.Entry<String, JsonNode> entry = it.next();
                structBuilder.field(entry.getKey(), inferSchema(entry.getValue()));
            }
            return structBuilder.build();
        case STRING:
            return Schema.STRING_SCHEMA;
        case BINARY:
        case MISSING:
        case POJO:
        default:
            return null;
    }
}


In the current model, the assumption made in the event that a JSON value is not capable of being intelligibly inferenced (null, []) is that the actual schema is a STRING.

 

Compatibility, Deprecation, and Migration Plan

...