...
{"test1":"hi","test2":"there","test3":12,"test4":12.5,"test5":null,"test6":true,"test7":false,"test8":["element1","element2"],"test9":[],"test10":[1]}
The inferred schema should yield a STRUCT with the following fields:
name | schema |
---|---|
test1 | STRING |
test2 | STRING |
test3 | INT64 |
test4 | FLOAT64 |
test5 | STRING |
test6 | BOOLEAN |
test7 | BOOLEAN |
test8 | ARRAY{STRING} |
test9 | ARRAY{STRING} |
test10 | ARRAY{INT64} |
This is achievable by introducing a recursive method, inferSchema(JsonNode jsonValue), which is capable of both 1.) inferring the schema for simple JSON documents, and 2.) making a recursive call to break down JSON documents with complex data types into their constituent parts and Schemas:
Code Block | ||
---|---|---|
| ||
private Schema inferSchema(JsonNode jsonValue) { switch (jsonValue.getNodeType()) { case NULL: return Schema.OPTIONAL_STRING_SCHEMA; case BOOLEAN: return Schema.BOOLEAN_SCHEMA; case NUMBER: if (jsonValue.isIntegralNumber()) { return Schema.INT64_SCHEMA; } else { return Schema.FLOAT64_SCHEMA; } case ARRAY: SchemaBuilder arrayBuilder = SchemaBuilder.array(jsonValue.elements().hasNext() ? inferSchema(jsonValue.elements().next()) : Schema.OPTIONAL_STRING_SCHEMA); return arrayBuilder.build(); case OBJECT: SchemaBuilder structBuilder = SchemaBuilder.struct(); Iterator<Map.Entry<String, JsonNode>> it = jsonValue.fields(); while (it.hasNext()) { Map.Entry<String, JsonNode> entry = it.next(); structBuilder.field(entry.getKey(), inferSchema(entry.getValue())); } return structBuilder.build(); case STRING: return Schema.STRING_SCHEMA; case BINARY: case MISSING: case POJO: default: return null; } } |
In the current model, the assumption made in the event that a JSON value is not capable of being intelligibly inferenced (null, []) is that the actual schema is a STRING.
Compatibility, Deprecation, and Migration Plan
...