THIS IS A TEST INSTANCE. ALL YOUR CHANGES WILL BE LOST!!!!
Old Griffin architecture problems
Griffin currently heavily depends on ONE query engine's raw API to define/implement data quality measurements, which is not good for several reasons,
- It couples with particular query engine, which is not feasible for most cases since different organizations might have different query engine deployed.
- Data quality definition is not clear enough to understandable between different stakeholder, we need more clear abstraction regarding data quality.
- Scheduler is limit in Griffin, scheduler should be easily integrate with different data platform teams' strategy.
- Griffin's only mission is to reduce MTTD(Mean time to detect), but currently it lack integration consideration for alerting.
Next generation Griffin architecture considerations
As the only mission for Griffin is to reduce MTTD,
- During define phase, next generation architecture should use more expressive rules to define data quality requirements. SQL based rule is a good candidate for defining data quality, it is abstract but also concrete. It is abstract so that we can dispatch data quality rules to different query engines, it is concrete that all data quality stakeholders can understand the rules and align easily.
- During measure phase, the next generation Griffin should standardize measure pipelines to different stages as recording stage, checking stage and alerting stage. It is easily for different data platform teams to integration with Griffin during different stages.
- During analyze phase, the next generation Griffin should provide standardize solutions as anomaly detection algorithm to detect anomaly, since in most cases, related stakeholders need support to define anomaly.
- Last but not least, the next generation Griffin should provides data quality reports/scorecards for different levels requirements.