...
Jira | ||||||
---|---|---|---|---|---|---|
|
Apache Atlas uses a the JanusGraph graph database at the heart of its metadata repository. This graph is used to show the interconnected relationships between data sources; the data sets they host; the business meaning of the data elements within each data set; the classification of these elements in terms of quality, confidentiality, retention; who (people and processes) are using them and for which purposes.
The current implementation of the graph db is Titan 0.5. This is a fairly back level version of Titan and there has been some work to provide support for Titan 1.0 by adding a graph abstraction layer. There is still work to do to complete this abstraction later, particularly in the catalog service which is using a back-level of ThinkerPop/Germlin that is not supported by Titan 1.0.
In the meantime a new graph initiative call JanusGraph has been spawned from Titan to take the code-base forward.
So, what should our graph strategy be? Do we focus on a single graph database, if so which one? or do we allow a range of graph databases that can be used depending on the deployment? If we support a range of grpah databases, can standard abstraction layers such as Apache TinkerPop be used?
JanusGraph uses a pluggable persistence store to save the metadata content and a search index for its search API. Apache Atlas can take advantage of this configurability to support a range of size, scalability and performance requirements
Jira | ||||||
---|---|---|---|---|---|---|
|
...