Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Updated to reflect work is complete

...

Jira
serverASF JIRA
serverId5aa69414-a9e9-3523-82ec-879b028fb15b
keyATLAS-1757

 

Apache Atlas uses a the JanusGraph graph database at the heart of its metadata repository.  This graph is used to show the interconnected relationships between data sources; the data sets they host; the business  meaning of the data elements within each data set; the classification of these elements in terms of quality, confidentiality, retention; who (people and processes) are using them and for which purposes.

The current implementation of the graph db is Titan 0.5.  This is a fairly back level version of Titan and there has been some work to provide support for Titan 1.0 by adding a graph abstraction layer.  There is still work to do to complete this abstraction later, particularly in the catalog service which is using a back-level of ThinkerPop/Germlin that is not supported by Titan 1.0.

In the meantime a new graph initiative call JanusGraph has been spawned from Titan to take the code-base forward.

So, what should our graph strategy be?  Do we focus on a single graph database, if so which one? or do we allow a range of graph databases that can be used depending on the deployment? If we support a range of grpah databases, can standard abstraction layers such as Apache TinkerPop be used?

  

JanusGraph uses a pluggable persistence store to save the metadata content and a search index for its search API.  Apache Atlas can take advantage of this configurability to support a range of size, scalability and performance requirements

Jira
serverASF JIRA
serverId5aa69414-a9e9-3523-82ec-879b028fb15b
keyATLAS-1757
is where the discussion about our graph strategy is occurring and it will be used to coordinate the implementation of whatever is decided.  All welcome ...

 

...