Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Apache Atlas should bring together all of the knowledge the organization has about each data source in order to have enough information to differentiate between them during the data selection process.  Figure 1 below shows the types of information about a data source that should be available with the catalog searchthrough Apache Atlas.

 

Figure 1: Drilldown Drill-down from catalog search results to explore the content and qualities of a data set

 

This metadata is assembled through notifications from data processing engines (via the bridges/hooks), from metadata discovery pipelines, from user interfaces and API calls.  The result is a rich description of the data source and its content.   All of this detail is necessary to support the catalog search because an organization is likely to have many hundreds of data sources that seem to have the same type of data in them but each may have different levels of quality, coverage of attributes, scope of instances, currency, precision etc. 

<more to come>

During the search for data for a data project, the Atlas user needs to be able to iteratively search, review results and refine the search to narrow down the list of candidate data sources as fast as possible.  When they have identified the assets of interest they can request the data is provisioned to a sandbox for further analysis.

 

...

Supporting Architecture

The architecture that supports the catalog search is shown in figures 2 and 3. In all cases, the catalog search UI accesses Apache Atlas through the Catalog Open Metadata Access Service (OMAS) REST API.  This interface interacts with an Open Metadata Repository Service (OMRS) Connector that it retrieves from the Open Connector Framework (OCF). All OMRS connectors support the same interfaces:

  • The entity and relationship types supported by the metadata repositor(y/ies)
  • The entity and relationship APIs to access all types of metadata in a common manner
  • Specialized, type-safe interfaces for the core metadata types that are included in the Apache Atlas build.

There are two implementations of the OMRS Connector provided for Apache Atlas: a Local Atlas OMRS Connector for accessing a local Apache Atlas metadata repository and an Enterprise OMRS Connector for making federated queries across many metadata repositories. 

Image Added

Figure 2: Catalog search using a single instance of an Apache Atlas Repository

Figure 2 shows the Catalog OMAS API calling the Local Atlas OMRS Connector.  The Local Atlas OMRS connector provides access to the local metadata repository.  Within the Atlas metadata repository is the graph that provides the metadata entities and linking relationships.  The graph is supported by other data stores that provide logs and other supporting information.  The repository service provides a search API over all of the repository stores as well as a query and update interface. 

Image Added

Figure 3: Catalog search across an enterprise

Figure 3 shows the Catalog OMAS API calling the Enterprise OMRS connector.  This connector makes calls to the local OMRS connector as well as REST API calls to the OMRS connectors on remote metadata repositories. 

If these remote repositories are Apache Atlas, then the OMRS connector called would be a Local Atlas OMRS Connector.  However, other metadata repositories may be connected in by implementing their own OMRS connector that translated the OMRS requests into their local API calls.

 

 

...

 

<more to come> 

 

...