Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Added links to child pages

...

Typically the end user would want to use meaningful business terms to describe the data they need, they may want so see related descriptions of the data and the profile of its data values and its lineage.  Other information about the owners/stewards of the data and the organization they come from, and any license associated with the data would also be relevant.  To provide this information, the VDC project needs to expand the types defined in Apache Atlas; expand out the capability of the glossary so it supports categories and other types of semantic relationships to help the end user locate the right data; provide a new catalog API and interface for discovery of data based on these values.

...

Consider the case where the end user is searching for additional sources for their project and the data that they need has not been provisioned into HDFS - it is still on the source systems.   However, these data sources are already catalogued in another metadata repository.  To be valuable, Apache Atlas's Catalog search needs to be able to cast it its search to reach data and metadata repositories beyond Hadoop in order to locate all available data.   Once the end user has identified interesting sources, they may then request that the data is provisioned into HDFS for further analysis.  The VDC project will introduce the frameworks, integration and adapter capability to allow a more enterprise view of the potential data sources, plus a metadata driven connector framework for connecting to both data and metadata repositories.

...