Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Add link to Egeria

...

Forward to ==> Area 1 - Collaboration

 

In the Atlas build, model files are added in alphanumeric order to build up the type library.  Therefore the model files are numbered to ensure its dependencies have been resolved before a model file is loaded.   For the open metadata types, the model file number is shown in package name to document the intended load order.  The numbering scheme of the metadata areas described in Building out the Open Metadata Typesystem provides the high-level numbering system.  For example, area 1 has models 0100-0199, area 2 has models 0200-299, etc.   Each area's sub-models are dispersed along its range, ensuring there is space to insert additional models in the future.

In addition to the 7 metadata areas, the open metadata models introduces some base definitions and structures that are used throughout the open metadata models.  These need to be loaded first and so we have added an Area 0 that includes these base definitions.   The current models defined for Apache Atlas have been moved to 1000.  Thus the models numbers from 0000 to 0099 will look something like figure 1.

Image Removed

Figure 1: Summary of Area 0's packages

 

 

Details of the new models are shown below.  The original Apache models are linked to on page: Atlas Model.

Base Model

The base model defines key concepts such as Referenceable, Asset, Infrastructure, Process and DataSet.  It has been restructured so that Asset now inherits from Referenceable.

Image Removed

Figure 2: Existing base model

 

 

Linked Media Types

Linked media types describes the simple structures that are used repeatedly in open metadata to connect it to documents and entities in other types of repositories.

Image Removed

Figure 3: Linked media for any referenceable entity

  • External references link metadata to elements in external repositories.

     

  • Media such as images allow an icon, thumbnail and larger images to be associated with a metadata element.  They are intended to be displayed with the metadata content.  These images enrich the description of the object and may include, for example, design drawings, photographs or illustrations of the component in action.

     

 

External Identifiers

External Identifiers are identifiers for a Referenceable that are used in external systems.

Image RemovedImage Removed

Figure 4: Correlating metadata with information in other systems

 

Property Facets

Property facets allow any entity to be extended with additional properties.  This is particularly useful for storing metadata that originated in another type of metadata repository, or tool since it allows vendor/tool specific values to be stored.

Image Removed

Figure 5: Adding custom properties to any referenceable entity

 

Locations

It is important to understand where assets are located to ensure they are properly protected and comply with data sovereignty laws.  The open metadata model allows location information to be captured at many levels of granularity.

Image Removed

Figure 6: Understanding where data assets and services are located

The NestedLocation relationship allows hierarchical grouping of locations to be represented.  Notice that locations can be organized into multiple hierarchies.

The AdjacentLocation relationship links locations that touch one another.

The notion of a location is variable and the classifications FixedLocation, SecureLocation and CyberLocation help to clarify the nature of the location.

  • FixedLocation means that the location represents a physical place where, for example, Hosts (see 0030 below), servers (see 0040 below) and hence data may be located.  This could be an area of a data center, the building the data center is located in, or even the country where the server/data is located.
  • SecureLocation indicates that there is restricted access to the location.
  • CyberLocation means that the location describes something in cyber space.

 

Hosts and Platforms

The host and platform metadata entities provide a simple model for the IT infrastructure (nodes, computers, etc) that data resources are hosted on.

Image Removed

Figure 7: Defining the platform that the data assets and services run on

A Host is an IT Infrastructure concept associated with the hardware running the systems.  It provides a mechanism for describing a unit of hardware that provides the ability to host software servers.

The host can be linked to its location through the HostLocation relationship.

The operating platform is an informational structure to describe the operating system of the host.  Many hosts could have the same operating platform.

 

Complex Hosts

In today's systems, hardware is managed to get the maximum use out of it.  Therefore the concept of a Host is typically virtualized to allow a single computer to be used for many hosts and for multiple computers to collectively support a single host.

The complex hosts handle environments where many nodes are acting together as a cluster, and where virtualized containers (such as Docker) are being used.

Image Removed

Figure 8: Supporting server clusters and server virtualization (server containers)

A HostCluster describes a collection of hosts that together are providing a service.  Clusters are often used to provide horizontal scaling of services.

A VirtualContainer provides the services of a host to the software servers deployed on it (see 0040 below).  When the server makes requests for storage, network access etc, the VirtualContainer delegates the requests to the equivalent services of the actual host it is deployed on.

VirtualContainers can be hosted on other VirtualContainers, but to actually run they need to ultimately be deployed on to a real physical Host.

 

 

Software Servers

Software servers describe the middleware software servers (such as application servers, data movement engines and database servers) that run on the Hosts.   Within the software server model we capture its userId tat is operates under. Most metadata repositories are run in a secure mode requiring incoming requests to include the requester’s security credentials.  Therefore we have an identifier for each unique logged on security identity (aka userId).  This identity is recorded within specific entities and relationships when they are created or updated.  By storing the user identifier for the server, it is possible to correlate the server with the changes to the metadata (and related data assets) that it makes. 

See model 0310 Actors in Area 3 for details of how user identifiers are correlated with people and teams).

Image Removed

Figure 9: Servers and their connectivity and capabilities

Open metadata may also capture the network endpoint(s) that the server is connected to and the host it is deployed to.

The endpoint defines the parameters needed to connect to the server.  It also features in the Connection model used by applications and tools to call the server.  Thus through the endpoint entity it is possible to link the connection to the underlying server.

Within the server are many capabilities.  These range from full applications (see 0170 in Area 1) to security plugins to logging and encryption libraries.  Different organizations and tools can choose the granularity in which the capabilities are captured in order to provide appropriate context to data assets and the decisions made around them.

 

Software Servers and Assets

Assets are managed or consumed by Server Capabilities.  the model below shows how this relationship is captured.

Image Removed

Figure 10: Assets linked to server capabilities

 

 

Networks and Gateways

The network model for open metadata is very simple, to allow hosts to be grouped into the networks they are connected to.  This can show details such as where hosts are isolated in private networks, where the gateways onto the Internet. 

 

Image Removed

Figure 11: The networks that specific hosts connect to

 

 

Cloud Platforms and Services

The cloud platforms and services model show that cloud computing is not so different from what we have been doing before.  Cloud infrastructure and services are classified as such to show that the organization is not completely in control of the technology supporting their data and processes.

 

Image Removed

Figure 12: Cloud platforms and services

The cloud provider is the organization that provides and runs the infrastructure for a cloud service.  Typically the host it offers is actually a HostCluster. 

The cloud provider may offer infrastructure as a service (IaaS), in which case, an organization can deploy VirtualContainers onto the cloud provider's HostCluster (see model 0035 above).

If the cloud provider is offering platform as a service (PaaS), an application may deploy server capability onto the cloud platform.

If the cloud provider is offering Software as a Service (SaaS) then it has provided APIs backed by pre-deployed server capability that an organization can call as a cloud service.

 

 

 

...

  See https://odpi.github.io/egeria/open-metadata-publication/website/open-metadata-types/Area-0-models.html.




 

...