Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  1. OrganizationProvider
    1. Use Case: Given a finite list of organization IDs, pull latest details about each. 
      1. Input: List of organization ids
      2. Output: github:organization
    2. Use Case: Given a finite list of user IDs, pull latest details on each organization any of them belong to.
      1. Input: List of user ids
      2. Output: github:organization
  2. RepositoryProvider
    1. Use Case: Given a finite list of organization IDs, pull latest details on each public repository belonging to any of them.
      1. Input: List of organization ids
      2. Output: github:repository
    2. Use Case: Given a finite list of user IDs, pull latest details on each public repository belonging to any of them.
      1. Input: List of organization ids
      2. Output: github:repository
  3. UserProvider
    1. Use Case: Given a finite list of user IDs, pull latest details on all of them.
      1. Input: List of user ids
      2. Output: github:user
  4. UserFollowingProvider
    1. Use Case: Given a finite list of user IDs, pull latest details on each user any of them are following, maintaining the follow connection.
      1. Input: List of user ids
      2. Output: github:follow (github:user, github:user)
    2. Use Case: Given a finite list of user IDs, pull latest details on each user any of them are following, maintaining the follow connection.
      1. Input: List of user ids
      2. Output: github:follow (github:user, github:user)

Look for a high-quality java library.

Search github / stack-overflow / google and see if you can find a high-quality java library to simplify the code involved in getting the data.

It should be:

  1. Publicly available source code
  2. FOSS-friendly license (Apache 2.0, MIT, etc...)
  3. In a public maven repository
  4. Active

First look on web site of data source

https://developer.github.com/libraries/

No official java library

Look at list of third party options

#1) GitHub Java API (org.eclipse.egit.github.core)

Publicly available source code - https://github.com/eclipse/egit-github/tree/master/org.eclipse.egit.github.core

 

Eclipse Public License v1 (acceptable) - https://github.com/eclipse/egit-github/blob/master/org.eclipse.egit.github.core/about.html

https://www.apache.org/legal/resolved says EPL can be used but only binaries - that's fine we around going to redistribute the source.

In a public maven repository - http://search.maven.org/#search%7Cga%7C1%7Cegit-github

Active - The last release was in 2013, that's not ideal, but there are still people committing on the project and working on issues

#2) GitHub API for Java

Publicly available source code - http://github-api.kohsuke.org/

MIT License (acceptable) - http://github-api.kohsuke.org/license.html

In a public maven repository - https://oss.sonatype.org/#nexus-search;quick~github-api

Active - There have been 80 releases, thats a great sign.

#3) http://github.jcabi.com/

Publicly available source code - https://github.com/jcabi/jcabi-github

License - not an open license.  Deal-breaker.

#2 looks like best bet.

Document how each provider class will acquire data.

Make notes on how you plan to use the java library to get the source data into each provider.

  1. OrganizationProvider
  2. RepositoryProvider
  3. UserProvider
  4. UserFollowProvider

Figure out permissions

If the data source requires special permissions to get at the dataset you are looking at, figure out how to get those permissions and document the process.

Document all the information that will be needed to connect to the data source.

Make an empty module

Create an empty module in your own project or in streams-contrib.  Make sure it's part of the reactor.

Create a base configuration object

Create a json schema file (src/main/jsonschema) with fields containing all the information needed to establish basic connectivity with the data source.

These fields should include:

  • everything needed to connect
  • everything needed to authenticate

Example:

https://git-wip-us.apache.org/repos/asf?p=incubator-streams.git;a=blob;f=streams-contrib/streams-provider-twitter/src/main/jsonschema/com/twitter/TwitterConfiguration.json;h=69048d123022a2e138932c8a14ef9e846438bc41;hb=HEAD

Create a reference.conf

Create a  reference.conf file in src/main/resources containing a HOCON snippet matching the base configuration schema containing just the connection details.

This file should contain only the connection details, no credentials.

By putting these in reference.conf, you ensure that they get set by default for anyone who uses the module, thus relieving you of needed to bake default values into either the code or the json schemas.

Example:

https://git-wip-us.apache.org/repos/asf?p=incubator-streams.git;a=blob;f=streams-contrib/streams-provider-twitter/src/main/resources/reference.conf;h=b5c9f6f1f58ccc4c3c56e9b18dbddf42aa2d3192;hb=HEAD

Create a credential resource file for testing

Create an application.conf file containing a HOCON snippet matching the base configuration schema containing your credentials.

This file should contain only your credentials - but you only need one credential file for every provider you are working with.

Example:

Create a unit test that demonstrates reading the test configuration resource into the configuration object

The test should demonstrate that the test resource gets loaded from the hocon snippet, into the JVM properties, then using StreamsConfigurator into an instance of the base configuration object.

Example:

TODO

Create a base provider that just opens a re-usable connection object to the data source

Create a java class which implements StreamsProvider.  

This provider doesn't need to implement any of the read* methods, just prepare.  Calling prepare should result in a Provider with a live connection to the data source.  

Appropriate validation on the configuration and on the resulting connection object should be added.

Example:

TODO

Make an empty module

Create an empty module in your own project or in streams-contrib.  Make sure it's part of the reactor.

Create a base configuration object

Create a json schema file (src/main/jsonschema) with fields containing all the information needed to establish basic connectivity with the data source.

These fields should include:

  • everything needed to connect
  • everything needed to authenticate

Example:

https://git-wip-us.apache.org/repos/asf?p=incubator-streams.git;a=blob;f=streams-contrib/streams-provider-twitter/src/main/jsonschema/com/twitter/TwitterConfiguration.json;h=69048d123022a2e138932c8a14ef9e846438bc41;hb=HEAD

Create a reference.conf

Create a  reference.conf file in src/main/resources containing a HOCON snippet matching the base configuration schema containing just the connection details.

This file should contain only the connection details, no credentials.

By putting these in reference.conf, you ensure that they get set by default for anyone who uses the module, thus relieving you of needed to bake default values into either the code or the json schemas.

Example:

https://git-wip-us.apache.org/repos/asf?p=incubator-streams.git;a=blob;f=streams-contrib/streams-provider-twitter/src/main/resources/reference.conf;h=b5c9f6f1f58ccc4c3c56e9b18dbddf42aa2d3192;hb=HEAD

Create a credential resource file for testing

Create an application.conf file containing a HOCON snippet matching the base configuration schema containing your credentials.

This file should contain only your credentials - but you only need one credential file for every provider you are working with.

Example:

Create a unit test that demonstrates reading the test configuration resource into the configuration object

The test should demonstrate that the test resource gets loaded from the hocon snippet, into the JVM properties, then using StreamsConfigurator into an instance of the base configuration object.

Example:

TODO

Write a primary class to manage the HTTP connections and implement accessor methods.

Give it a singleton getInstance method driven from the configuration object


Example:

org.apache.streams.twitter.api.Twitter

Create an integration test that demonstrates basic connectivity 

Make a 'IT' in src/test/java that loads the test configuration with your credentials in it, instantiates the primary connection class, asserts that the connection object is instantiated and authorized.

Example:

Authentication

Find request signing documentation

Most APIs require requests to be cryptographically signed.  The exact details and protocols may differ. 

Locate and review current documentation about this topic.

Figure out permissions

If the data source requires special permissions to get at the dataset you are looking at, figure out how to get those permissions and document the process.

Document all the information that will be needed to connect to the data source.

 

Integration

Our goal is to create interfaces that let us access important entities, events, and relationships from the data provider in their native format, via java objects generated from schemas.

Create at least one java interface to wrap the data provider

REST interfaces typically have a tree structure:

A call to the interface will typically contain:

  • path, which might contain path parameters
  • a set of query parameters 
  • and/or a request entity

Typically, for each path we want to call we will create a java method on one of several interfaces, enumerate the request parameters, describe the request as a java bean, and describe the response as a java bean.

Example:

Create an integration test that demonstrates connectivity

Make a 'IT' in src/test/java that loads the test configuration with your credentials in it, instantiates a providerinstatiates the primary class, and then asserts tests that the connection object is connected and authorized.

...