...
- OrganizationProvider
- Use Case: Given a finite list of organization IDs, pull latest details about each.
- Input: List of organization ids
- Output: github:organization
- Use Case: Given a finite list of user IDs, pull latest details on each organization any of them belong to.
- Input: List of user ids
- Output: github:organization
- Use Case: Given a finite list of organization IDs, pull latest details about each.
- RepositoryProvider
- Use Case: Given a finite list of organization IDs, pull latest details on each public repository belonging to any of them.
- Input: List of organization ids
- Output: github:repository
- Use Case: Given a finite list of user IDs, pull latest details on each public repository belonging to any of them.
- Input: List of organization ids
- Output: github:repository
- Use Case: Given a finite list of organization IDs, pull latest details on each public repository belonging to any of them.
- UserProvider
- Use Case: Given a finite list of user IDs, pull latest details on all of them.
- Input: List of user ids
- Output: github:user
- Use Case: Given a finite list of user IDs, pull latest details on all of them.
- UserFollowingProvider
- Use Case: Given a finite list of user IDs, pull latest details on each user any of them are following, maintaining the follow connection.
- Input: List of user ids
- Output: github:follow (github:user, github:user)
- Use Case: Given a finite list of user IDs, pull latest details on each user any of them are following, maintaining the follow connection.
- Input: List of user ids
- Output: github:follow (github:user, github:user)
- Use Case: Given a finite list of user IDs, pull latest details on each user any of them are following, maintaining the follow connection.
Look for a high-quality java library.
Search github / stack-overflow / google and see if you can find a high-quality java library to simplify the code involved in getting the data.
It should be:
- Publicly available source code
- FOSS-friendly license (Apache 2.0, MIT, etc...)
- In a public maven repository
- Active
First look on web site of data source
https://developer.github.com/libraries/
No official java library
Look at list of third party options
#1) GitHub Java API (org.eclipse.egit.github.core)
Publicly available source code - https://github.com/eclipse/egit-github/tree/master/org.eclipse.egit.github.core
Eclipse Public License v1 (acceptable) - https://github.com/eclipse/egit-github/blob/master/org.eclipse.egit.github.core/about.html
https://www.apache.org/legal/resolved says EPL can be used but only binaries - that's fine we around going to redistribute the source.
In a public maven repository - http://search.maven.org/#search%7Cga%7C1%7Cegit-github
Active - The last release was in 2013, that's not ideal, but there are still people committing on the project and working on issues
#2) GitHub API for Java
Publicly available source code - http://github-api.kohsuke.org/
MIT License (acceptable) - http://github-api.kohsuke.org/license.html
In a public maven repository - https://oss.sonatype.org/#nexus-search;quick~github-api
Active - There have been 80 releases, thats a great sign.
Publicly available source code - https://github.com/jcabi/jcabi-github
License - not an open license. Deal-breaker.
#2 looks like best bet.
Document how each provider class will acquire data.
Make notes on how you plan to use the java library to get the source data into each provider.
- OrganizationProvider
- RepositoryProvider
- UserProvider
- UserFollowProvider
Figure out permissions
If the data source requires special permissions to get at the dataset you are looking at, figure out how to get those permissions and document the process.
Document all the information that will be needed to connect to the data source.
Make an empty module
Create an empty module in your own project or in streams-contrib. Make sure it's part of the reactor.
Create a base configuration object
Create a json schema file (src/main/jsonschema) with fields containing all the information needed to establish basic connectivity with the data source.
These fields should include:
- everything needed to connect
- everything needed to authenticate
Example:
Create a reference.conf
Create a reference.conf file in src/main/resources containing a HOCON snippet matching the base configuration schema containing just the connection details.
This file should contain only the connection details, no credentials.
By putting these in reference.conf, you ensure that they get set by default for anyone who uses the module, thus relieving you of needed to bake default values into either the code or the json schemas.
Example:
Create a credential resource file for testing
Create an application.conf file containing a HOCON snippet matching the base configuration schema containing your credentials.
This file should contain only your credentials - but you only need one credential file for every provider you are working with.
Example:
Create a unit test that demonstrates reading the test configuration resource into the configuration object
The test should demonstrate that the test resource gets loaded from the hocon snippet, into the JVM properties, then using StreamsConfigurator into an instance of the base configuration object.
Example:
TODO
Create a base provider that just opens a re-usable connection object to the data source
Create a java class which implements StreamsProvider.
This provider doesn't need to implement any of the read* methods, just prepare. Calling prepare should result in a Provider with a live connection to the data source.
Appropriate validation on the configuration and on the resulting connection object should be added.
Example:
TODO
Make an empty module
Create an empty module in your own project or in streams-contrib. Make sure it's part of the reactor.
Create a base configuration object
Create a json schema file (src/main/jsonschema) with fields containing all the information needed to establish basic connectivity with the data source.
These fields should include:
- everything needed to connect
- everything needed to authenticate
Example:
Create a reference.conf
Create a reference.conf file in src/main/resources containing a HOCON snippet matching the base configuration schema containing just the connection details.
This file should contain only the connection details, no credentials.
By putting these in reference.conf, you ensure that they get set by default for anyone who uses the module, thus relieving you of needed to bake default values into either the code or the json schemas.
Example:
Create a credential resource file for testing
Create an application.conf file containing a HOCON snippet matching the base configuration schema containing your credentials.
This file should contain only your credentials - but you only need one credential file for every provider you are working with.
Example:
Create a unit test that demonstrates reading the test configuration resource into the configuration object
The test should demonstrate that the test resource gets loaded from the hocon snippet, into the JVM properties, then using StreamsConfigurator into an instance of the base configuration object.
Example:
TODO
Write a primary class to manage the HTTP connections and implement accessor methods.
Give it a singleton getInstance method driven from the configuration object
Example:
org.apache.streams.twitter.api.Twitter
Create an integration test that demonstrates basic connectivity
Make a 'IT' in src/test/java that loads the test configuration with your credentials in it, instantiates the primary connection class, asserts that the connection object is instantiated and authorized.
Example:
Authentication
Find request signing documentation
Most APIs require requests to be cryptographically signed. The exact details and protocols may differ.
Figure out permissions
If the data source requires special permissions to get at the dataset you are looking at, figure out how to get those permissions and document the process.
Document all the information that will be needed to connect to the data source.
Integration
Our goal is to create interfaces that let us access important entities, events, and relationships from the data provider in their native format, via java objects generated from schemas.
Create at least one java interface to wrap the data provider
REST interfaces typically have a tree structure:
- http://api.github.com/
- orgs/
- repos/
- users/
A call to the interface will typically contain:
- a path, which might contain path parameters,
- a set of query parameters
- and/or a request entity
Typically, for each path we want to call we will create a java method on one of several interfaces, enumerate the request parameters, describe the request as a java bean, and describe the response as a java bean.
Example:
Create an integration test that demonstrates connectivity
Make a 'IT' in src/test/java that loads the test configuration with your credentials in it, instantiates a providerinstatiates the primary class, and then asserts tests that the connection object is connected and authorized.
...