WIP
This wiki will serve as resource to document some of the reasoning behind the design decisions for sqoop2
Why Tomcat for the Sqoop2 Server?
Tomcat provides the basic web-server container to host sqoop as a service. One of the design goals of Sqoop2 was to provide rest-apis for creating sqoop jobs. It has its quirks and in 2014 there are better alternatives we can use for a JVM based web-server. BTW, we welcome patches to support jetty or netty for the sqoop server.
What is the Sqoop2 Repository and why do we use Derby? Can we document-store to save the Sqoop entities?
What are the main design goals of Sqoop2?
The overarching goals are documented here. But there are more subtle ones will be added here.
- Allow development of data connectors against a stable API, independent on Sqoop2 implementation internals (such as choice of execution engine, dependency on Hadoop components, etc). For example: Oracle connector can't assume a tnsnames.ora exists in the environment, Kite connector can't assume that hive-site.xml will exist. The connector can still ask for a location of hive-site.xml or tnsnames.ora as an input when creating a link though.
- Connectors focus on how to get data in and out of data systems. The framework include execution life-cycle - kicking off tasks / workers and such. We never rely on the framework to handle data reads and writes (even though most frameworks have IO capability) - this is the responsibility of the connectors.
Adding some fun facts about the design are encouraged!