Sqoop2 Design Bits

WIP

This wiki will serve as resource to document some of the reasoning behind the design decisions for sqoop2

Why Tomcat for the Sqoop2 Server?

Tomcat provides the basic web-server container to host sqoop as a service. One of the design goals of Sqoop2 was to provide rest-apis for creating sqoop jobs. It has its quirks and in 2014 there are better alternatives we can use for a JVM based web-server. BTW, we welcome patches to support jetty or netty for the sqoop server.

What is the Sqoop2 Repository and why do we use Derby? Can we document-store to save the Sqoop entities?

Sqoop2 job information is persisted in the repository. We chose Derby as an initial implementation probably for its simplicity. Since we have a well-defined Repository API, it is possible to add support for additional DB implementation to store the Sqoop2 job and its associated information. The Sqoop2 entities such as the Connector Configurables, Links, Jobs, LinkConfigs and JobConfigs are currently modeled in a way that are best represented in a relational database, but it should be possible to store them in a document-store such as mongoDB and the constraints such as unique names across connectors might have to be modeled in code unlike in RDMS.

What are the main design goals of Sqoop2?

The overarching goals are documented here. But there are more subtle ones will be added here.

From Gwen Shapira,

Remove the dependency between the connectors and the framework.The same way that an Oracle connector won't be able to assume a tnsnames.ora exists in the environment, we don't want Kite connector to assume that hive-site.xml will exist.The connector can still ask for a location of hive-site.xml as an input when creating a link though.

Adding some fun facts about the design are encouraged!

Child pages

Sqoop2 Design Bits