Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.


Status

Page properties


State
: Draft

Discussion thread: https://lists.apache.org/thread.html/7aaea178f8e3deeb1239726cb5a11fb1f504b45742946e8992866f57@%3Cdev.airflow.apache.org%3E

JIRA:

Jira
serverASF JIRA
columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
serverId5aa69414-a9e9-3523-82ec-879b028fb15b
keyAIRFLOW-2651




Motivation

This is an idea that I've used in the past for my own work, seen in this PR (https://github.com/apache/airflow/pull/3526), and now see again in Ash's user survey (https://ash.berlintaylor.com/writings/2019/02/airflow-user-survey-2019/):

More Operators: 11 comments

Requests for more operators/sensors. One good request was to have “composable” operators to explosion of XtoY operators. Ed: this would be nice! If someone wants to start an Airflow Improvement Proposal for this that would be ace.

So, I created this AIP because I think it would be great to have this in Airflow.

Considerations

To prevent a ton of A-to-B operators, we could create hooks which are accessible via a common interface. This allows for interchangeable operators, e.g. CopyOperator which takes 2 of such hooks to copy from system A using hook A to system B using hook B. PR https://github.com/apache/airflow/pull/3526 already created a collection of filesystem hooks using Python's file object API. This is explained in more detail in the corresponding JIRA ticket: 

Jira
serverASF JIRA
serverId5aa69414-a9e9-3523-82ec-879b028fb15b
keyAIRFLOW-2651
.

The result is a few filesystem hooks:

  • ftp
  • hdfs
  • local
  • s3
  • sftp

And a few operators which accept any hook adhering to the file object interface, e.g. (not included in PR):

  • CopyFileOperator
  • DeleteFileOperator
  • CopyTreeOperator

This is specific to filesystems, but I think this idea can be extended to a.o. databases using the Python DB API.

Since the PR above went stale for reasons, before somebody puts a lot of time & effort into this, I think it would be wise to discuss if this is desirable. And if so, if the work from PR 3526 can be used as a basis or if large changes are required.