Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

In the current implementation of ApplicationRunner, there are a few issues: 

  1. Instantiation of specific implementation of ApplicationRunner is exposed to the user, which requires user to choose a specific implementation of ApplicationRunner in source code, depending on the deployment environment (I.e. YARN vs standalone). 
  2. ApplicationRunner only supports high-level API and does not fully support low-level API: 

    1. In standalone environment, user's program written in StreamTask/AsyncStreamTask classes is only supported in LocalApplicationRunner w/ a run() method 

    2. In YARN, RemoteApplicationRunner only support high-level API applications and falls back to JobRunner for low-level API applications. 

  3. There is no unified API to allow user to specify high-level API and low-level API

...

  1. in initialization either. 

  2. There is no defined standard lifecycle of a user

...

  1. application process in both YARN and standalone deployment.

...

  1. Hence,

...

  1. no consistent pattern to insert user

...

  1. code into

...

  1. the application’s full lifecycle 

    1. There is no standard method to insert user-defined application initialization sequence 

...

      1. In YARN, all

...

      1. application processeare

...

      1.  initialized (i.e. configure re-writer, stream processor initialization, etc.) by

...

      1. the build-in

...

      1. main functions in Samza framework (i.e. ApplicationRunnerMain on launch host and LocalContainerRunner on NodeManagers). 

...

      1. In standalone,

...

      1. user

...

      1. can put

...

      1. arbitrary code in user

...

      1. main

...

      1. function to initialize the application process. 

    1. There

...

    1. is no defined method to allow user to inject customized logic

...

    1. before start/stop the user defined processors (I.e. StreamProcessors defined by user applicationeither, in addition to initialization 

Motivation

Our goal is to allow users to write high-level or low-level API applications once and deploy in both YARN and standalone environments without code change. The following requirements are necessary to achieve our goal: 

  1. H
  2. ides
  3. ide the choice of specific implementation of ApplicationRunner via configuration, not in source code. 
  4. Define a

  5. set of standard application lifecycle methods for applications written in both low- and high-level APIs
  6. unified API to allow user to specify processing logic in high- and low-level API in all environment (I.e. all ApplicationRunners) w/ all lifecycle methods 

  7. Define a set of standard application execution methods to deploy both low- and high-level APIs application in YARN

  8.  deployed in YARN
  9. and standalone environments. 

  10. Define a

  11. set of
  12. standard processor life-

  13. cycle 
  14. cycle aware 

  15. hooks 
  16. API to allow

  17.  
  18. user’s

  19. customized logic to be injected in all lifecycle stages in
  20. customized logic to be injected before and after start/stop the processors in both YARN and standalone environments

  21. . Support applications written in low-level API in all environment (I
  22. .

  23. e. all ApplicationRunners) w/ all lifecycle methods
  24.  

Note that we need to define the following concepts clearly: 

...

  1. Application’s execution methods in runtime are the functions to change the deployment status of an application runtime instance

...

  1. (I.e.

...

  1.  run/status/

...

  1. kill/waitForFinish) 
    1. ApplicationRunner’s execution methods are the deployment environment specific implementation

...

    1. to the above API methods 
  1. Application’s processor lifecycle aware methods are the user-defined functions to inject customized logic to be executed before or

...

  1. after we start or stop the stream processing logic defined by the user application (I.e. beforeStart/afterStart/beforeStop/afterStop are called when we start/stop the StreamProcessors in local host) 

Proposed Changes

The proposed changes are the followings: 

...

  1. Define a

...

  1. unified API 

...

  1. ApplicationBase as a single entry point for users to implement all user-

...

  1. customized logic, for both high-level API and

...

  1. low-level API

...

  1.  
    1. User implements a single describe() method to implement all user processing logic before creating the runtime application instance 

      1. Sub-classes StreamApplication and TaskApplication provide specific describe(methods for high-level API and low-level API, respectively

...

Implement customized logic before / after calling the application lifecycle methods (e.g. start/stop) 

      1.  

  1. Define a unified API class ApplicationSpec to contain 

    1. High- and low-level processing logic defined via ApplicationBase.describe() 

    2. User implemented ProcessorLifecycleListener class that includes customized logic to be invoked before and after starting/stopping the StreamProcessor(s) in the user application 

      1. Methods

...

      1. are beforeStart/afterStart/beforeStop/afterStop 

  1. Define the

...

  1. class ApplicationRuntime to represent a runtime instance of a user application,

...

  1. with a

...

  1. set of standard execution methods to change the deployment status of the runtime application instance in different environments (I.e. YARN and standalone)

...

    1. A runtime instance of user application is defined as a bundle of both 

      1. ApplicationSpec that

...

      1. contains all user

...

      1. customized logic. 

  1. This would be setup through the user application’s describe() method when constructing the ApplicationSpec object 

      1. ApplicationRunner that deploys and runs the user code. This would be instantiated from the configuration, not exposed to user at all. 

    1. The set of

...

    1. standard execution methods for an ApplicationRuntime object includes 

...

    1. run/

...

    1. kill/status/waitForFinish 

A high-level overview of the proposed changes is illustrated below: 
Image Removed

Figure-1: high-level user programming model Image Removed


Figure-2: Interaction and lifecycle of runtime API objects (using StreamApplication as  in LocalApplicationRunner as an example). 


The above design achieves the following goals: 

...