Work in Progress
Overview
The sequence diagrams below are intended to be a very detailed description of the interactions that occur during the process of defining, submitting and executing a map reduce job on a secure Hadoop 2.x cluster. Different phases of the overall process are covered in each diagram.
- Bootstrap
- Job Definition
- Job Submission
- Job Initiation
- Map Task Execution
- Reduce Task Execution
- Job Completion
- Client Monitoring
Legend
The descriptions of the interactions in the sequence diagrams below take this form.
message [Protocol] ( input ) : output
The [Protocol]
portion describes the protocol, authentication mechanism and identities exchanged.
Abbreviation |
Description |
||
---|---|---|---|
<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="b46f6833-aa6b-4300-9487-af45a92136b4"><ac:plain-text-body><![CDATA[ |
|
Kerberos Protocol |
]]></ac:plain-text-body></ac:structured-macro> |
<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="67c58403-9fcd-49e5-bb32-b37a81d50833"><ac:plain-text-body><![CDATA[ |
|
RPC protocol with SASL mutual authentication using Kerberos tickets. |
]]></ac:plain-text-body></ac:structured-macro> |
<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="da7b6d89-9788-4234-8fa4-66d54ab4b9bb"><ac:plain-text-body><![CDATA[ |
|
RPC protocol with SASL mutual authentication using delegation tokens. |
]]></ac:plain-text-body></ac:structured-macro> |
<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="8b0b03d3-6681-4263-8bef-490943f9e865"><ac:plain-text-body><![CDATA[ |
|
RPC protocol with SASL mutual authentication using delegation tokens. |
]]></ac:plain-text-body></ac:structured-macro> |
<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="812cb529-23fd-4ffb-ac10-7c97b13c420a"><ac:plain-text-body><![CDATA[ |
|
Shuffle data transfer protocol between ShuffleService and ReduceTask. HTTP protocol with TODO. |
]]></ac:plain-text-body></ac:structured-macro> |
<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="48aef52d-e354-4c36-80d7-ae0c47d58aec"><ac:plain-text-body><![CDATA[ |
|
Block data transfer protocol between the DataNode and a client. HTTP protocol with block tokens plus SHA1 hash exchange. |
]]></ac:plain-text-body></ac:structured-macro> |
Suffixes are used in many cases to denote type.
Abbreviation |
Description |
---|---|
tgt |
Kerberos Ticket Granting Ticket |
kt |
Kerberos Service Ticket: u-jt-kt = A Kerberos Ticket for User u to access the JobTracker jt |
kp |
Kerberos Principal: nn-kp = The Kerberos principal for the NameNode nn |
dt |
Delegation Token: c-nn-dt = A delegation token for identity of the Client that can be presented to the NameNode. |
tkn |
Access Token: am-tkn = An access token that can be presented to the ApplicationMaster for access. |
tkn-sk |
Token Secret Key |
Kerberos principals use the principal abbreviation and the kp suffix.
Abbreviation |
Description |
---|---|
|
NameNode's Kerberos Principal |
|
DataNode's Kerberos Principal (Unique principal for each DataNode on every node) |
|
JobTracker's Kerberos Principal |
|
TaskTracker's Kerberos Principal (Unique principal for each TaskTracker on every node) |
Kerberos tickets use the consumer principal abbreviation, provider principal abbreviation and kt suffix.
Abbreviation |
Description |
---|---|
|
Kerberos service ticket for User u to access NameNode nn |
|
Kerberos service ticket for User u to access JobTracker jt |
|
Kerberos service ticket for DataNode dn to access NameNode nn |
|
Kerberos service ticket for JobTracker dn to access NameNode nn |
|
Kerberos service ticket for TaskTracker tt to access JobTracker jt |
Secure MapReduce2 - Bootstrap
This diagram illustrates the interactions that occur when a Hadoop system is starting up and stabilizing. It involves various master components generating secret keys and slave components registering with the masters to receive these secret keys.
Secure MapReduce2 - Job Definition
This diagram illustrates the steps taken by a client to define a MapReduce job that will later be submitted.
Secure MapReduce2 - Job Submission
This diagram illustrates the steps taken during the submission of a MapReduce job.
Secure MapReduce2 - Job Initiation
This diagram illustrates the steps taken when a MapReduce job is scheduled for execution.
Secure MapReduce2 - Map Task Execution
This diagram illustrates the steps taken when the Map portion of a MapReduce job is executed.
Secure MapReduce2 - Reduce Task Execution
This diagram illustrates the steps taken when the Reduce portion of a MapReduce job is executed.
Secure MapReduce2 - Job Completion
This diagram illustrates the steps taken a MapReduce job has completed.
Secure MapReduce2 - Client Monitoring
This diagram illustrates the steps taken by a Client to monitor the status of a Job throughout the Job's life-cycle. The timeframe for this diagram span several of the diagrams above starting from Job Submission all the way through Job Completion.