Running HelloREEF with no client

The difference between HelloREEF and HelloREEFNoClient

 The HelloREEF application has multiple versions that all service different needs; one of these applications, HelloREEFNoClient, allows the creation of the Driver and Evaluators without a Client. In many scenarios involving a cluster of machines one Client will access multiple Drivers, so not every Driver needs to create a Client and that is where the HelloREEFNoClient application shines.

Running HelloREEFNoClient is nearly identical to running HelloREEF:

 

> java -cp lang/java/reef-examples/target/reef-examples-{$REEF_VERSION}-shaded.jar org.apache.reef.examples.hello.HelloREEFNoClient

The output should be the same to HelloREEF, with evaluator.stdout containing the “Hello, REEF!” message.

Running HelloREEF on YARN

REEF applications can be run on multiple runtime environments. Using HelloREEFYarn, we will see how to configure and launch REEF applications on YARN.


Prerequisites

You have compiled REEF locally, and have YARN installed and correctly configured.

How to configure REEF on YARN

The only difference between running a REEF application on YARN vs locally is the runtime configuration:

final LauncherStatus status = DriverLauncher
        .getLauncher(YarnClientConfiguration.CONF.build())
        .run(getDriverConfiguration(), JOB_TIMEOUT);

How to launch HelloREEFYarn

Running HelloREEFYarn is very similar to running HelloREEF:

> yarn jar lang/java/reef-examples/target/reef-examples-{$REEF_VERSION}-shaded.jar org.apache.reef.examples.hello.HelloREEFYarn

You can see how REEF applications work on YARN environments in Introduction to REEF.


Running HelloREEF on Azure Batch

Prerequisites

You have compiled REEF locally, and have Azure Batch Pool configured. See communication configuration instructions to enable external batch communication. It is suggested to use data-science-vm published by microsoft-ads, which has Java pre-installed.

Running HelloREEF on Azure Batch using Java

How to configure REEF Java on Azure Batch

REEF Azure Batch runtime configuration is provided through a helper class (AzureBatchRuntimeConfiguration.java) which reads an avro configuration file.

The configuration can either set a system environment variable REEF_AZBATCH_CONF or a direct file path.

Load configuration through an environment variable:

Configuration config = AzureBatchRuntimeConfiguration.fromEnvironment();

Load configuration through a file path:

String pathName = "./dummyFilePath";
Configuration config = AzureBatchRuntimeConfiguration.fromTextFile(new File(pathName));

Sample configuration file:

{
  "language": "Java",
  "Bindings": [
    {
      "key": "org.apache.reef.runtime.azbatch.parameters.IsWindows",
      "value": "false"
    },
    {
      "key": "org.apache.reef.runtime.azbatch.parameters.AzureBatchAccountKey",
      "value": "dummyvalue1234562Wbg0CqnIdyFiZXr1G5URGnfRTVQnQ50LvB5+wnrr5ERS87TH/8K93ViZn/qfH0SGH4DKQ=="
    },
    {
      "key": "org.apache.reef.runtime.azbatch.parameters.AzureBatchAccountName",
      "value": "reefbatchaccountname"
    },
    {
      "key": "org.apache.reef.runtime.azbatch.parameters.AzureBatchAccountUri",
      "value": "https://reefbatchaccountname.westus2.batch.azure.com"
    },
    {
      "key": "org.apache.reef.runtime.azbatch.parameters.AzureBatchPoolId",
      "value": "myreefpool"
    },
    {
      "key": "org.apache.reef.runtime.azbatch.parameters.AzureStorageAccountName",
      "value": "reefstoragename"
    },
    {
      "key": "org.apache.reef.runtime.azbatch.parameters.AzureStorageAccountKey",
      "value": "dummyvalue123456Wh5+f8lN4H3BnwgIHi3Xj/ohNZt5sm8ZWK8jnKWWKD2r9WeBw8Yad5CGjyd7s9lSY01RDw=="
    },
    {
      "key": "org.apache.reef.runtime.azbatch.parameters.AzureStorageContainerName",
      "value": "reef-container"
    }
  ]
}

An example configuration can be seen in HelloReefAzBatch.java.

How to launch HelloReefAzBatch

Running HelloReefAzBatch Java with no client:

java -cp lang/java/reef-examples/target/reef-examples-{$REEF_VERSION}-SNAPSHOT-shaded.jar org.apache.reef.examples.hello.HelloReefAzBatch

Warning: Due to a limitation of the current implementation, HelloReefAzBatch client is not supported unless the client is running on an Azure Batch node.

Running HelloREEF on Azure Batch using .NET

Warning: Only Windows VMs are supported.

How to configure REEF .NET on Azure Batch

Like running REEF on Azure Batch using Java, an example is provided in HelloREEF.cs.

How to run REEF .NET on Azure Batch

Running HelloREEF .NET with client:

reef\lang\cs\bin\.netcore\Debug\Org.Apache.REEF.Examples.HelloREEF\net461>Org.Apache.REEF.Examples.HelloREEF.exe "azurebatch"

How to configure REEF .NET Driver Client communication on Azure Batch

By default, an external entity cannot directly communicate with an Azure Batch node. In order to enable this communication, the Azure Batch Pool will need to have a configured InboundNATPool.

The InboundNATPool maps individual frontend ports to individual batch nodes. For REEF usage, you will need to account for the number of nodes in the batch pool and the number of tasks expected to run on a node. The Frontend port range must span the same number of ports as there will be nodes. Likewise, there must be the same number of InboundEndPoints as tasks you expect to run on a node. Once configured, the list of possible backend ports should be specified in AzureBatchRuntimeClientConfiguration; like in HelloREEF.cs.


Example InboundNATPool InboundEndPoints:

Name Backend PortFrontend port rangeProtocol 
Endpoint1 2000  1-100 TCP
Endpoint2 2001 101-200TCP

In Endpoint1, it maps each node's backend port (2000) to a frontend port number between 1 and 100. The client will then be able to talk to the backend port through the VM's public IP address and port, e.g. $(External IP):1 will map to $(Internal IP):2000. The user can retrieve a node's public IP address and frontend port through Azure Batch ComputeNode InboundEndPoint.

In REEF, since Driver-Client communication relies on backend ports that are open to the public, the maxmium numbers of Driver tasks that can run on the same node, is the number of backend ports defined in InboundNATPool. This configuration has two InboundEndPoints (Endpoint1 and Endpoint2) and therefore only two drivers can run on one node. If more than two drivers try to run on a node, there won't be enough ports available for port binding.

Likewise, this configuration has a frontend port range that spans 100 ports (1-100 and 101-200) and therefore only 100 nodes can properly use the port mappings. If more than 100 nodes are running tasks, they will run out of frontend ports for port mapping.

Assume a user's pool consists of 2 nodes with the following mapping established:

Node IdEndpointPublic IP AddressFrontend portBackend port
node1 Endpoint1 13.0.0.2012000 
node1 Endpoint2 13.0.0.20 101 2001 
node2 Endpoint1 13.0.0.20 2000 
node2 Endpoint2 13.0.0.201022001

To communicate to node1 on port 2001, the user will call through "13.0.0.20:101".

To communicate to node2 on port 2000, the user will call through "13.0.0.20.2".

Restrict the access when using InboundNATPool

User can use NetworkSecurityGroupRules to setup which IPs should be allowed to be able to talk to the port from outside; thus giving user ability to restrict who can contact the listener. An example can be found here.

 

Azure Batch with Docker containers

Azure Batch has the functionality to run jobs and tasks within a Docker container. Using Docker containers to execute REEF jobs has the benefit of isolating the runtime dependencies in light-weight Docker containers instead of the vm node. This section describes how you can configure the pool and REEF to execute REEF jobs inside Docker containers.

1. Create Dockerfile for your OS with REEF and other dependencies

The Docker container must be configured to execute REEF jobs since the jobs will execute within the Docker container environment. You can add additional dependencies as you see fit. Following are the dockerfiles listing dependencies required for REEF for Windows and Linux based containers.

Option: Windows Docker image

The following example of dockerfile targets the windowsservercode on dockerhub and the pre-requisites for java and other path variables:

FROM microsoft/windowsservercore:latest
ADD http://javadl.oracle.com/webapps/download/AutoDL?BundleId=207775 c:\jre-8u91-windows-x64.exe
RUN powershell -Command Start-Process -FilePath C:\jre-8u91-windows-x64.exe -PassThru -Wait -ArgumentList \"/s /L c:\Java64.log\"
ENV 'JAVA_HOME' 'C:\Program Files\Java\jre1.8.0_91\'
ENV 'PATH' 'C:\Windows\system32;C:\Windows;C:\Windows\System32\Wbem;C:\Windows\System32\WindowsPowerShell\v1.0\;C:\Users\ContainerAdministrator\AppData\Local\Microsoft\WindowsApps;C:\Program Files\Java\jre1.8.0_91\bin\'
# The following reference is added since the default Windows Server core 
# image is missing this important dll to be able to run java programs
ADD vcruntime140.dll C:/Windows/System32/vcruntime140.dll
RUN del c:\jre-8u91-windows-x64.exe

# Add your own dependencies here

Option: Ubuntu Docker image

The following example of dockerfile installs the jdk and necessary utilities required by REEF:

FROM ubuntu
RUN apt-get update && apt-get install -y default-jdk unzip
ENV JAVA_HOME /usr/bin/java

# Add your own dependencies here

2. Create a Docker image from the dockerfile and publish the Docker image to Azure Container service.

3. Create a Pool in the Azure Batch account for Container workloads

Use the following settings when creating the pool:

  • Set the Max tasks per node to be one (more on this below).

  • Enable Inter-node communication option.

  • If REEF client-driver communication is necessary, you will also need to configure the pool  to use these same set of ports.

4. Configure REEF to use the Docker containers on the pool

Add the following additional properties to the Runtime Configuration:

return AzureBatchRuntimeClientConfiguration.ConfigurationModule
    // All other configuration that applies to Azure Batch pools without containers is also required here. The following is additional configuration that is required.
    .Set(AzureBatchRuntimeClientConfiguration.ContainerRegistryServer, @" mycontainerservice.azurecr.io")
    .Set(AzureBatchRuntimeClientConfiguration.ContainerRegistryUsername, @"<registry name from container service – Access Keys section>")
    .Set(AzureBatchRuntimeClientConfiguration.ContainerRegistryPassword, @"<password from container service – Access Keys section>")
    .Set(AzureBatchRuntimeClientConfiguration.ContainerImageName, @" mycontainerservice.azurecr.io/mydockerimage")
    // Provide at least three ports below that must be reserved for Docker container execution on the vm nodes (one each for http server, wake and name server).
    .Set(AzureBatchRuntimeClientConfiguration.AzureBatchPoolDriverPortsList, new List<string> { "2000", "2001", "2002" };)

Limitations

Containers must be limited to a single docker container per node at a time. This is done by setting the number of Max Tasks per Node to be one when you create the Azure batch pool for containers. Following is a brief explanation about this limitation:

For a service to communicate to clients external to the container, it must be executing on ports that are explicitly mapped to ports at the host level. The port mapping can only be set when creating the Docker container. Since we do not have knowledge of the ports available to us from Azure Batch, we are limited to executing only one container at a time with a predefined port list to ensure that it will start successfully.

Running a REEF Webserver: HelloREEFHttp

REEF also has a webserver interface to handle HTTP requests. This webserver can be utilized in many different manners such as in Interprocess Communcation or in conjuction with the REST API.

To demonstrate a possible use for this interface, HelloREEFHttp serves as a simple webserver to execute shell commands requested from user input. The first thing we should do is register a handler to receive the HTTP requests.

Prerequisites

Again, you have compiled REEF locally.

 

HttpServerShellCmdHandler

HttpServerShellCmdHandler implements HttpHandler but three methods must be overridden first: getUriSpecification, setUriSpecification, and onHttpRequest.

  •  UriSpecification defines the URI specification for the handler. More than one handler can exist per application and thus each handler is distinguished using this specification. Since HelloREEFHttp defines UriSpecification asCommand, an HTTP request looks like http://{host_address}:{host_port}/Command/{request}.
  • onHttpRequest defines a hook for when an HTTP request for this handler is invoked.
  • No labels