This page includes tips and troubleshooting information regarding Apache Beam's multi-language pipelines framework. For full documentation on multi-language pipelines, please see here.

"java: command not found" when starting the pipeline

This usually occurs due to java command not being available when submitting a multi-language pipeline that uses a Java transform to the Beam runner. Multi-language wrappers implemented in the pipeline SDK may try to automatically start up a Java expansion service, hence java command being available in the system is a pre-requisite. This can be resolved by installing JDK in the machine where the job is submitted from and adding the JDK directory with the java binary to the environment variable PATH.

grpc error "failed to connect to all addresses" when submitting the job

This usually occurs when an expansion service that is used by the pipeline is not available. For example, it could be that you simply forgot to start the expansion service before running the job. Or it could be that a Java expansion service that is automatically started up by a wrapper implemented in the pipeline SDK failed for some reason.

"KeyError: 'beam:coders:javasdk:0.1'" when expanding the transform using expansion service

This occurs due to Java expansion service returning an expanded transform that uses a Java specific coder as one of its outputs. Cross-language transforms require coders used at the SDK boundary to be Beam Standard Coders that can be interpreted by all SDKs. Note that internal sub-transforms of the expanded transforms may choose Java specify coders. What matters are final outputs produced by the expanded Java transform.

The solution will be to update the user's transform to produce output PCollection types that use standard coders at the SDK boundaries.

"java.lang.IllegalArgumentException: Unknown Coder URN beam:coder:pickled_python:v1" when runner a Python pipeline

This usually means that an expansion request that is sent from Python SDK to a Java expansion service contained Python specific PickleCoder that Java SDK cannot interpret. For example, this could be due to following.

Input PCollection that is fed into the cross-language transform in Python side uses a Python specific type. In this case, the PTranform that produced the input PCollection has to be updated to produce outputs that use Beam's Standard Coders to make sure Java side can interpret such coders.
Input PCollection uses standard coders but Python type inferencing results in picking up the PickleCoder. This can usually be resolved by annotating the predecessor Python transform with the correct type annotation using the with_output_types tag. See this Jira for an example.

How to set Java log level from a Python pipeline that uses Java transforms

For supported runners, you can set the log level of Java transforms by adding the corresponding pipeline option as a local pipeline option in Python side. For example, to suppress all logs from Java package org.apache.kafka you can do following.

Add a Python PipelineOption that represents the corresponding Java PipelineOption available here. This can be simply added to your Python program that starts up the Beam job.

class JavaLoggingOptions(PipelineOptions):
  @classmethod
  def _add_argparse_args(cls, parser):
    parser.add_argument(
        '--sdkHarnessLogLevelOverrides',
        default={},
        type=json.loads,
        help=(
          'Java log level overrides'))

Specify the additional PipelineOption as a parameter when running the Beam pipeline.

--sdkHarnessLogLevelOverrides "{\"org.apache.kafka\":\"ERROR\"}"

Space shortcuts

Page tree

"java: command not found" when starting the pipeline

grpc error "failed to connect to all addresses" when submitting the job

"KeyError: 'beam:coders:javasdk:0.1'" when expanding the transform using expansion service

"java.lang.IllegalArgumentException: Unknown Coder URN beam:coder:pickled_python:v1" when runner a Python pipeline

How to set Java log level from a Python pipeline that uses Java transforms

Space shortcuts

Page tree

Multi-language Pipelines Tips

"java: command not found" when starting the pipeline

grpc error "failed to connect to all addresses" when submitting the job

"KeyError: 'beam:coders:javasdk:0.1'" when expanding the transform using expansion service

"java.lang.IllegalArgumentException: Unknown Coder URN beam:coder:pickled_python:v1" when runner a Python pipeline

How to set Java log level from a Python pipeline that uses Java transforms