Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Works In Progress

Portability Framework

The primary Beam vision: Any SDK on any runner. This is a cross-cutting effort across Java, Python, and Go, and every Beam runner.

Apache Spark 2.0 Runner

JStorm Runner

MapReduce Runner

Tez Runner

Go SDK

JIRA: sdk-go / BEAM-2083

...

Python 3 Support

Work is in progress to add Python 3 support to Beam. Current goal is to make Beam codebase compatible both with Python 2.7 and Python 3.4.

Contributions are welcome! If you are interested to help, you can select an unassigned issue in the Kanban board and assign it to yourself. Comment on the issue if you cannot assign it yourself. When submitting a new PR, please tag @RobbeSneyders@aaltay, and @tvalentyn.

Next Java LTS version support (Java 11 / 18.9)

Work to support the next LTS release of Java is in progress. For more details about the scope and info on the various tasks please see the JIRA ticket.

IO Performance Testing

We are also working on writing Performance Tests for IOs and developing a Performance Testing Framework for them. Contributions are welcome in the following areas:

  • developing more IO Performance Tests (IOITs)
  • providing necessary kubernetes infrastructure (eg. for databases or filesystems to be used in tests)
  • running Performance Tests on runners other than Dataflow and Direct
  • improving existing Performance Testing Framework and it’s documentation

See the documentation and the initial proposal(for file based tests).

If you’re willing to help in this area, tag the following people in PRs: @chamikaramj@DariuszAniszewski@lgajowy@szewi@kkucharc

Euphoria Java 8 DSL

Easy to use Java 8 DSL for the Beam Java SDK. Provides a high-level abstraction of Beam transformations, which is both easy to read and write. Can be used as a complement to existing Beam pipelines (convertible back and forth). You can have a glimpse of the API at WordCount example.

Improving the contributor experience

Making it easier to write code, run tests, and release. Investigating using docker for jenkins builds, automating the release process, and improving the reliability of tests.

Ideas and help welcome! Contact: Alan MyrvoldMark LiuYifan Zou

Beam SQL

Beam SQL has lots of areas to contribute: support for new operators, new connectors, performance measurement and improvement, more full specification and testing, etc.

Add benchmarks to continuous integration

Run Nexmark benchmark queries after each commit for Spark, Flink and Direct Runner and export response times to performance dashboards

Extract metrics in a runner agnostic way

Metrics are pushed by the runners to configurable sinks (HTTP REST sink available). It is already enabled in Filnk and Spark runner. Work is in progress for Dataflow