THIS IS A TEST INSTANCE. ALL YOUR CHANGES WILL BE LOST!!!!
Self Link: https://s.apache.org/beam-design-docs
Documents by category
Project Incubation (2016)
- Original Drive Folder for Incubation Docs [Google Drive folder]
- Technical Vision [doc], [slides]
- Repository Structure [doc]
- Flink runner: Current status and development roadmap [doc]
- Spark Runner Technical Vision [doc]
- PPMC deep dive [slides]
...
- IOChannelFactory Redesign [doc]
- Configurable BeamFileSystem [doc]
- New API for writing files in Beam [doc]
- Dynamic file-based sinks [doc]
- Beam GCP Debuggability Metrics [doc]
- KafkaIO
- CDAP IO [doc]
- Schema Aware Beam IOs [doc]
- Client-Side Throttling Overview [doc]
Metrics
- Defining and Adding SDK Metrics via FN API [doc]
- Histogram Style Metrics - [doc]
- Get Metrics API: Metric Extraction via proto RPC API. [doc]
- Metrics API [doc]
- I/O Metrics [doc]
- Metrics extraction independent from runners / execution engines [doc]
- Watermark Metrics [doc]
- Support Dropwizard Metrics in Beam [doc]
- Beam GCP Debuggability Metrics [doc]
...
- More Expressive PAsserts [doc]
- Mergebot design document [doc]
- Performance tests for commonly used file-based I/O PTransforms [doc]
- Performance tests results analysis and basic regression detection [doc]
- Eventual PAssert [doc]
- Testing I/O Transforms in Apache Beam [doc]
- Reproducible Environment for Jenkins Tests By Using Container [doc]
- Keeping precommit times fast [doc]
- Increase Beam post-commit tests stability [doc]
- Beam-Site Automation Reliability [doc]
- Managing outdated dependencies [doc]
- Automation For Beam Dependency Check [doc]
- Test performance of core Apache Beam operations [doc]
- Add static code analysis quality gates to Beam [doc]
- Portable batch & streaming load tests in all sdks [doc]
- Storing, displaying and detecting anomalies in test results [doc]
- Add ARM Support to Beam SDK Container Images [doc]
Deployment
- Beam on Flink on Kubernetes [doc]
...
- Beam Python User State and Timer APIs [doc]
- Python Kafka connector [doc]
- Python 3 support [doc]
- Splittable DoFn for Python SDK [doc]
- Parquet IO for Python SDK [doc]
- Building Python Wheels [doc]
- Beam Type Hints for Python 3 [doc]
- Pandas Dataframe API for Beam [doc]
- Batched DoFns [doc]
- PEP 585 Type Hints for Python 3.9+ [doc]
- The Current State of Beam Python Type Hinting (as of 2.52.0) [doc]
- Enrichment transform [doc]
Go
- Apache Beam Go SDK design [doc]
- Go SDK Vanity Import Path [doc] (unimplemented)
- Needs to be adjusted to account for Go Modules.
- Go SDK Integration Tests [doc]
- Design RFC
- Assumes Beam knowledge, but points out how Go's features informed the SDK design.
- User Defined Coders + Original Schema Sketch
- Schemas: https://s.apache.org/beam-go-schemas (doesn't include rows)
- Splittable DoFns for the Go SDK [doc]
- Self-Checkpointing SDFs for the Go SDK [doc]
- Bundle Finalization in the Go SDK [doc]
- Watermark Estimation in the Go SDK [doc]
- State and Timers in the Go SDK [doc]
- Using Generics for Registration [doc]
- Side Input Window Mapping [doc]
- MultiMap Side Input Support [doc]
- One-Pagers:
- Investigation: Go Expansion Service Auto-Startup for Dev Environments [doc]
Machine Learning
- Custom Inference Functions [doc]
- Model Updates using Side Inputs [doc]
- RunInference: ML Inference in Beam [doc]
- beam.MLTransform [ doc ]
- Embeddings in MLTransform [doc]
- TensorFlow Model Handler [doc]
- Hugging Face Model Handler [doc]
- Per Key Inference [doc]
- Benchmarking RunInference with Multi-Process Shared Models [doc]
Other
- Euphoria - High-Level Java 8 DSL [doc]
- Apache Beam Code Review Guide [doc]
- Nexmark - Nexmark
- Slowly Changing Side Inputs (or Slowly Changing Dimensions Support) [doc]
Some of documents are available on this google drive
...