Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Contents

...

Stream-based utilities

Since it is possible to release different modules with different language level requirements, we could consider creating a commons-numbers-complex-stream module to hold the utilities currently in class ComplexUtils.

From a management POV, it would avoid keeping the maintenance burden of an outdated API once the whole component switches to Java 8.

Release 1.0 should not ship with ComplexUtils.

Difficulty: Minor
Potential mentors:
Gilles Sadowski, mail: erans (at) apache.org
Project Devs, mail: dev (at) commons.apache.org

Camel

RocketMQ

RocketMQ Connect Cassandra


Content

The Cassandra sink connector allows writing data to Apache Cassandra. In this project, you need to implement a Cassandra sink connector based on OpenMessaging connect API, and run it on RocketMQ connect runtimeh3. You should learn before applying for this topic
Cassandra/[Apache RocketMQ

https://rocketmq.apache.org/]/[Apache RocketMQ Connecthttps://github.com/apache/rocketmq-externals/tree/master/rocketmq-connect]/ OpenMessaging Connect APIh3. Mentor
duhengforever@apache.orgImage Addedvongosling@apache.orgImage Added


 
 

Difficulty: Major
Potential mentors:
duheng, mail: duheng (at) apache.org
Project Devs, mail: dev (at) rocketmq.apache.org

RocketMQ Connect InfluxDB

Content

The InfluxDB sink connector allows moving data from Apache RocketMQ to InfluxDB. It writes data from a topic in Apache RocketMQ to InfluxDB. While The InfluxDB source connector is used to export data from InfluxDB Server to RocketMQ.

In this project, you need to implement an InfluxDB sink connector(source connector is optional) based on OpenMessaging connect API.

You should learn before applying for this topic

InfluxDB/[Apache RocketMQ|https://rocketmq.apache.org/]/[Apache RocketMQ Connect|https://github.com/apache/rocketmq-externals/tree/master/rocketmq-connect]/ OpenMessaging Connect API

Mentor

duhengforever@apache.orgImage Addedwlliqipeng@apache.orgImage Addedvongosling@apache.orgImage Added

Difficulty: Major
Potential mentors:
duheng, mail: duheng (at) apache.org
Project Devs, mail: dev (at) rocketmq.apache.org

The Operator for RocketMQ Exporter

he exporter exposes the endpoint of monitoring data collection to Prometheus server in the form of HTTP service. Prometheus server can obtain the monitoring data to be collected by accessing the endpoint endpoint provided by the exporter. RocketMQ exporter is such an exporter. It first collects data from rocketmq cluster, and then normalizes the collected data to meet the requirements of Prometheus system with the help of the third-party client library provided by Prometheus. Prometheus regularly pulls data from the exporter. This topic needs to implement an operator of rocketmq exporter to facilitate the deployment of the exporter in kubenetes platform.

You should learn before applying for this topic

RocketMQ-Exporter Repo
RocketMQ-Exporter Overview
Kubetenes Operator
RocketMQ-Operator

Mentor

wlliqipeng@apache.orgImage Addedvongosling@apache.orgImage Added

Difficulty: Major
Potential mentors:
duheng, mail: duheng (at) apache.org
Project Devs, mail: dev (at) rocketmq.apache.org

RocketMQ Connect IoTDB

Content

The IoTDB sink connector allows moving data from Apache RocketMQ to IoTDB. It writes data from a topic in Apache RocketMQ to IoTDB.

IoTDB (Internet of Things Database) is a data management system for time series data, which can provide users specific services, such as, data collection, storage and analysis. Due to its lightweight structure, high performance and usable features together with its seamless integration with the Hadoop and Spark ecology, IoTDB meets the requirements of massive dataset storage, high throughput data input and complex data analysis in the industrial IoTDB field.

In this project, there are some update operations for historical data, so it is necessary to ensure the sequential transmission and consumption of data via RocketMQ. If there is no update operation in use, then there is no need to guarantee the order of data. IoTDB will process these data which may be disorderly.

So, in this project, you need to implement an IoTDB sink connector based on OpenMessaging connect API, and run it on RocketMQ connect runtime.

You should learn before applying for this topic

IoTDB/[Apache RocketMQ|https://rocketmq.apache.org/]/[Apache RocketMQ Connect|https://github.com/apache/rocketmq-externals/tree/master/rocketmq-connect]/ OpenMessaging Connect API

Mentor

duhengforever@apache.orgImage Addedwlliqipeng@apache.orgImage Addedvongosling@apache.orgImage Added

Difficulty: Major
Potential mentors:
duheng, mail: duheng (at) apache.org
Project Devs, mail: dev (at) rocketmq.apache.org

RocketMQ Connect Elasticsearch

Content

The Elasticsearch sink connector allows moving data from Apache RocketMQ to Elasticsearch 6.x, and 7.x. It writes data from a topic in Apache RocketMQ to an index in Elasticsearch and all data for a topic have the same type.

Elasticsearch is often used for text queries, analytics and as an key-value store (use cases). The connector covers both the analytics and key-value store use cases.

For the analytics use case, each message is in RocketMQ is treated as an event and the connector uses topic+message queue+offset as a unique identifier for events, which then converted to unique documents in Elasticsearch. For the key-value store use case, it supports using keys from RocketMQ messages as document ids in Elasticsearch and provides configurations ensuring that updates to a key are written to Elasticsearch in order.

So, in this project, you need to implement a sink connector based on OpenMessaging connect API, and will executed on RocketMQ connect runtime.

You should learn before applying for this topic

Elasticsearch/[Apache RocketMQ|https://rocketmq.apache.org/]/[Apache RocketMQ Connect|https://github.com/apache/rocketmq-externals/tree/master/rocketmq-connect]/ OpenMessaging Connect API

Mentor

duhengforever@apache.orgImage Addedvongosling@apache.orgImage Added

Difficulty: Major
Potential mentors:
duheng, mail: duheng (at) apache.org
Project Devs, mail: dev (at) rocketmq.apache.org

Apache RocketMQ CLI Admin Tool Developed by Golang

Apache rocketmq provides a cli admin tool developed by Java to querying, managing and diagnosing various problems. At the same time, it also provides a set of API interface, which can be called by Java application program to create, delete, query, message query and other functions. This topic requires the realization of CLI management tool and a set of API interface developed by golang language, through which go application can realize the creation, query and other operations of topic.

You should learn before applying for this topic

Apache RocketMQ
Apache RocketMQ Go Client

Mentor

wlliqipeng@apache.orgImage Addedvongosling@apache.orgImage Added

Difficulty: Major
Potential mentors:
duheng, mail: duheng (at) apache.org
Project Devs, mail: dev (at) rocketmq.apache.org

Apache RocketMQ Schema Registry

Content

In order to help RocketMQ improve its event management capabilities, and at the same time better decouple the producer and receiver, keep the event forward compatible, so we need a service for event metadata management is called a schema registry.

Schema registry will provide a GraphQL interface for developers to define standard schemas for their events, share them across the organization and safely evolve them in a way that is backward compatible and future proof.

You should learn before applying for this topic

Apache RocketMQ/Apache RocketMQ SDK/

Mentor

duhengforever@apache.orgImage Addedvongosling@apache.orgImage Added

Difficulty: Major
Potential mentors:
duheng, mail: duheng (at) apache.org
Project Devs, mail: dev (at) rocketmq.apache.org

CloudEvents support for RocketMQ

Context

Events are everywhere. However, event producers tend to describe events differently.

The lack of a common way of describing events means developers must constantly re-learn how to consume events. This also limits the potential for libraries, tooling and infrastructure to aide the delivery of event data across environments, like SDKs, event routers or tracing systems. The portability and productivity we can achieve from event data is hindered overall.

CloudEvents is a specification for describing event data in common formats to provide interoperability across services, platforms and systems.
RocketMQ as an event streaming platform, also hopes to improve the interoperability of different event platforms by being compatible with the CloudEvents standard and supporting CloudEvents SDK. In this topic, you need to improve the binding spec. and implement the RocketMQ CloudEvents SDK(Java、Golang or others).

You should learn before applying for this topic

Apache RocketMQ/Apache RocketMQ SDK/CloudEvents

Mentor

duhengforever@apache.orgImage Addedvongosling@apache.orgImage Added

Difficulty: Major
Potential mentors:
duheng, mail: duheng (at) apache.org
Project Devs, mail: dev (at) rocketmq.apache.org

Apache RocketMQ Channel for Knative

Context

Knative is a kubernetes based platform for building, deploying and managing modern serverless applications. Knative to provide a set of middleware components that are essential to building modern, source-centric, and container-based applications that can run anywhere: on-premises, in the cloud, or even in a third-party data centre. Knative consists of the Serving and Eventing components. Eventing is a system that is designed to address a common need for cloud-native development and provides composable primitives to enable late-binding event sources and event consumers. Eventing also defines an event forwarding and persistence layer, called a Channel. Each channel is a separate Kubernetes Custom Resource. This topic requires you to implement rocketmqchannel based on Apache RocketMQ.

You should learn before applying for this topic

How Knative works
RocketMQSource for Knative
Apache RocketMQ Operator

Mentor

wlliqipeng@apache.orgImage Addedvongosling@apache.orgImage Added

Difficulty: Major
Potential mentors:
duheng, mail: duheng (at) apache.org
Project Devs, mail: dev (at) rocketmq.apache.org

Apache RocketMQ Ingestion for Druid

Context

Druid is a real-time analytics database designed for fast slice-and-dice analytics ("OLAP" queries) on large data sets. In this topic, you should develop the RocketMQ indexing service enables the configuration of supervisors on the Overlord, which facilitate ingestion from RocketMQ by managing the creation and lifetime of RocketMQ indexing tasks. These indexing tasks read events using RocketMQ's own partition and offset mechanism. The supervisor oversees the state of the indexing tasks to coordinate handoffs, manage failures, and ensure that the scalability and replication requirements are maintained.

You should learn before applying for this topic

Apache Druid Data Ingestion

Mentor

vongosling@apache.orgImage Addedduhengforever@apache.orgImage Added

Difficulty: Major
Potential mentors:
duheng, mail: duheng (at) apache.org
Project Devs, mail: dev (at) rocketmq.apache.org

Apache RocketMQ Connect Hudi

Context

Hudi could ingest and manage the storage of large analytical datasets over DFS (hdfs or cloud stores). It can act as either a source or sink for streaming processing platform such as Apache RocketMQ. it also can be used as a state store inside a processing DAG (similar to how rocksDB is used by Flink). This is an item on the roadmap of the Apache RocketMQ. This time, you should implement a fully hudi source and sink based on RocketMQ connect framework, which is a most important implementation of the OpenConnect.

You should learn before applying for this topic

Apache RocketMQ Connect Framework
Apache Hudi
.

Mentor

vongosling@apache.orgImage Addedduhengforever@apache.orgImage Added

Difficulty: Major
Potential mentors:
duheng, mail: duheng (at) apache.org
Project Devs, mail: dev (at) rocketmq.apache.org

Context

There are many ways that Apache Flink and Apache RocketMQ can integrate to provide elastic data processing at a large scale. RocketMQ can be used as a streaming source and streaming sink in Flink DataStream applications, which is the main implementation and popular usage in RocketMQ community. Developers can ingest data from RocketMQ into a Flink job that makes computations and processes real-time data, to then send the data back to a RocketMQ topic as a streaming sink. More details you could see from https://github.com/apache/rocketmq-externals/tree/master/rocketmq-flink.

With more and more DW or OLAP engineers using RocketMQ for their data processing work, another potential integration needs arose. Developers can take advantage of as both a streaming source and a streaming table sink for Flink SQL or Table API queries. Also, Flink 1.9.0 makes the Table API a first-class citizen. It's time to support SQL in RocketMQ. This is the topic for Apache RocketMQ connect Flink.

You should learn before applying for this topic

Apache RocketMQ Flink Connector
Apache Flink Table API

Extension

For some expertise students in the streaming field, you could continue to implements and provides an exactly-once streaming source and at-least-once(or exactly-once)streaming sink, like the issue #500 said.

Mentor

nicholasjiang@apache.orgImage Added ,   duhengforever@apache.orgImage Addedvongosling@apache.orgImage Added

Difficulty: Major
Potential mentors:
duheng, mail: duheng (at) apache.org
Project Devs, mail: dev (at) rocketmq.apache.org

Apache RocketMQ Scaler for KEDA

Context

KEDA allows for fine-grained autoscaling (including to/from zero) for event-driven Kubernetes workloads. KEDA serves as a Kubernetes Metrics Server and allows users to define autoscaling rules using a dedicated Kubernetes custom resource definition. KEDA has a number of “scalers” that can both detect if a deployment should be activated or deactivated, and feed custom metrics for a specific event source. In this topic, you need to implement the RocketMQ scalers.

You should learn before applying for this topic

Helm/Apache RocketMQ Operator/Apache RocketMQ Docker Image
Apache RocketMQ multi-replica mechanism(based on DLedger)
How KEDA works

Mentor

wlliqipeng@apache.orgImage Addedvongosling@apache.orgImage Added


Difficulty: Major
Potential mentors:
duheng, mail: duheng (at) apache.org
Project Devs, mail: dev (at) rocketmq.apache.org

Camel

camel-minio - Component to store/load files from blob store

min.io is a s3 like blob store. So users have more freedom than being locked into aws

We can create a camel-minio component for it
https://github.com/minio/minio-java

Difficulty: Major
Potential mentors:
Claus Ibsen, mail: davsclaus (at) apache.org
Project Devs, mail: dev (at) camel.apache.org

Camel grpc component doesn't transfer the Message headers

Headers that are added to the Message

Camel grpc component doesn't transfer the Message headers

Headers that are added to the Message in the camel Exchange before making a call to the camel-grpc component are not received at the grpc consumer. The expectation is that these headers would be added to the grpcStub before sending over the wire (like other components like http4 etc).

Our team has come up with a workaround for this but it is extremely cumbersome. We had to extend the GrpcProducer to introduce a custom GrpcExchangeForwarder that would copy header from exchange to the stub before invoking the sync/async method.

At the consumer side we had to extend the GrpcConsumer to use a custom ServerInterceptor to capture the grpc headers and custom MethodHandler to transfer the grpc headers to the Camel exchange headers.

Difficulty: Major
Potential mentors:
Vishal Vijayan, mail: vijayanv (at) apache.org
Project Devs, mail: dev (at) camel.apache.org

...

Create a camel component for etcd v3

Difficulty: Minor
Potential mentors:
Luca Burgazzoli, mail: lb (at) apache.org
Project Devs, mail: dev (at) camel.apache.org

Beam

BeamSQL aggregation analytics

functions

functionality

BeamSQL has a long list of of aggregation/aggregation analytics functionalities to support.

To begin with, you will need to support this syntax:

            analytic_function_name ( [ argument_list ]
)
OVER (
[ PARTITION BY
 )
            OVER (
            [ PARTITION BY partition_expression_list
]
[ ORDER BY expression [{ ASC | DESC }] [,
 ]
            [ ORDER BY expression [{ ASC
            | DESC }] [, ...]
]
[
 ]
            [ window_frame_clause
]
)
 ]
            )
            

This will requires touch core components of BeamSQL:
1. SQL parser to support the syntax above.
2. SQL core to implement physical relational operator.
3. Distributed algorithms to implement a list of functions in a distributed manner.
4. Build benchmarks to measure performance of your implementation.

To understand what SQL analytics functionality is, you could check this great explanation doc: https://cloud.google.com/bigquery/docs/reference/standard-sql/analytic-function-concepts.

To know about Beam's programming model, check: https://beam.apache.org/documentation/programming-guide/#overview

Difficulty: Major
Potential mentors:
Rui Wang, mail: amaliujia (at) apache.org
Project Devs, mail: dev (at) beam.apache.org

...