Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Contents

APISIX

Check the API version for every request

In order to make sure the dashboard is using the correct API version, we'd better add the APISIX version in every api response.

Please

  1. Add the dashboard api version variable in the config file.
  2. Check every api response in the request.ts file, show a alert when the dashboard version is not compatible with the APISIX version.

mentors: juzhiyuan@apache.org
Potential mentors:
Project Devs, mail: dev (at) apisix.apache.org

Difficulty: Major
Potential mentors:
Ming Wen, mail: wenming (at) apache.org
Project Devs, mail:

...

add commit message checker

For the quality of every commit message, please add the Commit Message checker just like https://github.com/vuejs/vue-next/blob/master/scripts/verifyCommit.js

Difficulty: Minor

mentors: juzhiyuan@apache.org
Potential mentors:
Project Devs, mail: dev (at) apisix.apache.org

Difficulty: Minor
Potential mentors:
Ming Wen, mail: wenming (at) apache.org
Project Devs, mail:

Add X-API-KEY for api request

Our API for dashboard is using API Key to auth the request, please add the API Key Header in global request handler. Just like [1] for reference. Please note, this key should be added by config file, such as .env file. I recommend fetching this key by fetchAPIKey API.

[1] b3b3065#diff-084c3d9c2786b7cd963be84e40a38725R32

Difficulty: Minor

mentors: juzhiyuan@apache.orgImage Added

implement Apache APISIX echo plugin

APISIX currently provides a simple example plugin, but it does not provide useful functionality.

So we can provide a useful plugin to help users understand as fully as possible how to develop an APISIX plugin.

This plugin could implement the corresponding functionality in the common phases such as init, rewrite, access, balancer, header filer, body filter and log . But the specific functionality are still being considered.

Difficulty: Major

mentors: agile6v@Image Removed apache.org, wenming@apache.org,Image Removed yousa@apache.orgImage Removed
Potential mentors:
Project Devs, mail: dev (at) apisix.apache.org

Difficulty: Major
Potential mentors:
Ming Wen, mail: wenming (at) apache.org
Project Devs, mail:

implement Apache APISIX echo plugin

APISIX currently provides a simple example plugin, but it does not provide useful functionality.

So we can provide a useful plugin to help users understand as fully as possible how to develop an APISIX plugin.

This plugin could implement the corresponding functionality in the common phases such as init, rewrite, access, balancer, header filer, body filter and log . But the specific functionality are still being considered

feature: Support follow redirect

When a client request passes through APISIX to upstream, if upstream returns 301 or 302 and then APISIX returns directly to the client by default. The client receives 301 or 302 response and then initiates the request again based on the address specified by Location. Sometimes the client wants APISIX to help it do this, so APISIX can provide this capability to support more scenarios.

Difficulty: Major

mentors: agile6v@ apache.org, wenming@apache.org, yousa@apache.org
Potential mentors:
Project Devs, mail: dev (at) apisix.apache.org

Difficulty: Major
Potential mentors:
Ming Wen, mail: wenming (at) apache.org
Project Devs, mail:

feature: Support follow redirect

When a client request passes through APISIX to upstream, if upstream returns 301 or 302 and then APISIX returns directly to the client by default. The client receives 301 or 302 response and then initiates the request again based on the address specified by Location. Sometimes the client wants APISIX to help it do this, so APISIX can provide this capability to support more scenarios.

Difficulty: Major

mentors: agile6v@Image Added apache.org, wenming@apache.org,Image Added yousa@apache

Add X-API-KEY for api request

Our API for dashboard is using API Key to auth the request, please add the API Key Header in global request handler. Just like [1] for reference. Please note, this key should be added by config file, such as .env file. I recommend fetching this key by fetchAPIKey API.

[1] b3b3065#diff-084c3d9c2786b7cd963be84e40a38725R32

Difficulty: Minor

mentors: juzhiyuan@apache.orgImage Modified
Potential mentors:
Project Devs, mail: dev (at) apisix.apache.org

Difficulty: Major
Potential mentors:
Ming Wen, mail: wenming (at) apache.org
Project Devs, mail:

...

Apache IoTDB integration with MiNiFI/NiFi

IoTDB is a database for storing time series data.

MiNiFI is a data flow engine to transfer data from A to B, e.g., from PLC4X to IoTDB.

This proposal is for integration IoTDB with MiNiFi.

  • let MiNiFi/NiFi to support writing data into IoTDB.


Difficulty:  major

mentors:

Difficulty: Major
Potential mentors:
Xiangdong Huang, mail: hxd (at) apache.org
Project Devs, mail: dev (at) iotdb.apache.org

Apache IoTDB

Database Connection Pool and integration with some web framework

trigger module for streaming cumputing

IoTDB is a time series database.

When using a database in an application, the database connection pool is much helpful for  high performance and saving resources.

Besides, when developing a website using Spring or some other web framework, now many developers do not control the database connection manually. Instead, developers just need to tell what database they will use and the web framework can handle everything well.

This proposal is for

  • letting IoTDB supports some database connection pools like Apache Commons DBCP, C3P0.
  • integration IoTDB with one web framework (e.g., Spring)

You should know:

  • IoTDB
  • At least one DB connection pool
  • Know Spring or some other web framework

mentors:

hxd@apache.org

Difficulty: Major
Potential mentors:
Xiangdong Huang, mail: hxd (at) apache.org
Project Devs, mail: dev (at) iotdb.apache.org

-series data management system and the data usually comes in a streaming way.

In the IoT area, when a data point comes, a trigger can be called because of the following scenario:

  • (single data point calculation) the data point is an outlier point, or the data value reaches a warning threshold. IoTDB needs to publish the data point to those who subscribed the event.
  • (multiply time series data point calculation) a device sends several metrics data to IoTDB, e.g., vehicle d1 sends average speed and running time to IoTDB. Then users may want to get the mileage of the vehicle (speed x time). IoTDB needs to calculate the result and save it to another time series.
  • (Time window calculation) a device reports its temperature every second. Though the temperature is not too high, if it keeps increasing in 5 seconds, IoTDB needs to report the event to those who subscribe that.


As there are many streaming computing projects already, we can integrate one of them into IoTDB.

  • If IoTDB runs on Edge, we can integrate Apache StreamPipes or Apache Edgent.
  • If IOTDB runs on a Server, the

Apache IoTDB trigger module for streaming cumputing

IoTDB is a time-series data management system and the data usually comes in a streaming way.

In the IoT area, when a data point comes, a trigger can be called because of the following scenario:

  • (single data point calculation) the data point is an outlier point, or the data value reaches a warning threshold. IoTDB needs to publish the data point to those who subscribed the event.
  • (multiply time series data point calculation) a device sends several metrics data to IoTDB, e.g., vehicle d1 sends average speed and running time to IoTDB. Then users may want to get the mileage of the vehicle (speed x time). IoTDB needs to calculate the result and save it to another time series.
  • (Time window calculation) a device reports its temperature every second. Though the temperature is not too high, if it keeps increasing in 5 seconds, IoTDB needs to report the event to those who subscribe that.

As there are many streaming computing projects already, we can integrate one of them into IoTDB.

  • If IoTDB runs on Edge, we can integrate Apache StreamPipes or Apache Edgent.
  • If IOTDB runs on a Server, the above also works  and Apache Flink is also a good choice.

The process is:

  • User registers a trigger into IoTDB.
  • When a data comes, IoTDB save it and check whether there are triggers on it
  • If so, call a streaming computing framework to do something;


You may need to know:

  • At least one streaming computing project.
  • SQL parser or some other DSL parser tool.

You have to modify the source codes of IoTDB server engine module.

Difficulty: A little hard

mentors:

Difficulty: Major
Potential mentors:
Xiangdong Huang, mail: hxd (at) apache.org
Project Devs, mail: dev (at) iotdb.apache.org

...

Apache IoTDB integration with Prometheus

IoTDB is a highly efficient time series database.

Prometheus is a monitoring and alerting toolkit, which supports collecting data from other systems, servers, and IoT devices, saving data into a DB, visualizing data and provides some query APIs.


Prometheus allows users to use their database rather than just Prometheus DB for storing time series databases. 

This proposal is for integrating IoTDB with Prometheus.


You should know:

  • How to use Prometheus
  • How to use IoTDB
  • Java and Go language

difficulty: Major

mentors:

hxd@apache.org

Difficulty: Major
Potential mentors:
Xiangdong Huang, mail: hxd (at) apache.org
Project Devs, mail: dev (at) iotdb.apache.org

...

IoTDB Database Connection Pool and integration with some web framework

IoTDB is a time series database.

When using a database in an application, the database connection pool is much helpful for  high performance and saving resources.

Besides, when developing a website using Spring or some other web framework, now many developers do not control the database connection manually. Instead, developers just need to tell what database they will use and the web framework can handle everything well.

This proposal is for

  • letting IoTDB supports some database connection pools like Apache Commons DBCP, C3P0.
  • integration IoTDB with one web framework (e.g., Spring)


You should know:

  • IoTDB
  • At least one DB connection pool
  • Know Spring or some other web framework

mentors:

hxd@apache.org

Dynamic Task Sizing on Nemo

This is an umbrella issue to keep track of the issues related to the dynamic task sizing feature on Nemo.

Dynamic task sizing needs to consider a workload and try to decide on the optimal task size based on the runtime metrics and characteristics. It should have an effect on the parallelism and the partitions, on how many partitions an intermediate data should be divided/shuffled into, and to effectively handle skews in the meanwhile.

Difficulty: Major
Potential mentors:
Won Wook SongXiangdong Huang, mail: wonook hxd (at) apache.org
Project Devs, mail: dev (at) nemoiotdb.apache.org

Beam

Apache Nemo

Dynamic Task Sizing on Nemo

This is an umbrella issue to keep track of the issues related to the dynamic task sizing feature on Nemo.

Dynamic task sizing needs to consider a workload and try to decide on the optimal task size based on the runtime metrics and characteristics. It should have an effect on the parallelism and the partitions, on how many partitions an intermediate data should be divided/shuffled into, and to effectively handle skews in the meanwhile

Add Daffodil IO for Apache Beam

From https://daffodil.apache.org/:

Daffodil is an open source implementation of the DFDL specification that uses these DFDL schemas to parse fixed format data into an infoset, which is most commonly represented as either XML or JSON. This allows the use of well-established XML or JSON technologies and libraries to consume, inspect, and manipulate fixed format data in existing solutions. Daffodil is also capable of the reverse by serializing or “unparsing” an XML or JSON infoset back to the original data format.

We should create a Beam IO that accepts a DFDL schema as an argument and can then produce and consume data in the specified format. I think it would be most natural for Beam users if this IO could produce Beam Rows, but an initial version that just operates with Infosets could be useful as well.

Difficulty: Major
Potential mentors:
Brian HuletteWon Wook Song, mail: bhulette wonook (at) apache.org
Project Devs, mail: dev (at) beamnemo.apache.org

Implement an Azure blobstore filesystem for Python SDK

This is similar to BEAM-2572, but for Azure's blobstore.

Difficulty: Major
Potential mentors:
Pablo Estrada, mail: pabloem (at) apache.org
Project Devs, mail: dev (at) beam.apache.org

Beam

Add Daffodil IO for Apache Beam

From https://daffodil.apache.org/:

Daffodil is an open source implementation of the DFDL specification that uses these DFDL schemas to parse fixed format data into an infoset, which is most commonly represented as either XML or JSON. This allows the use of well-established XML or JSON technologies and libraries to consume, inspect, and manipulate fixed format data in existing solutions. Daffodil is also capable of the reverse by serializing or “unparsing” an XML or JSON infoset back to the original data format.

We should create a Beam IO that accepts a DFDL schema as an argument and can then produce and consume data in the specified format. I think it would be most natural for Beam users if this IO could produce Beam Rows, but an initial version that just operates with Infosets could be useful as well.

Difficulty: Major
Potential mentors:
Brian Hulette, mail: bhulette (at) apache.org
Project Devs, mail: dev (at) beam.apache.org

Implement an Azure blobstore filesystem for Python SDK

This is similar to BEAM-2572, but for Azure's blobstore.

Difficulty: Major
Potential mentors:
Pablo Estrada, mail: pabloem (at) apache.org
Project Devs, mail: dev (at) beam.apache.org

BeamSQL aggregation analytics functionality

Mentor email: ruwang@google.com. Feel free to send emails for your questions.

Project Information

BeamSQL aggregation analytics functionality

Mentor email: ruwang@google.com. Feel free to send emails for your questions.

Project Information
---------------------
BeamSQL has a long list of of aggregation/aggregation analytics functionalities to support.

To begin with, you will need to support this syntax:

            analytic_function_name ( [ argument_list ] )
            OVER (
            [ PARTITION BY partition_expression_list ]
            [ ORDER BY expression [{ ASC
            | DESC }] [, ...] ]
            [ window_frame_clause ]
            )
            

As there is a long list of analytics functions, a good start point is support rank() first.

This will requires touch core components of BeamSQL:
1. SQL parser to support the syntax above.
2. SQL core to implement physical relational operator.
3. Distributed algorithms to implement a list of functions in a distributed manner.
4. Build benchmarks to measure performance of your implementation.

To understand what SQL analytics functionality is, you could check this great explanation doc: https://cloud.google.com/bigquery/docs/reference/standard-sql/analytic-function-concepts.

To know about Beam's programming model, check: https://beam.apache.org/documentation/programming-guide/#overview

Difficulty: Major
Potential mentors:
Rui Wang, mail: amaliujia (at) apache.org
Project Devs, mail: dev (at) beam.apache.org

...

RocketMQ Connect Hbase

Content

The Hbase sink connector allows moving data from Apache RocketMQ to Hbase. It writes data from a topic in RocketMQ to a table in the specified HBase instance. Auto-creation of tables and the auto-creation of column families are also supported.So, in instance. Auto-creation of tables and the auto-creation of column families are also supported.

So, in this project, you need to implement an Hbase sink connector based on OpenMessaging connect API, and will execute on RocketMQ connect runtime.

You should learn before applying for this topic
Hbase/Apache RocketMQ/Apache RocketMQ Connect/ OpenMessaging Connect API

Mentor

chenguangsheng@apache.orgImage Addedvongosling@apache.orgImage Added

Difficulty: Major
Potential mentors:
duheng, mail: duheng (at) apache.org
Project Devs, mail: dev (at) rocketmq.apache.org

RocketMQ Connect Cassandra


Content

The Cassandra sink connector allows writing data to Apache Cassandra. In this project, you need to implement

an Hbase

a Cassandra sink connector based on OpenMessaging connect API, and

will execute

run it on RocketMQ connect

runtime

runtimeh3.

You should learn before applying for this topic

Hbase

Cassandra/[Apache RocketMQ

https://rocketmq.apache.org/]/[Apache RocketMQ Connecthttps://github.com/apache/rocketmq-externals/tree/master/rocketmq-connect]/ OpenMessaging Connect API
h3. Mentor
chenguangsheng@apache

 
 

Difficulty: Major
Potential mentors:
duheng, mail: duheng (at) apache.org
Project Devs, mail: dev (at) rocketmq.apache.org

RocketMQ Connect

Cassandra

InfluxDB

Content

The

Cassandra

InfluxDB sink connector allows

writing data to Apache Cassandra.

moving data from Apache RocketMQ to InfluxDB. It writes data from a topic in Apache RocketMQ to InfluxDB. While The InfluxDB source connector is used to export data from InfluxDB Server to RocketMQ.

In this project, you need to implement

a Cassandra sink connector

an InfluxDB sink connector(source connector is optional) based on OpenMessaging connect API

, and run it on RocketMQ connect runtimeh3.

.

You should learn before applying for this topic

Cassandra

InfluxDB/[Apache RocketMQ

|https://rocketmq.apache.org/]/[Apache RocketMQ Connect

|https://github.com/apache/rocketmq-externals/tree/master/rocketmq-connect]/ OpenMessaging Connect API

h3.

Mentor

duhengforever@apache.orgImage Modifiedwlliqipeng@apache.orgImage Addedvongosling@apache.orgImage Modified

 
 

Difficulty: Major
Potential mentors:
duheng, mail: duheng (at) apache.org
Project Devs, mail: dev (at) rocketmq.apache.org

The Operator for RocketMQ

Connect InfluxDB

Exporter

he exporter exposes the endpoint of monitoring data collection to Prometheus server in the form of HTTP service. Prometheus server can obtain the monitoring data to be collected by accessing the endpoint endpoint provided by the exporter. RocketMQ exporter is such an exporter. It first collects data from rocketmq cluster, and then normalizes the collected data to meet the requirements of Prometheus system with the help of the third-party client library provided by Prometheus. Prometheus regularly pulls data from the exporter. This topic needs to implement an operator of rocketmq exporter to facilitate the deployment of the exporter in kubenetes platform

Content

The InfluxDB sink connector allows moving data from Apache RocketMQ to InfluxDB. It writes data from a topic in Apache RocketMQ to InfluxDB. While The InfluxDB source connector is used to export data from InfluxDB Server to RocketMQ.

In this project, you need to implement an InfluxDB sink connector(source connector is optional) based on OpenMessaging connect API.

You should learn before applying for this topic

InfluxDB/[Apache RocketMQ|https://rocketmq.apache.org/]/[Apache RocketMQ Connect|https://github.com/apache/rocketmq-externals/tree/master/rocketmq-connect]/ OpenMessaging Connect API

Mentor

RocketMQ-Exporter Repo
RocketMQ-Exporter Overview
Kubetenes Operator
RocketMQ-Operator

Mentor

duhengforever@apache.orgImage Removedwlliqipeng@apache.orgvongosling@apache.org

Difficulty: Major
Potential mentors:
duheng, mail: duheng (at) apache.org
Project Devs, mail: dev (at) rocketmq.apache.org

The Operator for RocketMQ Exporter

RocketMQ Connect IoTDB

Content

The IoTDB sink connector allows moving data from Apache RocketMQ to IoTDB. It writes data from a topic in Apache RocketMQ to IoTDB.

IoTDB (Internet of Things Database) is a data management system for time series data, which can provide users specific services, such as, data collection, storage and analysis. Due to its lightweight structure, high performance and usable features together with its seamless integration with the Hadoop and Spark ecology, IoTDB meets the requirements of massive dataset storage, high throughput data input and complex data analysis in the industrial IoTDB field.

In this project, there are some update operations for historical data, so it is necessary to ensure the sequential transmission and consumption of data via RocketMQ. If there is no update operation in use, then there is no need to guarantee the order of data. IoTDB will process these data which may be disorderly.

So, in this project, you need to implement an IoTDB sink connector based on OpenMessaging connect API, and run it on RocketMQ connect runtimehe exporter exposes the endpoint of monitoring data collection to Prometheus server in the form of HTTP service. Prometheus server can obtain the monitoring data to be collected by accessing the endpoint endpoint provided by the exporter. RocketMQ exporter is such an exporter. It first collects data from rocketmq cluster, and then normalizes the collected data to meet the requirements of Prometheus system with the help of the third-party client library provided by Prometheus. Prometheus regularly pulls data from the exporter. This topic needs to implement an operator of rocketmq exporter to facilitate the deployment of the exporter in kubenetes platform.

You should learn before applying for this topic

RocketMQ-Exporter Repo
RocketMQ-Exporter Overview
Kubetenes Operator
RocketMQ-Operator

Mentor

IoTDB/[Apache RocketMQ|https://rocketmq.apache.org/]/[Apache RocketMQ Connect|https://github.com/apache/rocketmq-externals/tree/master/rocketmq-connect]/ OpenMessaging Connect API

Mentor

hxd@apache.org, duhengforever@apache.orgImage Addedwlliqipeng@apache.orgvongosling@apache.org

Difficulty: Major
Potential mentors:
duheng, mail: duheng (at) apache.org
Project Devs, mail: dev (at) rocketmq.apache.org

RocketMQ Connect IoTDB

Apache RocketMQ Schema Registry

Content

In order to help RocketMQ improve its event management capabilities, and at the same time better decouple the producer and receiver, keep the event forward compatible, so we need a service for event metadata management is called a schema registry.

Schema registry will provide a GraphQL interface for developers to define standard schemas for their events, share them across the organization and safely evolve them in a way that is backward compatible and future proof

Content

The IoTDB sink connector allows moving data from Apache RocketMQ to IoTDB. It writes data from a topic in Apache RocketMQ to IoTDB.

IoTDB (Internet of Things Database) is a data management system for time series data, which can provide users specific services, such as, data collection, storage and analysis. Due to its lightweight structure, high performance and usable features together with its seamless integration with the Hadoop and Spark ecology, IoTDB meets the requirements of massive dataset storage, high throughput data input and complex data analysis in the industrial IoTDB field.

In this project, there are some update operations for historical data, so it is necessary to ensure the sequential transmission and consumption of data via RocketMQ. If there is no update operation in use, then there is no need to guarantee the order of data. IoTDB will process these data which may be disorderly.

So, in this project, you need to implement an IoTDB sink connector based on OpenMessaging connect API, and run it on RocketMQ connect runtime.

You should learn before applying for this topic

IoTDB/[Apache RocketMQ|https://rocketmq.apache.org/]/[Apache RocketMQ Connect|https://github.com/apache/rocketmq-externals/tree/master/rocketmq-connect]/ OpenMessaging Connect APISDK/

Mentor

hxd@apache.org, duhengforever@apache.orgwlliqipeng@apache.orgImage Removedvongosling@apache.org

Difficulty: Major
Potential mentors:
duheng, mail: duheng (at) apache.org
Project Devs, mail: dev (at) rocketmq.apache.org

Apache RocketMQ

Schema Registry

Channel for Knative

Context

Knative is a kubernetes based platform for building, deploying and managing modern serverless applications. Knative to provide a set of middleware components that are essential to building modern, source-centric, and container-based applications that can run anywhere: on-premises, in the cloud, or even in a third-party data centre. Knative consists of the Serving and Eventing components. Eventing is a system that is designed to address a common need for cloud-native development and provides composable primitives to enable late-binding event sources and event consumers. Eventing also defines an event forwarding and persistence layer, called a Channel. Each channel is a separate Kubernetes Custom Resource. This topic requires you to implement rocketmqchannel based on Apache RocketMQ

Content

In order to help RocketMQ improve its event management capabilities, and at the same time better decouple the producer and receiver, keep the event forward compatible, so we need a service for event metadata management is called a schema registry.

Schema registry will provide a GraphQL interface for developers to define standard schemas for their events, share them across the organization and safely evolve them in a way that is backward compatible and future proof.

You should learn before applying for this topic

How Knative works
RocketMQSource for Knative
Apache RocketMQ /Apache RocketMQ SDK/Operator

Mentor

duhengforever@apachewlliqipeng@apache.orgImage Modifiedvongosling@apache.org

Difficulty: Major
Potential mentors:
duheng, mail: duheng (at) apache.org
Project Devs, mail: dev (at) rocketmq.apache.org

Apache RocketMQ Channel for Knative

at) rocketmq.apache.org

Apache RocketMQ Ingestion for Druid

Context

Druid is a real-time analytics database designed for fast slice-and-dice analytics ("OLAP" queries) on large data sets. In this topic, you should develop the RocketMQ indexing service enables the configuration of supervisors on the Overlord, which facilitate ingestion from RocketMQ by managing the creation and lifetime of RocketMQ indexing tasks. These indexing tasks read events using RocketMQ's own partition and offset mechanism. The supervisor oversees the state of the indexing tasks to coordinate handoffs, manage failures, and ensure that the scalability and replication requirements are maintained

Context

Knative is a kubernetes based platform for building, deploying and managing modern serverless applications. Knative to provide a set of middleware components that are essential to building modern, source-centric, and container-based applications that can run anywhere: on-premises, in the cloud, or even in a third-party data centre. Knative consists of the Serving and Eventing components. Eventing is a system that is designed to address a common need for cloud-native development and provides composable primitives to enable late-binding event sources and event consumers. Eventing also defines an event forwarding and persistence layer, called a Channel. Each channel is a separate Kubernetes Custom Resource. This topic requires you to implement rocketmqchannel based on Apache RocketMQ.

You should learn before applying for this topic

How Knative works
RocketMQSource for Knative
Apache RocketMQ OperatorApache Druid Data Ingestion

Mentor

wlliqipeng@apachevongosling@apache.orgImage Modifiedvongosling@apacheduhengforever@apache.orgImage Modified

Difficulty: Major
Potential mentors:
duheng, mail: duheng (at) apache.org
Project Devs, mail: dev (at) rocketmq.apache.org

Apache RocketMQ

Ingestion for Druid

Connect Hudi

Context

Druid is a real-time analytics database designed for fast slice-and-dice analytics ("OLAP" queries) on large data sets. In this topic, you should develop the RocketMQ indexing service enables the configuration of supervisors on the Overlord, which facilitate ingestion from RocketMQ by managing the creation and lifetime of RocketMQ indexing tasks. These indexing tasks read events using RocketMQ's own partition and offset mechanism. The supervisor oversees the state of the indexing tasks to coordinate handoffs, manage failures, and ensure that the scalability and replication requirements are maintainedHudi could ingest and manage the storage of large analytical datasets over DFS (hdfs or cloud stores). It can act as either a source or sink for streaming processing platform such as Apache RocketMQ. it also can be used as a state store inside a processing DAG (similar to how rocksDB is used by Flink). This is an item on the roadmap of the Apache RocketMQ. This time, you should implement a fully hudi source and sink based on RocketMQ connect framework, which is a most important implementation of the OpenConnect.

You should learn before applying for this topic

Apache Druid Data IngestionRocketMQ Connect Framework
Apache Hudi
.

Mentor

vongosling@apache.orgduhengforever@apache.org

Difficulty: Major
Potential mentors:
duheng, mail: duheng (at) apache.org
Project Devs, mail: dev (at) rocketmq.apache.org

Apache RocketMQ

Connect Hudi

Scaler for KEDA

Context

Hudi could ingest and manage the storage of large analytical datasets over DFS (hdfs or cloud stores). It can act as either a source or sink for streaming processing platform such as Apache RocketMQ. it also can be used as a state store inside a processing DAG (similar to how rocksDB is used by Flink). This is an item on the roadmap of the Apache RocketMQ. This time, you should implement a fully hudi source and sink based on RocketMQ connect framework, which is a most important implementation of the OpenConnectKEDA allows for fine-grained autoscaling (including to/from zero) for event-driven Kubernetes workloads. KEDA serves as a Kubernetes Metrics Server and allows users to define autoscaling rules using a dedicated Kubernetes custom resource definition. KEDA has a number of “scalers” that can both detect if a deployment should be activated or deactivated, and feed custom metrics for a specific event source. In this topic, you need to implement the RocketMQ scalers.

You should learn before applying for this topic

Helm/Apache RocketMQ Operator/Apache RocketMQ Connect FrameworkDocker Image
Apache Hudi
.RocketMQ multi-replica mechanism(based on DLedger)
How KEDA works

Mentor

vongosling@apachewlliqipeng@apache.orgImage Modifiedduhengforever@apachevongosling@apache.orgImage Modified


Difficulty: Major
Potential mentors:
duheng, mail: duheng (at) apache.org
Project Devs, mail: dev (at) rocketmq.apache.org

Apache RocketMQ

Scaler for KEDA

CLI Admin Tool Developed by Golang

Apache rocketmq provides a cli admin tool developed by Java to querying, managing and diagnosing various problems. At the same time, it also provides a set of API interface, which can be called by Java application program to create, delete, query, message query and other functions. This topic requires the realization of CLI management tool and a set of API interface developed by golang language, through which go application can realize the creation, query and other operations of topic

Context

KEDA allows for fine-grained autoscaling (including to/from zero) for event-driven Kubernetes workloads. KEDA serves as a Kubernetes Metrics Server and allows users to define autoscaling rules using a dedicated Kubernetes custom resource definition. KEDA has a number of “scalers” that can both detect if a deployment should be activated or deactivated, and feed custom metrics for a specific event source. In this topic, you need to implement the RocketMQ scalers.

You should learn before applying for this topic

Helm/Apache RocketMQ Operator/Apache RocketMQ Docker Image
Apache RocketMQ multi-replica mechanism(based on DLedger)
How KEDA worksGo Client

Mentor

wlliqipeng@apache.orgvongosling@apache.org

Difficulty: Major
Potential mentors:
duheng, mail: duheng (at) apache.org
Project Devs, mail: dev (at) rocketmq.apache.org

Apache RocketMQ CLI Admin Tool Developed by Golang

duheng, mail: duheng (at) apache.org
Project Devs, mail: dev (at) rocketmq.apache.org

RocketMQ Connect Elasticsearch

Content

The Elasticsearch sink connector allows moving data from Apache RocketMQ to Elasticsearch 6.x, and 7.x. It writes data from a topic in Apache RocketMQ to an index in Elasticsearch and all data for a topic have the same type.

Elasticsearch is often used for text queries, analytics and as an key-value store (use cases). The connector covers both the analytics and key-value store use cases.

For the analytics use case, each message is in RocketMQ is treated as an event and the connector uses topic+message queue+offset as a unique identifier for events, which then converted to unique documents in Elasticsearch. For the key-value store use case, it supports using keys from RocketMQ messages as document ids in Elasticsearch and provides configurations ensuring that updates to a key are written to Elasticsearch in order.

So, in this project, you need to implement a sink connector based on OpenMessaging connect API, and will executed on RocketMQ connect runtimeApache rocketmq provides a cli admin tool developed by Java to querying, managing and diagnosing various problems. At the same time, it also provides a set of API interface, which can be called by Java application program to create, delete, query, message query and other functions. This topic requires the realization of CLI management tool and a set of API interface developed by golang language, through which go application can realize the creation, query and other operations of topic.

You should learn before applying for this topic

Apache RocketMQ
Apache RocketMQ Go Client

Mentor

Elasticsearch/[Apache RocketMQ|https://rocketmq.apache.org/]/[Apache RocketMQ Connect|https://github.com/apache/rocketmq-externals/tree/master/rocketmq-connect]/ OpenMessaging Connect API

Mentor

duhengforever@apachewlliqipeng@apache.orgImage Modifiedvongosling@apache.org

Difficulty: Major
Potential mentors:
duheng, mail: duheng (at) apache.org
Project Devs, mail: dev (at) rocketmq.apache.org

RocketMQ Connect Elasticsearch

CloudEvents support for RocketMQ

Context

Events are everywhere. However, event producers tend to describe events differently.

The lack of a common way of describing events means developers must constantly re-learn how to consume events. This also limits the potential for libraries, tooling and infrastructure to aide the delivery of event data across environments, like SDKs, event routers or tracing systems. The portability and productivity we can achieve from event data is hindered overall.

CloudEvents is a specification for describing event data in common formats to provide interoperability across services, platforms and systems.
RocketMQ as an event streaming platform, also hopes to improve the interoperability of different event platforms by being compatible with the CloudEvents standard and supporting CloudEvents SDK. In this topic, you need to improve the binding spec. and implement the RocketMQ CloudEvents SDK(Java、Golang or others)

Content

The Elasticsearch sink connector allows moving data from Apache RocketMQ to Elasticsearch 6.x, and 7.x. It writes data from a topic in Apache RocketMQ to an index in Elasticsearch and all data for a topic have the same type.

Elasticsearch is often used for text queries, analytics and as an key-value store (use cases). The connector covers both the analytics and key-value store use cases.

For the analytics use case, each message is in RocketMQ is treated as an event and the connector uses topic+message queue+offset as a unique identifier for events, which then converted to unique documents in Elasticsearch. For the key-value store use case, it supports using keys from RocketMQ messages as document ids in Elasticsearch and provides configurations ensuring that updates to a key are written to Elasticsearch in order.

So, in this project, you need to implement a sink connector based on OpenMessaging connect API, and will executed on RocketMQ connect runtime.

You should learn before applying for this topic

Elasticsearch/[Apache RocketMQ|https://rocketmq.apache.org/]/[Apache RocketMQ Connect|https://github.com/apache/rocketmq-externals/tree/master/rocketmq-connect]/ OpenMessaging Connect APISDK/CloudEvents

Mentor

duhengforever@apache.orgvongosling@apache.org

Difficulty: Major
Potential mentors:
duheng, mail: duheng (at) apache.org
Project Devs, mail: dev (at) rocketmq.apache.org
CloudEvents support for

Context

Events are everywhere. However, event producers tend to describe events differently.

The lack of a common way of describing events means developers must constantly re-learn how to consume events. This also limits the potential for libraries, tooling and infrastructure to aide the delivery of event data across environments, like SDKs, event routers or tracing systems. The portability and productivity we can achieve from event data is hindered overall.

There are many ways that Apache Flink and Apache RocketMQ can integrate to provide elastic data processing at a large scale. RocketMQ can be used as a streaming source and streaming sink in Flink DataStream applications, which is the main implementation and popular usage in RocketMQ community. Developers can ingest data from RocketMQ into a Flink job that makes computations and processes real-time data, to then send the data back to a RocketMQ topic as a streaming sink. More details you could see from https://github.com/apache/rocketmq-externals/tree/master/rocketmq-flink.

With more and more DW or OLAP engineers using RocketMQ for their data processing work, another potential integration needs arose. Developers can take advantage of as both a streaming source and a streaming table sink for Flink SQL or Table API queries. Also, Flink 1.9.0 makes the Table API a first-class citizen. It's time to support SQL in RocketMQ. This is the topic for Apache RocketMQ connect FlinkCloudEvents is a specification for describing event data in common formats to provide interoperability across services, platforms and systems.
RocketMQ as an event streaming platform, also hopes to improve the interoperability of different event platforms by being compatible with the CloudEvents standard and supporting CloudEvents SDK. In this topic, you need to improve the binding spec. and implement the RocketMQ CloudEvents SDK(Java、Golang or others).

You should learn before applying for this topic

Apache RocketMQ /Apache RocketMQ SDK/CloudEvents

Mentor

Flink Connector
Apache Flink Table API

Extension

For some expertise students in the streaming field, you could continue to implements and provides an exactly-once streaming source and at-least-once(or exactly-once)streaming sink, like the issue #500 said.

Mentor

nicholasjiang@apache.orgImage Added ,   duhengforever@apacheduhengforever@apache.orgvongosling@apache.org

Difficulty: Major
Potential mentors:
duheng, mail: duheng (at) apache.org
Project Devs, mail: dev (at) rocketmq.apache.orgmail: dev (at) rocketmq.apache.org

OpenWebBeans

Implement lightweight CDI-centric HTTP server

Apache OpenWebBeans is a IoC container implementing CDI specification.
With the rise of Kubernetes and more generally the Cloud adoption, it becomes more and more key to be able to have fast, light and reliable servers.
That ecosystem is mainly composed of Microprofile servers.
However their stack is quite huge for most applications and OpenWebBeans Microprofile server are not CDI centric (Meecrowave and Tomee are Tomcat centric).
This is why the need of a light HTTP server (likely Netty based), embeddable in CDI context (as a bean) comes.
It will be close to a light embedded servlet container but likely more reactive in the way the server will need to scale.
It must handle fixed size payload (with Content-Length header) but also chunking.
File upload is an optional bonus.
This task will require:

1. to implement a HTTP server with Netty (or alike),
2. define a light HTTP API (at least supporting filter like interception, even interceptor based but in a reactive fashion - CompletionStage),
3. make it configurable (Micorprofile config or so) and embedded.

Once this light server is ready, the next step for a Java application to embrace the cloud is to make it native.
This is generally done through GraalVM.
Today OpenWebBeans proxy generation is not stable so making it native is not trivial.
The end of the task will therefore be to implement a proxy SPI in OpenWebBeans enabling to have pre-generated proxies and reload them at runtime (per bean).
The delivery of this task can be a Runnable (with a companion main(String[])).

 
You should know:
• Java
• HTTP


Difficulty: Major

mentors: tandraschko@apache.org, rmannibucau@apache.org
Potential mentors:
Project Devs, mail: dev (at) openwebbeans.apache.org

Context

There are many ways that Apache Flink and Apache RocketMQ can integrate to provide elastic data processing at a large scale. RocketMQ can be used as a streaming source and streaming sink in Flink DataStream applications, which is the main implementation and popular usage in RocketMQ community. Developers can ingest data from RocketMQ into a Flink job that makes computations and processes real-time data, to then send the data back to a RocketMQ topic as a streaming sink. More details you could see from https://github.com/apache/rocketmq-externals/tree/master/rocketmq-flink.

With more and more DW or OLAP engineers using RocketMQ for their data processing work, another potential integration needs arose. Developers can take advantage of as both a streaming source and a streaming table sink for Flink SQL or Table API queries. Also, Flink 1.9.0 makes the Table API a first-class citizen. It's time to support SQL in RocketMQ. This is the topic for Apache RocketMQ connect Flink.

You should learn before applying for this topic

Apache RocketMQ Flink Connector
Apache Flink Table API

Extension

For some expertise students in the streaming field, you could continue to implements and provides an exactly-once streaming source and at-least-once(or exactly-once)streaming sink, like the issue #500 said.

Mentor

nicholasjiang@apache.orgImage Removed ,   duhengforever@apache.orgImage Removedvongosling@apache.orgImage Removed

Difficulty: Major
Potential mentors:
duhengThomas Andraschko, mail: duheng tandraschko (at) apache.org
Project Devs, mail: dev (at) rocketmqopenwebbeans.apache.org