Contents
- APISIX
- Apache Airflow
- Apache Fineract
- Apache IoTDB
- Apache IoTDB Database Connection Pool and integration with some web frameworkApache IoTDB trigger module for streaming cumputing
- Apache Nemo
- Beam
- Camel
- RocketMQ
- OpenWebBeans
APISIX
Check the API version for every request
In order to make sure the dashboard is using the correct API version, we'd better add the APISIX version in every api response.
Please
- Add the dashboard api version variable in the config file.
- Check every api response in the request.ts file, show a alert when the dashboard version is not compatible with the APISIX version.
mentors: juzhiyuan@apache.org
Potential mentors:
Project Devs, mail: dev (at) apisix.apache.org
...
add commit message checker
For the quality of every commit message, please add the Commit Message checker just like https://github.com/vuejs/vue-next/blob/master/scripts/verifyCommit.js
Difficulty: Minor
mentors: juzhiyuan@apache.org
Potential mentors:
Project Devs, mail: dev (at) apisix.apache.org
Add X-API-KEY for api request
Our API for dashboard is using API Key to auth the request, please add the API Key Header in global request handler. Just like [1] for reference. Please note, this key should be added by config file, such as .env file. I recommend fetching this key by fetchAPIKey API.
[1] b3b3065#diff-084c3d9c2786b7cd963be84e40a38725R32
Difficulty: Minor
mentors: juzhiyuan@apache.org
implement Apache APISIX echo plugin
APISIX currently provides a simple example plugin, but it does not provide useful functionality.
So we can provide a useful plugin to help users understand as fully as possible how to develop an APISIX plugin.
This plugin could implement the corresponding functionality in the common phases such as init, rewrite, access, balancer, header filer, body filter and log . But the specific functionality are still being considered.
Difficulty: Major
mentors: agile6v@ apache.org, wenming@apache.org,
yousa@apache.org
Potential mentors:
Project Devs, mail: dev (at) apisix.apache.org
implement Apache APISIX echo plugin
APISIX currently provides a simple example plugin, but it does not provide useful functionality.
So we can provide a useful plugin to help users understand as fully as possible how to develop an APISIX plugin.
This plugin could implement the corresponding functionality in the common phases such as init, rewrite, access, balancer, header filer, body filter and log . But the specific functionality are still being considered
feature: Support follow redirect
When a client request passes through APISIX to upstream, if upstream returns 301 or 302 and then APISIX returns directly to the client by default. The client receives 301 or 302 response and then initiates the request again based on the address specified by Location. Sometimes the client wants APISIX to help it do this, so APISIX can provide this capability to support more scenarios.
Difficulty: Major
mentors: agile6v@ apache.org, wenming@apache.org,
yousa@apache.org
Potential mentors:
Project Devs, mail: dev (at) apisix.apache.org
feature: Support follow redirect
When a client request passes through APISIX to upstream, if upstream returns 301 or 302 and then APISIX returns directly to the client by default. The client receives 301 or 302 response and then initiates the request again based on the address specified by Location. Sometimes the client wants APISIX to help it do this, so APISIX can provide this capability to support more scenarios.
Difficulty: Major
mentors: agile6v@ apache.org, wenming@apache.org,
yousa@apache
Add X-API-KEY for api request
Our API for dashboard is using API Key to auth the request, please add the API Key Header in global request handler. Just like [1] for reference. Please note, this key should be added by config file, such as .env file. I recommend fetching this key by fetchAPIKey API.
[1] b3b3065#diff-084c3d9c2786b7cd963be84e40a38725R32
Difficulty: Minor
mentors: juzhiyuan@apache.org
Potential mentors:
Project Devs, mail: dev (at) apisix.apache.org
...
Apache IoTDB integration with MiNiFI/NiFi
IoTDB is a database for storing time series data.
MiNiFI is a data flow engine to transfer data from A to B, e.g., from PLC4X to IoTDB.
This proposal is for integration IoTDB with MiNiFi.
- let MiNiFi/NiFi to support writing data into IoTDB.
Difficulty: major
mentors:
Apache IoTDB
Database Connection Pool and integration with some web frameworktrigger module for streaming cumputing
IoTDB is a time series database.
When using a database in an application, the database connection pool is much helpful for high performance and saving resources.
Besides, when developing a website using Spring or some other web framework, now many developers do not control the database connection manually. Instead, developers just need to tell what database they will use and the web framework can handle everything well.
This proposal is for
- letting IoTDB supports some database connection pools like Apache Commons DBCP, C3P0.
- integration IoTDB with one web framework (e.g., Spring)
You should know:
- IoTDB
- At least one DB connection pool
- Know Spring or some other web framework
mentors:
hxd@apache.org
-series data management system and the data usually comes in a streaming way.
In the IoT area, when a data point comes, a trigger can be called because of the following scenario:
- (single data point calculation) the data point is an outlier point, or the data value reaches a warning threshold. IoTDB needs to publish the data point to those who subscribed the event.
- (multiply time series data point calculation) a device sends several metrics data to IoTDB, e.g., vehicle d1 sends average speed and running time to IoTDB. Then users may want to get the mileage of the vehicle (speed x time). IoTDB needs to calculate the result and save it to another time series.
- (Time window calculation) a device reports its temperature every second. Though the temperature is not too high, if it keeps increasing in 5 seconds, IoTDB needs to report the event to those who subscribe that.
As there are many streaming computing projects already, we can integrate one of them into IoTDB.
- If IoTDB runs on Edge, we can integrate Apache StreamPipes or Apache Edgent.
- If IOTDB runs on a Server, the
Apache IoTDB trigger module for streaming cumputing
IoTDB is a time-series data management system and the data usually comes in a streaming way.
In the IoT area, when a data point comes, a trigger can be called because of the following scenario:
- (single data point calculation) the data point is an outlier point, or the data value reaches a warning threshold. IoTDB needs to publish the data point to those who subscribed the event.
- (multiply time series data point calculation) a device sends several metrics data to IoTDB, e.g., vehicle d1 sends average speed and running time to IoTDB. Then users may want to get the mileage of the vehicle (speed x time). IoTDB needs to calculate the result and save it to another time series.
- (Time window calculation) a device reports its temperature every second. Though the temperature is not too high, if it keeps increasing in 5 seconds, IoTDB needs to report the event to those who subscribe that.
As there are many streaming computing projects already, we can integrate one of them into IoTDB.
- If IoTDB runs on Edge, we can integrate Apache StreamPipes or Apache Edgent.
- If IOTDB runs on a Server, the above also works and Apache Flink is also a good choice.
The process is:
- User registers a trigger into IoTDB.
- When a data comes, IoTDB save it and check whether there are triggers on it
- If so, call a streaming computing framework to do something;
You may need to know:
- At least one streaming computing project.
- SQL parser or some other DSL parser tool.
You have to modify the source codes of IoTDB server engine module.
Difficulty: A little hard
mentors:
...
Apache IoTDB integration with Prometheus
IoTDB is a highly efficient time series database.
Prometheus is a monitoring and alerting toolkit, which supports collecting data from other systems, servers, and IoT devices, saving data into a DB, visualizing data and provides some query APIs.
Prometheus allows users to use their database rather than just Prometheus DB for storing time series databases.
This proposal is for integrating IoTDB with Prometheus.
You should know:
- How to use Prometheus
- How to use IoTDB
- Java and Go language
difficulty: Major
mentors:
hxd@apache.org
...
IoTDB Database Connection Pool and integration with some web framework
IoTDB is a time series database.
When using a database in an application, the database connection pool is much helpful for high performance and saving resources.
Besides, when developing a website using Spring or some other web framework, now many developers do not control the database connection manually. Instead, developers just need to tell what database they will use and the web framework can handle everything well.
This proposal is for
- letting IoTDB supports some database connection pools like Apache Commons DBCP, C3P0.
- integration IoTDB with one web framework (e.g., Spring)
You should know:
- IoTDB
- At least one DB connection pool
- Know Spring or some other web framework
mentors:
hxd@apache.org
Dynamic Task Sizing on Nemo
This is an umbrella issue to keep track of the issues related to the dynamic task sizing feature on Nemo.
Dynamic task sizing needs to consider a workload and try to decide on the optimal task size based on the runtime metrics and characteristics. It should have an effect on the parallelism and the partitions, on how many partitions an intermediate data should be divided/shuffled into, and to effectively handle skews in the meanwhile.
Beam
Apache Nemo
Dynamic Task Sizing on Nemo
This is an umbrella issue to keep track of the issues related to the dynamic task sizing feature on Nemo.
Dynamic task sizing needs to consider a workload and try to decide on the optimal task size based on the runtime metrics and characteristics. It should have an effect on the parallelism and the partitions, on how many partitions an intermediate data should be divided/shuffled into, and to effectively handle skews in the meanwhile
Add Daffodil IO for Apache Beam
From https://daffodil.apache.org/:
Daffodil is an open source implementation of the DFDL specification that uses these DFDL schemas to parse fixed format data into an infoset, which is most commonly represented as either XML or JSON. This allows the use of well-established XML or JSON technologies and libraries to consume, inspect, and manipulate fixed format data in existing solutions. Daffodil is also capable of the reverse by serializing or “unparsing” an XML or JSON infoset back to the original data format.
We should create a Beam IO that accepts a DFDL schema as an argument and can then produce and consume data in the specified format. I think it would be most natural for Beam users if this IO could produce Beam Rows, but an initial version that just operates with Infosets could be useful as well.
Implement an Azure blobstore filesystem for Python SDK
This is similar to BEAM-2572, but for Azure's blobstore.
Beam
Add Daffodil IO for Apache Beam
From https://daffodil.apache.org/:
Daffodil is an open source implementation of the DFDL specification that uses these DFDL schemas to parse fixed format data into an infoset, which is most commonly represented as either XML or JSON. This allows the use of well-established XML or JSON technologies and libraries to consume, inspect, and manipulate fixed format data in existing solutions. Daffodil is also capable of the reverse by serializing or “unparsing” an XML or JSON infoset back to the original data format.
We should create a Beam IO that accepts a DFDL schema as an argument and can then produce and consume data in the specified format. I think it would be most natural for Beam users if this IO could produce Beam Rows, but an initial version that just operates with Infosets could be useful as well.
Implement an Azure blobstore filesystem for Python SDK
This is similar to BEAM-2572, but for Azure's blobstore.
BeamSQL aggregation analytics functionality
Mentor email: ruwang@google.com. Feel free to send emails for your questions.
Project Information
BeamSQL aggregation analytics functionality
Mentor email: ruwang@google.com. Feel free to send emails for your questions.
Project Information
---------------------
BeamSQL has a long list of of aggregation/aggregation analytics functionalities to support.
To begin with, you will need to support this syntax:
analytic_function_name ( [ argument_list ] ) OVER ( [ PARTITION BY partition_expression_list ] [ ORDER BY expression [{ ASC | DESC }] [, ...] ] [ window_frame_clause ] )
As there is a long list of analytics functions, a good start point is support rank() first.
This will requires touch core components of BeamSQL:
1. SQL parser to support the syntax above.
2. SQL core to implement physical relational operator.
3. Distributed algorithms to implement a list of functions in a distributed manner.
4. Build benchmarks to measure performance of your implementation.
To understand what SQL analytics functionality is, you could check this great explanation doc: https://cloud.google.com/bigquery/docs/reference/standard-sql/analytic-function-concepts.
To know about Beam's programming model, check: https://beam.apache.org/documentation/programming-guide/#overview
...
RocketMQ Connect Hbase
Content
The Hbase sink connector allows moving data from Apache RocketMQ to Hbase. It writes data from a topic in RocketMQ to a table in the specified HBase instance. Auto-creation of tables and the auto-creation of column families are also supported.So, in instance. Auto-creation of tables and the auto-creation of column families are also supported.
So, in this project, you need to implement an Hbase sink connector based on OpenMessaging connect API, and will execute on RocketMQ connect runtime.
You should learn before applying for this topic
Hbase/Apache RocketMQ/Apache RocketMQ Connect/ OpenMessaging Connect API
Mentor
RocketMQ Connect Cassandra
ContentThe Cassandra sink connector allows writing data to Apache Cassandra. In this project, you need to implement |
a Cassandra sink connector based on OpenMessaging connect API, and |
run it on RocketMQ connect |
runtimeh3. |
You should learn before applying for this topic |
Cassandra/[Apache RocketMQ | https://rocketmq.apache.org/]/[Apache RocketMQ Connect | https://github.com/apache/rocketmq-externals/tree/master/rocketmq-connect]/ OpenMessaging Connect API |
h3. Mentor |
RocketMQ Connect
CassandraInfluxDB
Content
The
CassandraInfluxDB sink connector allows
writing data to Apache Cassandra.moving data from Apache RocketMQ to InfluxDB. It writes data from a topic in Apache RocketMQ to InfluxDB. While The InfluxDB source connector is used to export data from InfluxDB Server to RocketMQ.
In this project, you need to implement
a Cassandra sink connectoran InfluxDB sink connector(source connector is optional) based on OpenMessaging connect API
, and run it on RocketMQ connect runtimeh3..
You should learn before applying for this topic
CassandraInfluxDB/[Apache RocketMQ
|https://rocketmq.apache.org/]/[Apache RocketMQ Connect
|https://github.com/apache/rocketmq-externals/tree/master/rocketmq-connect]/ OpenMessaging Connect API
h3.Mentor
duhengforever@apache.org, wlliqipeng@apache.org
, vongosling@apache.org
The Operator for RocketMQ
Connect InfluxDBExporter
he exporter exposes the endpoint of monitoring data collection to Prometheus server in the form of HTTP service. Prometheus server can obtain the monitoring data to be collected by accessing the endpoint endpoint provided by the exporter. RocketMQ exporter is such an exporter. It first collects data from rocketmq cluster, and then normalizes the collected data to meet the requirements of Prometheus system with the help of the third-party client library provided by Prometheus. Prometheus regularly pulls data from the exporter. This topic needs to implement an operator of rocketmq exporter to facilitate the deployment of the exporter in kubenetes platform
Content
The InfluxDB sink connector allows moving data from Apache RocketMQ to InfluxDB. It writes data from a topic in Apache RocketMQ to InfluxDB. While The InfluxDB source connector is used to export data from InfluxDB Server to RocketMQ.
In this project, you need to implement an InfluxDB sink connector(source connector is optional) based on OpenMessaging connect API.
You should learn before applying for this topic
InfluxDB/[Apache RocketMQ|https://rocketmq.apache.org/]/[Apache RocketMQ Connect|https://github.com/apache/rocketmq-externals/tree/master/rocketmq-connect]/ OpenMessaging Connect API
Mentor
RocketMQ-Exporter Repo
RocketMQ-Exporter Overview
Kubetenes Operator
RocketMQ-Operator
Mentor
duhengforever@apache.org, wlliqipeng@apache.org
, vongosling@apache.org
The Operator for RocketMQ Exporter
RocketMQ Connect IoTDB
Content
The IoTDB sink connector allows moving data from Apache RocketMQ to IoTDB. It writes data from a topic in Apache RocketMQ to IoTDB.
IoTDB (Internet of Things Database) is a data management system for time series data, which can provide users specific services, such as, data collection, storage and analysis. Due to its lightweight structure, high performance and usable features together with its seamless integration with the Hadoop and Spark ecology, IoTDB meets the requirements of massive dataset storage, high throughput data input and complex data analysis in the industrial IoTDB field.
In this project, there are some update operations for historical data, so it is necessary to ensure the sequential transmission and consumption of data via RocketMQ. If there is no update operation in use, then there is no need to guarantee the order of data. IoTDB will process these data which may be disorderly.
So, in this project, you need to implement an IoTDB sink connector based on OpenMessaging connect API, and run it on RocketMQ connect runtimehe exporter exposes the endpoint of monitoring data collection to Prometheus server in the form of HTTP service. Prometheus server can obtain the monitoring data to be collected by accessing the endpoint endpoint provided by the exporter. RocketMQ exporter is such an exporter. It first collects data from rocketmq cluster, and then normalizes the collected data to meet the requirements of Prometheus system with the help of the third-party client library provided by Prometheus. Prometheus regularly pulls data from the exporter. This topic needs to implement an operator of rocketmq exporter to facilitate the deployment of the exporter in kubenetes platform.
You should learn before applying for this topic
RocketMQ-Exporter Repo
RocketMQ-Exporter Overview
Kubetenes Operator
RocketMQ-Operator
Mentor
IoTDB/[Apache RocketMQ|https://rocketmq.apache.org/]/[Apache RocketMQ Connect|https://github.com/apache/rocketmq-externals/tree/master/rocketmq-connect]/ OpenMessaging Connect API
Mentor
hxd@apache.org, duhengforever@apache.org, wlliqipeng@apache.org
, vongosling@apache.org
RocketMQ Connect IoTDB
Apache RocketMQ Schema Registry
Content
In order to help RocketMQ improve its event management capabilities, and at the same time better decouple the producer and receiver, keep the event forward compatible, so we need a service for event metadata management is called a schema registry.
Schema registry will provide a GraphQL interface for developers to define standard schemas for their events, share them across the organization and safely evolve them in a way that is backward compatible and future proof
Content
The IoTDB sink connector allows moving data from Apache RocketMQ to IoTDB. It writes data from a topic in Apache RocketMQ to IoTDB.
IoTDB (Internet of Things Database) is a data management system for time series data, which can provide users specific services, such as, data collection, storage and analysis. Due to its lightweight structure, high performance and usable features together with its seamless integration with the Hadoop and Spark ecology, IoTDB meets the requirements of massive dataset storage, high throughput data input and complex data analysis in the industrial IoTDB field.
In this project, there are some update operations for historical data, so it is necessary to ensure the sequential transmission and consumption of data via RocketMQ. If there is no update operation in use, then there is no need to guarantee the order of data. IoTDB will process these data which may be disorderly.
So, in this project, you need to implement an IoTDB sink connector based on OpenMessaging connect API, and run it on RocketMQ connect runtime.
You should learn before applying for this topic
IoTDB/[Apache RocketMQ|https://rocketmq.apache.org/]/[Apache RocketMQ Connect|https://github.com/apache/rocketmq-externals/tree/master/rocketmq-connect]/ OpenMessaging Connect APISDK/
Mentor
hxd@apache.org, duhengforever@apache.org, wlliqipeng@apache.org
, vongosling@apache.org
Apache RocketMQ
Schema RegistryChannel for Knative
Context
Knative is a kubernetes based platform for building, deploying and managing modern serverless applications. Knative to provide a set of middleware components that are essential to building modern, source-centric, and container-based applications that can run anywhere: on-premises, in the cloud, or even in a third-party data centre. Knative consists of the Serving and Eventing components. Eventing is a system that is designed to address a common need for cloud-native development and provides composable primitives to enable late-binding event sources and event consumers. Eventing also defines an event forwarding and persistence layer, called a Channel. Each channel is a separate Kubernetes Custom Resource. This topic requires you to implement rocketmqchannel based on Apache RocketMQ
Content
In order to help RocketMQ improve its event management capabilities, and at the same time better decouple the producer and receiver, keep the event forward compatible, so we need a service for event metadata management is called a schema registry.
Schema registry will provide a GraphQL interface for developers to define standard schemas for their events, share them across the organization and safely evolve them in a way that is backward compatible and future proof.
You should learn before applying for this topic
How Knative works
RocketMQSource for Knative
Apache RocketMQ /Apache RocketMQ SDK/Operator
Mentor
duhengforever@apachewlliqipeng@apache.org, vongosling@apache.org
Apache RocketMQ Channel for Knative
Apache RocketMQ Ingestion for Druid
Context
Druid is a real-time analytics database designed for fast slice-and-dice analytics ("OLAP" queries) on large data sets. In this topic, you should develop the RocketMQ indexing service enables the configuration of supervisors on the Overlord, which facilitate ingestion from RocketMQ by managing the creation and lifetime of RocketMQ indexing tasks. These indexing tasks read events using RocketMQ's own partition and offset mechanism. The supervisor oversees the state of the indexing tasks to coordinate handoffs, manage failures, and ensure that the scalability and replication requirements are maintained
Context
Knative is a kubernetes based platform for building, deploying and managing modern serverless applications. Knative to provide a set of middleware components that are essential to building modern, source-centric, and container-based applications that can run anywhere: on-premises, in the cloud, or even in a third-party data centre. Knative consists of the Serving and Eventing components. Eventing is a system that is designed to address a common need for cloud-native development and provides composable primitives to enable late-binding event sources and event consumers. Eventing also defines an event forwarding and persistence layer, called a Channel. Each channel is a separate Kubernetes Custom Resource. This topic requires you to implement rocketmqchannel based on Apache RocketMQ.
You should learn before applying for this topic
How Knative works
RocketMQSource for Knative
Apache RocketMQ OperatorApache Druid Data Ingestion
Mentor
wlliqipeng@apachevongosling@apache.org, vongosling@apacheduhengforever@apache.org
Apache RocketMQ
Ingestion for DruidConnect Hudi
Context
Druid is a real-time analytics database designed for fast slice-and-dice analytics ("OLAP" queries) on large data sets. In this topic, you should develop the RocketMQ indexing service enables the configuration of supervisors on the Overlord, which facilitate ingestion from RocketMQ by managing the creation and lifetime of RocketMQ indexing tasks. These indexing tasks read events using RocketMQ's own partition and offset mechanism. The supervisor oversees the state of the indexing tasks to coordinate handoffs, manage failures, and ensure that the scalability and replication requirements are maintainedHudi could ingest and manage the storage of large analytical datasets over DFS (hdfs or cloud stores). It can act as either a source or sink for streaming processing platform such as Apache RocketMQ. it also can be used as a state store inside a processing DAG (similar to how rocksDB is used by Flink). This is an item on the roadmap of the Apache RocketMQ. This time, you should implement a fully hudi source and sink based on RocketMQ connect framework, which is a most important implementation of the OpenConnect.
You should learn before applying for this topic
Apache Druid Data IngestionRocketMQ Connect Framework
Apache Hudi
.
Mentor
Apache RocketMQ
Connect HudiScaler for KEDA
Context
Hudi could ingest and manage the storage of large analytical datasets over DFS (hdfs or cloud stores). It can act as either a source or sink for streaming processing platform such as Apache RocketMQ. it also can be used as a state store inside a processing DAG (similar to how rocksDB is used by Flink). This is an item on the roadmap of the Apache RocketMQ. This time, you should implement a fully hudi source and sink based on RocketMQ connect framework, which is a most important implementation of the OpenConnectKEDA allows for fine-grained autoscaling (including to/from zero) for event-driven Kubernetes workloads. KEDA serves as a Kubernetes Metrics Server and allows users to define autoscaling rules using a dedicated Kubernetes custom resource definition. KEDA has a number of “scalers” that can both detect if a deployment should be activated or deactivated, and feed custom metrics for a specific event source. In this topic, you need to implement the RocketMQ scalers.
You should learn before applying for this topic
Helm/Apache RocketMQ Operator/Apache RocketMQ Connect FrameworkDocker Image
Apache Hudi
.RocketMQ multi-replica mechanism(based on DLedger)
How KEDA works
Mentor
vongosling@apachewlliqipeng@apache.org, duhengforever@apachevongosling@apache.org
Apache RocketMQ
Scaler for KEDACLI Admin Tool Developed by Golang
Apache rocketmq provides a cli admin tool developed by Java to querying, managing and diagnosing various problems. At the same time, it also provides a set of API interface, which can be called by Java application program to create, delete, query, message query and other functions. This topic requires the realization of CLI management tool and a set of API interface developed by golang language, through which go application can realize the creation, query and other operations of topic
Context
KEDA allows for fine-grained autoscaling (including to/from zero) for event-driven Kubernetes workloads. KEDA serves as a Kubernetes Metrics Server and allows users to define autoscaling rules using a dedicated Kubernetes custom resource definition. KEDA has a number of “scalers” that can both detect if a deployment should be activated or deactivated, and feed custom metrics for a specific event source. In this topic, you need to implement the RocketMQ scalers.
You should learn before applying for this topic
Helm/Apache RocketMQ Operator/Apache RocketMQ Docker Image
Apache RocketMQ multi-replica mechanism(based on DLedger)
How KEDA worksGo Client
Mentor
Apache RocketMQ CLI Admin Tool Developed by Golang
RocketMQ Connect Elasticsearch
Content
The Elasticsearch sink connector allows moving data from Apache RocketMQ to Elasticsearch 6.x, and 7.x. It writes data from a topic in Apache RocketMQ to an index in Elasticsearch and all data for a topic have the same type.
Elasticsearch is often used for text queries, analytics and as an key-value store (use cases). The connector covers both the analytics and key-value store use cases.
For the analytics use case, each message is in RocketMQ is treated as an event and the connector uses topic+message queue+offset as a unique identifier for events, which then converted to unique documents in Elasticsearch. For the key-value store use case, it supports using keys from RocketMQ messages as document ids in Elasticsearch and provides configurations ensuring that updates to a key are written to Elasticsearch in order.
So, in this project, you need to implement a sink connector based on OpenMessaging connect API, and will executed on RocketMQ connect runtimeApache rocketmq provides a cli admin tool developed by Java to querying, managing and diagnosing various problems. At the same time, it also provides a set of API interface, which can be called by Java application program to create, delete, query, message query and other functions. This topic requires the realization of CLI management tool and a set of API interface developed by golang language, through which go application can realize the creation, query and other operations of topic.
You should learn before applying for this topic
Apache RocketMQ
Apache RocketMQ Go Client
Mentor
Elasticsearch/[Apache RocketMQ|https://rocketmq.apache.org/]/[Apache RocketMQ Connect|https://github.com/apache/rocketmq-externals/tree/master/rocketmq-connect]/ OpenMessaging Connect API
Mentor
duhengforever@apachewlliqipeng@apache.org, vongosling@apache.org
RocketMQ Connect Elasticsearch
CloudEvents support for RocketMQ
Context
Events are everywhere. However, event producers tend to describe events differently.
The lack of a common way of describing events means developers must constantly re-learn how to consume events. This also limits the potential for libraries, tooling and infrastructure to aide the delivery of event data across environments, like SDKs, event routers or tracing systems. The portability and productivity we can achieve from event data is hindered overall.
CloudEvents is a specification for describing event data in common formats to provide interoperability across services, platforms and systems.
RocketMQ as an event streaming platform, also hopes to improve the interoperability of different event platforms by being compatible with the CloudEvents standard and supporting CloudEvents SDK. In this topic, you need to improve the binding spec. and implement the RocketMQ CloudEvents SDK(Java、Golang or others)
Content
The Elasticsearch sink connector allows moving data from Apache RocketMQ to Elasticsearch 6.x, and 7.x. It writes data from a topic in Apache RocketMQ to an index in Elasticsearch and all data for a topic have the same type.
Elasticsearch is often used for text queries, analytics and as an key-value store (use cases). The connector covers both the analytics and key-value store use cases.
For the analytics use case, each message is in RocketMQ is treated as an event and the connector uses topic+message queue+offset as a unique identifier for events, which then converted to unique documents in Elasticsearch. For the key-value store use case, it supports using keys from RocketMQ messages as document ids in Elasticsearch and provides configurations ensuring that updates to a key are written to Elasticsearch in order.
So, in this project, you need to implement a sink connector based on OpenMessaging connect API, and will executed on RocketMQ connect runtime.
You should learn before applying for this topic
Elasticsearch/[Apache RocketMQ|https://rocketmq.apache.org/]/[Apache RocketMQ Connect|https://github.com/apache/rocketmq-externals/tree/master/rocketmq-connect]/ OpenMessaging Connect APISDK/CloudEvents
Mentor
Apache RocketMQ Connect Flink
Context
Events are everywhere. However, event producers tend to describe events differently.
The lack of a common way of describing events means developers must constantly re-learn how to consume events. This also limits the potential for libraries, tooling and infrastructure to aide the delivery of event data across environments, like SDKs, event routers or tracing systems. The portability and productivity we can achieve from event data is hindered overall.
There are many ways that Apache Flink and Apache RocketMQ can integrate to provide elastic data processing at a large scale. RocketMQ can be used as a streaming source and streaming sink in Flink DataStream applications, which is the main implementation and popular usage in RocketMQ community. Developers can ingest data from RocketMQ into a Flink job that makes computations and processes real-time data, to then send the data back to a RocketMQ topic as a streaming sink. More details you could see from https://github.com/apache/rocketmq-externals/tree/master/rocketmq-flink.
With more and more DW or OLAP engineers using RocketMQ for their data processing work, another potential integration needs arose. Developers can take advantage of as both a streaming source and a streaming table sink for Flink SQL or Table API queries. Also, Flink 1.9.0 makes the Table API a first-class citizen. It's time to support SQL in RocketMQ. This is the topic for Apache RocketMQ connect FlinkCloudEvents is a specification for describing event data in common formats to provide interoperability across services, platforms and systems.
RocketMQ as an event streaming platform, also hopes to improve the interoperability of different event platforms by being compatible with the CloudEvents standard and supporting CloudEvents SDK. In this topic, you need to improve the binding spec. and implement the RocketMQ CloudEvents SDK(Java、Golang or others).
You should learn before applying for this topic
Apache RocketMQ /Apache RocketMQ SDK/CloudEvents
Mentor
Flink Connector
Apache Flink Table API
Extension
For some expertise students in the streaming field, you could continue to implements and provides an exactly-once streaming source and at-least-once(or exactly-once)streaming sink, like the issue #500 said.
Mentor
nicholasjiang@apache.org , duhengforever@apacheduhengforever@apache.org
, vongosling@apache.org
OpenWebBeans
Implement lightweight CDI-centric HTTP server
Apache OpenWebBeans is a IoC container implementing CDI specification.
With the rise of Kubernetes and more generally the Cloud adoption, it becomes more and more key to be able to have fast, light and reliable servers.
That ecosystem is mainly composed of Microprofile servers.
However their stack is quite huge for most applications and OpenWebBeans Microprofile server are not CDI centric (Meecrowave and Tomee are Tomcat centric).
This is why the need of a light HTTP server (likely Netty based), embeddable in CDI context (as a bean) comes.
It will be close to a light embedded servlet container but likely more reactive in the way the server will need to scale.
It must handle fixed size payload (with Content-Length header) but also chunking.
File upload is an optional bonus.
This task will require:
1. to implement a HTTP server with Netty (or alike),
2. define a light HTTP API (at least supporting filter like interception, even interceptor based but in a reactive fashion - CompletionStage),
3. make it configurable (Micorprofile config or so) and embedded.
Once this light server is ready, the next step for a Java application to embrace the cloud is to make it native.
This is generally done through GraalVM.
Today OpenWebBeans proxy generation is not stable so making it native is not trivial.
The end of the task will therefore be to implement a proxy SPI in OpenWebBeans enabling to have pre-generated proxies and reload them at runtime (per bean).
The delivery of this task can be a Runnable (with a companion main(String[])).
You should know:
• Java
• HTTP
Difficulty: Major
mentors: tandraschko@apache.org, rmannibucau@apache.org
Potential mentors:
Project Devs, mail: dev (at) openwebbeans.apache.org
Apache RocketMQ Connect Flink
Context
There are many ways that Apache Flink and Apache RocketMQ can integrate to provide elastic data processing at a large scale. RocketMQ can be used as a streaming source and streaming sink in Flink DataStream applications, which is the main implementation and popular usage in RocketMQ community. Developers can ingest data from RocketMQ into a Flink job that makes computations and processes real-time data, to then send the data back to a RocketMQ topic as a streaming sink. More details you could see from https://github.com/apache/rocketmq-externals/tree/master/rocketmq-flink.
With more and more DW or OLAP engineers using RocketMQ for their data processing work, another potential integration needs arose. Developers can take advantage of as both a streaming source and a streaming table sink for Flink SQL or Table API queries. Also, Flink 1.9.0 makes the Table API a first-class citizen. It's time to support SQL in RocketMQ. This is the topic for Apache RocketMQ connect Flink.
You should learn before applying for this topic
Apache RocketMQ Flink Connector
Apache Flink Table API
Extension
For some expertise students in the streaming field, you could continue to implements and provides an exactly-once streaming source and at-least-once(or exactly-once)streaming sink, like the issue #500 said.
Mentor
nicholasjiang@apache.org , duhengforever@apache.org
, vongosling@apache.org