A good long term objective for the PMC is to drop RabbitMQ in
favor of pulsar (third parties could package their own components using
RabbitMQ if they wishes...)

This means:

Solve the bugs that were found during the Pulsar MailQueue review
Pulsar MailQueue need to allow listing blobs in order to be
deduplication friendly.
Provide an event bus based on Pulsar
Provide a task manager based on Pulsar
Package a distributed server backed by pulsar, deprecate then replace
the current one.
(optionally) support mail queue priorities

While contributions would of course be welcomed on this topic, we could
offer it as part of GSOC 2022, and we could co-mentor it with mentors of
the Pulsar community (see [3])

[3] https://lists.apache.org/thread/y9s7f6hmh51ky30l20yx0dlz458gw259

Would such a plan gain traction around here ?

Difficulty: Major

Project size: ~350 hour (large)

Potential mentors:

Benoit Tellier, mail: btellier (at) apache.org

Project Devs, mail: dev (at) james.apache.org

Implement a web ui for James administration

James today provides a command line tool to do administration tasks like creating a domain, listing users, setting quota, etc.
It requires access to JMX port and even if lot of admins are confortable with such tools, to make our user base broader, we probably should expose the same commands in Rest and provide a fancy default web ui.
The task would need some basic skills on frontend tools to design an administration board, knowledge on what REST mean and enough Java understanding to add commands to existing Rest backend.
In the team, we have a strong focus on test (who want a mail server that is not tested enough ?) so we will explain and/or teach the student how to have the right test coverage of the features using modern tools like Cucumber, Selenium, rest-assured, etc.

Difficulty: Major

Project size: ~350 hour (large)

Potential mentors:

Matthieu Baechler, mail: matthieu (at) apache.org

Project Devs, mail: dev (at) james.apache.org

[GSOC] James as a (distributed) MX server

Why ?

Alternatives like Postfix...

Do not offer a unified view of the mail queue across nodes
Requires statefull persistent storage

Given Apache James recent push to adopt a distributed mail queue based on Pulsar supporting delays (JAMES-3687), it starts making sense developing tooling for MX related tooling.

I propose myself to mentor a Gsoc on this topic.

Benefits for the student

At the end of this GSOC you will...

Have a solid understanding of email relaying and associated mechanics
Understand James modular architecture (mailet/ matcher / routes)
Have a hands-on expertise in SQL / NoSQL working with technologies like Cassandra, Redis, JPA...
Identify fix and solve architecture problems.
Conduct performance tests and develop an operational mindset

Inventory...

James ships a couple of MX related

[GSOC] James as a (distributed) MX server

Why ?

Alternatives like Postfix...

Do not offer a unified view of the mail queue across nodes
Requires statefull persistent storage

Given Apache James recent push to adopt a distributed mail queue based on Pulsar supporting delays (JAMES-3687), it starts making sense developing tooling for MX related tooling.

I propose myself to mentor a Gsoc on this topic.

Benefits for the student

At the end of this GSOC you will...

Have a solid understanding of email relaying and associated mechanics
Understand James modular architecture (mailet/ matcher / routes)
Have a hands-on expertise in SQL / NoSQL working with technologies like Cassandra, Redis, JPA...
Identify fix and solve architecture problems.
Conduct performance tests and develop an operational mindset

Inventory...

James ships a couple of MX related tools within smtp-hooks/mailets in default packages. It would make sense to me to move those as an extension.

James supports today...

checks agains DNS blacklists. `DNSRBLHandler` or `URIRBLHandler` smtp hook for instance. This can be moved as an extension IMO.

We would need a little performance benchmark to document performance implications of activating DNS-RBL.

Finally as quoted by a gitter guy: it would make more sens to have this done as a MailHook rather as a RcptHook as it would avoid doing the same job again and over again for each recipients. See ~~JAMES-3820~~ .

Grey listing. There's an existing implementation using JDBC as an underlying storage.

Move it as an extension.

Remove JDBC storage, propose 2 storage possibilities: in-memory for single node, REDIS for a distributed topology.

Some work around whitelist mailets? Move it as an extension, propose JPA, Cassandra, and XML configured implementations ? With a route to manage entries in there for JPA + Cassandra ?

I would expect a student to do his own little audit and come up with extra suggestions!

Difficulty: Major

Project size: ~175 hour (medium)

Potential mentors:

Benoit Tellier, mail: btellier (at) apache.org

Project Devs, mail: dev (at) james.apache.org

Commons Statistics

[GSoC] Summary statistics API for Java 8 streams

Placeholder for tasks that could be undertaken in this year's GSoC.

Ideas:

Design an updated summary statistics API for use with Java 8 streams based on the summary statistic implementations in the Commons Math stat.descriptive package including moments, rank and summary sub-packages.

Difficulty: Minor

Project size: ~350 hour

Implement a web ui for James administration

James today provides a command line tool to do administration tasks like creating a domain, listing users, setting quota, etc.
It requires access to JMX port and even if lot of admins are confortable with such tools, to make our user base broader, we probably should expose the same commands in Rest and provide a fancy default web ui.
The task would need some basic skills on frontend tools to design an administration board, knowledge on what REST mean and enough Java understanding to add commands to existing Rest backend.
In the team, we have a strong focus on test (who want a mail server that is not tested enough ?) so we will explain and/or teach the student how to have the right test coverage of the features using modern tools like Cucumber, Selenium, rest-assured, etc.

Difficulty: Major

Project size: ~350 hour (large)

Potential mentors:

Matthieu BaechlerAlex Herbert, mail: matthieu aherbert (at) apache.org

Project Devs, mail: dev (at) james.apache.org

Commons

...

Numbers

Add support for extended precision floating-point numbers

Add implementations of extended precision floating point numbers.

An extended precision floating point number is a series of floating-point numbers that are non-overlapping such that:

double-double (a, b):

[GSoC] Summary statistics API for Java 8 streams

Placeholder for tasks that could be undertaken in this year's GSoC.

Ideas:

Design an updated summary statistics API for use with Java 8 streams based on the summary statistic implementations in the Commons Math stat.descriptive package including moments, rank and summary sub-packages.

Difficulty: Minor

Project size: ~350 hour (large)

Potential mentors:

Alex Herbert, mail: aherbert (at) apache.org

Project Devs, mail:

Commons Numbers

Add support for extended precision floating-point numbers

Add implementations of extended precision floating point numbers.

An extended precision floating point number is a series of floating-point numbers that are non-overlapping such that:

double-double (a, b):
            |a| > |b|
            a == a + b

Common representations are double-double and quad-double (see for example David Bailey's paper on a quad-double library: QD).

Many computations in the Commons Numbers and Statistics libraries use extended precision computations where the accumulated error of a double would lead to complete cancellation of all significant bits; or create intermediate overflow of integer values.

This project would formalise the code underlying these use cases with a generic library applicable for use in the case where the result is expected to be a finite value and using Java's BigDecimal and/or BigInteger negatively impacts performance.

An example would be the average of long values where the intermediate sum overflows or the conversion to a double loses bits:

            long[] values = {Long.MAX_VALUE, Long.MAX_VALUE};
            System.out.println(Arrays.stream(values).average().getAsDouble()); System.out.println(Arrays.stream(values).mapToObj(BigDecimal::valueOf)
            .reduce(BigDecimal.ZERO, BigDecimal::add)
            .divide(BigDecimal.valueOf(values.length)).doubleValue());
            long[] values2 = {Long.MAX_VALUE, Long.MIN_VALUE};
            System.out.println(Arrays.stream(values2).asDoubleStream().average().getAsDouble()); System.out.println(Arrays.stream(values2).mapToObj(BigDecimal::valueOf)
               .reduce(BigDecimal.ZERO, BigDecimal::add)
            .divide(BigDecimal.valueOf(values2.length)).doubleValue());

Outputs:

-1.0
            9.223372036854776E18
            0.0
            -0.5

Difficulty: Major

Project size: ~175 hour (medium)

Potential mentors:

Alex Herbert, mail: aherbert (at) apache.org

Project Devs, mail: dev (at) commons.apache.org

...

Placeholder for 1.0 release

A placeholder ticket, to link other issues and organize tasks related to the 1.0 release of Commons Imaging.

The 1.0 release of Commons Imaging has been postponed several times. Now we have a more clear idea of what's necessary for the 1.0 (see issues with fixVersion 1.0 and 1.0-alpha3, and other open issues), and the tasks are interesting as it involves both basic and advanced programming for tasks such as organize how test images are loaded, or work on performance improvements at byte level and following image format specifications.

The tasks are not too hard to follow, as normally there are example images that need to work with Imaging, as well as other libraries in C, C++, Rust, PHP, etc., that process these images correctly. Our goal with this issue is to a) improve our docs, b) improve our tests, c) fix possible security issues, d) get the parsers in Commons Imaging ready for the 1.0 release.

Assigning the label for GSoC 2023, and full time. Although it would be possible to work on a smaller set of tasks for 1.0 as a part time too.

Difficulty: Major

Project size: ~350 hour (large)

Potential mentors:

Bruno P. Kinoshita, mail: kinow (at) apache.org

Project Devs, mail:

RocketMQ

[GSoC]

[RocketMQ] The performance tuning of RocketMQ proxy

RocketMQ TieredStore Integration with HDFS

[GSoC] RocketMQ TieredStore Integration with HDFS

Github Issue: https://github.com/apache/rocketmq/issues/6282

Apache RocketMQ and HDFS

Apache RocketMQ

Apache RocketMQ is a
distributed
cloud native messaging and streaming platform
with low latency, high performance and reliability, trillion-level capacity, and flexible scalability.
Page: https://rocketmq.apache.org
Repo: https://github.com/apache/rocketmq

Background

RocketMQ 5.0 has released a new module called `proxy`, which supports gRPC and remoting protocol. Additionally, it can be deployed in two modes, namely Local and Cluster modes. The performance tuning task will provide contributors with a comprehensive understanding of Apache RocketMQ and its intricate data flow, presenting a unique opportunity for beginners to acquaint themselves with and actively participate in our community.

Task

The task is to tune RocketMQ proxy for optimal performance involves latency and throughput. possess a thorough knowledge of Java implementation and possess the ability to fine-tune Netty, gRPC, the operating system, and RocketMQ itself. We anticipate that the developer responsible for this task will provide a performance report about measurements of both latency and throughput.

Relevant Skills

Basic knowledge of RocketMQ 5.0, Netty, gRPC, and operating system.

Mailing List: dev@rocketmq.apache.org^{Image Removed}

Mentor
Zhouxiang Zhan, committer of Apache RocketMQ, zhouxzhan@apache.org^{Image Removed}

Difficulty: Major

Project size: ~175 hour (medium)

Potential mentors:

Zhouxiang Zhan, mail: zhouxzhan (at) apache.org

Project Devs, mail: dev (at) rocketmq.apache.org

GSoC Observability Improvement for RocketMQ Streams

, making it simple to build event-driven applications.

Hadoop Distributed File System (HDFS) is a distributed file system designed to store and manage large data sets across multiple servers or clusters. HDFS provides a reliable, scalable, and fault-tolerant platform for storing and accessing data that can be accessed by a variety of applications running on the hadoop cluster.

Background

High-speed storage media, such as solid-state drives (SSDs), are typically more expensive than traditional hard disk drives (HDDs). To minimize storage costs, the local data disk size of a rocketmq broker is often limited. HDFS can store large amounts of data at a lower cost, it has better support for storing and retrieving data sequentially rather than randomly. In order to preserve message data over a long period or facilitate message export, the RocketMQ project previously introduced a tiered storage plugin. Now it is necessary to implement a storage plugin to save data on hdfs.

Relevant Skills

Interest in messging middleware and distributed storage system

Java development skills

Having a good understanding of rocketmq and hdfs models

Anyways, the most important relevant skill is motivation and readiness to learn during the project!

Tasks

understand the basic concepts and principles in distributed systems

provide related design documents

develop one that uses hdfs as the backend storage plugin to store rocketmq message data

write effective unit test code

*suggest improvements to the tiered storage interface

*what ever comes in your mind further ideas are always welcome

Learning Material

RocketMQ Streams

RocketMQ Streams is a lightweight stream processing framework, application gains the stream processing ability by depending on RocketMQ Streams as an SDK.

Background

Repo of RocketMQ Streams:

RocketMQ HomePage (https://rocketmq.apache.org) Github:

https://github.com/apache/rocketmq

-streams

The architecture document of RocketMQ Streams: RocketMQ Streams examples,

architecture doc

Task

The observability needs to be enhanced in the following aspects:

The metric of client/processor/thread/state/rocksdb;
The topology of streaming process;
Mutli-output of metrics;

This task need you to study the implementation details of RocketMQ Streams streams, and find out the key indicators in the stream processing process. Design and implement a complete set of observability solutions, and finally use it to complete runtime problem diagnosis;

Mentor

nize, Committer of of Apache RocketMQ, karp@apache.org^{Image Removed}

RocketMQ Tiered Storage Design (https://github.com/apache/rocketmq/wiki/RIP-57-Tiered-storage-for-RocketMQ)

HDFS HomePage (https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html)

Name and contact information

Mentor: Zhimin Li, Apache RocketMQ Committer, lizhimin@apache.org^{Image Added}

Mailing List: dev@rocketmq.apache.org^{Image Added}

Website: https://rocketmq.apache.org/ and https://hadoop.apache.org/

Difficulty: Major

Project size: ~350 ~175 hour (largemedium)

Potential mentors:

Ni ZeZhimin Li, mail: karp lizhimin (at) apache.org

Project Devs, mail: dev (at) rocketmq.apache.org

RocketMQ DLedger Controller Performance Optimization

Apache

GSoC Make RocketMQ support higher versions of Java

Apache

RocketMQ

Apache RocketMQ is a distributed messaging and streaming platform with low latency, high performance and reliability, trillion-level capacity, and flexible scalability.
Page: https://rocketmq.apache.org
GithubRepo: https://github.com/apache/rocketmq

Background

RocketMQ is a widely used message middleware system in the Java community, which mainly supports Java8. As Java has evolved many new features and improvements have been added to the language and the Java Virtual Machine (JVM). However, RocketMQ still lacks compatibility with the latest Java versions, preventing users from taking advantage of new features and performance improvements. Therefore, we are seeking community support to upgrade RocketMQ to support higher versions of Java and enable the use of new features and JVM parameters.

Task

We aim to update the RocketMQ codebase to support newer versions of Java in a cross-compile manner. The goal is to enable RocketMQ to work with Java17, while maintaining backward compatibility with previous versions of Java. This will involve identifying and updating any dependencies that need to be changed to support the new Java versions, as well as testing and verifying that the new version of RocketMQ works correctly. With these updates, users will be able to take advantage of the latest Java features and performance improvements. We hope that the community can come together to support this task and make RocketMQ a more versatile and powerful middleware system.

Relevant Skills

Java language
Having a good understanding of the new features in higher versions of Java, particularly LTS versions.

Mentor

Yangkun Ai, PMC of Apache RocketMQ, aaronai@apache.org^{Image Removed}

Difficulty: Major

Project size: ~175 hour (medium)

Potential mentors:

Yangkun Ai, mail: aaronai (at) apache.org

Project Devs, mail: dev (at) rocketmq.apache.org

GSoC Integrate RocketMQ 5.0 client with Spring

5.0 introduced a new component, the controller, which controls the high availability master-slave switch in multi-replica scenarios. It uses the DLedger Raft library as a consensus replication state machine for metadata. As a completely independent component, it can run normally in some scenarios, but in large-scale clusters, it is necessary to maintain a large number of broker groups, which is a great challenge for operational capabilities and resource waste. When dealing with a large number of Broker groups, we need to optimize performance in large-scale scenarios, leveraging the high-performance writing of DLedger itself and performing some optimization for the current Controller architecture.

Task

1. Polish the usage of DLedger

Currently, on the Controller side, a task queue single thread is used for requesting reads and writes to DLedger, that is, only one read/write request can be processed at a time. However, DLedger itself implements many optimizations for multi-client reads and writes and can ensure linear consistency reading. However, now all read and write processing is performed using a single logical DLedger client, which will become a serious performance bottleneck in large-scale scenarios.

2. Optimization of DLedger features usage

DLedger itself can implement many optimizations, such as ReadIndex read and FollowerRead read. After implementation, we can fully leverage the performance of reads. Currently, all Broker nodes communicate with the Leader node of the Controller. In large-scale scenarios, this will cause the requests of each Controller group to be concentrated on the Leader node, and the other Follower nodes will not share the request processing of the Leader, which will cause single-point performance bottlenecks for the Leader.

3. Full asynchronous + parallel processing

Currently, DLedger itself is fully asynchronous, but on the Controller side, all requests for the DLedger side are synchronized, and many Controller-side operations are performed synchronously in a single thread, such as heartbeat checks and other timed tasks. In large-scale scenarios, the logic of these single-threaded synchronous operations will block a large number of requests from Broker-side, so asynchronous + parallel processing can be used for optimization.

4. Correctness testing and performance testing

After completing the above optimizations, it is necessary to conduct correctness testing on the new version and use distributed chaos testing frameworks such as OpenChaos to verify correct operation under fault scenarios such as network partition and random crashes.
After completing the correctness testing, a detailed performance testing report can be produced by comparing the new and old versions.

Skills Required

Strong interest in message middleware and distributed storage systems
Proficient in Java development
In-depth understanding of distributed consensus algorithms
In-depth understanding of the high-availability module of RockeetMQ and the DLedger library
Understanding of distributed chaos testing and performance testing.

Apache RocketMQ

Apache RocketMQ is a distributed messaging and streaming platform with low latency, high performance and reliability, trillion-level capacity and flexible scalability.

Page: https://rocketmq.apache.org
Github: https://github.com/apache/rocketmq

Background

RocketMQ 5.0 client has been released recently, we need to integrate it with Spring.

Task

Familiar with RocketMQ 5.0 java client usage, you could see more details from https://github.com/apache/rocketmq-clients/tree/master/java and https://rocketmq.apache.org/docs/quickStart/01quickstart
Integrate with Spring.

Relevant Skills

Java language
Basic knowledge of RocketMQ 5.0
Spring

Mentor

Rongtong Jin, PMC of Apache RocketMQ, jinrongtong@apache.org^{Image Removed}

Yangkun Ai, PMC of Apache RocketMQ, aaronai@apache.org^{Image Removed}

Difficulty: Major

Project size: ~175 ~350 hour (mediumlarge)

Potential mentors:

Yangkun AiRongtong Jin, mail: aaronai jinrongtong (at) apache.org

Project Devs, mail: dev (at) rocketmq.apache.org

RocketMQ TieredStore Integration with High Availability Architecture

GSoC Implement python client for RocketMQ 5.0

Apache RocketMQ

{}

Apache RocketMQ is a distributed messaging and streaming platform with low latency, high performance and reliability, trillion-level capacity and flexible scalability.
Page: https://rocketmq.apache.org

Background

With the official release of RocketMQ 5.1.0, tiered storage has arrived as a new independent module in the Technical Preview milestone. This allows users to unload messages from local disks to other cheaper storage, extending message retention time at a lower cost.0 has released various language clients including Java, CPP, and Golang, to cover all major programming languages, a Python client needs to be implemented.

Related RepoReference RIP-57: https://github.com/apache/rocketmq/wiki/RIP-57-Tiered-storage-for-RocketMQ

In addition, RocketMQ introduced a new high availability architecture in version 5.0.

Reference RIP-44: https://github.com/apache/rocketmq/wiki/RIP-44-Support-DLedger-Controller

However, currently RocketMQ tiered storage only supports single replicas.

Task

Currently, tiered storage only supports single replicas, and there are still the following issues in the integration with the high availability architecture:

Metadata synchronization: how to reliably synchronize metadata between master and slave nodes.
Disallowing message uploads beyond the confirm offset: to avoid message rollback, the maximum uploaded offset cannot exceed the confirm offset.
Starting multi-tier storage upload when the slave changes to master, and stopping tiered storage upload when the master becomes the slave: only the master node has write and delete permissions, and after the slave node is promoted, it needs to quickly resume tiered storage breakpoint resumption.
Design of slave pull protocol: how a newly launched empty slave can properly synchronize data through the tiered storage architecture. (If synchronization is performed based on the first or last file, resumption of breakpoints may not be possible when switching again).

So you need to provide a complete plan to solve the above issues and ultimately complete the integration of tiered storage and high availability architecture, while verifying it through the existing tiered storage file version and OpenChaos testing.

Relevant Skills

Interest in messaging middleware and distributed storage systems
Java development skills
Having a good understanding of RocketMQ tiered storage and high availability architecture

Difficulty: Major

Project size: ~350 hour (large)

Potential mentors:

Rongtong Jin, mail: jinrongtong (at) apache.org

Project Devs, mail: dev (at) rocketmq.apache.org

-clients

Task

The developer is required to be familiar with the Java implementation and capable of developing a Python client, while ensuring consistent functionality and semantics.

Relevant Skills
Python language
Basic knowledge of RocketMQ 5.0

Mentor

Yangkun Ai, PMC of Apache RocketMQ, aaronai@apache.org

Difficulty: Major

Project size: ~350 hour (large)

Potential mentors:

Yangkun Ai, mail: aaronai (at) apache.org

Project Devs, mail: dev (at) rocketmq.apache.org

RocketMQ TieredStore Integration with High Availability Architecture

Apache RocketMQ{}

Apache RocketMQ is a distributed messaging and streaming platform with low latency, high performance and reliability, trillion-level capacity and flexible scalability.

Page: https://rocketmq.apache.org

Background

With the official release of RocketMQ 5.1.0, tiered storage has arrived as a new independent module in the Technical Preview milestone. This allows users to unload messages from local disks to other cheaper storage, extending message retention time at a lower cost.

Reference RIP-57: https://github.com/apache/rocketmq/wiki/RIP-57-Tiered-storage-for-RocketMQ

In addition, RocketMQ introduced a new high availability architecture in version 5.0.

Reference RIP-44: https://github.com/apache/rocketmq/wiki/RIP-44-Support-DLedger-Controller

However, currently RocketMQ tiered storage only supports single replicas.

Task

Currently, tiered storage only supports single replicas, and there are still the following issues in the integration with the high availability architecture:

Metadata synchronization: how to reliably synchronize metadata between master and slave nodes.
Disallowing message uploads beyond the confirm offset: to avoid message rollback, the maximum uploaded offset cannot exceed the confirm offset.
Starting multi-tier storage upload when the slave changes to master, and stopping tiered storage upload when the master becomes the slave: only the master node has write and delete permissions, and after the slave node is promoted, it needs to quickly resume tiered storage breakpoint resumption.
Design of slave pull protocol: how a newly launched empty slave can properly synchronize data through the tiered storage architecture. (If synchronization is performed based on the first or last file, resumption of breakpoints may not be possible when switching again).

So you need to provide a complete plan to solve the above issues and ultimately complete the integration of tiered storage and high availability architecture, while verifying it through the existing tiered storage file version and OpenChaos testing.

Relevant Skills

Interest in messaging middleware and distributed storage systems
Java development skills
Having a good understanding of RocketMQ tiered storage and high availability architecture

Difficulty: Major

Project size: ~350 hour (large)

Potential mentors:

Rongtong Jin, mail: jinrongtong (at) apache.org

Project Devs, mail: dev (at) rocketmq.apache.org

[GSoC] [RocketMQ] The performance tuning of RocketMQ proxy

Apache RocketMQ

Apache RocketMQ is a distributed messaging and streaming platform with low latency, high performance and reliability, trillion-level capacity, and flexible scalability.

Page: https://rocketmq.apache.org
Repo: https://github.com/apache/rocketmq

Background

RocketMQ 5.0 has released a new module called `proxy`, which supports gRPC and remoting protocol. Additionally, it can be deployed in two modes, namely Local and Cluster modes. The performance tuning task will provide contributors with a comprehensive understanding of Apache RocketMQ and its intricate data flow, presenting a unique opportunity for beginners to acquaint themselves with and actively participate in our community.

Task

The task is to tune RocketMQ proxy for optimal performance involves latency and throughput. possess a thorough knowledge of Java implementation and possess the ability to fine-tune Netty, gRPC, the operating system, and RocketMQ itself. We anticipate that the developer responsible for this task will provide a performance report about measurements of both latency and throughput.

Relevant Skills

Basic knowledge of RocketMQ 5.0, Netty, gRPC, and operating system.

Mailing List: dev@rocketmq.apache.org^{Image Added}

Mentor
Zhouxiang Zhan, committer of Apache RocketMQ, zhouxzhan@apache.org^{Image Added}

Difficulty: Major

Project size: ~175 hour (medium)

Potential mentors:

Zhouxiang Zhan, mail: zhouxzhan

RocketMQ DLedger Controller Performance Optimization

Apache RocketMQ

Apache RocketMQ is a distributed messaging and streaming platform with low latency, high performance and reliability, trillion-level capacity, and flexible scalability.
Page: https://rocketmq.apache.org
Repo: https://github.com/apache/rocketmq

Background

RocketMQ 5.0 introduced a new component, the controller, which controls the high availability master-slave switch in multi-replica scenarios. It uses the DLedger Raft library as a consensus replication state machine for metadata. As a completely independent component, it can run normally in some scenarios, but in large-scale clusters, it is necessary to maintain a large number of broker groups, which is a great challenge for operational capabilities and resource waste. When dealing with a large number of Broker groups, we need to optimize performance in large-scale scenarios, leveraging the high-performance writing of DLedger itself and performing some optimization for the current Controller architecture.

Task

1. Polish the usage of DLedger

Currently, on the Controller side, a task queue single thread is used for requesting reads and writes to DLedger, that is, only one read/write request can be processed at a time. However, DLedger itself implements many optimizations for multi-client reads and writes and can ensure linear consistency reading. However, now all read and write processing is performed using a single logical DLedger client, which will become a serious performance bottleneck in large-scale scenarios.

2. Optimization of DLedger features usage

DLedger itself can implement many optimizations, such as ReadIndex read and FollowerRead read. After implementation, we can fully leverage the performance of reads. Currently, all Broker nodes communicate with the Leader node of the Controller. In large-scale scenarios, this will cause the requests of each Controller group to be concentrated on the Leader node, and the other Follower nodes will not share the request processing of the Leader, which will cause single-point performance bottlenecks for the Leader.

3. Full asynchronous + parallel processing

Currently, DLedger itself is fully asynchronous, but on the Controller side, all requests for the DLedger side are synchronized, and many Controller-side operations are performed synchronously in a single thread, such as heartbeat checks and other timed tasks. In large-scale scenarios, the logic of these single-threaded synchronous operations will block a large number of requests from Broker-side, so asynchronous + parallel processing can be used for optimization.

4. Correctness testing and performance testing

After completing the above optimizations, it is necessary to conduct correctness testing on the new version and use distributed chaos testing frameworks such as OpenChaos to verify correct operation under fault scenarios such as network partition and random crashes.
After completing the correctness testing, a detailed performance testing report can be produced by comparing the new and old versions.

Skills Required

Strong interest in message middleware and distributed storage systems
Proficient in Java development
In-depth understanding of distributed consensus algorithms
In-depth understanding of the high-availability module of RockeetMQ and the DLedger library
Understanding of distributed chaos testing and performance testing.

Difficulty: Major

Project size: ~350 hour (large)

Potential mentors:

Rongtong Jin, mail: jinrongtong (at) apache.org

Project Devs, mail: dev (at) rocketmq.apache.org

GSoC

Implement python client

Observability Improvement for RocketMQ

5.0

Streams

Apache

RocketMQ Streams

Apache RocketMQ Streams is a distributed messaging and streaming platform with low latency, high performance and reliability, trillion-level capacity and flexible scalability.

Page: https://rocketmq.apache.org

Background

RocketMQ 5.0 has released various language clients including Java, CPP, and Golang, to cover all major programming languages, a Python client needs to be implemented.

lightweight stream processing framework, application gains the stream processing ability by depending on RocketMQ Streams as an SDK.

Background

Repo of RocketMQ Streams: Related Repo: https://github.com/apache/rocketmq-clientsstreams
The architecture document of RocketMQ Streams: RocketMQ Streams examples,

architecture doc

Task

The developer is required observability needs to be familiar with the Java implementation and capable of developing a Python client, while ensuring consistent functionality and semantics.

Relevant Skills
Python language
Basic knowledge of RocketMQ 5.0

Mentor

Yangkun Ai, PMC of Apache RocketMQ, aaronai@apache.org

enhanced in the following aspects:

The metric of client/processor/thread/state/rocksdb;
The topology of streaming process;
Mutli-output of metrics;

This task need you to study the implementation details of RocketMQ Streams streams, and find out the key indicators in the stream processing process. Design and implement a complete set of observability solutions, and finally use it to complete runtime problem diagnosis;

Mentor

nize, Committer of of Apache RocketMQ, karp@apache.org^{Image Added}

Difficulty: Major

Project size: ~350 hour (large)

Potential mentors:

Yangkun AiNi Ze, mail: aaronai karp (at) apache.org

Project Devs, mail: dev (at) rocketmq.apache.org

GSoC Make RocketMQ support higher versions of Java

Apache RocketMQ

Apache RocketMQ is a distributed messaging and streaming platform with low latency, high performance and reliability, trillion-level capacity and flexible scalability.

Page: https://rocketmq.apache.org
Github

[GSoC] RocketMQ TieredStore Integration with HDFS

Github Issue: https://github.com/apache/rocketmq/issues/6282

Apache RocketMQ and HDFS

Apache RocketMQ is a cloud native messaging and streaming platform, making it simple to build event-driven applications.

Hadoop Distributed File System (HDFS) is a distributed file system designed to store and manage large data sets across multiple servers or clusters. HDFS provides a reliable, scalable, and fault-tolerant platform for storing and accessing data that can be accessed by a variety of applications running on the hadoop cluster.

Background

High-speed storage media, such as solid-state drives (SSDs), are typically more expensive than traditional hard disk drives (HDDs). To minimize storage costs, the local data disk size of a rocketmq broker is often limited. HDFS can store large amounts of data at a lower cost, it has better support for storing and retrieving data sequentially rather than randomly. In order to preserve message data over a long period or facilitate message export, the RocketMQ project previously introduced a tiered storage plugin. Now it is necessary to implement a storage plugin to save data on hdfs.

Relevant Skills

Interest in messging middleware and distributed storage system

Java development skills

Having a good understanding of rocketmq and hdfs models

Anyways, the most important relevant skill is motivation and readiness to learn during the project!

Tasks

understand the basic concepts and principles in distributed systems

provide related design documents

develop one that uses hdfs as the backend storage plugin to store rocketmq message data

write effective unit test code

*suggest improvements to the tiered storage interface

*what ever comes in your mind further ideas are always welcome

Learning Material

RocketMQ HomePage (https://rocketmq.apache.org) Github: https://github.com/apache/rocketmq

RocketMQ Tiered Storage Design (https://github.com/apache/rocketmq/wiki/RIP-57-Tiered-storage-for-RocketMQ)

HDFS HomePage (https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html)

Name and contact information

Mentor: Zhimin Li, Apache RocketMQ Committer, lizhimin@apache.org^{Image Removed}

Mailing List: dev@rocketmq.apache.org^{Image Removed}

Background

RocketMQ is a widely used message middleware system in the Java community, which mainly supports Java8. As Java has evolved many new features and improvements have been added to the language and the Java Virtual Machine (JVM). However, RocketMQ still lacks compatibility with the latest Java versions, preventing users from taking advantage of new features and performance improvements. Therefore, we are seeking community support to upgrade RocketMQ to support higher versions of Java and enable the use of new features and JVM parameters.

Task

We aim to update the RocketMQ codebase to support newer versions of Java in a cross-compile manner. The goal is to enable RocketMQ to work with Java17, while maintaining backward compatibility with previous versions of Java. This will involve identifying and updating any dependencies that need to be changed to support the new Java versions, as well as testing and verifying that the new version of RocketMQ works correctly. With these updates, users will be able to take advantage of the latest Java features and performance improvements. We hope that the community can come together to support this task and make RocketMQ a more versatile and powerful middleware system.

Relevant Skills

Java language
Having a good understanding of the new features in higher versions of Java, particularly LTS versions.

Mentor

Yangkun Ai, PMC of Apache RocketMQ, aaronai@apache.org^{Image Added}

Difficulty: Major

Project size: ~175 hour (medium)

Potential mentors:

Yangkun Ai, mail: aaronai (at) apache.org

Project Devs, mail: dev (at) rocketmq.apache.org

GSoC Integrate RocketMQ 5.0 client with Spring

Apache RocketMQ

Apache RocketMQ is a distributed messaging and streaming platform with low latency, high performance and reliability, trillion-level capacity and flexible scalability.

Website:

Page:

and

https://rocketmq.apache.org

/

Github:

https://

hadoop

github.com/apache

.org

/rocketmq

Background

RocketMQ 5.0 client has been released recently, we need to integrate it with Spring.

Task

Familiar with RocketMQ 5.0 java client usage, you could see more details from https://github.com/apache/rocketmq-clients/tree/master/java and https://rocketmq.apache.org/docs/quickStart/01quickstart
Integrate with Spring.

Relevant Skills

Java language
Basic knowledge of RocketMQ 5.0
Spring

Mentor

Rongtong Jin, PMC of Apache RocketMQ, jinrongtong@apache.org^{Image Added}

Yangkun Ai, PMC of Apache RocketMQ, aaronai@apache.org^{Image Added}

Difficulty: Major

Project size: ~175 hour (medium)

Potential mentors:

Zhimin LiYangkun Ai, mail: lizhimin aaronai (at) apache.org

Project Devs, mail: dev (at) rocketmq.apache.org

Doris

EventMesh

Apache EventMesh Integrate eventmesh runtime on Kubernetes

Apache EventMesh (incubating)
Apache EventMesh is a fully serverless platform used to build distributed event-driven applications.

Website

[GSoC][Doris]Page Cache Improvement

Apache Doris
Apache Doris is a real-time analytical database based on MPP architecture. As a unified platform that supports multiple data processing scenarios, it ensures high performance for low-latency and high-throughput queries, allows for easy federated queries on data lakes, and supports various data ingestion methods.
Page: https://doriseventmesh.apache.org

GitHub: https://github.com/apache/incubator-eventmesh

Upstream IssueGithub: https://github.com/apache/doris

Background

Apache Doris accelerates high-concurrency queries utilizing page cache, where the decompressed data is stored.
Currently, the page cache in Apache Doris uses a simple LRU algorithm, which reveals a few problems:

Hot data will be phased out in large queries
The page cache configuration is immutable and does not support GC.

Task

Phase One: Identify the impacts on queries when the decompressed data is stored in memory and SSD, respectively, and then determine whether full page cache is required.

Phase Two: Improve the cache strategy for Apache Doris based on the results from Phase One.

Learning Material

/incubator-eventmesh/issues/3327

Background

Currently, EventMesh has good usability in microservice scenarios. However, EventMesh's support for Kubernetes is still relatively weak.We hope the community can contribute EventMesh integration with the k8s.

Task

1.Discuss with the mentors your implementation idea

2. Learn the details of the Apache EventMesh project

3. Integrate EventMesh with the k8s

Recommended Skills

1.Familiar with Java

2.Familiar with Kubernetes

Mentor
Eason Chen, PPMC of Apache EventMesh, https://github.com/qqeasonchen, chenguangsheng@apache.org

Mike Xue, PPMC of Apache EventMesh, Page: https://doris.apache.org
Github: https://github.com/apache/doris

Mentor

Mentor: Yongqiang Yang, Apache Doris PMC member & Committer, yangyongqiang@apache.org ^{Image Removed}
Mentor: Haopeng Li, Apache Doris PMC member & Committer, lihaopeng@apache.org^{Image Removed}
Mailing List: dev@doris.apache.org

xwm1992, mikexue@apache.org

Difficulty: Major

Project size: ~350 hour (large)

Potential mentors:

Zhijing LuXue Weiming, mail: luzhijing mikexue (at) apache.org

Project Devs, mail: dev (at) doriseventmesh.apache.org

[GSoC][Doris] Supports BigQuery/Apache Kudu/Apache Cassandra/Apache Druid in Federated Queries

Apache EventMesh Optimize the event-bridge on EventMesh

Apache EventMesh (incubating)
Apache EventMesh is a fully serverless platform used to build distributed event-driven applications.

WebsiteApache Doris
Apache Doris is a real-time analytical database based on MPP architecture. As a unified platform that supports multiple data processing scenarios, it ensures high performance for low-latency and high-throughput queries, allows for easy federated queries on data lakes, and supports various data ingestion methods.
Page: https://doriseventmesh.apache.orgGithub

GitHub: https://github.com/apache/doris

Background

Apache Doris supports acceleration of queries on external data sources to meet users' needs for federated queries and analysis.
Currently, Apache Doris supports multiple external catalogs including those from Hive, Iceberg, Hudi, and JDBC. Developers can connect more data sources to Apache Doris based on a unified framework.

Objective

Enable Apache Doris to access one or more of these data sources via the Multi-Catalog feature: BigQuery/Kudu/Cassandra/Druid;
Compile relevant documentation. See an example here: https://doris.apache.org/docs/dev/lakehouse/multi-catalog/hive

Task
Phase One:

Get familiar with the Multi-Catalog structure of Apache Doris, including the metadata synchronization mechanism in FE and the data reading mechanism of BE.
Investigate how metadata should be acquired and how data access works regarding the picked data source(s); produce the corresponding design documentation.

Phase Two:

Develop connections to the picked data source(s) and implement access to metadata and data.

Learning Material

Page: https://doris.apache.org
Github: https://github.com/apache/doris

Mentor

Mentor: Mingyu Chen, Apache Doris PMC Member & Committer, morningman@apache.org ^{Image Removed}

Mentor: Calvin Kirs, Apache Geode PMC & Committer, Kirs@apache.org^{Image Removed}

incubator-eventmesh

Upstream Issue: https://github.com/apache/incubator-eventmesh/issues/3494

Background

Through eventmesh‘s event bridge feature, we can connect data to heterogeneous data storage, we hope that the community can optimize the current eventbridge capability of EventMesh to realize the data connection of different event stores.

Task

1. Discuss with the mentors what you need to do

2. Learn the details of the Apache EventMesh project

3. Verify the ability of different EventMesh cluster instances to synchronize data, sort out the corresponding verification step documents, and optimize the current EventMesh bridge features

Recommended Skills

1. Familiar with Java

2. Familiar with MQ is better

Mentor

Eason Chen, PPMC of Apache EventMesh, https://github.com/qqeasonchen, chenguangsheng@apache.org

Mike Xue, PPMC of Apache EventMesh, https://github.com/xwm1992, mikexue@apache.

Mailing List: dev@doris.apache.

org

Difficulty: Major

Project size: ~350 hour (large)

Potential mentors:

Zhijing LuXue Weiming, mail: luzhijing mikexue (at) apache.org

Project Devs, mail: dev (at) doriseventmesh.apache.org

Apache EventMesh EventMesh official website docs by version and demo show

Apache EventMesh (incubating)
Apache EventMesh is a fully serverless platform used to build distributed event-driven applications.

Website

[GSoC][Doris]Dictionary Encoding Acceleration

Apache Doris
Apache Doris is a real-time analytical database based on MPP architecture. As a unified platform that supports multiple data processing scenarios, it ensures high performance for low-latency and high-throughput queries, allows for easy federated queries on data lakes, and supports various data ingestion methods.
Page: https://doriseventmesh.apache.org

GithubGitHub: https://github.com/apache/doris

Background

In Apache Doris, dictionary encoding is performed during data writing and compaction. Dictionary encoding will be implemented on string data types by default. The dictionary size of a column for one segment is 1M at most. The dictionary encoding technology accelerates strings during queries, converting them into INT, for example.

Task

Phase One: Get familiar with the implementation of Apache Doris dictionary encoding; learning how Apache Doris dictionary encoding accelerates queries.
Phase Two: Evaluate the effectiveness of full dictionary encoding and figure out how to optimize memory in such a case.

Learning Material

incubator-eventmesh

Upstream Issue: https://github.com/apache/incubator-eventmesh/issues/3488

Background

We hope that the community can contribute to the maintenance of documents, including the archiving of Chinese and English content of documents of different release versions, the maintenance of official website documents, the improvement of project quick start documents, feature introduction, etc.

Task

1.Discuss with the mentors what you need to do

2. Learn the details of the Apache EventMesh project

3. Improve and supplement the content of documents on GitHub, maintain official website documents, record eventmesh quick user experience, and feature display videos

Recommended Skills

1.Familiar with MarkDown

2. Familiar with Java\Go

Mentor
Eason Chen, PPMC of Apache EventMesh, Page: https://doris.apache.org
Github: https://github.com/apache/doris

Mentor

Mentor: Chen Zhang, Apache Doris Committer, zhangchen@apache.org ^{Image Removed}

Mentor: Zhijing Lu, Apache Doris Committer, luzhijing@apache.org^{Image Removed}

qqeasonchen, chenguangsheng@apache.org

Mike Xue, PPMC of Apache EventMesh, https://github.com/xwm1992, mikexue@apache

Mailing List: dev@doris.apache

.org

Difficulty: Major

Project size: ~350 hour (large)

Potential mentors:

Zhijing LuXue Weiming, mail: luzhijing mikexue (at) apache.org

Project Devs, mail: dev (at) doriseventmesh.apache.org

SkyWalking

[GSOC] [SkyWalking] Self-Observability of the query subsystem in BanyanDB

Background

SkyWalking BanyanDB is an observability database, aims to ingest, analyze and store Metrics, Tracing and Logging data.

Objectives

Support EXPLAIN[1] for both measure query and stream query
Add self-observability including trace and metrics for query subsystem
Support EXPLAIN in the client SDK & CLI and add query plan visualization in the UI

[1]: EXPLAIN in MySQL

Recommended Skills

Familiar with Go
Have a basic understanding of database query engine
Have an experience of Apache SkyWalking or other APMs

Mentor

Mentor: Jiajing Lu, Apache SkyWalking PMC, lujiajing@apache.org^{Image Removed}
Mentor: Hongtao Gao, Apache SkyWalking PMC, Apache ShardingSphere PMC, hanahmily@apache.org^{Image Removed}
Mailing List: dev@skywalking.apache.org

Difficulty: Major

Project size: ~350 hour (large)

Potential mentors:

Jiajing Lu, mail: lujiajing (at) apache.org

Project Devs, mail: dev (at) skywalking.apache.org

Apache EventMesh Optimize the eventmesh-admin of EventMesh

Apache EventMesh (incubating)
Apache EventMesh is a fully serverless platform used to build distributed event-driven applications.

Website: https://eventmesh.apache.org

GitHub: https://github.com/apache/incubator-eventmesh

Upstream Issue: https://github.com/apache/incubator-eventmesh/issues/3495

Background

At present, eventmesh-admin provides a management interface for eventmesh storage, but it only implements the management function of rocketmq, which needs to be further expanded. At the same time, it can provide CLI for users to quickly start and experience.

Task

1. Discuss with the mentors what you need to do

2. Learn the details of the Apache EventMesh project

3. Implement the management interface for other eventmesh storage

4. Support the CLI for quick start the eventmesh

Recommended Skills

1. Familiar with Java

2. Familiar with MQ is better

Mentor

Eason Chen, PPMC of Apache EventMesh, https://github.com/qqeasonchen, chenguangsheng@apache.org

Mike Xue, PPMC of Apache EventMesh, https://github.com/xwm1992, mikexue@apache

[GSOC] [SkyWalking] Unify query planner and executor in BanyanDB

Background

SkyWalking BanyanDB is an observability database, aims to ingest, analyze and store Metrics, Tracing and Logging data.

Objectives

Fully unify/merge the query planner and executor for Measure and TopN

Recommended Skills

Familiar with Go
Have a basic understanding of database query engine
Have an experience of Apache SkyWalking

Mentor

Mentor: Jiajing Lu, Apache SkyWalking PMC, lujiajing@apache.org^{Image Removed}

Mentor: Hongtao Gao, Apache SkyWalking PMC, Apache ShardingSphere PMC, hanahmily@apache.org^{Image Removed}

Mailing List: dev@skywalking.apache

.org

Difficulty: Major

Project size: ~175 ~350 hour (mediumlarge)

Potential mentors:

Jiajing LuXue Weiming, mail: lujiajing mikexue (at) apache.org

Project Devs, mail: dev (at) skywalkingeventmesh.apache.org

[GSOC][SkyWalking] Add Terraform provider for Apache SkyWalking

Apache EventMesh Support source/sink connectors on EventMesh

Apache EventMesh (incubating)
Apache EventMesh is a fully serverless platform used to build distributed event-driven applications.

Website: https://eventmesh.apache.org

GitHub: https://github.com/apache/incubator-eventmesh

Upstream Issue: https://github.com/apache/incubator-eventmesh/issues/3492

Background

Through eventmesh's source/sink connector, we can connect data to heterogeneous data storage, we hope that the community can provide source/sink connector capabilities, such as connecting rocketmq data to different rocketmq clusters or kafka clusters

Task

1. Discuss with the mentors what you need to do

2. Learn the details of the Apache EventMesh project

3. Implement one of the source/sink connector based on the above background

Recommended Skills

1. Familiar with Java

2. Familiar with MQ is better

Mentor
Eason Chen, PPMC of Apache EventMesh, https://github.com/qqeasonchen, chenguangsheng@apache.org

Mike Xue, PPMC of Apache EventMesh, https://github.com/xwm1992, mikexue@apache.org

Now the deployment methods for SkyWalking are limited, we only have Helm Chart for users to deploy in Kubernetes, other users that are not using Kubernetes have to do all the house keeping stuffs to set up SkyWalking on, for example, VM.

This issue aims to add a Terraform provider, so that users can conveniently spin up a cluster for demonstration or testing, we should evolve the provider and allow users to customize as their need and finally users can use this in their production environment.

In this task, we will mainly focus on the support for AWS. In the Terraform provider, users need to provide their access key / secret key, and the provider does the rest stuffs: create VMs, create database/OpenSearch or RDS, download SkyWalking tars, configure the SkyWalking, and start the SkyWalking components (OAP/UI), create public IPs/domain name, etc.

Difficulty: Major

Project size: ~350 hour (large)

Potential mentors:

Zhenxu KeXue Weiming, mail: kezhenxu94 mikexue (at) apache.org

Project Devs, mail: dev (at) skywalkingeventmesh.apache.org

[GSOC] [SkyWalking] AIOps Log clustering with Flink (Flink Integration)

Apache SkyWalking is an application performance monitor tool for distributed systems, especially designed for microservices, cloud native and container-based (Kubernetes) architectures. This year we will proceed on log clustering implementation with a revised architecture and this task will require student to focus on Flink and its integration with SkyWalking OAP.

Difficulty: Major

Project size: ~350 hour (large)

Potential mentors:

Yihao Chen, mail: yihaochen (at) apache.org

Project Devs, mail: dev (at) skywalking.apache.org

[GSOC] [SkyWalking] AIOps Log clustering with Flink (Algorithm Optimization)

Apache SkyWalking is an application performance monitor tool for distributed systems, especially designed for microservices, cloud native and container-based (Kubernetes) architectures. This year we will proceed on log clustering implementation with a revised architecture and this task will require student to focus on algorithm optimiztion for the clustering technique.

Difficulty: Major

Project size: ~350 hour (large)

Potential mentors:

Yihao Chen, mail: yihaochen (at) apache.org

Project Devs, mail: dev (at) skywalking.apache.org

StreamPipes

Improving End-to-End Test Infrastructure of Apache StreamPipes

Apache StreamPipes

Apache StreamPipes (incubating) is a self-service (Industrial) IoT toolbox to enable non-technical users to connect, analyze and explore IoT data streams. StreamPipes offers several modules including StreamPipes Connect to easily connect data from industrial IoT sources, the Pipeline Editor to quickly create processing pipelines and several visualization modules for live and historic data exploration. Under the hood, StreamPipes utilizes an event-driven microservice paradigm of standalone, so-called analytics microservices making the system easy to extend for individual needs.

Background

StreamPipes has grown significantly over the past few years, with new features and contributors joining the project. However, as the project continues to evolve, e2e test coverage must also be improved to ensure that all features remain functional. Modern frameworks, such as Cypress, make it quite easy and fun to automatically test even complex application functionalities. As StreamPipes approaches its 1.0 release, it is important to improve e2e testing to ensure the robustness of the project and its use in real-world scenarios.

Tasks

[ ] Write e2e tests using Cypress to cover most functionalities and user interface components of StreamPipes.
[ ] Add more complex testing scenarios to ensure the reliability and robustness of StreamPipes in real-world use cases (e.g. automated tests for version updates)
[ ] Add e2e tests for the new Python client to ensure its integration with the main system and its functionalities ([#774 | https

[GSOC] [SkyWalking] Python Agent Performance Enhancement Plan

Apache SkyWalking is an application performance monitor tool for distributed systems, especially designed for microservices, cloud native and container-based (Kubernetes) architectures. This task is about enhancing Python agent performance, the tracking issue can be seen here -< https

://github.com/apache/

skywalking

streampipes/issues/

10408

Difficulty: Major

Project size: ~350 hour (large)

Potential mentors:

Yihao Chen, mail: yihaochen (at) apache.org

Project Devs, mail: dev (at) skywalking.apache.org

ShenYu

Apache ShenYu Gsoc 2023 - Design license scanning function

Background

At present, shenyu needs to manually check whether the license is correct one by one when releasing the version.

Tasks

Discuss with the tutor to complete the requirement design and technical design of the scanning license.
Finished scanning the initial version of the license.
Complete the corresponding test.

Relevant Skills

Familiar with Java.

Difficulty: Major

Project size: ~175 hour (medium)

Potential mentors:

SiYing Zheng, mail: impactcn (at) apache.org

Project Devs, mail: dev (at) shenyu.apache.org

Apache ShenYu Gsoc 2023 - Shenyu-Admin Internationalization

Background

Shenyu is a native API gateway for service proxy, protocol translation and API governance. It can manage and maintain the API through Shenyu-admin, and support internationalization in Chinese and English. Unfortunately, Shenyu-admin is only internationalized on the front end. The message prompt returned by the back-end interface is still in English. Therefore, we need to implement internationalization support for the back-end interface.This will lay a good foundation for shenyu to move towards more language support.

Relevant skills

Related skills spring resources
Spring Internationalization
Front-end react framework

API reference

            java.util.Locale;
            org.springframework.context.MessageSource;
            org.springframework.context.support.ResourceBundleMessageSource;

Interface effect example

            ## zh request example
            POST http://localhost:9095/plugin
            Content-Type: application/json
            Location: cn-zh
            X-Access-Token: xxx
            {
            "name": "test-create-plugin",
            "role": "test-create-plugin",
            "enabled": true,
            "sort": 100
            }
            Respone
            {
            "code": 600,
            "message": "未登录"
            }
            
            ### en request example
            POST http://localhost:9095/plugin
            Content-Type: application/json
            Location: en
            X-Access-Token: xxx
            {
            "name": "test-create-plugin",
            "role": "test-create-plugin",
            "enabled": true,
            "sort": 100
            }
            Respone
            {
            "code": 600,
            "message": "token is error"
            }

Task List

The task discussed with the tutor how to achieve the internationalization of shenyu-admin background
Some prompt message translation
Get through the internationalization of front-end, obtain the client region information through http protocol, support the language of the corresponding region.
Leave the extension of other multi-language internationalization support interface, so as to facilitate the localization transformation of subsequent users.

Difficulty: Major

Project size: ~350 hour (large)

Potential mentors:

Keguo Li, mail: likeguo (at) apache.org

Project Devs, mail: dev (at) shenyu.apache.org

774]])
[ ] Document the testing infrastructure and the testing approach to allow for easy maintenance and future contributions.
❗ ***Important Note*** ❗

Do not create any account on behalf of Apache StreamPipes in Cypress or using the name of Apache StreamPipes for any account creation. Your mentor will take care of it.

Relevant Skills

Familiarity with testing frameworks, such as Cypress or Selenium
Experience with TypeScript or Java
Basic knowledge of Angular is helpful
Familiarity with Docker and containerization is a plus
Learning Material

References

You can find our corresponding issue on GitHub here

Name and Contact Information

Name: Philipp Zehnder

email: zehnder[at]apache.org

community: dev[at]streampipes.apache.org

website: https://streampipes.apache.org/

Difficulty: Major

Project size: ~350 hour (large)

Potential mentors:

Philipp Zehnder, mail: zehnder (at) apache.org

Project Devs, mail: dev (at) streampipes.apache.org

OPC-UA browser for Apache StreamPipes

Apache StreamPipes

Apache StreamPipes (incubating) is a self-service (Industrial) IoT toolbox to enable non-technical users to connect, analyze and explore IoT data streams. StreamPipes offers several modules including StreamPipes Connect to easily connect data from industrial IoT sources, the Pipeline Editor to quickly create processing pipelines and several visualization modules for live and historic data exploration. Under the hood, StreamPipes utilizes an event-driven microservice paradigm of standalone, so-called analytics microservices making the system easy to extend for individual needs.

Background

StreamPipes is grown significantly throughout recent years. We were able to introduce a lot of new features and attracted both users and contributors. Putting the cherry on the cake, we were graduated as an Apache top level project in December 2022. We will of course continue developing new features and never rest to make StreamPipes even more amazing.

StreamPipes really shines when connecting Industrial IoT data. Such data sources typically originate from machine controllers, called PLCs (e.g., Siemens S7). But there are also new protocols such as OPC-UA which allow to browse available data within the controller. Our goal is to make connectivity of industrial data sources a matter of minutes.

Currently, data sources can be connected using the built-in module `StreamPipes Connect` from the UI. We provide a set of adapters for popular protocols that can be customized, e.g., connection details can be added.

To make it even easier to connect industrial data sources with StreamPipes, we plan to add an OPC-UA browser. This will be part of the entry page of StreamPipes connect and should allow users to enter connection details of an existing OPC-UA server. Afterwards, a new view in the UI shows available data nodes from the server, their status and current value. Users should be able to select values that should be part of a new adapter. Afterwards, a new adapter can be created by reusing the current workflow to create an OPC-UA data source.

This is a really cool project for participants interested in full-stack development who would like to get a deeper understanding of industrial IoT protocols. Have fun!

Tasks

[ ] get familiar with the OPC-UA protocol
[ ] develop mockups which demonstrate the user workflow
[ ] develop a data model for discovering data from OPC-UA
[ ] create the backend business logic for the OPC-UA browser
[ ] create the frontend views to asynchronously browse data and to create a new adapter
[ ] write Junit, Component and E2E tests
[ ] what ever comes in your mind 💡 further ideas are always welcome

Relevant Skills

interest in Industrial IoT and procotols such as OPC-UA

Java development skills
Angular/Typescript development skills

Anyways, the most important relevant skill is motivation and readiness to learn during the project!

Learning Material

StreamPipes documentation (https://streampipes.apache.org/docs/docs/user-guide-introduction.html)
[ur current OPC-UA adapter (https://github.com/apache/streampipes/tree/dev/streampipes-extensions/streampipes-connect-adapters-iiot/src/main/java/org/apache/streampipes/connect/iiot/adapters/opcua)
Eclipse Milo, which we currently use for OPC-UA connectivity (https://github.com/eclipse/milo)
Apache PLC4X, which has an API for browsing (https://plc4x.apache.org/)

Reference

Github issue can be found here

Apache ShenYu Gsoc 2023 - ShenYu End-To-End SpringCloud plugin test case

Background:

Shenyu is a native API gateway for service proxy, protocol translation and API governance. but Shenyu lack of End-To-End Tests.

Relevant skills：

1.Understand the architecture of ShenYu

2.Understand SpringCloud micro-service and ShenYu SpringCloud proxy plugin.

3.Understand ShenYu e2e framework and architecture.

How to coding

1.please refer to org.apache.shenyu.e2e.testcase.plugin.DividePluginCases

How to test

1.start shenyu admin in docker

2.start shenyu boostrap in docker

3.run test case org.apache.shenyu.e2e.testcase.plugin.PluginsTest#testDivide

Task List

1.develop e2e tests of the springcloud plug-in.

2.write shenyu e2e springcloud plugin documentation in shenyu-website.

3.refactor the existing plugin test cases.

Links:

website： https://shenyu.apache.org/

issues: https://github.com/apache/shenyustreampipes/issues/44741390

Name and contact information

Mentor: Dominik Riemer (riemer[at]apache.org).
Mailing list: (dev[at]streampipes.apache.org)
Website: streampipes.apache.org

Difficulty: Major

Project size: ~175 ~350 hour (mediumlarge)

Potential mentors:

Fengen HeDominik Riemer, mail: hefengen riemer (at) apache.org

Project Devs, mail: dev (at) shenyustreampipes.apache.org

Apache ShenYu Gsoc 2023 - Support for Kubernetes Service Discovery

Background

Apache ShenYu is a Java native API Gateway for service proxy, protocol conversion and API governance. Currently, ShenYu has good usability and performance in microservice scenarios. However, ShenYu's support for Kubernetes is still relatively weak.

Tasks

1. Support the registration of microservices deployed in K8s Pod to shenyu-admin and use K8s as the register center.
2. Discuss with mentors, and complete the requirements design and technical design of Shenyu K8s Register Center.
3. Complete the initial version of Shenyu K8s Register Center.
4. Complete the CI test of Shenyu K8s Register Center, verify the correctness of the code.
5. Write the necessary documentation, deployment guides, and instructions for users to connect microservices running inside the K8s Pod to ShenYu

Relevant Skills

1. Know the use of Apache ShenYu, especially the register center
2. Familiar with Java and Golang
3. Familiar with Kubernetes and can use Java or Golang to develop

Difficulty: Major

Project size: ~350 hour (large)

Potential mentors:

Yonglun Zhang, mail: zhangyonglun (at) apache.org

Project Devs, mail: dev (at) shenyu.apache.org

Code Insights for Apache StreamPipes

Apache StreamPipes

Apache StreamPipes (incubating) is a self-service (Industrial) IoT toolbox to enable non-technical users to connect, analyze and explore IoT data streams. StreamPipes offers several modules including StreamPipes Connect to easily connect data from industrial IoT sources, the Pipeline Editor to quickly create processing pipelines and several visualization modules for live and historic data exploration. Under the hood, StreamPipes utilizes an event-driven microservice paradigm of standalone, so-called analytics microservices making the system easy to extend for individual needs.

Background

StreamPipes has grown significantly throughout recent years. We were able to introduce a lot of new features and attracted both users and contributors. Putting the cherry on the cake, we were graduated as an Apache top level project in December 2022. We will of course continue developing new features and never rest to make StreamPipes even more amazing. Although, since we are approaching with full stream towards our `1.0` release, we want to project also to get more mature. Therefore, we want to address one of our Achilles' heels: our test coverage.

Don't worry, this issue is not about implementing myriads of tests for our code base. As a first step, we would like to make the status quo transparent. That means we want to measure our code coverage consistently across the whole codebase (Backend, UI, Python library) and report the coverage to codecov. Furthermore, to benchmark ourselves and motivate us to provide tests with every contributing, we would like to lock the current test coverage as an lower threshold that we always want to achieve (meaning in case we drop CI builds fail etc). With time we then can increase the required coverage lever step to step.

More than monitoring our test coverage, we also want to invest in better and more clean code. Therefore, we would like to adopt sonarcloud for our repository.

Tasks

[ ] calculate test coverage for all main parts of the repo
[ ] send coverage to codeCov
[ ] determine coverage threshold and let CI fail if below
[ ] include sonarcloud in CI setup
[ ] include automatic coverage report in PR validation (see an example here ) -> optional
[ ] include automatic sonarcloud report in PR validation -> optional
[ ] what ever comes in your mind 💡 further ideas are always welcome

❗Important Note❗

Do not create any account in behalf of Apache StreamPipes in Sonarcloud or in CodeCov or using the name of Apache StreamPipes for any account creation. Your mentor will take care of it.

Relevant Skills

basic knowledge about GitHub worfklows

Learning Material

GitHub workflow docs
Apache StreamPipes workflows
Sonarcloud for Monorepos
Using code cov for a monorepo: https://www.curtiscode.dev/post/tools/codecov-monorepo/ & https://docs.codecov.com/docs/flags

References

You can find our corresponding issue on GitHub here

Name and Contact Information

Name: Tim Bossenmaier

email: bossenti[at]apache.org

community: dev[at]streampipes.apache.org

website: https://streampipes

Apache ShenYu Gsoc 2023 - Design and implement shenyu ingress-controller in k8s

Background

Apache ShenYu is a Java native API Gateway for service proxy, protocol conversion and API governance. Currently, ShenYu has good usability and performance in microservice scenarios. However, ShenYu's support for Kubernetes is still relatively weak.

Tasks

1. Discuss with mentors, and complete the requirements design and technical design of shenyu-ingress-controller.
2. Complete the initial version of shenyu-ingress-controller, implement the reconcile of k8s ingress api, and make ShenYu as the ingress gateway of k8s.
3. Complete the ci test of shenyu-ingress-controller, verify the correctness of the code.

Relevant Skills

1. Know the use of Apache ShenYu
2. Familiar with Java and Golang
3. Familiar with Kubernetes and can use java or golang to develop Kubernetes Controller

Description

Issues ： https://github.com/apache/shenyu/issues/4438
website ： https://shenyu.apache.org/

Difficulty: Major

Project size: ~350 ~175 hour (largemedium)

Potential mentors:

Yu XiaoTim Bossenmaier, mail: xiaoyu bossenti (at) apache.org

Project Devs, mail: dev (at) shenyustreampipes.apache.org

...

ShardingSphere

Apache

EventMesh Support source/sink connectors on EventMesh

ShardingSphere Add ShardingSphere Kafka source connector

Apache ShardingSphere

Apache ShardingSphere is positioned as a Database Plus, and aims at building a standard layer and ecosystem above heterogeneous databases. It focuses on how to reuse existing databases and their respective upper layer, rather than creating a new database. The goal is to minimize or eliminate the challenges caused by underlying databases fragmentation.

Page: https://shardingsphere.apache.org
Github:

Apache EventMesh (incubating)
Apache EventMesh is a fully serverless platform used to build distributed event-driven applications.

Website: https://eventmesh.apache.org

GitHub: https://github.com/apache/incubator-eventmesh

Upstream Issue: https://github.com/apache/incubator-eventmesh/issues/3492

Background

Through eventmesh's source/sink connector, we can connect data to heterogeneous data storage, we hope that the community can provide source/sink connector capabilities, such as connecting rocketmq data to different rocketmq clusters or kafka clusters

Task

1. Discuss with the mentors what you need to do

2. Learn the details of the Apache EventMesh project

3. Implement one of the source/sink connector based on the above background

Recommended Skills

1. Familiar with Java

2. Familiar with MQ is better

Mentor
Eason Chen, PPMC of Apache EventMesh,

shardingsphere

Background

The community just added CDC (change data capture) feature recently. Change feed will be published in created network connection after logging in, then it could be consumed.

Since Kafka is popular distributed event streaming platform, it's useful to import change feed into Kafka for later processing.

Task

Familiar with ShardingSphere CDC client usage, create publication and subscribe change feed.
Familiar with Kafka connector development, develop source connector, integrate with ShardingSphere CDC. Persist change feed to Kafka topics properly.
Add unit test and E2E integration test.

Relevant Skills

1. Java language

2. Basic knowledge of CDC and Kafka

3. Maven

References

https://github.com

/qqeasonchen, chenguangsheng@apache.orgMike Xue, PPMC of Apache EventMesh,

Mentor

Hongsheng Zhong, PMC of Apache ShardingSphere, zhonghongsheng@apache.org

Xinze Guo, Committer of Apache ShardingSphere, azexin@apache/xwm1992, mikexue@apache.org

Difficulty: Major

Project size: ~350 hour (large)

Potential mentors:

Xue WeimingHongsheng Zhong, mail: mikexue zhonghongsheng (at) apache.org

Project Devs, mail: dev (at) eventmeshshardingsphere.apache.org

Apache

EventMesh EventMesh official website docs by version and demo show

ShardingSphere Enhance SQLNodeConverterEngine to support more MySQL SQL statements

Apache ShardingSphere

Apache ShardingSphere is positioned as a Database Plus, and aims at building a standard layer and ecosystem above heterogeneous databases. It focuses on how to reuse existing databases and their respective upper layer, rather than creating a new database. The goal is to minimize or eliminate the challenges caused by underlying databases fragmentation.

Page: https://shardingsphere

Apache EventMesh (incubating)
Apache EventMesh is a fully serverless platform used to build distributed event-driven applications.

Website: https://eventmesh.apache.org
GitHubGithub: https://github.com/apache/incubator-eventmesh

Upstream Issue: https://github.com/apache/incubator-eventmesh/issues/3488

Background

We hope that the community can contribute to the maintenance of documents, including the archiving of Chinese and English content of documents of different release versions, the maintenance of official website documents, the improvement of project quick start documents, feature introduction, etc.

Task

1.Discuss with the mentors what you need to do

2. Learn the details of the Apache EventMesh project

3. Improve and supplement the content of documents on GitHub, maintain official website documents, record eventmesh quick user experience, and feature display videos

Recommended Skills

1.Familiar with MarkDown

2. Familiar with Java\Go

Mentor
Eason Chen, PPMC of Apache EventMesh, https://github.com/qqeasonchen, chenguangsheng@apache.org

Mike Xue, PPMC of Apache EventMesh, https://github.com/xwm1992, mikexue@apache.org

Difficulty: Major

Project size: ~350 hour (large)

Potential mentors:

Xue Weiming, mail: mikexue (at) apache.org

Project Devs, mail: dev (at) eventmesh.apache.org

Apache EventMesh Optimize the eventmesh-admin of EventMesh

Apache EventMesh (incubating)
Apache EventMesh is a fully serverless platform used to build distributed event-driven applications.

Website: https://eventmesh.apache.org

GitHub: https://github.com/apache/incubator-eventmesh

Upstream Issue: https://github.com/apache/incubator-eventmesh/issues/3495

Background

At present, eventmesh-admin provides a management interface for eventmesh storage, but it only implements the management function of rocketmq, which needs to be further expanded. At the same time, it can provide CLI for users to quickly start and experience.

Task

1. Discuss with the mentors what you need to do

2. Learn the details of the Apache EventMesh project

3. Implement the management interface for other eventmesh storage

4. Support the CLI for quick start the eventmesh

Recommended Skills

1. Familiar with Java

2. Familiar with MQ is better

Mentor

Eason Chen, PPMC of Apache EventMesh, https://github.com/qqeasonchen, chenguangsheng@apache.org

Mike Xue, PPMC of Apache EventMesh, https://github.com/xwm1992, mikexue@apache.org

Difficulty: Major

Project size: ~350 hour (large)

Potential mentors:

Xue Weiming, mail: mikexue (at) apache.org

Project Devs, mail: dev (at) eventmesh.apache.org

Apache EventMesh Optimize the event-bridge on EventMesh

Apache EventMesh (incubating)
Apache EventMesh is a fully serverless platform used to build distributed event-driven applications.

Website: https://eventmesh.apache.org

GitHub: https://github.com/apache/incubator-eventmesh

Upstream Issue: https://github.com/apache/incubator-eventmesh/issues/3494

Background

Through eventmesh‘s event bridge feature, we can connect data to heterogeneous data storage, we hope that the community can optimize the current eventbridge capability of EventMesh to realize the data connection of different event stores.

Task

1. Discuss with the mentors what you need to do

2. Learn the details of the Apache EventMesh project

3. Verify the ability of different EventMesh cluster instances to synchronize data, sort out the corresponding verification step documents, and optimize the current EventMesh bridge features

Recommended Skills

1. Familiar with Java

2. Familiar with MQ is better

Mentor

Eason Chen, PPMC of Apache EventMesh, https://github.com/qqeasonchen, chenguangsheng@apache.org

Mike Xue, PPMC of Apache EventMesh, https://github.com/xwm1992, mikexue@apache.org

Difficulty: Major

Project size: ~350 hour (large)

Potential mentors:

Xue Weiming, mail: mikexue (at) apache.org

Project Devs, mail: dev (at) eventmesh.apache.org

shardingsphere

Background

The ShardingSphere SQL federation engine provides support for complex SQL statements, and it can well support cross-database join queries, subqueries, aggregation queries and other statements. An important part of SQL federation engine is to convert the SQL statement parsed by ShardingSphere into SqlNode, so that Calcite can be used to implement SQL optimization and federated query.

Task

This issue is to solve the MySQL exception that occurs during SQLNodeConverterEngine conversion. The specific case list is as follows.

select_char
select_extract
select_from_dual
select_from_with_table
select_group_by_with_having_and_window
select_not_between_with_single_table
select_not_in_with_single_table
select_substring
select_trim
select_weight_string
select_where_with_bit_expr_with_ampersand
select_where_with_bit_expr_with_caret
select_where_with_bit_expr_with_div
select_where_with_bit_expr_with_minus_interval
select_where_with_bit_expr_with_mod
select_where_with_bit_expr_with_mod_sign
select_where_with_bit_expr_with_plus_interval
select_where_with_bit_expr_with_signed_left_shift
select_where_with_bit_expr_with_signed_right_shift
select_where_with_bit_expr_with_vertical_bar
select_where_with_boolean_primary_with_comparison_subquery
select_where_with_boolean_primary_with_is
select_where_with_boolean_primary_with_is_not
select_where_with_boolean_primary_with_null_safe
select_where_with_expr_with_and_sign
select_where_with_expr_with_is
select_where_with_expr_with_is_not
select_where_with_expr_with_not
select_where_with_expr_with_not_sign
select_where_with_expr_with_or_sign
select_where_with_expr_with_xor
select_where_with_predicate_with_in_subquery
select_where_with_predicate_with_regexp
select_where_with_predicate_with_sounds_like
select_where_with_simple_expr_with_collate
select_where_with_simple_expr_with_match
select_where_with_simple_expr_with_not
select_where_with_simple_expr_with_odbc_escape_syntax
select_where_with_simple_expr_with_row
select_where_with_simple_expr_with_tilde
select_where_with_simple_expr_with_variable
select_window_function
select_with_assignment_operator
select_with_assignment_operator_and_keyword
select_with_case_expression
select_with_collate_with_marker
select_with_date_format_function
select_with_exists_sub_query_with_project
select_with_function_name
select_with_json_value_return_type
select_with_match_against
select_with_regexp
select_with_schema_name_in_column_projection
select_with_schema_name_in_shorthand_projection
select_with_spatial_function
select_with_trim_expr
select_with_trim_expr_from_expr

You need to compare the difference between actual and expected, and then correct the logic in SQLNodeConverterEngine so that actual can be consistent with expected.

After you make changes, remember to add case to SUPPORTED_SQL_CASE_IDS to ensure it can be tested.

Notice, these issues can be a good example.

Apache EventMesh Integrate eventmesh runtime on Kubernetes

Apache EventMesh (incubating)
Apache EventMesh is a fully serverless platform used to build distributed event-driven applications.

Website: https://eventmesh.apache.org

GitHub: https://github.com/apache/incubator-eventmesh

Upstream Issue: https://github.com/apache/incubator-eventmeshshardingsphere/issuespull/332714492

Background

Currently, EventMesh has good usability in microservice scenarios. However, EventMesh's support for Kubernetes is still relatively weak.We hope the community can contribute EventMesh integration with the k8s.

Task

1.Discuss with the mentors your implementation idea

2. Learn the details of the Apache EventMesh project

3. Integrate EventMesh with the k8s

Recommended Skills

1.Familiar with Java

2.Familiar with Kubernetes

Mentor
Eason Chen, PPMC of Apache EventMesh, https://github.com/qqeasonchen, chenguangsheng@apache.org

Relevant Skills

1. Master JAVA language

2. Have a basic understanding of Antlr g4 file

3. Be familiar with MySQL and Calcite SqlNode

Targets files

SQLNodeConverterEngineIT

https://github.com/apache/shardingsphere/blob/master/test/it/optimizer/src/test/java/org/apache/shardingsphere/test/it/optimize/SQLNodeConverterEngineIT.java

Mentor

Zhengqiang Duan, PMC of Apache ShardingSphere, duanzhengqiang@apache.org

Chuxin Chen, Committer of Apache ShardingSphere, tuichenchuxin@apache.org^{Image Added}

Trista Pan, PMC of Apache ShardingSphere, panjuan@apacheMike Xue, PPMC of Apache EventMesh, https://github.com/xwm1992, mikexue@apache.org

Difficulty: Major

Project size: ~350 hour (large)

Potential mentors:

Xue Weiming, mail: mikexue (at) apache.org

Project Devs, mail: dev (at) eventmesh.apache.org

TrafficControl

Zhengqiang Duan, mail: duanzhengqiang (at) apache.org

Project Devs, mail: dev (at) shardingsphere.apache.org

Apache ShardingSphere Enhance ComputeNode reconciliation

Add server indicator if a server is a cache

Apache ShardingSphere

Apache ShardingSphere is positioned as a Database Plus, and aims at building a standard layer and ecosystem above heterogeneous databases. It focuses on how to reuse existing databases and their respective upper layer, rather than creating a new database. The goal is to minimize or eliminate the challenges caused by underlying databases fragmentation.

Page: https://shardingsphere.apache.org/
Github:

https://github.com/apache/trafficcontrol/issues/7076

Difficulty: Trivial

Project size: ~175 hour (medium)

Potential mentors:

Brennan Fieck, mail: ocket8888 (at) apache.org

Project Devs, mail: dev (at) trafficcontrol.apache.org

shardingsphere

Background

There is a proposal about new CRD Cluster and ComputeNode as belows:

Currently we try to promote ComputeNode as major CRD to represent a special ShardingSphere Proxy deployment. And plan to use Cluster indicating a special ShardingSphere Proxy cluster.

Task

This issue is to enhance ComputeNode reconciliation availability. The specific case list is as follows.

Add IT test case for Deployment spec volume
Add IT test case for Deployment spec template init containers
Add IT test case for Deployment spec template spec containers
Add IT test case for Deployment spec volume mounts
Add IT test case for Deployment spec container ports
Add IT test case for Deployment spec container image tag
Add IT test case for Service spec ports
Add IT test case for ConfigMap data serverconfig
Add IT test case for ConfigMap data logback

Notice, these issues can be a good example.
chore: add more Ginkgo tests for ComputeNode #203

Relevant Skills

Master Go language, Ginkgo test framework
Have a basic understanding of Apache ShardingSphere Concepts
Be familiar with Kubernetes Operator, kubebuilder framework

Targets files

ComputeNode IT - https://github.com/apache/shardingsphere-on-cloud/blob/main/shardingsphere-operator/pkg/reconcile/computenode/compute_node_test.go

Mentor

Liyao Miao, Committer of Apache ShardingSphere, miaoliyao@apache.org^{Image Added}

Chuxin Chen, Committer of Apache ShardingSphere, tuichenchuxin@apache.org^{Image Added}

GSOC Varnish Cache support in Apache Traffic Control

Background
Apache Traffic Control is a Content Delivery Network (CDN) control plane for large scale content distribution.

Traffic Control currently requires Apache Traffic Server as the underlying cache. Help us expand the scope by integrating with the very popular Varnish Cache.

There are multiple aspects to this project:

Configuration Generation: Write software to build Varnish configuration files (VCL). This code will be implemented in our Traffic Ops and cache client side utilities, both written in Go.

Health Monitoring: Implement monitoring of the Varnish cache health and performance. This code will run both in the Traffic Monitor component and within Varnish. Traffic Monitor is written in Go and Varnish is written in C.

Testing: Adding automated tests for new code

Skills:

Proficiency in Go is required

A basic knowledge of HTTP and caching is preferred, but not required for this project.

Difficulty: Major

Project size: ~350 hour (large)

Potential mentors:

Eric Friedrich, mail: friede (at) apache.org

Project Devs, mail: dev (at) trafficcontrol.apache.org

StreamPipes

)

Potential mentors:

Chuxin Chen, mail: tuichenchuxin (at) apache.org

Project Devs, mail: dev (at) shardingsphere.apache.org

Apache ShardingSphere Add the feature of switching logging framework

Apache ShardingSphere

Apache ShardingSphere is positioned as a Database Plus, and aims at building a standard layer and ecosystem above heterogeneous databases. It focuses on how to reuse existing databases and their respective upper layer, rather than creating a new database. The goal is to minimize or eliminate the challenges caused by underlying databases fragmentation.

Page: https://shardingsphere.apache.org
Github: https://github.com/apache/shardingsphere

Background

ShardingSphere provides two adapters: ShardingSphere-JDBC and ShardingSphere-Proxy.

Now, ShardingSphere uses logback for logging, but consider the following situations:

Users may need to switch the logging framework to meet special needs, such as log4j2 can provide better asynchronous performance;
When using the JDBC adapter, the user application may not use logback, which may cause some conflicts.

Why doesn't the log facade suffice? Because ShardingSphere provides users with clustered logging configurations (such as changing the log level online), this requires dynamic construction of logger, which cannot be achieved with only the log facade.

Task

1. Design and implement logging SPI to support multiple logging frameworks (such as logback and log4j2)
2. Allow users to choose which logging framework to use through the logging rule

Relevant Skills

1. Master JAVA language

2. Basic knowledge of logback and log4j2

3. Maven

Mentor

Longtao Jiang, Committer of Apache ShardingSphere, jianglongtao@apache.org^{Image Added}

Trista Pan, PMC of Apache ShardingSphere, panjuan@apache.org^{Image Added}

Difficulty: Major

Project size: ~350 hour (large)

Potential mentors:

Longtao Jiang, mail: jianglongtao (at) apache.org

Project Devs, mail: dev (at) shardingsphere.apache.org

Apache ShardingSphere Support mainstream database metadata table query

Apache ShardingSphere

Apache ShardingSphere is positioned as a Database Plus, and aims at building a standard layer and ecosystem above heterogeneous databases. It focuses on how to reuse existing databases and their respective upper layer, rather than creating a new database. The goal is to minimize or eliminate the challenges caused by underlying databases fragmentation.

Page: https://shardingsphere.apache.org
Github: https://github.com/apache/shardingsphere

Background

ShardingSphere has designed its own metadata database to simulate metadata queries that support various databases.

More details:

https://github.com/apache/shardingsphere/issues/21268
https://github.com/apache/shardingsphere/issues/22052

Task

Support PostgreSQL And openGauss `\d tableName`
Support PostgreSQL And openGauss `\d+`
Support PostgreSQL And openGauss `\d+ tableName`
Support PostgreSQL And openGauss `l`
Support query for MySQL metadata `TABLES`
Support query for MySQL metadata `COLUMNS`
Support query for MySQL metadata `schemata`
Support query for MySQL metadata `ENGINES`
Support query for MySQL metadata `FILES`
Support query for MySQL metadata `VIEWS`

Notice, these issues can be a good example.

https://github.com/apache/shardingsphere/pull/22053
https://github.com/apache/shardingsphere/pull/22057/
https://github.com/apache/shardingsphere/pull/22166/
https://github.com/apache/shardingsphere/pull/22182

Relevant Skills

Master JAVA language
Have a basic understanding of Zookeeper
Be familiar with MySQL/Postgres SQLs

Mentor

Chuxin Chen, Committer of Apache ShardingSphere, tuichenchuxin@apache.org

Zhengqiang Duan, PMC of Apache ShardingSphere, duanzhengqiang@apache.org

Difficulty: Major

Project size: ~350 hour (large)

Potential mentors:

Chuxin Chen, mail: tuichenchuxin (at) apache.org

Project Devs, mail: dev (at) shardingsphere.apache.org

SkyWalking

[GSOC] [SkyWalking] AIOps Log clustering with Flink (Algorithm Optimization)

Apache SkyWalking is an application performance monitor tool for distributed systems, especially designed for microservices, cloud native and container-based (Kubernetes) architectures. This year we will proceed on log clustering implementation with a revised architecture and this task will require student to focus on algorithm optimiztion for the clustering technique.

Difficulty: Major

Project size: ~350 hour (large)

Potential mentors:

Yihao Chen, mail: yihaochen (at) apache.org

Project Devs, mail: dev (at) skywalking.apache.org

[GSOC] [SkyWalking] Python Agent Performance Enhancement Plan

Apache SkyWalking is an application performance monitor tool for distributed systems, especially designed for microservices, cloud native and container-based (Kubernetes) architectures. This task is about enhancing Python agent performance, the tracking issue can be seen here -< https://github.com/apache/skywalking/issues/10408

Difficulty: Major

Project size: ~350 hour (large)

Potential mentors:

Yihao Chen, mail: yihaochen (at) apache.org

Project Devs, mail: dev (at) skywalking.apache.org

[GSOC] [SkyWalking] AIOps Log clustering with Flink (Flink Integration)

Apache SkyWalking is an application performance monitor tool for distributed systems, especially designed for microservices, cloud native and container-based (Kubernetes) architectures. This year we will proceed on log clustering implementation with a revised architecture and this task will require student to focus on Flink and its integration with SkyWalking OAP.

Difficulty: Major

Project size: ~350 hour (large)

Potential mentors:

Yihao Chen, mail: yihaochen (at) apache.org

Project Devs, mail: dev (at) skywalking.apache.org

[GSOC] [SkyWalking] Self-Observability of the query subsystem in BanyanDB

Background

SkyWalking BanyanDB is an observability database, aims to ingest, analyze and store Metrics, Tracing and Logging data.

Objectives

Support EXPLAIN[1] for both measure query and stream query
Add self-observability including trace and metrics for query subsystem
Support EXPLAIN in the client SDK & CLI and add query plan visualization in the UI

[1]: EXPLAIN in MySQL

Recommended Skills

Familiar with Go
Have a basic understanding of database query engine
Have an experience of Apache SkyWalking or other APMs

Mentor

Mentor: Jiajing Lu, Apache SkyWalking PMC, lujiajing@apache.org^{Image Added}
Mentor: Hongtao Gao, Apache SkyWalking PMC, Apache ShardingSphere PMC, hanahmily@apache.org^{Image Added}
Mailing List: dev@skywalking.apache.org

Difficulty: Major

Project size: ~350 hour (large)

Potential mentors:

Jiajing Lu, mail: lujiajing (at) apache.org

Project Devs, mail: dev (at) skywalking.apache.org

[GSOC] [SkyWalking] Unify query planner and executor in BanyanDB

Background

SkyWalking BanyanDB is an observability database, aims to ingest, analyze and store Metrics, Tracing and Logging data.

Objectives

Fully unify/merge the query planner and executor for Measure and TopN

Recommended Skills

Familiar with Go
Have a basic understanding of database query engine
Have an experience of Apache SkyWalking

Mentor

Mentor: Jiajing Lu, Apache SkyWalking PMC, lujiajing@apache.org^{Image Added}
Mentor: Hongtao Gao, Apache SkyWalking PMC, Apache ShardingSphere PMC, hanahmily@apache.org^{Image Added}
Mailing List: dev@skywalking.apache.org

Difficulty: Major

Project size: ~175 hour (medium)

Potential mentors:

Jiajing Lu, mail: lujiajing (at) apache.org

Project Devs, mail: dev (at) skywalking.apache.org

[GSOC][SkyWalking] Add Terraform provider for Apache SkyWalking

Now the deployment methods for SkyWalking are limited, we only have Helm Chart for users to deploy in Kubernetes, other users that are not using Kubernetes have to do all the house keeping stuffs to set up SkyWalking on, for example, VM.

This issue aims to add a Terraform provider, so that users can conveniently spin up a cluster for demonstration or testing, we should evolve the provider and allow users to customize as their need and finally users can use this in their production environment.

In this task, we will mainly focus on the support for AWS. In the Terraform provider, users need to provide their access key / secret key, and the provider does the rest stuffs: create VMs, create database/OpenSearch or RDS, download SkyWalking tars, configure the SkyWalking, and start the SkyWalking components (OAP/UI), create public IPs/domain name, etc.

Code Insights for Apache StreamPipes

Apache StreamPipes

Apache StreamPipes (incubating) is a self-service (Industrial) IoT toolbox to enable non-technical users to connect, analyze and explore IoT data streams. StreamPipes offers several modules including StreamPipes Connect to easily connect data from industrial IoT sources, the Pipeline Editor to quickly create processing pipelines and several visualization modules for live and historic data exploration. Under the hood, StreamPipes utilizes an event-driven microservice paradigm of standalone, so-called analytics microservices making the system easy to extend for individual needs.

Background

StreamPipes has grown significantly throughout recent years. We were able to introduce a lot of new features and attracted both users and contributors. Putting the cherry on the cake, we were graduated as an Apache top level project in December 2022. We will of course continue developing new features and never rest to make StreamPipes even more amazing. Although, since we are approaching with full stream towards our `1.0` release, we want to project also to get more mature. Therefore, we want to address one of our Achilles' heels: our test coverage.

Don't worry, this issue is not about implementing myriads of tests for our code base. As a first step, we would like to make the status quo transparent. That means we want to measure our code coverage consistently across the whole codebase (Backend, UI, Python library) and report the coverage to codecov. Furthermore, to benchmark ourselves and motivate us to provide tests with every contributing, we would like to lock the current test coverage as an lower threshold that we always want to achieve (meaning in case we drop CI builds fail etc). With time we then can increase the required coverage lever step to step.

More than monitoring our test coverage, we also want to invest in better and more clean code. Therefore, we would like to adopt sonarcloud for our repository.

Tasks

[ ] calculate test coverage for all main parts of the repo
[ ] send coverage to codeCov
[ ] determine coverage threshold and let CI fail if below
[ ] include sonarcloud in CI setup
[ ] include automatic coverage report in PR validation (see an example here ) -> optional
[ ] include automatic sonarcloud report in PR validation -> optional
[ ] what ever comes in your mind 💡 further ideas are always welcome

❗Important Note❗

Do not create any account in behalf of Apache StreamPipes in Sonarcloud or in CodeCov or using the name of Apache StreamPipes for any account creation. Your mentor will take care of it.

Relevant Skills

basic knowledge about GitHub worfklows

Learning Material

GitHub workflow docs
Apache StreamPipes workflows
Sonarcloud for Monorepos
Using code cov for a monorepo: https://www.curtiscode.dev/post/tools/codecov-monorepo/ & https://docs.codecov.com/docs/flags

References

You can find our corresponding issue on GitHub here

Name and Contact Information

Name: Tim Bossenmaier

email: bossenti[at]apache.org

community: dev[at]streampipes.apache.org

website: https://streampipes.apache.org/

Difficulty: Major

Project size: ~175 hour (medium)

Potential mentors:

Tim Bossenmaier, mail: bossenti (at) apache.org

Project Devs, mail: dev (at) streampipes.apache.org

OPC-UA browser for Apache StreamPipes

Apache StreamPipes

Apache StreamPipes (incubating) is a self-service (Industrial) IoT toolbox to enable non-technical users to connect, analyze and explore IoT data streams. StreamPipes offers several modules including StreamPipes Connect to easily connect data from industrial IoT sources, the Pipeline Editor to quickly create processing pipelines and several visualization modules for live and historic data exploration. Under the hood, StreamPipes utilizes an event-driven microservice paradigm of standalone, so-called analytics microservices making the system easy to extend for individual needs.

Background

StreamPipes is grown significantly throughout recent years. We were able to introduce a lot of new features and attracted both users and contributors. Putting the cherry on the cake, we were graduated as an Apache top level project in December 2022. We will of course continue developing new features and never rest to make StreamPipes even more amazing.

StreamPipes really shines when connecting Industrial IoT data. Such data sources typically originate from machine controllers, called PLCs (e.g., Siemens S7). But there are also new protocols such as OPC-UA which allow to browse available data within the controller. Our goal is to make connectivity of industrial data sources a matter of minutes.

Currently, data sources can be connected using the built-in module `StreamPipes Connect` from the UI. We provide a set of adapters for popular protocols that can be customized, e.g., connection details can be added.

To make it even easier to connect industrial data sources with StreamPipes, we plan to add an OPC-UA browser. This will be part of the entry page of StreamPipes connect and should allow users to enter connection details of an existing OPC-UA server. Afterwards, a new view in the UI shows available data nodes from the server, their status and current value. Users should be able to select values that should be part of a new adapter. Afterwards, a new adapter can be created by reusing the current workflow to create an OPC-UA data source.

This is a really cool project for participants interested in full-stack development who would like to get a deeper understanding of industrial IoT protocols. Have fun!

Tasks

[ ] get familiar with the OPC-UA protocol
[ ] develop mockups which demonstrate the user workflow
[ ] develop a data model for discovering data from OPC-UA
[ ] create the backend business logic for the OPC-UA browser
[ ] create the frontend views to asynchronously browse data and to create a new adapter
[ ] write Junit, Component and E2E tests
[ ] what ever comes in your mind 💡 further ideas are always welcome

Relevant Skills

interest in Industrial IoT and procotols such as OPC-UA

Java development skills
Angular/Typescript development skills

Anyways, the most important relevant skill is motivation and readiness to learn during the project!

Learning Material

StreamPipes documentation (https://streampipes.apache.org/docs/docs/user-guide-introduction.html)
[ur current OPC-UA adapter (https://github.com/apache/streampipes/tree/dev/streampipes-extensions/streampipes-connect-adapters-iiot/src/main/java/org/apache/streampipes/connect/iiot/adapters/opcua)
Eclipse Milo, which we currently use for OPC-UA connectivity (https://github.com/eclipse/milo)
Apache PLC4X, which has an API for browsing (https://plc4x.apache.org/)

Reference

Github issue can be found here: https://github.com/apache/streampipes/issues/1390

Name and contact information

Mentor: Dominik Riemer (riemer[at]apache.org).

Mailing list: (dev[at]streampipes.apache.org)

Website: streampipes.apache.org

Difficulty: Major

Project size: ~350 hour (large)

Potential mentors:

Dominik RiemerZhenxu Ke, mail: riemer kezhenxu94 (at) apache.org

Project Devs, mail: dev (at) streampipesskywalking.apache.org

ShenYu

Apache ShenYu Gsoc 2023 - Support for Kubernetes Service Discovery

Background

Apache ShenYu is a Java native API Gateway for service proxy, protocol conversion and API governance. Currently, ShenYu has good usability and performance in microservice scenarios. However, ShenYu's support for Kubernetes is still relatively weak.

Tasks

1. Support the registration of microservices deployed in K8s Pod to shenyu-admin and use K8s as the register center.
2. Discuss with mentors, and complete the requirements design and technical design of Shenyu K8s Register Center.
3. Complete the initial version of Shenyu K8s Register Center.
4. Complete the CI test of Shenyu K8s Register Center, verify the correctness of the code.
5. Write the necessary documentation, deployment guides, and instructions for users to connect microservices running inside the K8s Pod to ShenYu

Relevant Skills

1. Know the use of Apache ShenYu, especially the register center
2. Familiar with Java and Golang
3. Familiar with Kubernetes and can use Java or Golang to develop

Difficulty: Major

Project size: ~350 hour (large)

Potential mentors:

Yonglun Zhang, mail: zhangyonglun (at) apache.org

Project Devs, mail: dev (at) shenyu.apache.org

Apache ShenYu Gsoc 2023 - Design and implement shenyu ingress-controller in k8s

Background

Apache ShenYu is a Java native API Gateway for service proxy, protocol conversion and API governance. Currently, ShenYu has good usability and performance in microservice scenarios. However, ShenYu's support for Kubernetes is still relatively weak.

Tasks

1. Discuss with mentors, and complete the requirements design and technical design of shenyu-ingress-controller.
2. Complete the initial version of shenyu-ingress-controller, implement the reconcile of k8s ingress api, and make ShenYu as the ingress gateway of k8s.
3. Complete the ci test of shenyu-ingress-controller, verify the correctness of the code.

Relevant Skills

1. Know the use of Apache ShenYu
2. Familiar with Java and Golang
3. Familiar with Kubernetes and can use java or golang to develop Kubernetes Controller

Description

Issues ： https://github.com/apache/shenyu/issues/4438
website ： https://shenyu

Improving End-to-End Test Infrastructure of Apache StreamPipes

Apache StreamPipes

Apache StreamPipes (incubating) is a self-service (Industrial) IoT toolbox to enable non-technical users to connect, analyze and explore IoT data streams. StreamPipes offers several modules including StreamPipes Connect to easily connect data from industrial IoT sources, the Pipeline Editor to quickly create processing pipelines and several visualization modules for live and historic data exploration. Under the hood, StreamPipes utilizes an event-driven microservice paradigm of standalone, so-called analytics microservices making the system easy to extend for individual needs.

Background

StreamPipes has grown significantly over the past few years, with new features and contributors joining the project. However, as the project continues to evolve, e2e test coverage must also be improved to ensure that all features remain functional. Modern frameworks, such as Cypress, make it quite easy and fun to automatically test even complex application functionalities. As StreamPipes approaches its 1.0 release, it is important to improve e2e testing to ensure the robustness of the project and its use in real-world scenarios.

Tasks

[ ] Write e2e tests using Cypress to cover most functionalities and user interface components of StreamPipes.
[ ] Add more complex testing scenarios to ensure the reliability and robustness of StreamPipes in real-world use cases (e.g. automated tests for version updates)
[ ] Add e2e tests for the new Python client to ensure its integration with the main system and its functionalities ([#774 | https://github.com/apache/streampipes/issues/774]])
[ ] Document the testing infrastructure and the testing approach to allow for easy maintenance and future contributions.
❗ ***Important Note*** ❗

Do not create any account on behalf of Apache StreamPipes in Cypress or using the name of Apache StreamPipes for any account creation. Your mentor will take care of it.

Relevant Skills

Familiarity with testing frameworks, such as Cypress or Selenium
Experience with TypeScript or Java
Basic knowledge of Angular is helpful
Familiarity with Docker and containerization is a plus
Learning Material

References

You can find our corresponding issue on GitHub here

Name and Contact Information

Name: Philipp Zehnder

email: zehnder[at]apache.org

community: dev[at]streampipes.apache.org

website: https://streampipes.apache.org/

Difficulty: Major

Project size: ~350 hour (large)

Potential mentors:

Philipp ZehnderYu Xiao, mail: zehnder xiaoyu (at) apache.org

Project Devs, mail: dev (at) streampipesshenyu.apache.org

ShardingSphere

Apache ShenYu Gsoc 2023 - Design license scanning function

Background

At present, shenyu needs to manually check whether the license is correct one by one when releasing the version.

Tasks

Discuss with the tutor to complete the requirement design and technical design of the scanning license.
Finished scanning the initial version of the license.
Complete the corresponding test.

Relevant Skills

Familiar with Java.

Difficulty: Major

Project size: ~175 hour (medium)

Potential mentors:

SiYing Zheng, mail: impactcn (at) apache.org

Project Devs, mail: dev (at) shenyu.apache.org

Apache ShenYu Gsoc 2023 - Shenyu-Admin Internationalization

Background

Shenyu is a native API gateway for service proxy, protocol translation and API governance. It can manage and maintain the API through Shenyu-admin, and support internationalization in Chinese and English. Unfortunately, Shenyu-admin is only internationalized on the front end. The message prompt returned by the back-end interface is still in English. Therefore, we need to implement internationalization support for the back-end interface.This will lay a good foundation for shenyu to move towards more language support.

Relevant skills

Related skills spring resources
Spring Internationalization
Front-end react framework

API reference

            java.util.Locale;
            org.springframework.context.MessageSource;
            org.springframework.context.support.ResourceBundleMessageSource;

Interface effect example

            ## zh request example
            POST http://localhost:9095/plugin
            Content-Type: application/json
            Location: cn-zh
            X-Access-Token: xxx
            {
            "name": "test-create-plugin",
            "role": "test-create-plugin",
            "enabled": true,
            "sort": 100
            }
            Respone
            {
            "code": 600,
            "message": "未登录"
            }
            
            ### en request example
            POST http://localhost:9095/plugin
            Content-Type: application/json
            Location: en
            X-Access-Token: xxx
            {
            "name": "test-create-plugin",
            "role": "test-create-plugin",
            "enabled": true,
            "sort": 100
            }
            Respone
            {
            "code": 600,
            "message": "token is error"
            }

Task List

The task discussed with the tutor how to achieve the internationalization of shenyu-admin background
Some prompt message translation
Get through the internationalization of front-end, obtain the client region information through http protocol, support the language of the corresponding region.
Leave the extension of other multi-language internationalization support interface, so as to facilitate the localization transformation of subsequent users.

Apache ShardingSphere Add the feature of switching logging framework

Apache ShardingSphere

Apache ShardingSphere is positioned as a Database Plus, and aims at building a standard layer and ecosystem above heterogeneous databases. It focuses on how to reuse existing databases and their respective upper layer, rather than creating a new database. The goal is to minimize or eliminate the challenges caused by underlying databases fragmentation.

Page: https://shardingsphere.apache.org
Github: https://github.com/apache/shardingsphere

Background

ShardingSphere provides two adapters: ShardingSphere-JDBC and ShardingSphere-Proxy.

Now, ShardingSphere uses logback for logging, but consider the following situations:

Users may need to switch the logging framework to meet special needs, such as log4j2 can provide better asynchronous performance;
When using the JDBC adapter, the user application may not use logback, which may cause some conflicts.

Why doesn't the log facade suffice? Because ShardingSphere provides users with clustered logging configurations (such as changing the log level online), this requires dynamic construction of logger, which cannot be achieved with only the log facade.

Task

1. Design and implement logging SPI to support multiple logging frameworks (such as logback and log4j2)
2. Allow users to choose which logging framework to use through the logging rule

Relevant Skills

1. Master JAVA language

2. Basic knowledge of logback and log4j2

3. Maven

Mentor

Longtao Jiang, Committer of Apache ShardingSphere, jianglongtao@apache.org^{Image Removed}

Trista Pan, PMC of Apache ShardingSphere, panjuan@apache.org^{Image Removed}

Difficulty: Major

Project size: ~350 hour (large)

Potential mentors:

Longtao Jiang, mail: jianglongtao (at) apache.org

Project Devs, mail: dev (at) shardingsphere.apache.org

Apache ShardingSphere Support mainstream database metadata table query

Apache ShardingSphere

Apache ShardingSphere is positioned as a Database Plus, and aims at building a standard layer and ecosystem above heterogeneous databases. It focuses on how to reuse existing databases and their respective upper layer, rather than creating a new database. The goal is to minimize or eliminate the challenges caused by underlying databases fragmentation.

Page: https://shardingsphere.apache.org
Github: https://github.com/apache/shardingsphere

Background

ShardingSphere has designed its own metadata database to simulate metadata queries that support various databases.

More details:

https://github.com/apache/shardingsphere/issues/21268
https://github.com/apache/shardingsphere/issues/22052

Task

Support PostgreSQL And openGauss `\d tableName`
Support PostgreSQL And openGauss `\d+`
Support PostgreSQL And openGauss `\d+ tableName`
Support PostgreSQL And openGauss `l`
Support query for MySQL metadata `TABLES`
Support query for MySQL metadata `COLUMNS`
Support query for MySQL metadata `schemata`
Support query for MySQL metadata `ENGINES`
Support query for MySQL metadata `FILES`
Support query for MySQL metadata `VIEWS`

Notice, these issues can be a good example.

https://github.com/apache/shardingsphere/pull/22053
https://github.com/apache/shardingsphere/pull/22057/
https://github.com/apache/shardingsphere/pull/22166/
https://github.com/apache/shardingsphere/pull/22182

Relevant Skills

Master JAVA language
Have a basic understanding of Zookeeper
Be familiar with MySQL/Postgres SQLs

Mentor

Chuxin Chen, Committer of Apache ShardingSphere, tuichenchuxin@apache.org

Zhengqiang Duan, PMC of Apache ShardingSphere, duanzhengqiang@apache.org

Difficulty: Major

Project size: ~350 hour (large)

Potential mentors:

Chuxin Chen, mail: tuichenchuxin (at) apache.org

Project Devs, mail: dev (at) shardingsphere.apache.org

Keguo Li, mail: likeguo (at) apache.org

Project Devs, mail: dev (at) shenyu.apache.org

Apache ShenYu Gsoc 2023 - ShenYu End-To-End SpringCloud plugin test case

Background:

Shenyu is a native API gateway for service proxy, protocol translation and API governance. but Shenyu lack of End-To-End Tests.

Relevant skills：

1.Understand the architecture of ShenYu

2.Understand SpringCloud micro-service and ShenYu SpringCloud proxy plugin.

3.Understand ShenYu e2e framework and architecture.

How to coding

1.please refer to org.apache.shenyu.e2e.testcase.plugin.DividePluginCases

How to test

1.start shenyu admin in docker

2.start shenyu boostrap in docker

3.run test case org.apache.shenyu.e2e.testcase.plugin.PluginsTest#testDivide

Task List

1.develop e2e tests of the springcloud plug-in.

2.write shenyu e2e springcloud plugin documentation in shenyu-website.

3.refactor the existing plugin test cases.

Links:

website： https://shenyu.apache.org/

issues: https://github.com/apache/shenyu/issues/4474

Difficulty: Major

Project size: ~175 hour (medium)

Potential mentors:

Fengen He, mail: hefengen (at) apache.org

Project Devs, mail: dev (at) shenyu.apache.org

TrafficControl

GSOC Varnish Cache support in Apache Traffic Control

Background
Apache Traffic Control is a Content Delivery Network (CDN) control plane for large scale content distribution.

Traffic Control currently requires Apache Traffic Server as the underlying cache. Help us expand the scope by integrating with the very popular Varnish Cache.

There are multiple aspects to this project:

Configuration Generation: Write software to build Varnish configuration files (VCL). This code will be implemented in our Traffic Ops and cache client side utilities, both written in Go.

Health Monitoring: Implement monitoring of the Varnish cache health and performance. This code will run both in the Traffic Monitor component and within Varnish. Traffic Monitor is written in Go and Varnish is written in C.

Testing: Adding automated tests for new code

Skills:

Proficiency in Go is required
A basic knowledge of HTTP and caching is preferred, but not required for this project.

Difficulty: Major

Project size: ~350 hour (large)

Potential mentors:

Eric Friedrich, mail: friede (at) apache.org

Project Devs, mail: dev (at) trafficcontrol.apache.org

Add server indicator if a server is a cache

https://github.com/apache/trafficcontrol/issues/7076

Difficulty: Trivial

Project size: ~175 hour (medium)

Potential mentors:

Brennan Fieck, mail: ocket8888 (at) apache.org

Project Devs, mail: dev (at) trafficcontrol.apache.org

Beam

[GSoC][Beam] Build out Beam Machine Learning Use Cases

Today, you can do all sorts of Machine Learning using Apache Beam (https://beam.apache.org/documentation/ml/overview/).

Many of our users, however, have a hard time getting started with ML and understanding how Beam can be applied to their day to day work. The goal of this project is to build out a series of Beam pipelines as Jupyter Notebooks demonstrating real world ML use cases, from NLP to image recognition to using large language models. As you go, there may be bugs or friction points as well which will provide opportunities to contribute back to Beam's core ML libraries.

Mentor for this will be Danny McCormick

Difficulty: Major

Project size: ~350 hour (large)

Potential mentors:

Pablo Estrada, mail: pabloem (at) apache.org

Project Devs, mail: dev (at) beam.apache.org

[GSoC][Beam] Advancing the Rust SDK on Beam

Beam has an experimental, ongoing implementation for a Rust SDK.

This project involves advancing that implementation and making sure it's compiant with Beam standards.

Good resource materials:

https://lists.apache.org/thread/xg9xq0btp8k1wh2v1gpqyfhwpsyxq4ds

This project is large.

Difficulty: Major

Project size: ~350 hour (large)

Potential mentors:

Pablo Estrada, mail: pabloem (at) apache.org

Project Devs, mail: dev (at) beam.apache.org

[GSoC][Beam] Advancing the Beam-on-Ray runner

There is a community effort to build a Beam runner to run Beam pipelines on top of Ray: https://github.com/ray-project/ray_beam_runner/

This involves pushing that project forward. It will require writing lots of Python code, and specifically going through the list of issues (https://github.com/ray-project/ray_beam_runner/issues) and solving as many of them as possible to make sure the runner is compliant.

Good resource docs:

This project is large.

Apache ShardingSphere Enhance SQLNodeConverterEngine to support more MySQL SQL statements

Apache ShardingSphere

Apache ShardingSphere is positioned as a Database Plus, and aims at building a standard layer and ecosystem above heterogeneous databases. It focuses on how to reuse existing databases and their respective upper layer, rather than creating a new database. The goal is to minimize or eliminate the challenges caused by underlying databases fragmentation.

Page: https://shardingsphere.apache.org
Github: https://github.com/apache/shardingsphere

Background

The ShardingSphere SQL federation engine provides support for complex SQL statements, and it can well support cross-database join queries, subqueries, aggregation queries and other statements. An important part of SQL federation engine is to convert the SQL statement parsed by ShardingSphere into SqlNode, so that Calcite can be used to implement SQL optimization and federated query.

Task

This issue is to solve the MySQL exception that occurs during SQLNodeConverterEngine conversion. The specific case list is as follows.

select_char
select_extract
select_from_dual
select_from_with_table
select_group_by_with_having_and_window
select_not_between_with_single_table
select_not_in_with_single_table
select_substring
select_trim
select_weight_string
select_where_with_bit_expr_with_ampersand
select_where_with_bit_expr_with_caret
select_where_with_bit_expr_with_div
select_where_with_bit_expr_with_minus_interval
select_where_with_bit_expr_with_mod
select_where_with_bit_expr_with_mod_sign
select_where_with_bit_expr_with_plus_interval
select_where_with_bit_expr_with_signed_left_shift
select_where_with_bit_expr_with_signed_right_shift
select_where_with_bit_expr_with_vertical_bar
select_where_with_boolean_primary_with_comparison_subquery
select_where_with_boolean_primary_with_is
select_where_with_boolean_primary_with_is_not
select_where_with_boolean_primary_with_null_safe
select_where_with_expr_with_and_sign
select_where_with_expr_with_is
select_where_with_expr_with_is_not
select_where_with_expr_with_not
select_where_with_expr_with_not_sign
select_where_with_expr_with_or_sign
select_where_with_expr_with_xor
select_where_with_predicate_with_in_subquery
select_where_with_predicate_with_regexp
select_where_with_predicate_with_sounds_like
select_where_with_simple_expr_with_collate
select_where_with_simple_expr_with_match
select_where_with_simple_expr_with_not
select_where_with_simple_expr_with_odbc_escape_syntax
select_where_with_simple_expr_with_row
select_where_with_simple_expr_with_tilde
select_where_with_simple_expr_with_variable
select_window_function
select_with_assignment_operator
select_with_assignment_operator_and_keyword
select_with_case_expression
select_with_collate_with_marker
select_with_date_format_function
select_with_exists_sub_query_with_project
select_with_function_name
select_with_json_value_return_type
select_with_match_against
select_with_regexp
select_with_schema_name_in_column_projection
select_with_schema_name_in_shorthand_projection
select_with_spatial_function
select_with_trim_expr
select_with_trim_expr_from_expr

You need to compare the difference between actual and expected, and then correct the logic in SQLNodeConverterEngine so that actual can be consistent with expected.

After you make changes, remember to add case to SUPPORTED_SQL_CASE_IDS to ensure it can be tested.

Notice, these issues can be a good example.
https://github.com/apache/shardingsphere/pull/14492

Relevant Skills

1. Master JAVA language

2. Have a basic understanding of Antlr g4 file

3. Be familiar with MySQL and Calcite SqlNode

Targets files

SQLNodeConverterEngineIT

https://github.com/apache/shardingsphere/blob/master/test/it/optimizer/src/test/java/org/apache/shardingsphere/test/it/optimize/SQLNodeConverterEngineIT.java

Mentor

Zhengqiang Duan, PMC of Apache ShardingSphere, duanzhengqiang@apache.org

Chuxin Chen, Committer of Apache ShardingSphere, tuichenchuxin@apache.org^{Image Removed}

Trista Pan, PMC of Apache ShardingSphere, panjuan@apache.org

Difficulty: Major

Project size: ~350 hour (large)

Potential mentors:

Zhengqiang Duan, mail: duanzhengqiang (at) apache.org

Project Devs, mail: dev (at) shardingsphere.apache.org

(large)

Potential mentors:

Pablo Estrada, mail: pabloem (at) apache.org

Project Devs, mail: dev (at) beam.apache.org

[GSoC][Beam] An IntelliJ plugin to develop Apache Beam pipelines and the Apache Beam SDKs

Beam library developers and Beam users would appreciate this : )

This project involves prototyping a few different solutions, so it will be large.

Difficulty: Major

Project size: ~350 hour (large)

Potential mentors:

Pablo Estrada, mail: pabloem (at) apache.org

Project Devs, mail: dev (at) beam.apache.org

Doris

[GSoC][Doris]Page Cache Improvement

Apache Doris
Apache Doris is a real-time analytical database based on MPP architecture. As a unified platform that supports multiple data processing scenarios, it ensures high performance for low-latency and high-throughput queries, allows for easy federated queries on data lakes, and supports various data ingestion methods.
Page: https://doris.apache.org

Github: https://github.com/apache/doris

Background

Apache Doris accelerates high-concurrency queries utilizing page cache, where the decompressed data is stored.
Currently, the page cache in Apache Doris uses a simple LRU algorithm, which reveals a few problems:

Hot data will be phased out in large queries
The page cache configuration is immutable and does not support GC.

Task

Phase One: Identify the impacts on queries when the decompressed data is stored in memory and SSD, respectively, and then determine whether full page cache is required.

Phase Two: Improve the cache strategy for Apache Doris based on the results from Phase One.

Learning Material

Page: https://doris.apache.org
Github:

Apache ShardingSphere Enhance ComputeNode reconciliation

Apache ShardingSphere

Apache ShardingSphere is positioned as a Database Plus, and aims at building a standard layer and ecosystem above heterogeneous databases. It focuses on how to reuse existing databases and their respective upper layer, rather than creating a new database. The goal is to minimize or eliminate the challenges caused by underlying databases fragmentation.

Page: https://shardingsphere.apache.org/
Github: https://github.com/apache/shardingsphere

Background

There is a proposal about new CRD Cluster and ComputeNode as belows:

Currently we try to promote ComputeNode as major CRD to represent a special ShardingSphere Proxy deployment. And plan to use Cluster indicating a special ShardingSphere Proxy cluster.

Task

This issue is to enhance ComputeNode reconciliation availability. The specific case list is as follows.

Add IT test case for Deployment spec volume
Add IT test case for Deployment spec template init containers
Add IT test case for Deployment spec template spec containers
Add IT test case for Deployment spec volume mounts
Add IT test case for Deployment spec container ports
Add IT test case for Deployment spec container image tag
Add IT test case for Service spec ports
Add IT test case for ConfigMap data serverconfig
Add IT test case for ConfigMap data logback

Notice, these issues can be a good example.
chore: add more Ginkgo tests for ComputeNode #203

Relevant Skills

Master Go language, Ginkgo test framework
Have a basic understanding of Apache ShardingSphere Concepts
Be familiar with Kubernetes Operator, kubebuilder framework

Targets files

ComputeNode IT - https://github.com/apache/shardingsphere-on-cloud/blob/main/shardingsphere-operator/pkg/reconcile/computenode/compute_node_test.godoris

Mentor

Liyao Miao, Committer of Apache ShardingSphere, miaoliyao@apache.org^{Image Removed}

Chuxin Chen, Committer of Apache ShardingSphere, tuichenchuxin@apache.org^{Image Removed}

Mentor: Yongqiang Yang, Apache Doris PMC member & Committer, yangyongqiang@apache.org ^{Image Added}
Mentor: Haopeng Li, Apache Doris PMC member & Committer, lihaopeng@apache.org^{Image Added}
Mailing List: dev@doris.apache.org

Difficulty: Major

Project size: ~350 hour (large)

Potential mentors:

Chuxin ChenZhijing Lu, mail: tuichenchuxin luzhijing (at) apache.org

Project Devs, mail: dev (at) shardingsphere.apache.org

Apache ShardingSphere Add ShardingSphere Kafka source connector

doris.apache.org

[GSoC][Doris] Supports BigQuery/Apache Kudu/Apache Cassandra/Apache Druid in Federated Queries

Apache Doris
Apache Doris is a real-time analytical database based on MPP architecture. As a unified platform that supports multiple data processing scenarios, it ensures high performance for low-latency and high-throughput queries, allows for easy federated queries on data lakes, and supports various data ingestion methods.
Page: https://doris

Apache ShardingSphere

Apache ShardingSphere is positioned as a Database Plus, and aims at building a standard layer and ecosystem above heterogeneous databases. It focuses on how to reuse existing databases and their respective upper layer, rather than creating a new database. The goal is to minimize or eliminate the challenges caused by underlying databases fragmentation.

Page: https://shardingsphere.apache.org
Github: https://github.com/apache/shardingsphere

Background

The community just added CDC (change data capture) feature recently. Change feed will be published in created network connection after logging in, then it could be consumed.

Since Kafka is popular distributed event streaming platform, it's useful to import change feed into Kafka for later processing.

Task

Familiar with ShardingSphere CDC client usage, create publication and subscribe change feed.
Familiar with Kafka connector development, develop source connector, integrate with ShardingSphere CDC. Persist change feed to Kafka topics properly.
Add unit test and E2E integration test.

Relevant Skills

1. Java language

2. Basic knowledge of CDC and Kafka

3. Maven

References

https://github.com/apache/shardingsphere/issues/22500

https://kafka.apache.org/documentation/#connect_development

https://github.com/apache/kafka/tree/trunk/connect/file/src

/github.com/apache/doris

Background

Apache Doris supports acceleration of queries on external data sources to meet users' needs for federated queries and analysis.
Currently, Apache Doris supports multiple external catalogs including those from Hive, Iceberg, Hudi, and JDBC. Developers can connect more data sources to Apache Doris based on a unified framework.

Objective

Enable Apache Doris to access one or more of these data sources via the Multi-Catalog feature: BigQuery/Kudu/Cassandra/Druid;
Compile relevant documentation. See an example here: https://doris.apache.org/docs/dev/lakehouse/multi-catalog/hive

Task
Phase One:

Get familiar with the Multi-Catalog structure of Apache Doris, including the metadata synchronization mechanism in FE and the data reading mechanism of BE.
Investigate how metadata should be acquired and how data access works regarding the picked data source(s); produce the corresponding design documentation.

Phase Two:

Develop connections to the picked data source(s) and implement access to metadata and data.

Learning Material

Page: https://doris.apache.org
Github: https://github.com/

confluentinc/kafka-connect-jdbc

apache/doris

Mentor

Hongsheng Zhong, PMC of Apache ShardingSphere, zhonghongsheng@apache.org

Mentor: Mingyu Chen, Apache Doris PMC Member & Committer, morningman@apache.org ^{Image Added}
Mentor: Calvin Kirs, Apache Geode PMC & Committer, Kirs@apache.org^{Image Added}
Mailing List: dev@doris.apache

Xinze Guo, Committer of Apache ShardingSphere, azexin@apache

.org

Difficulty: Major

Project size: ~350 hour (large)

Potential mentors:

Hongsheng ZhongZhijing Lu, mail: zhonghongsheng luzhijing (at) apache.org

Project Devs, mail: dev (at) shardingspheredoris.apache.org

...

[GSoC][

Beam] Build out Beam Machine Learning Use Cases

Today, you can do all sorts of Machine Learning using Apache Beam (https://beam.apache.org/documentation/ml/overview/).

Many of our users, however, have a hard time getting started with ML and understanding how Beam can be applied to their day to day work. The goal of this project is to build out a series of Beam pipelines as Jupyter Notebooks demonstrating real world ML use cases, from NLP to image recognition to using large language models. As you go, there may be bugs or friction points as well which will provide opportunities to contribute back to Beam's core ML libraries.

Mentor for this will be Danny McCormick

Doris]Dictionary Encoding Acceleration

Apache Doris
Apache Doris is a real-time analytical database based on MPP architecture. As a unified platform that supports multiple data processing scenarios, it ensures high performance for low-latency and high-throughput queries, allows for easy federated queries on data lakes, and supports various data ingestion methods.
Page: https://doris.apache.org

Github: https://github.com/apache/doris

Background

In Apache Doris, dictionary encoding is performed during data writing and compaction. Dictionary encoding will be implemented on string data types by default. The dictionary size of a column for one segment is 1M at most. The dictionary encoding technology accelerates strings during queries, converting them into INT, for example.

Task

Phase One: Get familiar with the implementation of Apache Doris dictionary encoding; learning how Apache Doris dictionary encoding accelerates queries.
Phase Two: Evaluate the effectiveness of full dictionary encoding and figure out how to optimize memory in such a case.

Learning Material

Page: https://doris.apache.org
Github: https://github.com/apache/doris

Mentor

Mentor: Chen Zhang, Apache Doris Committer, zhangchen@apache.org ^{Image Added}
Mentor: Zhijing Lu, Apache Doris Committer, luzhijing@apache.org^{Image Added}
Mailing List: dev@doris.apache.org

Difficulty: Major

Project size: ~350 hour (large)

Potential mentors:

Pablo Estrada, mail: pabloem (at) apache.org

Project Devs, mail: dev (at) beam.apache.org

[GSoC][Beam] An IntelliJ plugin to develop Apache Beam pipelines and the Apache Beam SDKs

Beam library developers and Beam users would appreciate this : )

This project involves prototyping a few different solutions, so it will be large.

Difficulty: Major

Project size: ~350 hour (large)

Potential mentors:

Pablo EstradaZhijing Lu, mail: pabloem luzhijing (at) apache.org

Project Devs, mail: dev (at) beamdoris.apache.org

CloudStack

CloudStack GSoC 2023 - Autodetect IPs used inside the VM

Github issue: https://github.com/apache/cloudstack/issues/7142

Description:

With regards to IP info reporting, Cloudstack relies entirely on it's DHCP data bases and so on. When this is not available (L2 networks etc) no IP information is shown for a given VM.

I propose we introduce a mechanism for "IP autodetection" and try to discover the IPs used inside the machines by means of querying the hypervisors. For example with KVM/libvirt we can simply do something like this:

{{root@fedora35 ~]# virsh domifaddr win2k22 --source agent
Name MAC address Protocol Address
-------------------------------------------------------------------------------
Ethernet 52:54:00:7b:23:6a ipv4 192.168.0.68/24
Loopback Pseudo-Interface 1 ipv6 ::1/128

- ipv4 127.0.0.1/8}}
The above command queries the qemu-guest-agent inside the Windows VM. The VM needs to have the qemu-guest-agent installed and running as well as the virtio serial drivers (easily done in this case with virtio-win-guest-tools.exe ) as well as a guest-agent socket channel defined in libvirt.

Once we have this information we could display it in the UI/API as "Autodetected VM IPs" or something like that.

I imagine it's very similar for VMWare and XCP-ng.

Thank you

Difficulty: Major

Project size: ~175 hour (medium)

Potential mentors:

Nicolás Vázquez, mail: nvazquez (at) apache.org

Project Devs, mail: dev (at) cloudstack.apache.org

...

Space shortcuts

Child pages

Page History

Versions Compared

Key

Why ?

Benefits for the student

Inventory...

Why ?

Benefits for the student

Inventory...

Commons Statistics

Commons

Numbers

Commons Numbers

RocketMQ

[GSoC] RocketMQ TieredStore Integration with HDFS

Apache RocketMQ and HDFS

Apache RocketMQ is a distributed cloud native messaging and streaming platform with low latency, high performance and reliability, trillion-level capacity, and flexible scalability.Page: https://rocketmq.apache.orgRepo: https://github.com/apache/rocketmq

Background

Task

Relevant Skills

Background

Relevant Skills

Tasks

Learning Material

RocketMQ Streams

Background

Task

Mentor

Name and contact information

Task

Relevant Skills

Mentor

Apache RocketMQ

Background

Task

Relevant Skills

Mentor

Apache RocketMQ

Apache RocketMQ is a distributed messaging and streaming platform with low latency, high performance and reliability, trillion-level capacity and flexible scalability.Page: https://rocketmq.apache.org

Background

Task

Mentor

Apache RocketMQ

Background

Task

Relevant Skills

RocketMQ Streams

Background

Background

Task

Mentor

Mentor

Apache RocketMQ

[GSoC] RocketMQ TieredStore Integration with HDFS

Apache RocketMQ and HDFS

Background

Relevant Skills

Tasks

Learning Material

Name and contact information

Background

Task

Relevant Skills

Mentor

Apache RocketMQ

Background

Task

Relevant Skills

Mentor

Doris

EventMesh

Background

Task

Learning Material

Mentor

Background

Objective

Learning Material

Apache RocketMQ is a
distributed
cloud native messaging and streaming platform
with low latency, high performance and reliability, trillion-level capacity, and flexible scalability.
Page: https://rocketmq.apache.org
Repo: https://github.com/apache/rocketmq

Apache RocketMQ is a distributed messaging and streaming platform with low latency, high performance and reliability, trillion-level capacity and flexible scalability.
Page: https://rocketmq.apache.org