This page is auto-generated! Please do NOT edit it, all changes will be lost on next update
Contents
James Server
Adopt Pulsar as the messaging technology backing the distributed James server
https://www.mail-archive.com/server-dev@james.apache.org/msg71462.html
A good long term objective for the PMC is to drop RabbitMQ in
favor of pulsar (third parties could package their own components using
RabbitMQ if they wishes...)
This means:
- Solve the bugs that were found during the Pulsar MailQueue review
- Pulsar MailQueue need to allow listing blobs in order to be
deduplication friendly. - Provide an event bus based on Pulsar
- Provide a task manager based on Pulsar
- Package a distributed server backed by pulsar, deprecate then replace
the current one. - (optionally) support mail queue priorities
While contributions would of course be welcomed on this topic, we could
offer it as part of GSOC 2022, and we could co-mentor it with mentors of
the Pulsar community (see [3])
[3] https://lists.apache.org/thread/y9s7f6hmh51ky30l20yx0dlz458gw259
Would such a plan gain traction around here ?
TrafficControl
GSOC Varnish Cache support in Apache Traffic Control
Background
Apache Traffic Control is a Content Delivery Network (CDN) control plane for large scale content distribution.
Traffic Control currently requires Apache Traffic Server as the underlying cache. Help us expand the scope by integrating with the very popular Varnish Cache.
There are multiple aspects to this project:
- Configuration Generation: Write software to build Varnish configuration files (VCL). This code will be implemented in our Traffic Ops and cache client side utilities, both written in Go.
- Health Monitoring: Implement monitoring of the Varnish cache health and performance. This code will run both in the Traffic Monitor component and within Varnish. Traffic Monitor is written in Go and Varnish is written in C.
- Testing: Adding automated tests for new code
Skills:
- Proficiency in Go is required
- A basic knowledge of HTTP and caching is preferred, but not required for this project.
ShardingSphere
Apache ShardingSphere Support mainstream database metadata table query
Apache ShardingSphere
Apache ShardingSphere is positioned as a Database Plus, and aims at building a standard layer and ecosystem above heterogeneous databases. It focuses on how to reuse existing databases and their respective upper layer, rather than creating a new database. The goal is to minimize or eliminate the challenges caused by underlying databases fragmentation.
Page: https://shardingsphere.apache.org
Github: https://github.com/apache/shardingsphere
Background
ShardingSphere has designed its own metadata database to simulate metadata queries that support various databases.
More details:
https://github.com/apache/shardingsphere/issues/21268
https://github.com/apache/shardingsphere/issues/22052
Task
- Support PostgreSQL And openGauss `\d tableName`
- Support PostgreSQL And openGauss `\d+`
- Support PostgreSQL And openGauss `\d+ tableName`
- Support PostgreSQL And openGauss `l`
- Support query for MySQL metadata `TABLES`
- Support query for MySQL metadata `COLUMNS`
- Support query for MySQL metadata `schemata`
- Support query for MySQL metadata `ENGINES`
- Support query for MySQL metadata `FILES`
- Support query for MySQL metadata `VIEWS`
Notice, these issues can be a good example.
https://github.com/apache/shardingsphere/pull/22053
https://github.com/apache/shardingsphere/pull/22057/
https://github.com/apache/shardingsphere/pull/22166/
https://github.com/apache/shardingsphere/pull/22182
Relevant Skills
- Master JAVA language
- Have a basic understanding of Zookeeper
- Be familiar with MySQL/Postgres SQLs
Mentor
Chuxin Chen, Committer of Apache ShardingSphere, tuichenchuxin@apache.org
Zhengqiang Duan, PMC of Apache ShardingSphere, duanzhengqiang@apache.org
Apache ShardingSphere Add the feature of switching logging framework
Apache ShardingSphere
Apache ShardingSphere is positioned as a Database Plus, and aims at building a standard layer and ecosystem above heterogeneous databases. It focuses on how to reuse existing databases and their respective upper layer, rather than creating a new database. The goal is to minimize or eliminate the challenges caused by underlying databases fragmentation.
Page: https://shardingsphere.apache.org
Github: https://github.com/apache/shardingsphere
Background
ShardingSphere provides two adapters: ShardingSphere-JDBC and ShardingSphere-Proxy.
Now, ShardingSphere uses logback for logging, but consider the following situations:
- Users may need to switch the logging framework to meet special needs, such as log4j2 can provide better asynchronous performance;
- When using the JDBC adapter, the user application may not use logback, which may cause some conflicts.
Why doesn't the log facade suffice? Because ShardingSphere provides users with clustered logging configurations (such as changing the log level online), this requires dynamic construction of logger, which cannot be achieved with only the log facade.
Task
1. Design and implement logging SPI to support multiple logging frameworks (such as logback and log4j2)
2. Allow users to choose which logging framework to use through the logging rule
Relevant Skills
1. Master JAVA language
2. Basic knowledge of logback and log4j2
3. Maven
Mentor
Longtao Jiang, Committer of Apache ShardingSphere, jianglongtao@apache.org
Trista Pan, PMC of Apache ShardingSphere, panjuan@apache.org
Apache ShardingSphere Add ShardingSphere Kafka source connector
Apache ShardingSphere
Apache ShardingSphere is positioned as a Database Plus, and aims at building a standard layer and ecosystem above heterogeneous databases. It focuses on how to reuse existing databases and their respective upper layer, rather than creating a new database. The goal is to minimize or eliminate the challenges caused by underlying databases fragmentation.
Page: https://shardingsphere.apache.org
Github: https://github.com/apache/shardingsphere
Background
The community just added CDC (change data capture) feature recently. Change feed will be published in created network connection after logging in, then it could be consumed.
Since Kafka is popular distributed event streaming platform, it's useful to import change feed into Kafka for later processing.
Task
- Familiar with ShardingSphere CDC client usage, create publication and subscribe change feed.
- Familiar with Kafka connector development, develop source connector, integrate with ShardingSphere CDC. Persist change feed to Kafka topics properly.
- Add unit test and E2E integration test.
Relevant Skills
1. Java language
2. Basic knowledge of CDC and Kafka
3. Maven
References
- https://github.com/apache/shardingsphere/issues/22500
- https://kafka.apache.org/documentation/#connect_development
- https://github.com/apache/kafka/tree/trunk/connect/file/src
- https://github.com/confluentinc/kafka-connect-jdbc
Mentor
Hongsheng Zhong, PMC of Apache ShardingSphere, zhonghongsheng@apache.org
Xinze Guo, Committer of Apache ShardingSphere, azexin@apache.org
Apache ShardingSphere Enhance ComputeNode reconciliation
Apache ShardingSphere
Apache ShardingSphere is positioned as a Database Plus, and aims at building a standard layer and ecosystem above heterogeneous databases. It focuses on how to reuse existing databases and their respective upper layer, rather than creating a new database. The goal is to minimize or eliminate the challenges caused by underlying databases fragmentation.
Page: https://shardingsphere.apache.org/
Github: https://github.com/apache/shardingsphere
Background
There is a proposal about new CRD Cluster and ComputeNode as belows:
- WIP: [New Feature] Introduce new CRD Cluster #167
- [Feat] Introduce new CRD as ComputeNode for better usability #166
Currently we try to promote ComputeNode as major CRD to represent a special ShardingSphere Proxy deployment. And plan to use Cluster indicating a special ShardingSphere Proxy cluster.
Task
This issue is to enhance ComputeNode reconciliation availability. The specific case list is as follows.
- Add IT test case for Deployment spec volume
- Add IT test case for Deployment spec template init containers
- Add IT test case for Deployment spec template spec containers
- Add IT test case for Deployment spec volume mounts
- Add IT test case for Deployment spec container ports
- Add IT test case for Deployment spec container image tag
- Add IT test case for Service spec ports
- Add IT test case for ConfigMap data serverconfig
- Add IT test case for ConfigMap data logback
Notice, these issues can be a good example. - chore: add more Ginkgo tests for ComputeNode #203
Relevant Skills
- Master Go language, Ginkgo test framework
- Have a basic understanding of Apache ShardingSphere Concepts
- Be familiar with Kubernetes Operator, kubebuilder framework
Targets files
ComputeNode IT - https://github.com/apache/shardingsphere-on-cloud/blob/main/shardingsphere-operator/pkg/reconcile/computenode/compute_node_test.go
Mentor
Liyao Miao, Committer of Apache ShardingSphere, miaoliyao@apache.org
Chuxin Chen, Committer of Apache ShardingSphere, tuichenchuxin@apache.org
Apache ShardingSphere Enhance SQLNodeConverterEngine to support more MySQL SQL statements
Apache ShardingSphere
Apache ShardingSphere is positioned as a Database Plus, and aims at building a standard layer and ecosystem above heterogeneous databases. It focuses on how to reuse existing databases and their respective upper layer, rather than creating a new database. The goal is to minimize or eliminate the challenges caused by underlying databases fragmentation.
Page: https://shardingsphere.apache.org
Github: https://github.com/apache/shardingsphere
Background
The ShardingSphere SQL federation engine provides support for complex SQL statements, and it can well support cross-database join queries, subqueries, aggregation queries and other statements. An important part of SQL federation engine is to convert the SQL statement parsed by ShardingSphere into SqlNode, so that Calcite can be used to implement SQL optimization and federated query.
Task
This issue is to solve the MySQL exception that occurs during SQLNodeConverterEngine conversion. The specific case list is as follows.
- select_char
- select_extract
- select_from_dual
- select_from_with_table
- select_group_by_with_having_and_window
- select_not_between_with_single_table
- select_not_in_with_single_table
- select_substring
- select_trim
- select_weight_string
- select_where_with_bit_expr_with_ampersand
- select_where_with_bit_expr_with_caret
- select_where_with_bit_expr_with_div
- select_where_with_bit_expr_with_minus_interval
- select_where_with_bit_expr_with_mod
- select_where_with_bit_expr_with_mod_sign
- select_where_with_bit_expr_with_plus_interval
- select_where_with_bit_expr_with_signed_left_shift
- select_where_with_bit_expr_with_signed_right_shift
- select_where_with_bit_expr_with_vertical_bar
- select_where_with_boolean_primary_with_comparison_subquery
- select_where_with_boolean_primary_with_is
- select_where_with_boolean_primary_with_is_not
- select_where_with_boolean_primary_with_null_safe
- select_where_with_expr_with_and_sign
- select_where_with_expr_with_is
- select_where_with_expr_with_is_not
- select_where_with_expr_with_not
- select_where_with_expr_with_not_sign
- select_where_with_expr_with_or_sign
- select_where_with_expr_with_xor
- select_where_with_predicate_with_in_subquery
- select_where_with_predicate_with_regexp
- select_where_with_predicate_with_sounds_like
- select_where_with_simple_expr_with_collate
- select_where_with_simple_expr_with_match
- select_where_with_simple_expr_with_not
- select_where_with_simple_expr_with_odbc_escape_syntax
- select_where_with_simple_expr_with_row
- select_where_with_simple_expr_with_tilde
- select_where_with_simple_expr_with_variable
- select_window_function
- select_with_assignment_operator
- select_with_assignment_operator_and_keyword
- select_with_case_expression
- select_with_collate_with_marker
- select_with_date_format_function
- select_with_exists_sub_query_with_project
- select_with_function_name
- select_with_json_value_return_type
- select_with_match_against
- select_with_regexp
- select_with_schema_name_in_column_projection
- select_with_schema_name_in_shorthand_projection
- select_with_spatial_function
- select_with_trim_expr
- select_with_trim_expr_from_expr
You need to compare the difference between actual and expected, and then correct the logic in SQLNodeConverterEngine so that actual can be consistent with expected.
After you make changes, remember to add case to SUPPORTED_SQL_CASE_IDS to ensure it can be tested.
Notice, these issues can be a good example.
https://github.com/apache/shardingsphere/pull/14492
Relevant Skills
1. Master JAVA language
2. Have a basic understanding of Antlr g4 file
3. Be familiar with MySQL and Calcite SqlNode
Targets files
SQLNodeConverterEngineIT
Mentor
Zhengqiang Duan, PMC of Apache ShardingSphere, duanzhengqiang@apache.org
Chuxin Chen, Committer of Apache ShardingSphere, tuichenchuxin@apache.org
Trista Pan, PMC of Apache ShardingSphere, panjuan@apache.org
StreamPipes
Code Insights for Apache StreamPipes
Apache StreamPipes
Apache StreamPipes (incubating) is a self-service (Industrial) IoT toolbox to enable non-technical users to connect, analyze and explore IoT data streams. StreamPipes offers several modules including StreamPipes Connect to easily connect data from industrial IoT sources, the Pipeline Editor to quickly create processing pipelines and several visualization modules for live and historic data exploration. Under the hood, StreamPipes utilizes an event-driven microservice paradigm of standalone, so-called analytics microservices making the system easy to extend for individual needs.
Background
StreamPipes has grown significantly throughout recent years. We were able to introduce a lot of new features and attracted both users and contributors. Putting the cherry on the cake, we were graduated as an Apache top level project in December 2022. We will of course continue developing new features and never rest to make StreamPipes even more amazing. Although, since we are approaching with full stream towards our `1.0` release, we want to project also to get more mature. Therefore, we want to address one of our Achilles' heels: our test coverage.
Don't worry, this issue is not about implementing myriads of tests for our code base. As a first step, we would like to make the status quo transparent. That means we want to measure our code coverage consistently across the whole codebase (Backend, UI, Python library) and report the coverage to codecov. Furthermore, to benchmark ourselves and motivate us to provide tests with every contributing, we would like to lock the current test coverage as an lower threshold that we always want to achieve (meaning in case we drop CI builds fail etc). With time we then can increase the required coverage lever step to step.
More than monitoring our test coverage, we also want to invest in better and more clean code. Therefore, we would like to adopt sonarcloud for our repository.
Tasks
- [ ] calculate test coverage for all main parts of the repo
- [ ] send coverage to codeCov
- [ ] determine coverage threshold and let CI fail if below
- [ ] include sonarcloud in CI setup
- [ ] include automatic coverage report in PR validation (see an example here ) -> optional
- [ ] include automatic sonarcloud report in PR validation -> optional
- [ ] what ever comes in your mind 💡 further ideas are always welcome
❗Important Note❗
Do not create any account in behalf of Apache StreamPipes in Sonarcloud or in CodeCov or using the name of Apache StreamPipes for any account creation. Your mentor will take care of it.
Relevant Skills
- basic knowledge about GitHub worfklows
Learning Material
- GitHub workflow docs
- Apache StreamPipes workflows
- Sonarcloud for Monorepos
- Using code cov for a monorepo: https://www.curtiscode.dev/post/tools/codecov-monorepo/ & https://docs.codecov.com/docs/flags
References
You can find our corresponding issue on GitHub here
Name and Contact Information
Name: Tim Bossenmaier
email: bossenti[at]apache.org
community: dev[at]streampipes.apache.org
website: https://streampipes.apache.org/
Improving End-to-End Test Infrastructure of Apache StreamPipes
Apache StreamPipes
Apache StreamPipes (incubating) is a self-service (Industrial) IoT toolbox to enable non-technical users to connect, analyze and explore IoT data streams. StreamPipes offers several modules including StreamPipes Connect to easily connect data from industrial IoT sources, the Pipeline Editor to quickly create processing pipelines and several visualization modules for live and historic data exploration. Under the hood, StreamPipes utilizes an event-driven microservice paradigm of standalone, so-called analytics microservices making the system easy to extend for individual needs.
Background
StreamPipes has grown significantly over the past few years, with new features and contributors joining the project. However, as the project continues to evolve, e2e test coverage must also be improved to ensure that all features remain functional. Modern frameworks, such as Cypress, make it quite easy and fun to automatically test even complex application functionalities. As StreamPipes approaches its 1.0 release, it is important to improve e2e testing to ensure the robustness of the project and its use in real-world scenarios.
Tasks
- [ ] Write e2e tests using Cypress to cover most functionalities and user interface components of StreamPipes.
- [ ] Add more complex testing scenarios to ensure the reliability and robustness of StreamPipes in real-world use cases (e.g. automated tests for version updates)
- [ ] Add e2e tests for the new Python client to ensure its integration with the main system and its functionalities ([#774 | https://github.com/apache/streampipes/issues/774]])
- [ ] Document the testing infrastructure and the testing approach to allow for easy maintenance and future contributions.
❗ ***Important Note*** ❗
Do not create any account on behalf of Apache StreamPipes in Cypress or using the name of Apache StreamPipes for any account creation. Your mentor will take care of it.
Relevant Skills
- Familiarity with testing frameworks, such as Cypress or Selenium
- Experience with TypeScript or Java
- Basic knowledge of Angular is helpful
- Familiarity with Docker and containerization is a plus
Learning Material
References
You can find our corresponding issue on GitHub here
Name and Contact Information
Name: Philipp Zehnder
email: zehnder[at]apache.org
community: dev[at]streampipes.apache.org
website: https://streampipes.apache.org/
OPC-UA browser for Apache StreamPipes
Apache StreamPipes
Apache StreamPipes (incubating) is a self-service (Industrial) IoT toolbox to enable non-technical users to connect, analyze and explore IoT data streams. StreamPipes offers several modules including StreamPipes Connect to easily connect data from industrial IoT sources, the Pipeline Editor to quickly create processing pipelines and several visualization modules for live and historic data exploration. Under the hood, StreamPipes utilizes an event-driven microservice paradigm of standalone, so-called analytics microservices making the system easy to extend for individual needs.
Background
StreamPipes is grown significantly throughout recent years. We were able to introduce a lot of new features and attracted both users and contributors. Putting the cherry on the cake, we were graduated as an Apache top level project in December 2022. We will of course continue developing new features and never rest to make StreamPipes even more amazing.
StreamPipes really shines when connecting Industrial IoT data. Such data sources typically originate from machine controllers, called PLCs (e.g., Siemens S7). But there are also new protocols such as OPC-UA which allow to browse available data within the controller. Our goal is to make connectivity of industrial data sources a matter of minutes.
Currently, data sources can be connected using the built-in module `StreamPipes Connect` from the UI. We provide a set of adapters for popular protocols that can be customized, e.g., connection details can be added.
To make it even easier to connect industrial data sources with StreamPipes, we plan to add an OPC-UA browser. This will be part of the entry page of StreamPipes connect and should allow users to enter connection details of an existing OPC-UA server. Afterwards, a new view in the UI shows available data nodes from the server, their status and current value. Users should be able to select values that should be part of a new adapter. Afterwards, a new adapter can be created by reusing the current workflow to create an OPC-UA data source.
This is a really cool project for participants interested in full-stack development who would like to get a deeper understanding of industrial IoT protocols. Have fun!
Tasks
- [ ] get familiar with the OPC-UA protocol
- [ ] develop mockups which demonstrate the user workflow
- [ ] develop a data model for discovering data from OPC-UA
- [ ] create the backend business logic for the OPC-UA browser
- [ ] create the frontend views to asynchronously browse data and to create a new adapter
- [ ] write Junit, Component and E2E tests
- [ ] what ever comes in your mind 💡 further ideas are always welcome
Relevant Skills
- interest in Industrial IoT and procotols such as OPC-UA
- Java development skills
- Angular/Typescript development skills
Anyways, the most important relevant skill is motivation and readiness to learn during the project!
Learning Material
- StreamPipes documentation (https://streampipes.apache.org/docs/docs/user-guide-introduction.html)
- [ur current OPC-UA adapter (https://github.com/apache/streampipes/tree/dev/streampipes-extensions/streampipes-connect-adapters-iiot/src/main/java/org/apache/streampipes/connect/iiot/adapters/opcua)
- Eclipse Milo, which we currently use for OPC-UA connectivity (https://github.com/eclipse/milo)
- Apache PLC4X, which has an API for browsing (https://plc4x.apache.org/)
Reference
Github issue can be found here: https://github.com/apache/streampipes/issues/1390
Name and contact information
- Mentor: Dominik Riemer (riemer[at]apache.org).
- Mailing list: (dev[at]streampipes.apache.org)
- Website: streampipes.apache.org
RocketMQ
GSoC Implement python client for RocketMQ 5.0
Apache RocketMQ
Apache RocketMQ is a distributed messaging and streaming platform with low latency, high performance and reliability, trillion-level capacity and flexible scalability.
Page: https://rocketmq.apache.org
Background
RocketMQ 5.0 has released various language clients including Java, CPP, and Golang, to cover all major programming languages, a Python client needs to be implemented.
Related Repo: https://github.com/apache/rocketmq-clients
Task
The developer is required to be familiar with the Java implementation and capable of developing a Python client, while ensuring consistent functionality and semantics.
Relevant Skills
Python language
Basic knowledge of RocketMQ 5.0
Mentor
Yangkun Ai, PMC of Apache RocketMQ, aaronai@apache.org
GSoC Integrate RocketMQ 5.0 client with Spring
Apache RocketMQ
Apache RocketMQ is a distributed messaging and streaming platform with low latency, high performance and reliability, trillion-level capacity and flexible scalability.
Page: https://rocketmq.apache.org
Github: https://github.com/apache/rocketmq
Background
RocketMQ 5.0 client has been released recently, we need to integrate it with Spring.
Related issue: https://github.com/apache/rocketmq-clients/issues/275
Task
- Familiar with RocketMQ 5.0 java client usage, you could see more details from https://github.com/apache/rocketmq-clients/tree/master/java and https://rocketmq.apache.org/docs/quickStart/01quickstart
- Integrate with Spring.
Relevant Skills
- Java language
- Basic knowledge of RocketMQ 5.0
- Spring
Mentor
Rongtong Jin, PMC of Apache RocketMQ, jinrongtong@apache.org
Yangkun Ai, PMC of Apache RocketMQ, aaronai@apache.org
GSoC Make RocketMQ support higher versions of Java
Apache RocketMQ
Apache RocketMQ is a distributed messaging and streaming platform with low latency, high performance and reliability, trillion-level capacity and flexible scalability.
Page: https://rocketmq.apache.org
Github: https://github.com/apache/rocketmq
Background
RocketMQ is a widely used message middleware system in the Java community, which mainly supports Java8. As Java has evolved many new features and improvements have been added to the language and the Java Virtual Machine (JVM). However, RocketMQ still lacks compatibility with the latest Java versions, preventing users from taking advantage of new features and performance improvements. Therefore, we are seeking community support to upgrade RocketMQ to support higher versions of Java and enable the use of new features and JVM parameters.
Task
We aim to update the RocketMQ codebase to support newer versions of Java in a cross-compile manner. The goal is to enable RocketMQ to work with Java17, while maintaining backward compatibility with previous versions of Java. This will involve identifying and updating any dependencies that need to be changed to support the new Java versions, as well as testing and verifying that the new version of RocketMQ works correctly. With these updates, users will be able to take advantage of the latest Java features and performance improvements. We hope that the community can come together to support this task and make RocketMQ a more versatile and powerful middleware system.
Relevant Skills
- Java language
- Having a good understanding of the new features in higher versions of Java, particularly LTS versions.
Mentor
Yangkun Ai, PMC of Apache RocketMQ, aaronai@apache.org
[GSoC] [RocketMQ] The performance tuning of RocketMQ proxy
Apache RocketMQ
Apache RocketMQ is a distributed messaging and streaming platform with low latency, high performance and reliability, trillion-level capacity, and flexible scalability.
Page: https://rocketmq.apache.org
Repo: https://github.com/apache/rocketmq
Background
RocketMQ 5.0 has released a new module called `proxy`, which supports gRPC and remoting protocol. Additionally, it can be deployed in two modes, namely Local and Cluster modes. The performance tuning task will provide contributors with a comprehensive understanding of Apache RocketMQ and its intricate data flow, presenting a unique opportunity for beginners to acquaint themselves with and actively participate in our community.
Task
The task is to tune RocketMQ proxy for optimal performance involves latency and throughput. possess a thorough knowledge of Java implementation and possess the ability to fine-tune Netty, gRPC, the operating system, and RocketMQ itself. We anticipate that the developer responsible for this task will provide a performance report about measurements of both latency and throughput.
Relevant Skills
Basic knowledge of RocketMQ 5.0, Netty, gRPC, and operating system.
Mailing List: dev@rocketmq.apache.org
Mentor
Zhouxiang Zhan, committer of Apache RocketMQ, zhouxzhan@apache.org
RocketMQ TieredStore Integration with High Availability Architecture
Apache RocketMQ{}
Apache RocketMQ is a distributed messaging and streaming platform with low latency, high performance and reliability, trillion-level capacity and flexible scalability.
Page: https://rocketmq.apache.org
Background
With the official release of RocketMQ 5.1.0, tiered storage has arrived as a new independent module in the Technical Preview milestone. This allows users to unload messages from local disks to other cheaper storage, extending message retention time at a lower cost.
Reference RIP-57: https://github.com/apache/rocketmq/wiki/RIP-57-Tiered-storage-for-RocketMQ
In addition, RocketMQ introduced a new high availability architecture in version 5.0.
Reference RIP-44: https://github.com/apache/rocketmq/wiki/RIP-44-Support-DLedger-Controller
However, currently RocketMQ tiered storage only supports single replicas.
Task
Currently, tiered storage only supports single replicas, and there are still the following issues in the integration with the high availability architecture:
- Metadata synchronization: how to reliably synchronize metadata between master and slave nodes.
- Disallowing message uploads beyond the confirm offset: to avoid message rollback, the maximum uploaded offset cannot exceed the confirm offset.
- Starting multi-tier storage upload when the slave changes to master, and stopping tiered storage upload when the master becomes the slave: only the master node has write and delete permissions, and after the slave node is promoted, it needs to quickly resume tiered storage breakpoint resumption.
- Design of slave pull protocol: how a newly launched empty slave can properly synchronize data through the tiered storage architecture. (If synchronization is performed based on the first or last file, resumption of breakpoints may not be possible when switching again).
So you need to provide a complete plan to solve the above issues and ultimately complete the integration of tiered storage and high availability architecture, while verifying it through the existing tiered storage file version and OpenChaos testing.
Relevant Skills
- Interest in messaging middleware and distributed storage systems
- Java development skills
- Having a good understanding of RocketMQ tiered storage and high availability architecture
SkyWalking
[GSOC] [SkyWalking] AIOps Log clustering with Flink (Algorithm Optimization)
Apache SkyWalking is an application performance monitor tool for distributed systems, especially designed for microservices, cloud native and container-based (Kubernetes) architectures. This year we will proceed on log clustering implementation with a revised architecture and this task will require student to focus on algorithm optimiztion for the clustering technique.
[GSOC] [SkyWalking] AIOps Log clustering with Flink (Flink Integration)
Apache SkyWalking is an application performance monitor tool for distributed systems, especially designed for microservices, cloud native and container-based (Kubernetes) architectures. This year we will proceed on log clustering implementation with a revised architecture and this task will require student to focus on Flink and its integration with SkyWalking OAP.
[GSOC] [SkyWalking] Python Agent Performance Enhancement Plan
Apache SkyWalking is an application performance monitor tool for distributed systems, especially designed for microservices, cloud native and container-based (Kubernetes) architectures. This task is about enhancing Python agent performance, the tracking issue can be seen here -< https://github.com/apache/skywalking/issues/10408
[GSOC] [SkyWalking] Pending Task on K8s
Apache SkyWalking is an application performance monitor tool for distributed systems, especially designed for microservices, cloud native and container-based (Kubernetes) architectures. This task is about a pending task on K8s.
[SkyWalking] Add Terraform provider for Apache SkyWalking
Now the deployment methods for SkyWalking are limited, we only have Helm Chart for users to deploy in Kubernetes, other users that are not using Kubernetes have to do all the house keeping stuffs to set up SkyWalking on, for example, VM.
This issue aims to add a Terraform provider, so that users can conveniently spin up a cluster for demonstration or testing, we should evolve the provider and allow users to customize as their need and finally users can use this in their production environment.
In this task, we will mainly focus on the support for AWS. In the Terraform provider, users need to provide their access key / secret key, and the provider does the rest stuffs: create VMs, create database/OpenSearch or RDS, download SkyWalking tars, configure the SkyWalking, and start the SkyWalking components (OAP/UI), create public IPs/domain name, etc.
Doris
[GSoC][Doris]Dictionary Encoding Acceleration
Apache Doris
Apache Doris is a real-time analytical database based on MPP architecture. As a unified platform that supports multiple data processing scenarios, it ensures high performance for low-latency and high-throughput queries, allows for easy federated queries on data lakes, and supports various data ingestion methods.
Page: https://doris.apache.org
Github: https://github.com/apache/doris
Background
In Apache Doris, dictionary encoding is performed during data writing and compaction. Dictionary encoding will be implemented on string data types by default. The dictionary size of a column for one segment is 1M at most. The dictionary encoding technology accelerates strings during queries, converting them into INT, for example.
Task
- Phase One: Get familiar with the implementation of Apache Doris dictionary encoding; learning how Apache Doris dictionary encoding accelerates queries.
- Phase Two: Evaluate the effectiveness of full dictionary encoding and figure out how to optimize memory in such a case.
Learning Material
Page: https://doris.apache.org
Github: https://github.com/apache/doris
Mentor
- Mentor: Chen Zhang, Apache Doris Committer, zhangchen@apache.org
- Mentor: Zhijing Lu, Apache Doris Committer, luzhijing@apache.org
- Mailing List: dev@doris.apache.org
[GSoC][Doris] Supports BigQuery/Apache Kudu/Apache Cassandra/Apache Druid in Federated Queries
Apache Doris
Apache Doris is a real-time analytical database based on MPP architecture. As a unified platform that supports multiple data processing scenarios, it ensures high performance for low-latency and high-throughput queries, allows for easy federated queries on data lakes, and supports various data ingestion methods.
Page: https://doris.apache.org
Github: https://github.com/apache/doris
Background
Apache Doris supports acceleration of queries on external data sources to meet users' needs for federated queries and analysis.
Currently, Apache Doris supports multiple external catalogs including those from Hive, Iceberg, Hudi, and JDBC. Developers can connect more data sources to Apache Doris based on a unified framework.
Objective
- Enable Apache Doris to access one or more of these data sources via the Multi-Catalog feature: BigQuery/Kudu/Cassandra/Druid;
- Compile relevant documentation. See an example here: https://doris.apache.org/docs/dev/lakehouse/multi-catalog/hive
Task
Phase One:
- Get familiar with the Multi-Catalog structure of Apache Doris, including the metadata synchronization mechanism in FE and the data reading mechanism of BE.
- Investigate how metadata should be acquired and how data access works regarding the picked data source(s); produce the corresponding design documentation.
Phase Two:
- Develop connections to the picked data source(s) and implement access to metadata and data.
Learning Material
Page: https://doris.apache.org
Github: https://github.com/apache/doris
Mentor
- Mentor: Mingyu Chen, Apache Doris PMC Member & Committer, morningman@apache.org
- Mentor: Calvin Kirs, Apache Geode PMC & Committer, Kirs@apache.org
- Mailing List: dev@doris.apache.org
[GSoC][Doris]Page Cache Improvement
Apache Doris
Apache Doris is a real-time analytical database based on MPP architecture. As a unified platform that supports multiple data processing scenarios, it ensures high performance for low-latency and high-throughput queries, allows for easy federated queries on data lakes, and supports various data ingestion methods.
Page: https://doris.apache.org
Github: https://github.com/apache/doris
Background
Apache Doris accelerates high-concurrency queries utilizing page cache, where the decompressed data is stored.
Currently, the page cache in Apache Doris uses a simple LRU algorithm, which reveals a few problems:
- Hot data will be phased out in large queries
- The page cache configuration is immutable and does not support GC.
Task
- Phase One: Identify the impacts on queries when the decompressed data is stored in memory and SSD, respectively, and then determine whether full page cache is required.
- Phase Two: Improve the cache strategy for Apache Doris based on the results from Phase One.
Learning Material
Page: https://doris.apache.org
Github: https://github.com/apache/doris
Mentor
- Mentor: Yongqiang Yang, Apache Doris PMC member & Committer, yangyongqiang@apache.org
- Mentor: Haopeng Li, Apache Doris PMC member & Committer, lihaopeng@apache.org
- Mailing List: dev@doris.apache.org
EventMesh
Apache EventMesh EventMesh official website dos by version and demo show
Apache EventMesh (incubating)
Apache EventMesh is a fully serverless platform used to build distributed event-driven applications.
Website: https://eventmesh.apache.org
GitHub: https://github.com/apache/incubator-eventmesh
Upstream Issue: https://github.com/apache/incubator-eventmesh/issues/3327
Background
We hope that the community can contribute to the maintenance of documents, including the archiving of Chinese and English content of documents of different release versions, the maintenance of official website documents, the improvement of project quick start documents, feature introduction, etc.
Task
1.Discuss with the mentors what you need to do
2. Learn the details of the Apache EventMesh project
3. Improve and supplement the content of documents on GitHub, maintain official website documents, record eventmesh quick user experience, and feature display videos
Recommended Skills
1.Familiar with MarkDown
2. Familiar with Java\Go
Mentor
Eason Chen, PPMC of Apache EventMesh, https://github.com/qqeasonchen, chenguangsheng@apache.org
Mike Xue, PPMC of Apache EventMesh, https://github.com/xwm1992, mikexue@apache.org
Apache EventMesh Integrate eventmesh runtime on Kubernetes
Apache EventMesh (incubating)
Apache EventMesh is a fully serverless platform used to build distributed event-driven applications.
Website: https://eventmesh.apache.org
GitHub: https://github.com/apache/incubator-eventmesh
Upstream Issue: https://github.com/apache/incubator-eventmesh/issues/3327
Background
Currently, EventMesh has good usability in microservice scenarios. However, EventMesh's support for Kubernetes is still relatively weak.We hope the community can contribute EventMesh integration with the k8s.
Task
1.Discuss with the mentors your implementation idea
2. Learn the details of the Apache EventMesh project
3. Integrate EventMesh with the k8s
Recommended Skills
1.Familiar with Java
2.Familiar with Kubernetes
Mentor
Eason Chen, PPMC of Apache EventMesh, https://github.com/qqeasonchen, chenguangsheng@apache.org
Mike Xue, PPMC of Apache EventMesh, https://github.com/xwm1992, mikexue@apache.org
ShenYu
Apache ShenYu Gsoc 2023 - Support for Kubernetes Service Discovery
Background
Apache ShenYu is a Java native API Gateway for service proxy, protocol conversion and API governance. Currently, ShenYu has good usability and performance in microservice scenarios. However, ShenYu's support for Kubernetes is still relatively weak.
Tasks
1. Support the registration of microservices deployed in K8s Pod to shenyu-admin and use K8s as the register center.
2. Discuss with mentors, and complete the requirements design and technical design of Shenyu K8s Register Center.
3. Complete the initial version of Shenyu K8s Register Center.
4. Complete the CI test of Shenyu K8s Register Center, verify the correctness of the code.
5. Write the necessary documentation, deployment guides, and instructions for users to connect microservices running inside the K8s Pod to ShenYu
Relevant Skills
1. Know the use of Apache ShenYu, especially the register center
2. Familiar with Java and Golang
3. Familiar with Kubernetes and can use Java or Golang to develop
Community Development
Add server indicator if a server is a cache
Apache Nemo
Dynamic Work Stealing on Nemo for handling skews
We aim to handle the problem on throttled resources (heterogeneous resources) and skewed input data. In order to solve this problem, we suggest dynamic work stealing that can dynamically track task statuses and steal workloads among each other. To do this, we have the following action items:
- Dynamically collecting task statistics during execution
- Detecting skewed tasks periodically
- Splitting the data allocated in skewed tasks and reallocating them into new tasks
- Synchronizing the optimization procedure
- Evaluation of the resulting implementations
Airavata
Airavata Jupyter Platform Services
- UI Framework
- To host the jupyter environment we will need to envolop the notebooks in a user interface and connect it with Apache Airavata services
- Leverage Airavata communications from within the Django Portal - https://github.com/apache/airavata-django-portal
- Explore if the platform is better to be developed as VSCode extensions leveraging jupyter extensions like - https://github.com/Microsoft/vscode-jupyter
- Alternatively, explore developing a standalone native application using ElectronJS
- Draft up a platform architecture - Airavata based infrastructure with functionality similar to collab.
- Authenticate with Airavata Custos Framework - https://github.com/apache/airavata-custos
- Extend Notebook filesystem using the virtual file system approaching integration with Airavata based storage and catalog
- Make the notebooks registered with Airavata app catalog and experiment catalog.
Advanced Possibilities:
Explore Multi-tenanted JupyterHub
- Can K8 namespace isolation accomplish?
- Make deployment of Jupyter support as part of the default core
- Data and the user-level tenancy can be assumed, how to make sure infrastructure can isolate them, like not one gateway crashing a hosting environment.
- How to leverage computational resources jupypter hub
Dashboards to get quick statistics
Gateway admins need period reports for various reporting and planning.
Features Include:
- Compute resources across that had at least one job submitted during the period <start date - End date>
- User groups created within a given period and how many users are in those and with permission levels and also number of jobs each user have submitted.
- List applications and number of jobs for each applications for a given period and group them by job status.
- Number of users that at least submitted a single job for the period <start date - End date>
- Total number of Unique Users
- User Registration Trends
- Number of experiments for a given period <Start date - End date> grouped by the experiment status
- The total cpu-hours used by a users, sorted, quarterly, plotted over a period of time
- The total cpu-hours consumed by application, sorted, quarterly, plotted over a period of time
Enhance File Transports in MFT
Complete all transports in MFT
- Currently SCP, S3 is known to work
- Others need effort to optimize, test, and declare readiness
- Develop a complete a fully functional MFT Command-line interface
- Have a feature-complete Python SDK
- A minimum implementation will be prvoided, students need to complete it and test it.
Custos Backup and Restore
Custos does not have the capabilities to efficiently backup and restore a live instance. This is essential for high available services.
Airavata Rich Client based on ElectronJS
Using SEAGrid Rich Client as an example, develop a native application based on electronJS to mimic Airavata Django Portal.
Reference example - https://github.com/SciGaP/seagrid-rich-client