...
Contents
- James Server
- Commons Statistics
- Commons Numbers
- Commons Math
- Commons Imaging
- RocketMQ
- EventMesh
- StreamPipes
- ShardingSphere
- Apache ShardingSphere Enhance SQLNodeConverterEngine to support more MySQL SQL statementsApache ShardingSphere Enhance ComputeNode reconciliation
- ShenYu
- TrafficControl
- Doris
- SkyWalking
- [GSOC] [SkyWalking] Python Agent Performance Enhancement Plan[SkyWalking] Build the OAP into GraalVM native image
- Beam
- Teaclave
- Airflow
- SeaTunnel
- Airflow
- CloudStack
- Apache Nemo
- Apache Gora
- Apache Fineract
- Apache Dubbo
- Dubbo GSoC 2023 - HTTP/3 Rest SupportDubbo GSoC 2023 - Automatically configure pixiu as istio ingress gateway
- Apache Commons All
- Airavata
...
Code Insights for Apache StreamPipes
Apache StreamPipes
Apache StreamPipes (incubating) is a self-service (Industrial) IoT toolbox to enable non-technical users to connect, analyze and explore IoT data streams. StreamPipes offers several modules including StreamPipes Connect to easily connect data from industrial IoT sources, the Pipeline Editor to quickly create processing pipelines and several visualization modules for live and historic data exploration. Under the hood, StreamPipes utilizes an event-driven microservice paradigm of standalone, so-called analytics microservices making the system easy to extend for individual needs.
Background
StreamPipes has grown significantly throughout recent years. We were able to introduce a lot of new features and attracted both users and contributors. Putting the cherry on the cake, we were graduated as an Apache top level project in December 2022. We will of course continue developing new features and never rest to make StreamPipes even more amazing. Although, since we are approaching with full stream towards our `1.0` release, we want to project also to get more mature. Therefore, we want to address one of our Achilles' heels: our test coverage.
Don't worry, this issue is not about implementing myriads of tests for our code base. As a first step, we would like to make the status quo transparent. That means we want to measure our code coverage consistently across the whole codebase (Backend, UI, Python library) and report the coverage to codecov. Furthermore, to benchmark ourselves and motivate us to provide tests with every contributing, we would like to lock the current test coverage as an lower threshold that we always want to achieve (meaning in case we drop CI builds fail etc). With time we then can increase the required coverage lever step to step.
More than monitoring our test coverage, we also want to invest in better and more clean code. Therefore, we would like to adopt sonarcloud for our repository.
Tasks
- [ ] calculate test coverage for all main parts of the repo
- [ ] send coverage to codeCov
- [ ] determine coverage threshold and let CI fail if below
- [ ] include sonarcloud in CI setup
- [ ] include automatic coverage report in PR validation (see an example here ) -> optional
- [ ] include automatic sonarcloud report in PR validation -> optional
- [ ] what ever comes in your mind 💡 further ideas are always welcome
❗Important Note❗
Do not create any account in behalf of Apache StreamPipes in Sonarcloud or in CodeCov or using the name of Apache StreamPipes for any account creation. Your mentor will take care of it.
Relevant Skills
- basic knowledge about GitHub worfklows
Learning Material
- GitHub workflow docs
- Apache StreamPipes workflows
- Sonarcloud for Monorepos
- Using code cov for a monorepo: https://www.curtiscode.dev/post/tools/codecov-monorepo/ & https://docs.codecov.com/docs/flags
References
You can find our corresponding issue on GitHub here
Name and Contact Information
Name: Tim Bossenmaier
email: bossenti[at]apache.org
community: dev[at]streampipes.apache.org
website: https://streampipes.apache.org/
ShardingSphere
Apache ShardingSphere Enhance
SQLNodeConverterEngine to support more MySQL SQL statementsComputeNode reconciliation
Apache ShardingSphere
Apache ShardingSphere is positioned as a Database Plus, and aims at building a standard layer and ecosystem above heterogeneous databases. It focuses on how to reuse existing databases and their respective upper layer, rather than creating a new database. The goal is to minimize or eliminate the challenges caused by underlying databases fragmentation.
Page: https://shardingsphere.apache.org/
Github: https://github.com/apache/shardingsphere shardingsphere
Background
There is a proposal about new CRD Cluster and ComputeNode as belows:
- WIP: [New Feature] Introduce new CRD Cluster #167
- [Feat] Introduce new CRD as ComputeNode for better usability #166
Currently we try to promote ComputeNode as major CRD to represent a special ShardingSphere Proxy deployment. And plan to use Cluster indicating a special ShardingSphere Proxy clusterThe ShardingSphere SQL federation engine provides support for complex SQL statements, and it can well support cross-database join queries, subqueries, aggregation queries and other statements. An important part of SQL federation engine is to convert the SQL statement parsed by ShardingSphere into SqlNode, so that Calcite can be used to implement SQL optimization and federated query.
Task
This issue is to solve the MySQL exception that occurs during SQLNodeConverterEngine conversionenhance ComputeNode reconciliation availability. The specific case list is as follows.
- select_char
- select_extract
- select_from_dual
- select_from_with_table
- select_group_by_with_having_and_window
- select_not_between_with_single_table
- select_not_in_with_single_table
- select_substring
- select_trim
- select_weight_string
- select_where_with_bit_expr_with_ampersand
- select_where_with_bit_expr_with_caret
- select_where_with_bit_expr_with_div
- select_where_with_bit_expr_with_minus_interval
- select_where_with_bit_expr_with_mod
- select_where_with_bit_expr_with_mod_sign
- select_where_with_bit_expr_with_plus_interval
- select_where_with_bit_expr_with_signed_left_shift
- select_where_with_bit_expr_with_signed_right_shift
- select_where_with_bit_expr_with_vertical_bar
- select_where_with_boolean_primary_with_comparison_subquery
- select_where_with_boolean_primary_with_is
- select_where_with_boolean_primary_with_is_not
- select_where_with_boolean_primary_with_null_safe
- select_where_with_expr_with_and_sign
- select_where_with_expr_with_is
- select_where_with_expr_with_is_not
- select_where_with_expr_with_not
- select_where_with_expr_with_not_sign
- select_where_with_expr_with_or_sign
- select_where_with_expr_with_xor
- select_where_with_predicate_with_in_subquery
- select_where_with_predicate_with_regexp
- select_where_with_predicate_with_sounds_like
- select_where_with_simple_expr_with_collate
- select_where_with_simple_expr_with_match
- select_where_with_simple_expr_with_not
- select_where_with_simple_expr_with_odbc_escape_syntax
- select_where_with_simple_expr_with_row
- select_where_with_simple_expr_with_tilde
- select_where_with_simple_expr_with_variable
- select_window_function
- select_with_assignment_operator
- select_with_assignment_operator_and_keyword
- select_with_case_expression
- select_with_collate_with_marker
- select_with_date_format_function
- select_with_exists_sub_query_with_project
- select_with_function_name
- select_with_json_value_return_type
- select_with_match_against
- select_with_regexp
- select_with_schema_name_in_column_projection
- select_with_schema_name_in_shorthand_projection
- select_with_spatial_function
- select_with_trim_expr
- select_with_trim_expr_from_expr
You need to compare the difference between actual and expected, and then correct the logic in SQLNodeConverterEngine so that actual can be consistent with expected.
- Add IT test case for Deployment spec volume
- Add IT test case for Deployment spec template init containers
- Add IT test case for Deployment spec template spec containers
- Add IT test case for Deployment spec volume mounts
- Add IT test case for Deployment spec container ports
- Add IT test case for Deployment spec container image tag
- Add IT test case for Service spec ports
- Add IT test case for ConfigMap data serverconfig
- Add IT test case for ConfigMap data logback
Notice, these issues can be a good example. - chore: add more Ginkgo tests for ComputeNode #203
Relevant Skills
- Master Go language, Ginkgo test framework
- Have a basic understanding of Apache ShardingSphere Concepts
- Be familiar with Kubernetes Operator, kubebuilder framework
Targets files
ComputeNode IT - https://github.com/apache/shardingsphere-on-cloud/blob/main/shardingsphere-operator/pkg/reconcile/computenode/compute_node_test.go
Mentor
Liyao Miao, Committer of Apache ShardingSphere, miaoliyao@apache.org
Chuxin Chen, Committer of Apache ShardingSphere, tuichenchuxin@apache.org
Apache ShardingSphere Add the feature of switching logging framework
Apache ShardingSphere
Apache ShardingSphere is positioned as a Database Plus, and aims at building a standard layer and ecosystem above heterogeneous databases. It focuses on how to reuse existing databases and their respective upper layer, rather than creating a new database. The goal is to minimize or eliminate the challenges caused by underlying databases fragmentation.
Page: https://shardingsphere.apache.org
Github: https://github.com/apache/shardingsphere
Background
ShardingSphere provides two adapters: ShardingSphere-JDBC and ShardingSphere-Proxy.
Now, ShardingSphere uses logback for logging, but consider the following situations:
- Users may need to switch the logging framework to meet special needs, such as log4j2 can provide better asynchronous performance;
- When using the JDBC adapter, the user application may not use logback, which may cause some conflicts.
Why doesn't the log facade suffice? Because ShardingSphere provides users with clustered logging configurations (such as changing the log level online), this requires dynamic construction of logger, which cannot be achieved with only the log facade.
Task
1. Design and implement logging SPI to support multiple logging frameworks (such as logback and log4j2)
2. Allow users to choose which logging framework to use through the logging rule
Relevant Skills
1. Master JAVA language
2. Basic knowledge of logback and log4j2
3. Maven
Mentor
Longtao Jiang, Committer of Apache ShardingSphere, jianglongtao@apache.org
Trista Pan, PMC of Apache ShardingSphere, panjuan@apache.org
Apache ShardingSphere Support mainstream database metadata table query
Apache ShardingSphere
Apache ShardingSphere is positioned as a Database Plus, and aims at building a standard layer and ecosystem above heterogeneous databases. It focuses on how to reuse existing databases and their respective upper layer, rather than creating a new database. The goal is to minimize or eliminate the challenges caused by underlying databases fragmentation.
Page: https://shardingsphere.apache.org
Github:
After you make changes, remember to add case to SUPPORTED_SQL_CASE_IDS to ensure it can be tested.
Notice, these issues can be a good example.
https://github.com/apache/shardingsphere/pull/14492
Relevant Skills
1. Master JAVA language
2. Have a basic understanding of Antlr g4 file
3. Be familiar with MySQL and Calcite SqlNode
Targets files
SQLNodeConverterEngineIT
Background
ShardingSphere has designed its own metadata database to simulate metadata queries that support various databases.
More details:
https://github.com/apache/shardingsphere/blob/master/test/it/optimizer/src/test/java/org/issues/21268
https://github.com/apache/shardingsphere/test/it/optimize/SQLNodeConverterEngineIT.java
Mentor
Zhengqiang Duan, PMC of Apache ShardingSphere, duanzhengqiang@apache.org
Chuxin Chen, Committer of Apache ShardingSphere, tuichenchuxin@apache.org
Trista Pan, PMC of Apache ShardingSphere, panjuan@apache.org
Task
- Support PostgreSQL And openGauss `\d tableName`
- Support PostgreSQL And openGauss `\d+`
- Support PostgreSQL And openGauss `\d+ tableName`
- Support PostgreSQL And openGauss `l`
- Support query for MySQL metadata `TABLES`
- Support query for MySQL metadata `COLUMNS`
- Support query for MySQL metadata `schemata`
- Support query for MySQL metadata `ENGINES`
- Support query for MySQL metadata `FILES`
- Support query for MySQL metadata `VIEWS`
Notice, these issues can be a good example.
https://github.com/apache/shardingsphere/pull/22053
https://github.com/apache/shardingsphere/pull/22057/
https://github.com/apache/shardingsphere/pull/22166/
https://github.com/apache/shardingsphere/pull/22182
Relevant Skills
- Master JAVA language
- Have a basic understanding of Zookeeper
- Be familiar with MySQL/Postgres SQLs
Mentor
Chuxin Chen, Committer of Apache ShardingSphere, tuichenchuxin@apache.org
Zhengqiang Duan, PMC of Apache ShardingSphere, duanzhengqiang@apache.org
Apache ShardingSphere
Enhance ComputeNode reconciliationAdd ShardingSphere Kafka source connector
Apache ShardingSphere
Apache ShardingSphere is positioned as a Database Plus, and aims at building a standard layer and ecosystem above heterogeneous databases. It focuses on how to reuse existing databases and their respective upper layer, rather than creating a new database. The goal is to minimize or eliminate the challenges caused by underlying databases fragmentation.
Page: https://shardingsphere.apache.org/
Github: https://github.com/apache/shardingsphere
Background
There is a proposal about new CRD Cluster and ComputeNode as belows:
- WIP: [New Feature] Introduce new CRD Cluster #167
- [Feat] Introduce new CRD as ComputeNode for better usability #166
Currently we try to promote ComputeNode as major CRD to represent a special ShardingSphere Proxy deployment. And plan to use Cluster indicating a special ShardingSphere Proxy cluster.
Task
This issue is to enhance ComputeNode reconciliation availability. The specific case list is as follows.
- Add IT test case for Deployment spec volume
- Add IT test case for Deployment spec template init containers
- Add IT test case for Deployment spec template spec containers
- Add IT test case for Deployment spec volume mounts
- Add IT test case for Deployment spec container ports
- Add IT test case for Deployment spec container image tag
- Add IT test case for Service spec ports
- Add IT test case for ConfigMap data serverconfig
- Add IT test case for ConfigMap data logback
Notice, these issues can be a good example. - chore: add more Ginkgo tests for ComputeNode #203
Relevant Skills
- Master Go language, Ginkgo test framework
- Have a basic understanding of Apache ShardingSphere Concepts
- Be familiar with Kubernetes Operator, kubebuilder framework
Targets files
ComputeNode IT -Background
The community just added CDC (change data capture) feature recently. Change feed will be published in created network connection after logging in, then it could be consumed.
Since Kafka is popular distributed event streaming platform, it's useful to import change feed into Kafka for later processing.
Task
- Familiar with ShardingSphere CDC client usage, create publication and subscribe change feed.
- Familiar with Kafka connector development, develop source connector, integrate with ShardingSphere CDC. Persist change feed to Kafka topics properly.
- Add unit test and E2E integration test.
Relevant Skills
- Java language
- Basic knowledge of CDC and Kafka
- Maven
References
-on-cloudblob/main/shardingsphere-operator/pkg/reconcile/computenode/compute_node_test.goMentor
Liyao Miao, Committer of Apache ShardingSphere, miaoliyao@apache.org
Chuxin Chen, Committer of Apache ShardingSphere, tuichenchuxin@apache.org
- issues/22500
- https://kafka.apache.org/documentation/#connect_development
- https://github.com/apache/kafka/tree/trunk/connect/file/src
- https://github.com/confluentinc/kafka-connect-jdbc
Local Test Steps
- Modify `conf/server.yaml`, uncomment `cdc-server-port: 33071` to enable CDC. (Refer to step 2)
- Configure proxy, refer to `Prerequisites` and `Procedure` in build to configure proxy (Newer version could be used too, current stable version is 5.3.1).
- Start proxy server, it'll start CDC server too.
- Download ShardingSphere source code from https://github.com/apache/shardingsphere , modify and run `org.apache.shardingsphere.data.pipeline.cdc.client.example.Bootstrap`. It'll print `records:` by default in `Bootstrap`.
- Execute some ISNERT/UPDATE/DELETE SQLs in proxy to generate change feed, and then check in `Bootstrap` console.
Mentor
Hongsheng Zhong, PMC of Apache ShardingSphere, zhonghongsheng@apache.org
Xinze Guo, Committer of Apache ShardingSphere, azexin@apache.org
Apache ShardingSphere Write a converter to generate DistSQL
Apache ShardingSphere
Apache ShardingSphere is positioned as a Database Plus, and aims at building a standard layer and ecosystem above heterogeneous databases. It focuses on how to reuse existing databases and their respective upper layer, rather than creating a new database. The goal is to minimize or eliminate the challenges caused by underlying databases fragmentation.
Page: https://shardingsphere.apache.org/
Github: https://github.com/apache/shardingsphere
Background
ShardingSphere provides two adapters: ShardingSphere-JDBC and ShardingSphere-Proxy.
Now, ShardingSphere uses logback for logging, but consider the following situations:
- Users may need to switch the logging framework to meet special needs, such as log4j2 can provide better asynchronous performance;
- When using the JDBC adapter, the user application may not use logback, which may cause some conflicts.
Why doesn't the log facade suffice? Because ShardingSphere provides users with clustered logging configurations (such as changing the log level online), this requires dynamic construction of logger, which cannot be achieved with only the log facade.
Task
1. Design and implement logging SPI to support multiple logging frameworks (such as logback and log4j2)2. Allow users to choose which logging framework to use through the logging rule
Currently we try to promote StorageNode as major CRD to represent a set of storage units for ShardingSphere.
Task
The elementary task is that the storage node controller could manage the lifecycle of a set of storage units, like PostgreSQL, in kubernetes.
We don't hope to create another wheel like pg-operator. So consider using a predefined parameter group to generate the target CRD.
- [ ] Generate DistSQL according to the Golang struct `EncryptionRule`
- [ ] Generate DistSQL according to the Golang struct `ShardingRule`
- [ ] Generate DistSQL according to the Golang struct `ReadWriteSplittingRule`
- [ ] Generate DistSQL according to the Golang struct `MaskRule`
- [ ] Generate DistSQL according to the Golang struct `ShadowRule`
Relevant Skills
1. Master JAVA Go language, Ginkgo test framework
2. Basic knowledge of logback and log4j2
3. Maven
Mentor
Longtao Jiang, Committer of Apache ShardingSphere, jianglongtao@apache.org
Trista Pan, PMC of Apache ShardingSphere, panjuan@apache.org
Have a basic understanding of Apache ShardingSphere Concepts and DistSQL
Targets files
DistSQL Converter - https://github.com/apache/shardingsphere-on-cloud/blob/main/shardingsphere-operator/pkg/distsql/converter.go, etc.
Example
A struct defined as below:
```golang
type EncryptRule struct{}
func (t EncryptRule) ToDistSQL() string {}
```
While invoking ToDistSQL() it will generate a DistSQL regarding a EncryptRule like:
```SQL
CREATE ENCRYPT RULE t_encrypt (....
```
References:
Mentor
Liyao Miao, Committer of Apache ShardingSphere, miaoliyao@apache.org
Chuxin Chen, Committer of Apache ShardingSphere, tuichenchuxin@apache.org
Apache ShardingSphere Introduce new CRD as StorageNode for better usability
Apache ShardingSphere Support mainstream database metadata table queryApache ShardingSphere
Apache ShardingSphere is positioned as a Database Plus, and aims at building a standard layer and ecosystem above heterogeneous databases. It focuses on how to reuse existing databases and their respective upper layer, rather than creating a new database. The goal is to minimize or eliminate the challenges caused by underlying databases fragmentation.
Page: https://shardingsphere.apache.org/
Github: https://github.com/apache/shardingsphere
Background
ShardingSphere has designed its own metadata database to simulate metadata queries that support various databases.
More details:
https://github.com/apache/shardingsphere/issues/21268
https://github.com/apache/shardingsphere/issues/22052
Task
- Support PostgreSQL And openGauss `\d tableName`
- Support PostgreSQL And openGauss `\d+`
- Support PostgreSQL And openGauss `\d+ tableName`
- Support PostgreSQL And openGauss `l`
- Support query for MySQL metadata `TABLES`
- Support query for MySQL metadata `COLUMNS`
- Support query for MySQL metadata `schemata`
- Support query for MySQL metadata `ENGINES`
- Support query for MySQL metadata `FILES`
- Support query for MySQL metadata `VIEWS`
Notice, these issues can be a good example.
There is a proposal about new CRD Cluster and ComputeNode as belows:
- #167
- #166
Currently we try to promote StorageNode as major CRD to represent a set of storage units for ShardingSphere.
Task
The elementary task is that the storage node controller could manage the lifecycle of a set of storage units, like PostgreSQL, in kubernetes.
We don't hope to create another wheel like pg-operator. So consider using a predefined parameter group to generate the target CRD.
- [ ] Create a PostgreSQL cluster while a StorageNode with pg parameters is created
- [ ] Update the PostgreSQL cluster while updated StorageNode
- [ ] Delete the PostgreSQL cluster while deleted StorageNode. Notice this may need a deletion strategy.
- [ ] Reconciling StorageNode according to the status of PostgreSQL cluster.
- [ ] The status of StorageNode would be consumed by common storage units related DistSQLs
Relevant Skills
1. Master Go language, Ginkgo test framework
2. Have a basic understanding of Apache ShardingSphere Concepts
3. Be familiar with Kubernetes Operator, kubebuilder framework
Targets files
StorageNode Controller - https://github.com/apache/shardingsphere/pull/22053
https://github.com/apache/shardingsphere-on-cloud/pull/22057/
https://github.com/apache/shardingsphere/pull/22166/
https://github.com/apache/shardingsphere/pull/22182
Relevant Skills
- Master JAVA language
- Have a basic understanding of Zookeeper
- Be familiar with MySQL/Postgres SQLs
Mentor
blob/main/shardingsphere-operator/pkg/controllers/storagenode_controller.go
Mentor
Liyao MiaoChuxin Chen, Committer of Apache ShardingSphere, tuichenchuxin@apache miaoliyao@apache.org
Zhengqiang DuanChuxin Chen, PMC Committer of Apache ShardingSphere, duanzhengqiang@apache tuichenchuxin@apache.org
Apache ShardingSphere
Add ShardingSphere Kafka source connectorIntroduce JVM chaos to ShardingSphere
Apache ShardingSphere
Apache ShardingSphere is positioned as a Database Plus, and aims at building a standard layer and ecosystem above heterogeneous databases. It focuses on how to reuse existing databases and their respective upper layer, rather than creating a new database. The goal is to minimize or eliminate the challenges caused by underlying databases fragmentation.
Page: https://shardingsphere.apache.org/
Github: https://github.com/apache/shardingsphere
Background
There is a proposal about the background of ChaosEngineering as belows:
Introduce ChaosEngineering for ShardingSphere #32
And we also proposed a generic controller for ShardingSphereChaos as belows:
The community just added CDC (change data capture) feature recently. Change feed will be published in created network connection after logging in, then it could be consumed.
Since Kafka is popular distributed event streaming platform, it's useful to import change feed into Kafka for later processing.
Task
- Familiar with ShardingSphere CDC client usage, create publication and subscribe change feed.
- Familiar with Kafka connector development, develop source connector, integrate with ShardingSphere CDC. Persist change feed to Kafka topics properly.
- Add unit test and E2E integration test.
Relevant Skills
- Java language
- Basic knowledge of CDC and Kafka
- Maven
References
[GSoC 2023] Introduce New CRD ShardingSphereChaos #272
The ShardingSphereChaos controller is aiming at different chaos tests. This JVMChaos is an important one.
Task
Write several scripts to implement different JVMChaos for main features of ShardingSphere. The specific case list is as follows.
- Add scripts injecting chaos to DataSharding
- Add scripts injecting chaos to ReadWritingSplitting
- Add scripts injecting chaos to DatabaseDiscovery
- Add scripts injecting chaos to Encryption
- Add scripts injecting chaos to Mask
- Add scripts injecting chaos to Shadow
Basically, these scripts will cause unexpected behaviour while executing the related. DistSQL.
Relevant Skills
- Master Go language, Ginkgo test framework
- Have a deep understanding of Apache ShardingSphere concepts and practices.
- JVM byte mechanisms like ByteMan, ByteBuddy.
Targets files
JVMChaos Scripts - https://github.com/apache/shardingsphere-on-cloud/
issues/22500Local Test Steps
- Modify `conf/server.yaml`, uncomment `cdc-server-port: 33071` to enable CDC. (Refer to step 2)
- Configure proxy, refer to `Prerequisites` and `Procedure` in build to configure proxy (Newer version could be used too, current stable version is 5.3.1).
- Start proxy server, it'll start CDC server too.
- Download ShardingSphere source code from https://github.com/apache/shardingsphere , modify and run `org.apache.shardingsphere.data.pipeline.cdc.client.example.Bootstrap`. It'll print `records:` by default in `Bootstrap`.
- Execute some ISNERT/UPDATE/DELETE SQLs in proxy to generate change feed, and then check in `Bootstrap` console.
Mentor
Hongsheng Zhong, PMC of Apache ShardingSphere, zhonghongsheng@apache.org
Xinze Guo, Committer of Apache ShardingSphere, azexin@apache.org
Mentor
Liyao Miao, Committer of Apache ShardingSphere, miaoliyao@apache.org
Chuxin Chen, Committer of Apache ShardingSphere, tuichenchuxin@apache.org
Apache ShardingSphere Introduce New CRD ShardingSphereChaos
Apache ShardingSphere
Apache ShardingSphere is
Apache ShardingSphere Write a converter to generate DistSQL
Apache ShardingSphere
Apache ShardingSphere is positioned as a Database Plus, and aims at building a standard layer and ecosystem above heterogeneous databases. It focuses on how to reuse existing databases and their respective upper layer, rather than creating a new database. The goal is to minimize or eliminate the challenges caused by underlying databases fragmentation.
Page: https://shardingsphere.apache.org/
Github: https://github.com/apache/shardingsphere
Background
Currently we try to promote StorageNode as major CRD to represent a set of storage units for ShardingSphere.
Task
The elementary task is that the storage node controller could manage the lifecycle of a set of storage units, like PostgreSQL, in kubernetes.
We don't hope to create another wheel like pg-operator. So consider using a predefined parameter group to generate the target CRD.
There is a proposal about the background of ChaosEngineering as belows:
The ShardingSphereChaos controller is aiming at different chaos tests.
Task
Propose a generic controller for ShardingSphereChaos, which reconcile CRD ShardingSphereChaos, prepare, execute and verify test.
- [ ] Support common ShardingSphere features, prepare test rules and dataset
- [ ] Generating chaos type according to the backend implementation
- [ ] Verify testing result with DistSQL or other tools
- [ ] Generate DistSQL according to the Golang struct `EncryptionRule`
- [ ] Generate DistSQL according to the Golang struct `ShardingRule`
- [ ] Generate DistSQL according to the Golang struct `ReadWriteSplittingRule`
- [ ] Generate DistSQL according to the Golang struct `MaskRule`
- [ ] Generate DistSQL according to the Golang struct `ShadowRule`
Relevant Skills
1. Master Go language, Ginkgo test framework
2. Have a basic deep understanding of Apache ShardingSphere Concepts and DistSQLconcepts and practices.
3. Kubernetes operator pattern, kube-builder
Targets files
DistSQL Converter ShardingSphereChaos Controller - https://github.com/apache/shardingsphere-on-cloud/blob/main/shardingsphere-operator/pkg/distsqlcontrollers/converterchaos_controller.go, etc.
Example
A struct defined as below:
```golang
type EncryptRule struct{}
func (t EncryptRule) ToDistSQL() string {}
```
While invoking ToDistSQL() it will generate a DistSQL regarding a EncryptRule like:
```SQL
CREATE ENCRYPT RULE t_encrypt (....
```
References:
Mentor
Liyao Miao, Committer of Apache ShardingSphere, miaoliyao@apache.orgMentor
Liyao Miao, Committer of Apache ShardingSphere, miaoliyao@apache.org
Chuxin Chen, Committer of Apache ShardingSphere, tuichenchuxin@apache.org
Apache ShardingSphere
Introduce new CRD as StorageNode for better usabilityEnhance SQLNodeConverterEngine to support more MySQL SQL statements
Apache ShardingSphere
Apache ShardingSphere is positioned as a Database Plus, and aims at building a standard layer and ecosystem above heterogeneous databases. It focuses on how to reuse existing databases and their respective upper layer, rather than creating a new database. The goal is to minimize or eliminate the challenges caused by underlying databases fragmentation.
Page: https://shardingsphere.apache.org/
Github: https://github.com/apache/shardingsphere
Background
There is a proposal about new CRD Cluster and ComputeNode as belows:
- #167
- #166
Currently we try to promote StorageNode as major CRD to represent a set of storage units for ShardingSphere.
Task
The elementary task is that the storage node controller could manage the lifecycle of a set of storage units, like PostgreSQL, in kubernetes.
We don't hope to create another wheel like pg-operator. So consider using a predefined parameter group to generate the target CRD.
- [ ] Create a PostgreSQL cluster while a StorageNode with pg parameters is created
- [ ] Update the PostgreSQL cluster while updated StorageNode
- [ ] Delete the PostgreSQL cluster while deleted StorageNode. Notice this may need a deletion strategy.
- [ ] Reconciling StorageNode according to the status of PostgreSQL cluster.
- [ ] The status of StorageNode would be consumed by common storage units related DistSQLs
Relevant Skills
1. Master Go language, Ginkgo test framework
2. Have a basic understanding of Apache ShardingSphere Concepts
3. Be familiar with Kubernetes Operator, kubebuilder framework
Targets files
StorageNode Controller - https://github.com/apache/shardingsphere-on-cloud/blob/main/shardingsphere-operator/pkg/controllers/storagenode_controller.go
Mentor
Liyao Miao, Committer of Apache ShardingSphere, miaoliyao@apache.org
Chuxin Chen, Committer of Apache ShardingSphere, tuichenchuxin@apache.org
Apache ShardingSphere Introduce JVM chaos to ShardingSphere
Apache ShardingSphere
Apache ShardingSphere is positioned as a Database Plus, and aims at building a standard layer and ecosystem above heterogeneous databases. It focuses on how to reuse existing databases and their respective upper layer, rather than creating a new database. The goal is to minimize or eliminate the challenges caused by underlying databases fragmentation.
Page: https://shardingsphere.apache.org/
Github: https://github.com/apache/shardingsphere
Background
There is a proposal about the background of ChaosEngineering as belows:
Introduce ChaosEngineering for ShardingSphere #32
And we also proposed a generic controller for ShardingSphereChaos as belows:
[GSoC 2023] Introduce New CRD ShardingSphereChaos #272
The ShardingSphereChaos controller is aiming at different chaos tests. This JVMChaos is an important one.
Task
Write several scripts to implement different JVMChaos for main features of ShardingSphere. The specific case list is as follows.
- Add scripts injecting chaos to DataSharding
- Add scripts injecting chaos to ReadWritingSplitting
- Add scripts injecting chaos to DatabaseDiscovery
- Add scripts injecting chaos to Encryption
- Add scripts injecting chaos to Mask
- Add scripts injecting chaos to Shadow
Basically, these scripts will cause unexpected behaviour while executing the related. DistSQL.
Relevant Skills
- Master Go language, Ginkgo test framework
- Have a deep understanding of Apache ShardingSphere concepts and practices.
- JVM byte mechanisms like ByteMan, ByteBuddy.
Targets files
JVMChaos Scripts - https://github.com/apache/shardingsphere-on-cloud/chaos/jvmchaos/scripts/
Mentor
Liyao Miao, Committer of Apache ShardingSphere, miaoliyao@apache.org
Chuxin Chen, Committer of Apache ShardingSphere, tuichenchuxin@apache.org
The ShardingSphere SQL federation engine provides support for complex SQL statements, and it can well support cross-database join queries, subqueries, aggregation queries and other statements. An important part of SQL federation engine is to convert the SQL statement parsed by ShardingSphere into SqlNode, so that Calcite can be used to implement SQL optimization and federated query.
Task
This issue is to solve the MySQL exception that occurs during SQLNodeConverterEngine conversion. The specific case list is as follows.
- select_char
- select_extract
- select_from_dual
- select_from_with_table
- select_group_by_with_having_and_window
- select_not_between_with_single_table
- select_not_in_with_single_table
- select_substring
- select_trim
- select_weight_string
- select_where_with_bit_expr_with_ampersand
- select_where_with_bit_expr_with_caret
- select_where_with_bit_expr_with_div
- select_where_with_bit_expr_with_minus_interval
- select_where_with_bit_expr_with_mod
- select_where_with_bit_expr_with_mod_sign
- select_where_with_bit_expr_with_plus_interval
- select_where_with_bit_expr_with_signed_left_shift
- select_where_with_bit_expr_with_signed_right_shift
- select_where_with_bit_expr_with_vertical_bar
- select_where_with_boolean_primary_with_comparison_subquery
- select_where_with_boolean_primary_with_is
- select_where_with_boolean_primary_with_is_not
- select_where_with_boolean_primary_with_null_safe
- select_where_with_expr_with_and_sign
- select_where_with_expr_with_is
- select_where_with_expr_with_is_not
- select_where_with_expr_with_not
- select_where_with_expr_with_not_sign
- select_where_with_expr_with_or_sign
- select_where_with_expr_with_xor
- select_where_with_predicate_with_in_subquery
- select_where_with_predicate_with_regexp
- select_where_with_predicate_with_sounds_like
- select_where_with_simple_expr_with_collate
- select_where_with_simple_expr_with_match
You need to compare the difference between actual and expected, and then correct the logic in SQLNodeConverterEngine so that actual can be consistent with expected.
After you make changes, remember to add case to SUPPORTED_SQL_CASE_IDS to ensure it can be tested.
Notice, these issues can be a good example.
https://github.com/apache/shardingsphere/pull/14492
Relevant Skills
1. Master JAVA language
2. Have a basic understanding of Antlr g4 file
3. Be familiar with MySQL and Calcite SqlNode
Targets files
SQLNodeConverterEngineIT
Apache ShardingSphere Introduce New CRD ShardingSphereChaos
Apache ShardingSphere
Apache ShardingSphere is positioned as a Database Plus, and aims at building a standard layer and ecosystem above heterogeneous databases. It focuses on how to reuse existing databases and their respective upper layer, rather than creating a new database. The goal is to minimize or eliminate the challenges caused by underlying databases fragmentation.
Page: https://shardingsphere.apache.org/
Github: https://github.com/apache/shardingsphere
Background
There is a proposal about the background of ChaosEngineering as belows:
The ShardingSphereChaos controller is aiming at different chaos tests.
Task
Propose a generic controller for ShardingSphereChaos, which reconcile CRD ShardingSphereChaos, prepare, execute and verify test.
- [ ] Support common ShardingSphere features, prepare test rules and dataset
- [ ] Generating chaos type according to the backend implementation
- [ ] Verify testing result with DistSQL or other tools
Relevant Skills
1. Master Go language, Ginkgo test framework
2. Have a deep understanding of Apache ShardingSphere concepts and practices.
3. Kubernetes operator pattern, kube-builder
Targets files
ShardingSphereChaos Controller - https://github.com/apache/shardingsphere-on-cloud/shardingsphere-operator/pkg/controllers/chaos_controller.go, etc.
Mentor
Mentor
Zhengqiang Duan, PMC of Apache ShardingSphere, duanzhengqiang@apache.orgLiyao Miao, Committer of Apache ShardingSphere, miaoliyao@apache.org
Chuxin Chen, Committer of Apache ShardingSphere, tuichenchuxin@apache tuichenchuxin@apache.org
Trista Pan, PMC of Apache ShardingSphere, panjuan@apache.org
...
[GSOC][SkyWalking] Add Terraform provider for Apache SkyWalking
Now the deployment methods for SkyWalking are limited, we only have Helm Chart for users to deploy in Kubernetes, other users that are not using Kubernetes have to do all the house keeping stuffs to set up SkyWalking on, for example, VM.
This issue aims to add a Terraform provider, so that users can conveniently spin up a cluster for demonstration or testing, we should evolve the provider and allow users to customize as their need and finally users can use this in their production environment.
In this task, we will mainly focus on the support for AWS. In the Terraform provider, users need to provide their access key / secret key, and the provider does the rest stuffs: create VMs, create database/OpenSearch or RDS, download SkyWalking tars, configure the SkyWalking, and start the SkyWalking components (OAP/UI), create public IPs/domain name, etc.domain name, etc.
[SkyWalking] Build the OAP into GraalVM native image
Currently skywalking OAP is bundled as a tar ball when releasing, and the start time is long, we are looking for a way to distribute the binary executable in a more convenient way and speed up the bootstrap time. Now we found that GraalVM is a good fit not only it can solve the two aforementioned points but also it will bring benefits that, we can rewrite our LAL or even MAL system in the future with a more secure and isolated method, wasm, which is supported GraalVM too!
so this task is to adjust OAP, build it into GraalVM and make all tests in OAP passed.
...
[GSOC] [SkyWalking] AIOps Log clustering with Flink (Flink Integration)
Apache SkyWalking is an application performance monitor tool for distributed systems, especially designed for microservices, cloud native and container-based (Kubernetes) architectures. This year we will proceed on log clustering implementation with a revised architecture and this task will require student to focus on Flink and its integration with SkyWalking OAP.
Mentor
- Mentor: Yanlong He, Apache SkyWalking PMC, heyanlong@apache.org
- Co-mentor: Yihao Chen (Superskyyy), Apache SkyWalking PMC, yihaochen@apache.org
- Mailing List: dev@skywalking.apache.org
(Flink Integration)
Apache SkyWalking is an application performance monitor tool for distributed systems, especially designed for microservices, cloud native and container-based (Kubernetes) architectures. This task is about enhancing Python agent performance, the tracking issue can be seen here -< https://github.com/apache/skywalking/issues/10408year we will proceed on log clustering implementation with a revised architecture and this task will require student to focus on Flink and its integration with SkyWalking OAP.
Mentor
- Mentor: Yihao Chen (Superskyyy)Yanlong He, Apache SkyWalking PMC, yihaochen@apacheheyanlong@apache.orgMentor: Zhenxu Ke
- Co-mentor: Yihao Chen (Superskyyy), Apache SkyWalking PMC, kezhenxu94@apacheyihaochen@apache.org
- Mailing List: dev@skywalking.apache.org
[GSOC] [SkyWalking]
Build the OAP into GraalVM native imagePython Agent Performance Enhancement Plan
Apache SkyWalking is an application performance monitor tool for distributed systems, especially designed for microservices, cloud native and container-based (Kubernetes) architectures. This task is about enhancing Python agent performance, the tracking issue can be seen here -< https://github.com/apache/skywalking/issues/10408
Mentor
- Mentor: Yihao Chen (Superskyyy), Apache SkyWalking PMC, yihaochen@apache.org
- Mentor: Zhenxu Ke, Apache SkyWalking PMC, kezhenxu94@apache.org
- Mailing List: dev@skywalking.apache.org
Currently skywalking OAP is bundled as a tar ball when releasing, and the start time is long, we are looking for a way to distribute the binary executable in a more convenient way and speed up the bootstrap time. Now we found that GraalVM is a good fit not only it can solve the two aforementioned points but also it will bring benefits that, we can rewrite our LAL or even MAL system in the future with a more secure and isolated method, wasm, which is supported GraalVM too!
so this task is to adjust OAP, build it into GraalVM and make all tests in OAP passed....
[GSoC][Teaclave (incubating)] Data Privacy Policy Definition and Function Verification
Background
The Apache Teaclave (incubating) is a cutting-edge solution for confidential computing, providing Function-as-a-Service (FaaS) capabilities that enable the decoupling of data and function providers. Despite its impressive functionality and security features, Teaclave currently lacks a mechanism for data providers to enforce policies on the data they upload. For example, data providers may wish to restrict access to certain columns of data for third-party function providers. Open Policy Agent (OPA) offers flexible control over service behavior and has been widely adopted by the cloud-native community. If Teaclave were to integrate OPA, data providers could apply policies to their data, enhancing Teaclave’s functionality. Another potential security loophole in Teaclave is the absence of a means to verify the expected behavior of a function. This gap leaves the system vulnerable to exploitation by malicious actors. Fortunately, most of Teaclave’s interfaces can be reused, with the exception of the function uploading phase, which may require an overhaul to address this issue. Overall, the integration of OPA and the addition of a function verification mechanism would make Teaclave an even more robust and secure solution for confidential computing.
Benefits
If this proposal moves on smoothly, new functionality will be added to the Teaclave project that enables the verification of the function behavior that it strictly conforms to a prescribed policy.
Deliverables
- Milestones: Basic policies (e.g., addition, subtraction) of the data can be verified by Teaclave; Complex policies can be verified.
- Components: Verifier for the function code; Policy language adapters (adapt policy language to verifier); Policy language parser; Function source code converter (append policies to the functions).
- Documentation: The internal working mechanism of the verification; How to write policies for the data.
Timeline Estimation
- 0.5 month: Policy language parser and/or policy language design (if Rego is not an ideal choice).
- 1.5 − 2 months: Verification contracts rewriting on the function source code based on the policy parsed. • (∼ 1 month): The function can be properly verified formally (by, e.g., querying the Z3 SMT solver).
Difficulty: Major
Project size: ~350 hour (large)
Potential mentors:
Mingshen Sun, Apache Teaclave (incubating) PPMC, mssun@apache.org
Airflow
Potential mentors:
Mingshen Sun, Apache Teaclave (incubating) PPMC, mssun@apache.org
[GSoC][Airflow] Automation for PMC
This is a project to implement a tool for PMC task automation.
This is a large project.
Mentor will be aizhamal ,
SeaTunnel
Apache SeaTunnel(Incubating) Http Client For SeaTunnel Zeta
Apache SeaTunnel(Incubating)
SeaTunnel is a very easy-to-use ultra-high-performance distributed data integration platform that supports real-time synchronization of massive data. It can synchronize tens of billions of data stably and efficiently every day, and has been used in the production of nearly 100 companies.
SeaTunnel provides a Connector API that does not depend on a specific execution engine. Connectors (Source, Transform, Sink) developed based on this API can run On many different engines, such as SeaTunnel Zeta, Flink, Spark that are currently supported. SeaTunnel has supported more than 100 Connectors, and the number is surging.
Website: https://seatunnel.apache.org/
GitHub: https://github.com/apache/incubator-seatunnel
Background
To use SeaTunnel, the current user needs to first create and write a config file that specifies the engine that runs the job, as well as engine related parameters. Then define the Source, Transform, and Sink of the job. We hope to provide a client that allows users to define the engine, Source, Transform, and Sink information of the job run directly through code in the client without having to start with a config file. The user can then submit the job definition information through the Client, and SeaTunnel will run these jobs. After the job is submitted, the user can obtain the status of the job running through the Client. For jobs that are already running, users can use this client to manage them, such as stopping jobs, temporary jobs, and so on.
Task
1. Discuss with the mentors what you need to do
2. Learn the details of the Apache SeaTunnel project
3. Discuss and complete design and development
Relevant Skills
- Familiar with Java, Http
- Familiar with SeaTunnel is better
Mentor
- Mentor: Jun Gao, Apache SeaTunnel(Incubating) PPMC Member, gaojun2048@apache.org
- Mentor: Li Liu, Apache SeaTunnel(Incubating) Commiter, ic4y@apache.org Mailing List: dev@seatunnel.apache.org
- , gaojun2048@apache.org
- Mentor: Li Liu, Apache SeaTunnel(Incubating) Commiter, ic4y@apache.org
- Mailing List: dev@seatunnel.apache.org
Airflow
[GSoC][Airflow] Automation for PMC
This is a project to implement a tool for PMC task automation.
This is a large project.
Mentor will be aizhamal ,
CloudStack
CloudStack GSoC 2023 - Improve ConfigDrive to store network information
Github issue: https://github.com/apache/cloudstack/issues/2872
ConfigDrive / cloud-init supports a network_data.json file which can contain network information for a VM.
By providing the network information using ConfigDrive to a VM we can eliminate the need for DHCP and thus the Virtual Router in some use-cases.
An example JSON file:
{ "links": [ { "ethernet_mac_address": "52:54:00:0d:bf:93", "id": "eth0", "mtu": 1500, "type": "phy" } ], "networks": [ { "id": "eth0", "ip_address": "192.168.200.200", "link": "eth0", "netmask": "255.255.255.0", "network_id": "dacd568d-5be6-4786-91fe-750c374b78b4", "routes": [ { "gateway": "192.168.200.1", "netmask": "0.0.0.0", "network": "0.0.0.0" } ], "type": "ipv4" }, { "id": "eth0", "ip_address": "2001:db8:100::1337", "link": "eth0", "netmask": "64", "network_id": "dacd568d-5be6-4786-91fe-750c374b78b4", "routes": [ { "gateway": "2001:db8:100::1", "netmask": "0", "network": "::" } ], "type": "ipv6" } ], "services": [ { "address": "8.8.8.8", "type": "dns" } ] }
In Basic Networking and Advanced Networking zones which are using a shared network you wouldn't require a VR anymore.
...
Dubbo GSoC 2023 - HTTP/3 Rest Support
HTTP/3 has been formalized as a standard in the last year. Dubbo, as a framework that supports publishing and invoking Web services, needs to support the HTTP/3 protocol.
This task needs to expand the implementation of the current rest protocol to support publishing HTTP/3 services and calling HTTP/3 services.current rest protocol to support publishing HTTP/3 services and calling HTTP/3 services.
Dubbo GSoC - Pixiu supports gRPC/dubbo protocol with WASM plug-in
Pixiu acts as a gateway, forwarding traffic to various services.
Pixiu needs to support communication between different applications on the browser, and WASM needs to be supported on the browser. Currently, it only supports the HTTP protocol.
This project needs to complete the communication protocol below WASM (gRPC is preferred)
1. Support gRPC protocol
2. Support dubbo protocol
The front end calls gRPC reference https://github.com/grpc/grpc-web
Dubbo GSoC 2023 - Automatically configure pixiu as istio ingress gateway
In the istio mesh environment, the public dubbo/dubbo go provider can be exposed outside the cluster through the http/https protocol through the istio ingress gateway. This requires the ingress gateway to complete the conversion from http to dubbo protocol, which is the main scenario of pixiu; this project Need to complete:
1. Customize pixiu, which can be used as an istio ingress gateway, proxy http/https requests and convert them into dubbo requests;
2. The gateway supports basic user authentication methods.
Basic reference: https://istio.io/latest/blog/2019/custom-ingress-gateway/
https://cloud.ibm.com/docs/containers?topic=containers-istio-custom-gateway
Dubbo GSoC - Pixiu supports gRPC/dubbo protocol with WASM plug-in
Pixiu acts as a gateway, forwarding traffic to various services.
Pixiu needs to support communication between different applications on the browser, and WASM needs to be supported on the browser. Currently, it only supports the HTTP protocol.
This project needs to complete the communication protocol below WASM (gRPC is preferred)
1. Support gRPC protocol
2. Support dubbo protocol
The front end calls gRPC reference https://github.com/grpc/grpc-web
...