Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Contents

...

Apache ShardingSphere Support mainstream database metadata table query

Apache ShardingSphere

Apache ShardingSphere is positioned as a Database Plus, and aims at building a standard layer and ecosystem above heterogeneous databases. It focuses on how to reuse existing databases and their respective upper layer, rather than creating a new database. The goal is to minimize or eliminate the challenges caused by underlying databases fragmentation.

Pagehttps://shardingsphere.apache.org
Githubhttps://github.com/apache/shardingsphere 

Background

ShardingSphere has designed its own metadata database to simulate metadata queries that support various databases.

More details:

https://github.com/apache/shardingsphere/issues/21268
https://github.com/apache/shardingsphere/issues/22052

Task

  • Support PostgreSQL And openGauss `\d tableName`
  • Support PostgreSQL And openGauss `\d+`
  • Support PostgreSQL And openGauss `\d+ tableName`
  • Support PostgreSQL And openGauss `l`
  • Support query for MySQL metadata `TABLES`
  • Support query for MySQL metadata `COLUMNS`
  • Support query for MySQL metadata `schemata`
  • Support query for MySQL metadata `ENGINES`
  • Support query for MySQL metadata `FILES`
  • Support query for MySQL metadata `VIEWS`

Notice, these issues can be a good example.

https://github.com/apache/shardingsphere/pull/22053
https://github.com/apache/shardingsphere/pull/22057/
https://github.com/apache/shardingsphere/pull/22166/
https://github.com/apache/shardingsphere/pull/22182

Relevant Skills

  •  Master JAVA language
  •  Have a basic understanding of Zookeeper
  •  Be familiar with MySQL/Postgres SQLs 


Mentor

Chuxin Chen, Committer of Apache ShardingSphere, tuichenchuxin@apache.org

Zhengqiang Duan, PMC of Apache ShardingSphere, duanzhengqiang@apache.org

Difficulty: Major
Project size: ~350 hour (large)
Potential mentors:
Chuxin Chen, mail: tuichenchuxin (at) apache.org
Project Devs, mail: dev (at) shardingsphere.apache.org

Commons Statistics

[GSoC] Summary statistics API for Java 8 streams

Placeholder for tasks that could be undertaken in this year's GSoC.

Ideas:

  • Design an updated summary statistics API for use with Java 8 streams based on the summary statistic implementations in the Commons Math stat.descriptive package including moments, rank and summary sub-packages.
Difficulty: Minor
Project size: ~350 hour (large)
Potential mentors:
Alex Herbert, mail: aherbert (at) apache.org
Project Devs, mail:

Commons Numbers

Apache ShardingSphere Enhance ComputeNode reconciliation

Apache ShardingSphere

Apache ShardingSphere is positioned as a Database Plus, and aims at building a standard layer and ecosystem above heterogeneous databases. It focuses on how to reuse existing databases and their respective upper layer, rather than creating a new database. The goal is to minimize or eliminate the challenges caused by underlying databases fragmentation.

Page: https://shardingsphere.apache.org/
Github: https://github.com/apache/shardingsphere 

Background

There is a proposal about new CRD Cluster and ComputeNode as belows:

Currently we try to promote ComputeNode as major CRD to represent a special ShardingSphere Proxy deployment. And plan to use Cluster indicating a special ShardingSphere Proxy cluster.

Task

This issue is to enhance ComputeNode reconciliation availability. The specific case list is as follows.

  •  Add IT test case for Deployment spec volume
  •  Add IT test case for Deployment spec template init containers
  •  Add IT test case for Deployment spec template spec containers
  •  Add IT test case for Deployment spec volume mounts
  •  Add IT test case for Deployment spec container ports
  •  Add IT test case for Deployment spec container image tag
  •  Add IT test case for Service spec ports
  •  Add IT test case for ConfigMap data serverconfig
  •  Add IT test case for ConfigMap data logback
     
    Notice, these issues can be a good example.
  • chore: add more Ginkgo tests for ComputeNode #203

Relevant Skills

  1. Master Go language, Ginkgo test framework
  2. Have a basic understanding of Apache ShardingSphere Concepts
  3. Be familiar with Kubernetes Operator, kubebuilder framework

Targets files

ComputeNode IT - https://github.com/apache/shardingsphere-on-cloud/blob/main/shardingsphere-operator/pkg/reconcile/computenode/compute_node_test.go

Mentor

Liyao Miao, Committer of Apache ShardingSphere,  miaoliyao@apache.org

Difficulty: Major
Project size: ~350 hour (large)
Potential mentors:
Chuxin Chen, mail: tuichenchuxin (at) apache.org
Project Devs, mail: dev (at) shardingsphere.apache.org

Apache ShardingSphere Add ShardingSphere Kafka source connector

Apache ShardingSphere

Apache ShardingSphere is positioned as a Database Plus, and aims at building a standard layer and ecosystem above heterogeneous databases. It focuses on how to reuse existing databases and their respective upper layer, rather than creating a new database. The goal is to minimize or eliminate the challenges caused by underlying databases fragmentation.

Pagehttps://shardingsphere.apache.org
Githubhttps://github.com/apache/shardingsphere 

Background

The community just added CDC (change data capture) feature recently. Change feed will be published in created network connection after logging in, then it could be consumed.

Since Kafka is popular distributed event streaming platform, it's useful to import change feed into Kafka for later processing.

Task

  1. Familiar with ShardingSphere CDC client usage, create publication and subscribe change feed.
  2. Familiar with Kafka connector development, develop source connector, integrate with ShardingSphere CDC. Persist change feed to Kafka topics properly.

Relevant Skills

1. Java language

2. Basic knowledge of CDC and Kafka

3. Maven

References

Mentor

Hongsheng Zhong, PMC of Apache ShardingSphere, zhonghongsheng@apache.orgImage Added


Difficulty: Major
Project size: ~350 hour (large)
Potential mentors:
Hongsheng Zhong, mail: zhonghongsheng (at) apache.org
Project Devs, mail: dev (at) shardingsphere.apache.org

Commons Statistics

[GSoC] Summary statistics API for Java 8 streams

Placeholder for tasks that could be undertaken in this year's GSoC.

Ideas:

  • Design an updated summary statistics API for use with Java 8 streams based on the summary statistic implementations in the Commons Math stat.descriptive package including moments, rank and summary sub-packages.
Difficulty: Minor
Project size: ~350 hour (large)
Potential mentors:
Alex Herbert, mail: aherbert (at) apache.org
Project Devs, mail:

Commons Numbers

Add support for extended precision floating-point numbers

Add implementations of extended precision floating point numbers.

An extended precision floating point number is a series of floating-point numbers that are non-overlapping such that:

double-double (a, b):
            |a| > |b|
            a == a + b

Common representations are double-double and quad-double (see for example David Bailey's paper on a quad-double library: QD).

Many computations in the Commons Numbers and Statistics libraries use extended precision computations where the accumulated error of a double would lead to complete cancellation of all significant bits; or create intermediate overflow of integer values.

This project would formalise the code underlying these use cases with a generic library applicable for use in the case where the result is expected to be a finite value and using Java's BigDecimal and/or BigInteger negatively impacts performance.

An example would be the average of long values where the intermediate sum overflows or the conversion to a double loses bits:

            long[] values = {Long.MAX_VALUE, Long.MAX_VALUE};
            System.out.println(Arrays.stream(values).average().getAsDouble()); System.out.println(Arrays.stream(values).mapToObj(BigDecimal::valueOf)
            .reduce(BigDecimal.ZERO, BigDecimal::add)
            .divide(BigDecimal.valueOf(values.length)).doubleValue());
            long[] values2 = {Long.MAX_VALUE, Long.MIN_VALUE};
            System.out.println(Arrays.stream(values2).asDoubleStream().average().getAsDouble()); System.out.println(Arrays.stream(values2).mapToObj(BigDecimal::valueOf)
               .reduce(BigDecimal.ZERO, BigDecimal::add)
            .divide(BigDecimal.valueOf(values2.length)).doubleValue());
            

Outputs:

-1.0
            9.223372036854776E18
            0.0
            -0.5
Difficulty: Major
Project size: ~175 hour (medium)
Potential mentors:
Alex Herbert, mail: aherbert (at) apache.org
Project Devs, mail: dev (at) commons.apache.org

Commons Math

[GSoC] Update components including machine learning; linear algebra; special functions

Placeholder for tasks that could be undertaken in this year's GSoC.

Ideas (extracted from the "dev" ML):

  1. Redesign and modularize the "ml" package
    -> main goal: enable multi-thread usage.
  2. Abstract the linear algebra utilities
    -> main goal: allow switching to alternative implementations.
  3. Redesign and modularize the "random" package
    -> main goal: general support of low-discrepancy sequences.
  4. Refactor and modularize the "special" package
    -> main goals: ensure accuracy and performance and better API,
    add other functions.
  5. Upgrade the test suite to Junit 5
    -> additional goal: collect a list of "odd" expectations.

Other suggestions welcome, as well as

  • delineating additional and/or intermediate goals,
  • signalling potential pitfalls and/or alternative approaches to the intended goal(s).
Difficulty: Minor
Project size: ~350 hour (large)
Potential mentors:
Gilles Sadowski, mail: erans (at) apache.org
Project Devs, mail: dev (at) commons.apache.org

Commons Imaging

Placeholder for 1.0 release

A placeholder ticket, to link other issues and organize tasks related to the 1.0 release of Commons Imaging.

The 1.0 release of Commons Imaging has been postponed several times. Now we have a more clear idea of what's necessary for the 1.0 (see issues with fixVersion 1.0 and 1.0-alpha3, and other open issues), and the tasks are interesting as it involves both basic and advanced programming for tasks such as organize how test images are loaded, or work on performance improvements at byte level and following image format specifications.

The tasks are not too hard to follow, as normally there are example images that need to work with Imaging, as well as other libraries in C, C++, Rust, PHP, etc., that process these images correctly. Our goal with this issue is to a) improve our docs, b) improve our tests, c) fix possible security issues, d) get the parsers in Commons Imaging ready for the 1.0 release.

Assigning the label for GSoC 2023, and full time. Although it would be possible to work on a smaller set of tasks for 1.0 as a part time too.

Difficulty: Major
Project size: ~350 hour (large)
Potential mentors:
Bruno P. Kinoshita, mail: kinow (at) apache.org
Project Devs, mail:

Apache Dubbo

Dubbo GSoC 2023 - Integration suite on Kubernetes

As a development framework that is closely related to users, Dubbo may have a huge impact on users if any problems occur during the iteration process. Therefore, Dubbo needs a complete set of automated regression testing tools.
At present, Dubbo already has a set of testing tools based on docker-compose, but this set of tools cannot test the compatibility in the kubernetes environment. At the same time, we also need a more reliable test case construction system to ensure that the test cases are sufficiently complete.

Difficulty: Major
Project size: ~350 hour (large)
Potential mentors:
Albumen Kevin, mail: albumenj (at) apache.org
Project Devs, mail:

Dubbo GSoC 2023 - Dubbo usage scanner

As a development framework closely related to users, Dubbo provides many functional features (such as configuring timeouts, retries, etc.). We hope that a tool can be given to users to scan which features are used, which features are deprecated, which ones will be deprecated in the future, and so on. Based on this tool, we can provide users with a better migration solution.
Suggestion: You can consider based on static code scanning or javaagent implementation.

Difficulty: Major
Project size: ~350 hour (large)
Potential mentors:
Albumen Kevin, mail: albumenj (at) apache.org
Project Devs, mail:

Dubbo GSoC 2023 - Remove jprotoc in compiler

Dubbo supports the communication mode based on the gRPC protocol through Triple. For this reason, Dubbo has developed a compiling plug-in for proto files based on jprotoc. Due to the activeness of jprotoc, currently Dubbo compiler cannot run well on the latest protobuf version. Therefore, we need to consider implementing a new compiler with reference to gRPC.

Difficulty: Major
Project size: ~350 hour (large)
Potential mentors:
Albumen Kevin, mail: albumenj (at) apache.org
Project Devs, mail:

Dubbo GSoC 2023 - Dubbo i18n log

Dubbo is a development framework that is closely related to users, and many usages by users may cause exceptions handled by Dubbo. Usually, in this case, users can only judge through logs. We hope to provide an i18n localized log output tool to provide users with a more friendly log troubleshooting experience.

Difficulty: Major
Project size: ~175 hour (medium)
Potential mentors:
Albumen Kevin, mail: albumenj (at) apache.org
Project Devs, mail:

Dubbo GSoC 2023 - Refactor dubbo project to gradle

As more and more projects start to develop based on Gradle and profit from Gradle, Dubbo also hopes to migrate to the Gradle project. This task requires you to transform the dubbo project[1] into a gradle project.


 [1] https://github.com/apache/dubbo

Difficulty: Major
Project size: ~350 hour (large)
Potential mentors:
Albumen Kevin, mail: albumenj (at) apache.org
Project Devs, mail:

Dubbo GSoC 2023 - Metrics on Dubbo Admin

Dubbo Admin is a console of Dubbo. Today, Dubbo's observability is becoming more and more powerful. We need to directly observe some indicators of Dubbo on Dubbo Admin, and even put forward suggestions for users to improve problems.

Add support for extended precision floating-point numbers

Add implementations of extended precision floating point numbers.

An extended precision floating point number is a series of floating-point numbers that are non-overlapping such that:

double-double (a, b):
            |a| > |b|
            a == a + b

Common representations are double-double and quad-double (see for example David Bailey's paper on a quad-double library: QD).

Many computations in the Commons Numbers and Statistics libraries use extended precision computations where the accumulated error of a double would lead to complete cancellation of all significant bits; or create intermediate overflow of integer values.

This project would formalise the code underlying these use cases with a generic library applicable for use in the case where the result is expected to be a finite value and using Java's BigDecimal and/or BigInteger negatively impacts performance.

An example would be the average of long values where the intermediate sum overflows or the conversion to a double loses bits:

            long[] values = {Long.MAX_VALUE, Long.MAX_VALUE};
            System.out.println(Arrays.stream(values).average().getAsDouble()); System.out.println(Arrays.stream(values).mapToObj(BigDecimal::valueOf)
            .reduce(BigDecimal.ZERO, BigDecimal::add)
            .divide(BigDecimal.valueOf(values.length)).doubleValue());
            long[] values2 = {Long.MAX_VALUE, Long.MIN_VALUE};
            System.out.println(Arrays.stream(values2).asDoubleStream().average().getAsDouble()); System.out.println(Arrays.stream(values2).mapToObj(BigDecimal::valueOf)
               .reduce(BigDecimal.ZERO, BigDecimal::add)
            .divide(BigDecimal.valueOf(values2.length)).doubleValue());
            

Outputs:

-1.0 9.223372036854776E18 0.0 -0.5

Difficulty: Major
Project size: ~175 hour (medium)
Potential mentors:
Alex HerbertAlbumen Kevin, mail: aherbert albumenj (at) apache.org
Project Devs, mail: dev (at) commons.apache.org

Commons Math

, mail:

Dubbo GSoC 2023 - Refactor the http layer

Background

Dubbo currently supports the rest protocol based on http1, and the triple protocol based on http2, but currently the two protocols based on the http protocol are implemented independently, and at the same time, they cannot replace the underlying implementation, and their respective implementation costs are relatively high.

Target

In order to reduce maintenance costs, we hope to be able to abstract http. The underlying implementation of the target implementation of http has nothing to do with the protocol, and we hope that different protocols can reuse related implementations.

Difficulty: Major

[GSoC] Update components including machine learning; linear algebra; special functions

Placeholder for tasks that could be undertaken in this year's GSoC.

Ideas (extracted from the "dev" ML):

  1. Redesign and modularize the "ml" package
    -> main goal: enable multi-thread usage.
  2. Abstract the linear algebra utilities
    -> main goal: allow switching to alternative implementations.
  3. Redesign and modularize the "random" package
    -> main goal: general support of low-discrepancy sequences.
  4. Refactor and modularize the "special" package
    -> main goals: ensure accuracy and performance and better API,
    add other functions.
  5. Upgrade the test suite to Junit 5
    -> additional goal: collect a list of "odd" expectations.

Other suggestions welcome, as well as

  • delineating additional and/or intermediate goals,
  • signalling potential pitfalls and/or alternative approaches to the intended goal(s).
Difficulty: Minor
Project size: ~350 hour (large)
Potential mentors:
Gilles SadowskiAlbumen Kevin, mail: erans albumenj (at) apache.org
Project Devs, mail: dev (at) commons.apache.org

Commons Imaging

Dubbo GSoC 2023 - Refactor Connection

Background

At present, the abstraction of connection by client in different protocols in Dubbo is not perfect. For example, there is a big discrepancy between the client abstraction of connection in dubbo and triple protocols. As a result, the enhancement of connection-related functions in the client is more complicated, and the implementation cannot be reused. At the same time, the client also needs to implement a lot of repetitive code when extending the protocol.

Target

Reduce the complexity of the client part when extending the protocol, and increase the reuse of connection-related modules.

Difficulty: Major
Project size: ~350 hour (large)
Potential mentors:
Albumen Kevin, mail: albumenj (at) apache.org
Project Devs, mail:

Dubbo GSoC 2023 - IDL management

Background

Dubbo currently supports protobuf as a serialization method. Protobuf relies on proto (Idl) for code generation, but currently lacks tools for managing Idl files. For example, for java users, proto files are used for each compilation. It is more troublesome, and everyone is used to using jar packages for dependencies.

Target

Implement an Idl management and control platform, support idl files to automatically generate dependency packages in various languages, and push them to relevant dependency warehouses

Placeholder for 1.0 release

A placeholder ticket, to link other issues and organize tasks related to the 1.0 release of Commons Imaging.

The 1.0 release of Commons Imaging has been postponed several times. Now we have a more clear idea of what's necessary for the 1.0 (see issues with fixVersion 1.0 and 1.0-alpha3, and other open issues), and the tasks are interesting as it involves both basic and advanced programming for tasks such as organize how test images are loaded, or work on performance improvements at byte level and following image format specifications.

The tasks are not too hard to follow, as normally there are example images that need to work with Imaging, as well as other libraries in C, C++, Rust, PHP, etc., that process these images correctly. Our goal with this issue is to a) improve our docs, b) improve our tests, c) fix possible security issues, d) get the parsers in Commons Imaging ready for the 1.0 release.

Assigning the label for GSoC 2023, and full time. Although it would be possible to work on a smaller set of tasks for 1.0 as a part time too.

Difficulty: Major
Project size: ~350 hour (large)
Potential mentors:
Bruno P. KinoshitaAlbumen Kevin, mail: kinow albumenj (at) apache.org
Project Devs, mail:

...