Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Contents

...

Apache ShardingSphere Add ShardingSphere Kafka source connector

Apache ShardingSphere

Apache ShardingSphere is positioned as a Database Plus, and aims at building a standard layer and ecosystem above heterogeneous databases. It focuses on how to reuse existing databases and their respective upper layer, rather than creating a new database. The goal is to minimize or eliminate the challenges caused by underlying databases fragmentation.

Pagehttps://shardingsphere.apache.org
Githubhttps://github.com/apache/shardingsphere 

Background

The community just added CDC (change data capture) feature recently. Change feed will be published in created network connection after logging in, then it could be consumed.

Since Kafka is popular distributed event streaming platform, it's useful to import change feed into Kafka for later processing.

Task

  1. Familiar with ShardingSphere CDC client usage, create publication and subscribe change feed.
  2. Familiar with Kafka connector development, develop source connector, integrate with ShardingSphere CDC. Persist change feed to Kafka topics properly.

Relevant Skills

1. Java language

2. Basic knowledge of CDC and Kafka

3. Maven

References

Mentor

Hongsheng Zhong, PMC of Apache ShardingSphere, zhonghongsheng@apache.org


Difficulty: Major
Project size: ~350 hour (large)
Potential mentors:
Hongsheng Zhong, mail: zhonghongsheng (at) apache.org
Project Devs, mail: dev (at) shardingsphere.apache.org

StreamPipes

RocketMQ

GSoC Integrate RocketMQ 5.0 client with Spring

Apache RocketMQ

Apache RocketMQ is a distributed messaging and streaming platform with low latency, high performance and reliability, trillion-level capacity and flexible scalability.

Page: https://rocketmq.apache.org
Github: https://github.com/apache/rocketmq

Background

RocketMQ 5.0 client has been released recently, we need to integrate it with Spring.

Task

  1. Familiar with RocketMQ 5.0 java client usage, you could see more details from https://github.com/apache/rocketmq-clients/tree/master/java and https://rocketmq.apache.org/docs/quickStart/01quickstart
  2. Integrate with Spring.

Relevant Skills

  1. Java language
  2. Basic knowledge of RocketMQ 5.0
  3. Spring

Mentor

Yangkun Ai, PMC of Apache RocketMQ, aaronai@apache.orgImage Added

Difficulty: Major
Project size: ~175 hour (medium)
Potential mentors:
Yangkun Ai, mail: aaronai (at) apache.org
Project Devs, mail: dev (at) rocketmq.apache.org

GSoC Support Logging exporter for metrics in RocketMQ

Apache RocketMQ

Apache RocketMQ is a distributed messaging and streaming platform with low latency, high performance and reliability, trillion-level capacity and flexible scalability.

Page: https://rocketmq.apache.org
Github: https://github.com/apache/rocketmq

Background

RocketMQ 5.0 supports metrics based on OpenTelemetry. However, typically metrics data needs to be collected by an OpenTelemetry Collector, in which case the OTLP metrics exporter is used. If there is no collector-based component, it is also possible to print the metrics data directly to logs, which makes troubleshooting problems more convenient.

Related Issue

https://github.com/apache/rocketmq/issues/5678

Relevant Skills

  1. Java language
  2. Basic knowledge of CNCF OpenTelemetry

Mentor

Yangkun Ai, PMC of Apache RocketMQ, aaronai@apache.orgImage Added

Difficulty: Major
Project size: ~175 hour (medium)
Potential mentors:
Yangkun Ai, mail: aaronai

Code Insights for Apache StreamPipes

Apache StreamPipes

Apache StreamPipes (incubating) is a self-service (Industrial) IoT toolbox to enable non-technical users to connect, analyze and explore IoT data streams. StreamPipes offers several modules including StreamPipes Connect to easily connect data from industrial IoT sources, the Pipeline Editor to quickly create processing pipelines and several visualization modules for live and historic data exploration. Under the hood, StreamPipes utilizes an event-driven microservice paradigm of standalone, so-called analytics microservices making the system easy to extend for individual needs.

Background

StreamPipes has grown significantly throughout recent years. We were able to introduce a lot of new features and attracted both users and contributors. Putting the cherry on the cake, we were graduated as an Apache top level project in December 2022. We will of course continue developing new features and never rest to make StreamPipes even more amazing. Although, since we are approaching with full stream towards our `1.0` release, we want to project also to get more mature. Therefore, we want to address one of our Achilles' heels: our test coverage.

Don't worry, this issue is not about implementing myriads of tests for our code base. As a first step, we would like to make the status quo transparent. That means we want to measure our code coverage consistently across the whole codebase (Backend, UI, Python library) and report the coverage to codecov. Furthermore, to benchmark ourselves and motivate us to provide tests with every contributing, we would like to lock the current test coverage as an lower threshold that we always want to achieve (meaning in case we drop CI builds fail etc). With time we then can increase the required coverage lever step to step.

More than monitoring our test coverage, we also want to invest in better and more clean code. Therefore, we would like to adopt sonarcloud for our repository.

Tasks

  • [ ] calculate test coverage for all main parts of the repo
  • [ ] send coverage to codeCov
  • [ ] determine coverage threshold and let CI fail if below
  • [ ] include sonarcloud in CI setup
  • [ ] include automatic coverage report in PR validation (see an example here ) -> optional
  • [ ] include automatic sonarcloud report in PR validation -> optional
  • [ ] what ever comes in your mind 💡 further ideas are always welcome

❗Important Note❗

Do not create any account in behalf of Apache StreamPipes in Sonarcloud or in CodeCov or using the name of Apache StreamPipes for any account creation. Your mentor will take care of it.

Relevant Skills

  • basic knowledge about GitHub worfklows

Learning Material

Name and Contact Information

Name: Tim Bossenmaier

email:  bossenti[at]apache.org

community: dev[at]streampipes.apache.org

website: https://streampipes.apache.org/

Difficulty: Major
Project size: ~175 hour (medium)
Potential mentors:
Tim Bossenmaier, mail: bossenti (at) apache.org
Project Devs, mail: dev (at) streampipesrocketmq.apache.org

RocketMQ

GSoC Integrate RocketMQ 5.0 client with Spring

Apache RocketMQ

Apache RocketMQ is a distributed messaging and streaming platform with low latency, high performance and reliability, trillion-level capacity and flexible scalability.

Page: https://rocketmq.apache.org
Github: https://github.com/apache/rocketmq

Background

RocketMQ 5.0 client has been released recently, we need to integrate it with Spring.

Task

  1. Familiar with RocketMQ 5.0 java client usage, you could see more details from https://github.com/apache/rocketmq-clients/tree/master/java and https://rocketmq.apache.org/docs/quickStart/01quickstart
  2. Integrate with Spring.

Relevant Skills

  1. Java language
  2. Basic knowledge of RocketMQ 5.0
  3. Spring

Mentor

Yangkun Ai, PMC of Apache RocketMQ, aaronai@apache.orgImage Removed

Difficulty: Major
Project size: ~175 hour (medium)
Potential mentors:
Yangkun Ai, mail: aaronai (at) apache.org
Project Devs, mail: dev (at) rocketmq.apache.org

Commons Statistics

[GSoC] Summary statistics API for Java 8 streams

Placeholder for tasks that could be undertaken in this year's GSoC.

Ideas:

  • Design an updated summary statistics API for use with Java 8 streams based on the summary statistic implementations in the Commons Math stat.descriptive package including moments, rank and summary sub-packages.
Difficulty: Minor
Project size: ~350 hour (large)
Potential mentors:
Alex Herbert, mail: aherbert (at) apache.org
Project Devs, mail:

Commons Numbers

StreamPipes

Code Insights for Apache StreamPipes

Apache StreamPipes

Apache StreamPipes (incubating) is a self-service (Industrial) IoT toolbox to enable non-technical users to connect, analyze and explore IoT data streams. StreamPipes offers several modules including StreamPipes Connect to easily connect data from industrial IoT sources, the Pipeline Editor to quickly create processing pipelines and several visualization modules for live and historic data exploration. Under the hood, StreamPipes utilizes an event-driven microservice paradigm of standalone, so-called analytics microservices making the system easy to extend for individual needs.

Background

StreamPipes has grown significantly throughout recent years. We were able to introduce a lot of new features and attracted both users and contributors. Putting the cherry on the cake, we were graduated as an Apache top level project in December 2022. We will of course continue developing new features and never rest to make StreamPipes even more amazing. Although, since we are approaching with full stream towards our `1.0` release, we want to project also to get more mature. Therefore, we want to address one of our Achilles' heels: our test coverage.

Don't worry, this issue is not about implementing myriads of tests for our code base. As a first step, we would like to make the status quo transparent. That means we want to measure our code coverage consistently across the whole codebase (Backend, UI, Python library) and report the coverage to codecov. Furthermore, to benchmark ourselves and motivate us to provide tests with every contributing, we would like to lock the current test coverage as an lower threshold that we always want to achieve (meaning in case we drop CI builds fail etc). With time we then can increase the required coverage lever step to step.

More than monitoring our test coverage, we also want to invest in better and more clean code. Therefore, we would like to adopt sonarcloud for our repository.

Tasks

  • [ ] calculate test coverage for all main parts of the repo
  • [ ] send coverage to codeCov
  • [ ] determine coverage threshold and let CI fail if below
  • [ ] include sonarcloud in CI setup
  • [ ] include automatic coverage report in PR validation (see an example here ) -> optional
  • [ ] include automatic sonarcloud report in PR validation -> optional
  • [ ] what ever comes in your mind 💡 further ideas are always welcome


❗Important Note❗

Do not create any account in behalf of Apache StreamPipes in Sonarcloud or in CodeCov or using the name of Apache StreamPipes for any account creation. Your mentor will take care of it.


Relevant Skills

  • basic knowledge about GitHub worfklows

Learning Material


References

You can find our corresponding issue on GitHub here


Name and Contact Information

Name: Tim Bossenmaier

email:  bossenti[at]apache.org

community: dev[at]streampipes.apache.org

website: https://streampipes.apache.org/

Difficulty: Major
Project size: ~175 hour (medium)
Potential mentors:
Tim Bossenmaier, mail: bossenti (at) apache.org
Project Devs, mail: dev (at) streampipes.apache.org

Commons Statistics

[GSoC] Summary statistics API for Java 8 streams

Placeholder for tasks that could be undertaken in this year's GSoC.

Ideas:

  • Design an updated summary statistics API for use with Java 8 streams based on the summary statistic implementations in the Commons Math stat.descriptive package including moments, rank and summary sub-packages.
Difficulty: Minor
Project size: ~350 hour (large)
Potential mentors:
Alex Herbert, mail: aherbert (at) apache.org
Project Devs, mail:

Commons Numbers

Add support for extended precision floating-point numbers

Add implementations of extended precision floating point numbers.

An extended precision floating point number is a series of floating-point numbers that are non-overlapping such that:

double-double (a, b):

Add support for extended precision floating-point numbers

Add implementations of extended precision floating point numbers.

An extended precision floating point number is a series of floating-point numbers that are non-overlapping such that:

double-double (a, b):
            |a| > |b|
            a == a + b

Common representations are double-double and quad-double (see for example David Bailey's paper on a quad-double library: QD).

Many computations in the Commons Numbers and Statistics libraries use extended precision computations where the accumulated error of a double would lead to complete cancellation of all significant bits; or create intermediate overflow of integer values.

This project would formalise the code underlying these use cases with a generic library applicable for use in the case where the result is expected to be a finite value and using Java's BigDecimal and/or BigInteger negatively impacts performance.

An example would be the average of long values where the intermediate sum overflows or the conversion to a double loses bits:

long[] values = {Long.MAX_VALUE, Long.MAX_VALUE};

            
System.out.println(Arrays.stream(values).average().getAsDouble()); System.out.println(Arrays.stream(values).mapToObj(BigDecimal::valueOf)
|a| > |b|
            a == a + b

Common representations are double-double and quad-double (see for example David Bailey's paper on a quad-double library: QD).

Many computations in the Commons Numbers and Statistics libraries use extended precision computations where the accumulated error of a double would lead to complete cancellation of all significant bits; or create intermediate overflow of integer values.

This project would formalise the code underlying these use cases with a generic library applicable for use in the case where the result is expected to be a finite value and using Java's BigDecimal and/or BigInteger negatively impacts performance.

An example would be the average of long values where the intermediate sum overflows or the conversion to a double loses bits:

    .reduce(BigDecimal.ZERO, BigDecimal::add)
            .divide(BigDecimal.valueOf(values.length)).doubleValue());
            long[] values2 = {Long.MAX_VALUE, Long.MIN_VALUE};
         long[] values = System.out.println(Arrays.stream(values2).asDoubleStream({Long.MAX_VALUE, Long.MAX_VALUE};
            System.out.println(Arrays.stream(values).average().getAsDouble()); System.out.println(Arrays.stream(values2values).mapToObj(BigDecimal::valueOf)
               .reduce(BigDecimal.ZERO, BigDecimal::add)
            .divide(BigDecimal.valueOf(values2values.length)).doubleValue());
            

Outputs:

-1.0 9.223372036854776E18
long[] values2 = {Long.MAX_VALUE, Long.MIN_VALUE};
            
0.0
System.out.println(Arrays.stream(values2).asDoubleStream().average().getAsDouble()); System.out.println(Arrays.stream(values2).mapToObj(BigDecimal::valueOf)
               .reduce(BigDecimal.ZERO, BigDecimal::add)
            
-0.5
Difficulty: Major
Project size: ~175 hour (medium)
Potential mentors:
Alex Herbert, mail: aherbert (at) apache.org
Project Devs, mail: dev (at) commons.apache.org

Commons Math

[GSoC] Update components including machine learning; linear algebra; special functions

Placeholder for tasks that could be undertaken in this year's GSoC.

Ideas (extracted from the "dev" ML):

  1. Redesign and modularize the "ml" package
    -> main goal: enable multi-thread usage.
  2. Abstract the linear algebra utilities
    -> main goal: allow switching to alternative implementations.
  3. Redesign and modularize the "random" package
    -> main goal: general support of low-discrepancy sequences.
  4. Refactor and modularize the "special" package
    -> main goals: ensure accuracy and performance and better API,
    add other functions.
  5. Upgrade the test suite to Junit 5
    -> additional goal: collect a list of "odd" expectations.

Other suggestions welcome, as well as

  • delineating additional and/or intermediate goals,
  • signalling potential pitfalls and/or alternative approaches to the intended goal(s).
Difficulty: Minor
Project size: ~350 hour (large)
Potential mentors:
Gilles Sadowski, mail: erans (at) apache.org
Project Devs, mail: dev (at) commons.apache.org

Commons Imaging

.divide(BigDecimal.valueOf(values2.length)).doubleValue());
            

Outputs:

-1.0
            9.223372036854776E18
            0.0
            -0.5
Difficulty: Major
Project size: ~175 hour (medium)
Potential mentors:
Alex Herbert, mail: aherbert (at) apache.org
Project Devs, mail: dev (at) commons.apache.org

Commons Math

[GSoC] Update components including machine learning; linear algebra; special functions

Placeholder for tasks that could be undertaken in this year's GSoC.

Ideas (extracted from the "dev" ML):

  1. Redesign and modularize the "ml" package
    -> main goal: enable multi-thread usage.
  2. Abstract the linear algebra utilities
    -> main goal: allow switching to alternative implementations.
  3. Redesign and modularize the "random" package
    -> main goal: general support of low-discrepancy sequences.
  4. Refactor and modularize the "special" package
    -> main goals: ensure accuracy and performance and better API,
    add other functions.
  5. Upgrade the test suite to Junit 5
    -> additional goal: collect a list of "odd" expectations.
  6. Review and finalize pending issues about the refactoring of the "genetic algorithm" functionality (cf. dedicated branch)

Other suggestions welcome, as well as

  • delineating additional and/or intermediate goals,
  • signalling potential pitfalls and/or alternative approaches to the intended goal(s).
Difficulty: Minor
Project size: ~350 hour (large)
Potential mentors:
Gilles Sadowski, mail: erans (at) apache.org
Project Devs, mail: dev (at) commons.apache.org

Commons Imaging

Placeholder for 1.0 release

A placeholder ticket, to link other issues and organize tasks related to the 1.0 release of Commons Imaging.

The 1.0 release of Commons Imaging has been postponed several times. Now we have a more clear idea of what's necessary for the 1.0 (see issues with fixVersion 1.0 and 1.0-alpha3, and other open issues), and the tasks are interesting as it involves both basic and advanced programming for tasks such as organize how test images are loaded, or work on performance improvements at byte level and following image format specifications.

The tasks are not too hard to follow, as normally there are example images that need to work with Imaging, as well as other libraries in C, C++, Rust, PHP, etc., that process these images correctly. Our goal with this issue is to a) improve our docs, b) improve our tests, c) fix possible security issues, d) get the parsers in Commons Imaging ready for the 1.0 release.

Assigning the label for GSoC 2023, and full time. Although it would be possible to work on a smaller set of tasks for 1.0 as a part time too.

Difficulty: Major
Project size: ~350 hour (large)
Potential mentors:
Bruno P. Kinoshita, mail: kinow (at) apache.org
Project Devs, mail:

CloudStack

CloudStack GSoC 2023 - Autodetect IPs used inside the VM

Github issue: https://github.com/apache/cloudstack/issues/7142


Description:

With regards to IP info reporting, Cloudstack relies entirely on it's DHCP data bases and so on. When this is not available (L2 networks etc) no IP information is shown for a given VM.

I propose we introduce a mechanism for "IP autodetection" and try to discover the IPs used inside the machines by means of querying the hypervisors. For example with KVM/libvirt we can simply do something like this:

 
{{root@fedora35 ~]# virsh domifaddr win2k22 --source agent
Name MAC address Protocol Address
-------------------------------------------------------------------------------
Ethernet 52:54:00:7b:23:6a ipv4 192.168.0.68/24
Loopback Pseudo-Interface 1 ipv6 ::1/128

  • - ipv4 127.0.0.1/8}}
    The above command queries the qemu-guest-agent inside the Windows VM. The VM needs to have the qemu-guest-agent installed and running as well as the virtio serial drivers (easily done in this case with virtio-win-guest-tools.exe ) as well as a guest-agent socket channel defined in libvirt.

Once we have this information we could display it in the UI/API as "Autodetected VM IPs" or something like that.

I imagine it's very similar for VMWare and XCP-ng.

Thank you

Difficulty: Major
Project size: ~175 hour (medium)
Potential mentors:
Nicolás Vázquez, mail: nvazquez (at) apache.org
Project Devs, mail: dev (at) cloudstack.apache.org

CloudStack GSoC 2023 - Extend Import-Export Instances to the KVM Hypervisor

Github issue: https://github.com/apache/cloudstack/issues/7127


Description:

The Import-Export functionality is only allowed for the Vmware hypervisor. The functionality is developed within a VM ingestion framework that allows the extension to other hypervisors. The Import-Export functionality consists on few APIs and the UI to interact with them:

  • listUnmanagedInstances: Lists unmanaged virtual machines (not existing in CloudStack but existing on the hypervisor side)
  • importUnmanagedInstance: Import an unmanaged VM into CloudStack (this implies populating the database with the corresponding data)
  • unmanageVirtualMachine: Make CloudStack forget a VM but do not remove it on the hypervisor side

The complexity on KVM should be parsing the existing XML domains into different resources and map them in CloudStack to populate the database correctly.

Difficulty: Major
Project size: ~175 hour (medium)
Potential mentors:
Nicolás Vázquez, mail: nvazquez

Placeholder for 1.0 release

A placeholder ticket, to link other issues and organize tasks related to the 1.0 release of Commons Imaging.

The 1.0 release of Commons Imaging has been postponed several times. Now we have a more clear idea of what's necessary for the 1.0 (see issues with fixVersion 1.0 and 1.0-alpha3, and other open issues), and the tasks are interesting as it involves both basic and advanced programming for tasks such as organize how test images are loaded, or work on performance improvements at byte level and following image format specifications.

The tasks are not too hard to follow, as normally there are example images that need to work with Imaging, as well as other libraries in C, C++, Rust, PHP, etc., that process these images correctly. Our goal with this issue is to a) improve our docs, b) improve our tests, c) fix possible security issues, d) get the parsers in Commons Imaging ready for the 1.0 release.

Assigning the label for GSoC 2023, and full time. Although it would be possible to work on a smaller set of tasks for 1.0 as a part time too.

Difficulty: Major
Project size: ~350 hour (large)
Potential mentors:
Bruno P. Kinoshita, mail: kinow (at) apache.org
Project Devs, mail: dev (at) cloudstack.apache.org

Apache Dubbo

Dubbo GSoC 2023 - Integration suite on Kubernetes

As a development framework that is closely related to users, Dubbo may have a huge impact on users if any problems occur during the iteration process. Therefore, Dubbo needs a complete set of automated regression testing tools.
At present, Dubbo already has a set of testing tools based on docker-compose, but this set of tools cannot test the compatibility in the kubernetes environment. At the same time, we also need a more reliable test case construction system to ensure that the test cases are sufficiently complete.

Difficulty: Major
Project size: ~350 hour (large)
Potential mentors:
Albumen Kevin, mail: albumenj (at) apache.org
Project Devs, mail:

...

Dubbo GSoC 2023 - Refactor the http layer

Background

Dubbo currently supports the rest protocol based on http1, and the triple protocol based on http2, but currently the two protocols based on the http protocol are implemented independently, and at the same time, they cannot replace the underlying implementation, and their respective implementation costs are relatively high.

Target

In order to reduce maintenance costs, we hope to be able to abstract http. The underlying implementation of the target implementation of http has nothing to do with the protocol, and we hope that different protocols can reuse related implementations.

Difficulty: Major
Project size: ~350 hour (large)
Potential mentors:
Albumen Kevin, mail: albumenj (at) apache.org
Project Devs, mail:

Dubbo GSoC 2023 - Refactor Connection

Background

At present, the abstraction of connection by client in different protocols in Dubbo is not perfect. For example, there is a big discrepancy between the client abstraction of connection in dubbo and triple protocols. As a result, the enhancement of connection-related functions in the client is more complicated, and the implementation cannot be reused. At the same time, the client also needs to implement a lot of repetitive code when extending the protocol.

Target

Reduce the complexity of the client part when extending the protocol, and increase the reuse of connection-related modules.

Difficulty: Major
Project size: ~350 hour (large)
Potential mentors:
Albumen Kevin, mail: albumenj (at) apache.org
Project Devs, mail:

Dubbo GSoC 2023 - IDL management

Background

Dubbo currently supports protobuf as a serialization method. Protobuf relies on proto (Idl) for code generation, but currently lacks tools for managing Idl files. For example, for java users, proto files are used for each compilation. It is more troublesome, and everyone is used to using jar packages for dependencies.

Target

Implement an Idl management and control platform, support idl files to automatically generate dependency packages in various languages, and push them to relevant dependency warehouses

Difficulty: Major
Project size: ~350 hour (large)
Potential mentors:
Albumen Kevin, mail: albumenj (at) apache.org
Project Devs, mail:

...