A good long term objective for the PMC is to drop RabbitMQ in
favor of pulsar (third parties could package their own components using
RabbitMQ if they wishes...)

This means:

Solve the bugs that were found during the Pulsar MailQueue review
Pulsar MailQueue need to allow listing blobs in order to be
deduplication friendly.
Provide an event bus based on Pulsar
Provide a task manager based on Pulsar
Package a distributed server backed by pulsar, deprecate then replace
the current one.
(optionally) support mail queue priorities

While contributions would of course be welcomed on this topic, we could
offer it as part of GSOC 2022, and we could co-mentor it with mentors of
the Pulsar community (see [3])

[3] https://lists.apache.org/thread/y9s7f6hmh51ky30l20yx0dlz458gw259

Would such a plan gain traction around here ?

Difficulty: Major

Project size: ~350 hour (large)

Potential mentors:

Benoit Tellier, mail: btellier (at) apache.org

Project Devs, mail: dev (at) james.apache.org

Implement a web ui for James administration

James today provides a command line tool to do administration tasks like creating a domain, listing users, setting quota, etc.
It requires access to JMX port and even if lot of admins are confortable with such tools, to make our user base broader, we probably should expose the same commands in Rest and provide a fancy default web ui.
The task would need some basic skills on frontend tools to design an administration board, knowledge on what REST mean and enough Java understanding to add commands to existing Rest backend.
In the team, we have a strong focus on test (who want a mail server that is not tested enough ?) so we will explain and/or teach the student how to have the right test coverage of the features using modern tools like Cucumber, Selenium, rest-assured, etc.

Difficulty: Major

Project size: ~350 hour (large)

Potential mentors:

Matthieu Baechler, mail: matthieu (at) apache.org

Project Devs, mail: dev (at) james.apache.org

[GSOC] James as a (distributed) MX server

Why ?

Alternatives like Postfix...

Do not offer a unified view of the mail queue across nodes
Requires statefull persistent storage

Given Apache James recent push to adopt a distributed mail queue based on Pulsar supporting delays (JAMES-3687), it starts making sense developing tooling for MX related tooling.

I propose myself to mentor a Gsoc on this topic.

Benefits for the student

At the end of this GSOC you will...

Have a solid understanding of email relaying and associated mechanics
Understand James modular architecture (mailet/ matcher / routes)
Have a hands-on expertise in SQL / NoSQL working with technologies like Cassandra, Redis, JPA...
Identify fix and solve architecture problems.
Conduct performance tests and develop an operational mindset

Inventory...

James ships a couple of MX related tools within smtp-hooks/mailets in default packages. It would make sense to me to move those as an extension.

James supports today...

checks agains DNS blacklists. `DNSRBLHandler` or `URIRBLHandler` smtp hook for instance. This can be moved as an extension IMO.

We would need a little performance benchmark to document performance implications of activating DNS-RBL.

Finally as quoted by a gitter guy: it would make more sens to have this done as a MailHook rather as a RcptHook as it would avoid doing the same job again and over again for each recipients. See ~~JAMES-3820~~ .

Grey listing. There's an existing implementation using JDBC as an underlying storage.

Move it as an extension.

Remove JDBC storage, propose 2 storage possibilities: in-memory for single node, REDIS for a distributed topology.

Some work around whitelist mailets? Move it as an extension, propose JPA, Cassandra, and XML configured implementations ? With a route to manage entries in there for JPA + Cassandra ?

I would expect a student to do his own little audit and come up with extra suggestions!

Difficulty: Major

Project size: ~175 hour (medium)

Potential mentors:

Benoit Tellier, mail: btellier (at) apache.org

Project Devs, mail: dev (at) james.apache.org

Beam

[GSoC][Beam] An IntelliJ plugin to develop Apache Beam pipelines and the Apache Beam SDKs

Beam library developers and Beam users would appreciate this : )

This project involves prototyping a few different solutions, so it will be large.

Difficulty: Major

Project size: ~350 hour (large)

Potential mentors:

Pablo Estrada, mail: pabloem (at) apache.org

Project Devs, mail: dev (at) beam.apache.org

TrafficControl

GSOC Varnish Cache support in Apache Traffic Control

Background
Apache Traffic Control is a Content Delivery Network (CDN) control plane for large scale content distribution.

Traffic Control currently requires Apache Traffic Server as the underlying cache. Help us expand the scope by integrating with the very popular Varnish Cache.

There are multiple aspects to this project:

Configuration Generation: Write software to build Varnish configuration files (VCL). This code will be implemented in our Traffic Ops and cache client side utilities, both written in Go.

Health Monitoring: Implement monitoring of the Varnish cache health and performance. This code will run both in the Traffic Monitor component and within Varnish. Traffic Monitor is written in Go and Varnish is written in C.

Testing: Adding automated tests for new code

Skills:

Proficiency in Go is required
A basic knowledge of HTTP and caching is preferred, but not required for this project.

Difficulty: Major

Project size: ~350 hour (large)

Potential mentors:

Eric Friedrich, mail: friede (at) apache.org

Project Devs, mail: dev (at) trafficcontrol.apache.org

SkyWalking

[GSOC] [SkyWalking] GSOC 2023 Tasks

This is a placeholder for Apache SkyWalking GSOC 2023 ideas, we are currently brainstorming for projects and will update asap.

There will be at least two projects, one around AIOps algorithms.

Difficulty: Major

Project size: ~350 hour (large)

Potential mentors:

Yihao Chen, mail: yihaochen (at) apache.org

Project Devs, mail: dev (at) skywalking.apache.org

ShardingSphere

Apache ShardingSphere Enhance SQLNodeConverterEngine to support more MySQL SQL statements

Apache ShardingSphere

Apache ShardingSphere is positioned as a Database Plus, and aims at building a standard layer and ecosystem above heterogeneous databases. It focuses on how to reuse existing databases and their respective upper layer, rather than creating a new database. The goal is to minimize or eliminate the challenges caused by underlying databases fragmentation.

Page: https://shardingsphere.apache.org
Github: https://github.com/apache/shardingsphere

Background

The ShardingSphere SQL federation engine provides support for complex SQL statements, and it can well support cross-database join queries, subqueries, aggregation queries and other statements. An important part of SQL federation engine is to convert the SQL statement parsed by ShardingSphere into SqlNode, so that Calcite can be used to implement SQL optimization and federated query.

Task

This issue is to solve the MySQL exception that occurs during SQLNodeConverterEngine conversion. The specific case list is as follows.

select_char
select_extract
select_from_dual
select_from_with_table
select_group_by_with_having_and_window
select_not_between_with_single_table
select_not_in_with_single_table
select_substring
select_trim
select_weight_string
select_where_with_bit_expr_with_ampersand
select_where_with_bit_expr_with_caret
select_where_with_bit_expr_with_div
select_where_with_bit_expr_with_minus_interval
select_where_with_bit_expr_with_mod
select_where_with_bit_expr_with_mod_sign
select_where_with_bit_expr_with_plus_interval
select_where_with_bit_expr_with_signed_left_shift
select_where_with_bit_expr_with_signed_right_shift
select_where_with_bit_expr_with_vertical_bar
select_where_with_boolean_primary_with_comparison_subquery
select_where_with_boolean_primary_with_is
select_where_with_boolean_primary_with_is_not
select_where_with_boolean_primary_with_null_safe
select_where_with_expr_with_and_sign
select_where_with_expr_with_is
select_where_with_expr_with_is_not
select_where_with_expr_with_not
select_where_with_expr_with_not_sign
select_where_with_expr_with_or_sign
select_where_with_expr_with_xor
select_where_with_predicate_with_in_subquery
select_where_with_predicate_with_regexp
select_where_with_predicate_with_sounds_like
select_where_with_simple_expr_with_collate
select_where_with_simple_expr_with_match
select_where_with_simple_expr_with_not
select_where_with_simple_expr_with_odbc_escape_syntax
select_where_with_simple_expr_with_row
select_where_with_simple_expr_with_tilde
select_where_with_simple_expr_with_variable
select_window_function
select_with_assignment_operator
select_with_assignment_operator_and_keyword
select_with_case_expression
select_with_collate_with_marker
select_with_date_format_function
select_with_exists_sub_query_with_project
select_with_function_name
select_with_json_value_return_type
select_with_match_against
select_with_regexp
select_with_schema_name_in_column_projection
select_with_schema_name_in_shorthand_projection
select_with_spatial_function
select_with_trim_expr
select_with_trim_expr_from_expr

You need to compare the difference between actual and expected, and then correct the logic in SQLNodeConverterEngine so that actual can be consistent with expected.

After you make changes, remember to add case to SUPPORTED_SQL_CASE_IDS to ensure it can be tested.

Notice, these issues can be a good example.
https://github.com/apache/shardingsphere/pull/14492

Relevant Skills

1. Master JAVA language

2. Have a basic understanding of Antlr g4 file

3. Be familiar with MySQL and Calcite SqlNode

Targets files

SQLNodeConverterEngineIT

https://github.com/apache/shardingsphere/blob/master/test/it/optimizer/src/test/java/org/apache/shardingsphere/test/it/optimize/SQLNodeConverterEngineIT.java

Mentor

Zhengqiang Duan, PMC of Apache ShardingSphere, duanzhengqiang@apache.org

Chuxin Chen, Committer of Apache ShardingSphere, tuichenchuxin@apache.org

Difficulty: Major

Project size: ~350 hour (large)

Potential mentors:

Zhengqiang Duan, mail: duanzhengqiang (at) apache.org

Project Devs, mail: dev (at) shardingsphere.apache.org

Apache ShardingSphere Add the feature of switching logging framework

Apache ShardingSphere

Apache ShardingSphere is positioned as a Database Plus, and aims at building a standard layer and ecosystem above heterogeneous databases. It focuses on how to reuse existing databases and their respective upper layer, rather than creating a new database. The goal is to minimize or eliminate the challenges caused by underlying databases fragmentation.

Page: https://shardingsphere.apache.org
Github: https://github.com/apache/shardingsphere

Background

ShardingSphere provides two adapters: ShardingSphere-JDBC and ShardingSphere-Proxy.

Now, ShardingSphere uses logback for logging, but consider the following situations:

Users may need to switch the logging framework to meet special needs, such as log4j2 can provide better asynchronous performance;
When using the JDBC adapter, the user application may not use logback, which may cause some conflicts.

Why doesn't the log facade suffice? Because ShardingSphere provides users with clustered logging configurations (such as changing the log level online), this requires dynamic construction of logger, which cannot be achieved with only the log facade.

Task

1. Design and implement logging SPI to support multiple logging frameworks (such as logback and log4j2)
2. Allow users to choose which logging framework to use through the logging rule

Relevant Skills

1. Master JAVA language

2. Basic knowledge of logback and log4j2

3. Maven

Mentor

Longtao Jiang, Committer of Apache ShardingSphere, jianglongtao@apache.org

Difficulty: Major

Project size: ~350 hour (large)

Potential mentors:

Longtao Jiang, mail: jianglongtao (at) apache.org

Project Devs, mail: dev (at) shardingsphere.apache.org

Apache ShardingSphere Support mainstream database metadata table query

Apache ShardingSphere

Apache ShardingSphere is positioned as a Database Plus, and aims at building a standard layer and ecosystem above heterogeneous databases. It focuses on how to reuse existing databases and their respective upper layer, rather than creating a new database. The goal is to minimize or eliminate the challenges caused by underlying databases fragmentation.

Page: https://shardingsphere.apache.org
Github: https://github.com/apache/shardingsphere

Background

ShardingSphere has designed its own metadata database to simulate metadata queries that support various databases.

More details:

https://github.com/apache/shardingsphere/issues/21268
https://github.com/apache/shardingsphere/issues/22052

Task

Support PostgreSQL And openGauss `\d tableName`
Support PostgreSQL And openGauss `\d+`
Support PostgreSQL And openGauss `\d+ tableName`
Support PostgreSQL And openGauss `l`
Support query for MySQL metadata `TABLES`
Support query for MySQL metadata `COLUMNS`
Support query for MySQL metadata `schemata`
Support query for MySQL metadata `ENGINES`
Support query for MySQL metadata `FILES`
Support query for MySQL metadata `VIEWS`

Notice, these issues can be a good example.

https://github.com/apache/shardingsphere/pull/22053
https://github.com/apache/shardingsphere/pull/22057/
https://github.com/apache/shardingsphere/pull/22166/
https://github.com/apache/shardingsphere/pull/22182

Relevant Skills

Master JAVA language
Have a basic understanding of Zookeeper
Be familiar with MySQL/Postgres SQLs

Mentor

Chuxin Chen, Committer of Apache ShardingSphere, tuichenchuxin@apache.org

Zhengqiang Duan, PMC of Apache ShardingSphere, duanzhengqiang@apache.org

Difficulty: Major

Project size: ~350 hour (large)

Potential mentors:

Chuxin Chen, mail: tuichenchuxin (at) apache.org

Project Devs, mail: dev (at) shardingsphere.apache.org

Apache ShardingSphere Enhance ComputeNode reconciliation

Apache ShardingSphere

Apache ShardingSphere is positioned as a Database Plus, and aims at building a standard layer and ecosystem above heterogeneous databases. It focuses on how to reuse existing databases and their respective upper layer, rather than creating a new database. The goal is to minimize or eliminate the challenges caused by underlying databases fragmentation.

Page: https://shardingsphere.apache.org/
Github: https://github.com/apache/shardingsphere

Background

There is a proposal about new CRD Cluster and ComputeNode as belows:

Currently we try to promote ComputeNode as major CRD to represent a special ShardingSphere Proxy deployment. And plan to use Cluster indicating a special ShardingSphere Proxy cluster.

Task

This issue is to enhance ComputeNode reconciliation availability. The specific case list is as follows.

Add IT test case for Deployment spec volume
Add IT test case for Deployment spec template init containers
Add IT test case for Deployment spec template spec containers
Add IT test case for Deployment spec volume mounts
Add IT test case for Deployment spec container ports
Add IT test case for Deployment spec container image tag
Add IT test case for Service spec ports
Add IT test case for ConfigMap data serverconfig
Add IT test case for ConfigMap data logback

Notice, these issues can be a good example.
chore: add more Ginkgo tests for ComputeNode #203

Relevant Skills

Master Go language, Ginkgo test framework
Have a basic understanding of Apache ShardingSphere Concepts
Be familiar with Kubernetes Operator, kubebuilder framework

Targets files

ComputeNode IT - https://github.com/apache/shardingsphere-on-cloud/blob/main/shardingsphere-operator/pkg/reconcile/computenode/compute_node_test.go

Mentor

Liyao Miao, Committer of Apache ShardingSphere, miaoliyao@apache.org

Difficulty: Major

Project size: ~350 hour (large)

Potential mentors:

Chuxin Chen, mail: tuichenchuxin (at) apache.org

Project Devs, mail: dev (at) shardingsphere.apache.org

Apache ShardingSphere Add ShardingSphere Kafka source connector

Apache ShardingSphere

Apache ShardingSphere is positioned as a Database Plus, and aims at building a standard layer and ecosystem above heterogeneous databases. It focuses on how to reuse existing databases and their respective upper layer, rather than creating a new database. The goal is to minimize or eliminate the challenges caused by underlying databases fragmentation.

Page: https://shardingsphere.apache.org
Github: https://github.com/apache/shardingsphere

Background

The community just added CDC (change data capture) feature recently. Change feed will be published in created network connection after logging in, then it could be consumed.

Since Kafka is popular distributed event streaming platform, it's useful to import change feed into Kafka for later processing.

Task

Familiar with ShardingSphere CDC client usage, create publication and subscribe change feed.
Familiar with Kafka connector development, develop source connector, integrate with ShardingSphere CDC. Persist change feed to Kafka topics properly.

Relevant Skills

1. Java language

2. Basic knowledge of CDC and Kafka

3. Maven

References

Mentor

Hongsheng Zhong, PMC of Apache ShardingSphere, zhonghongsheng@apache.org

Difficulty: Major

Project size: ~350 hour (large)

Potential mentors:

Hongsheng Zhong, mail: zhonghongsheng (at) apache.org

Project Devs, mail: dev (at) shardingsphere.apache.org

RocketMQ

GSoC Integrate RocketMQ 5.0 client with Spring

Apache RocketMQ

Apache RocketMQ is a distributed messaging and streaming platform with low latency, high performance and reliability, trillion-level capacity and flexible scalability.

Page: https://rocketmq.apache.org
Github: https://github.com/apache/rocketmq

Background

RocketMQ 5.0 client has been released recently, we need to integrate it with Spring.

Task

Familiar with RocketMQ 5.0 java client usage, you could see more details from https://github.com/apache/rocketmq-clients/tree/master/java and https://rocketmq.apache.org/docs/quickStart/01quickstart
Integrate with Spring.

Relevant Skills

Java language
Basic knowledge of RocketMQ 5.0
Spring

Mentor

Yangkun Ai, PMC of Apache RocketMQ, aaronai@apache.org

Difficulty: Major

Project size: ~175 hour (medium)

Potential mentors:

Yangkun Ai, mail: aaronai (at) apache.org

Project Devs, mail: dev (at) rocketmq.apache.org

GSoC Support Logging exporter for metrics in RocketMQ

Apache RocketMQ

Apache RocketMQ is a distributed messaging and streaming platform with low latency, high performance and reliability, trillion-level capacity and flexible scalability.

Page: https://rocketmq.apache.org
Github: https://github.com/apache/rocketmq

Background

RocketMQ 5.0 supports metrics based on OpenTelemetry. However, typically metrics data needs to be collected by an OpenTelemetry Collector, in which case the OTLP metrics exporter is used. If there is no collector-based component, it is also possible to print the metrics data directly to logs, which makes troubleshooting problems more convenient.

Relevant Skills

Java language
Basic knowledge of CNCF OpenTelemetry

Mentor

Yangkun Ai, PMC of Apache RocketMQ, aaronai@apache.org

Difficulty: Major

Project size: ~175 hour (medium)

Potential mentors:

Yangkun Ai, mail: aaronai (at) apache.org

Project Devs, mail: dev (at) rocketmq.apache.org

StreamPipes

Code Insights for Apache StreamPipes

Apache StreamPipes

Apache StreamPipes (incubating) is a self-service (Industrial) IoT toolbox to enable non-technical users to connect, analyze and explore IoT data streams. StreamPipes offers several modules including StreamPipes Connect to easily connect data from industrial IoT sources, the Pipeline Editor to quickly create processing pipelines and several visualization modules for live and historic data exploration. Under the hood, StreamPipes utilizes an event-driven microservice paradigm of standalone, so-called analytics microservices making the system easy to extend for individual needs.

Background

StreamPipes has grown significantly throughout recent years. We were able to introduce a lot of new features and attracted both users and contributors. Putting the cherry on the cake, we were graduated as an Apache top level project in December 2022. We will of course continue developing new features and never rest to make StreamPipes even more amazing. Although, since we are approaching with full stream towards our `1.0` release, we want to project also to get more mature. Therefore, we want to address one of our Achilles' heels: our test coverage.

Don't worry, this issue is not about implementing myriads of tests for our code base. As a first step, we would like to make the status quo transparent. That means we want to measure our code coverage consistently across the whole codebase (Backend, UI, Python library) and report the coverage to codecov. Furthermore, to benchmark ourselves and motivate us to provide tests with every contributing, we would like to lock the current test coverage as an lower threshold that we always want to achieve (meaning in case we drop CI builds fail etc). With time we then can increase the required coverage lever step to step.

More than monitoring our test coverage, we also want to invest in better and more clean code. Therefore, we would like to adopt sonarcloud for our repository.

Tasks

[ ] calculate test coverage for all main parts of the repo
[ ] send coverage to codeCov
[ ] determine coverage threshold and let CI fail if below
[ ] include sonarcloud in CI setup
[ ] include automatic coverage report in PR validation (see an example here ) -> optional
[ ] include automatic sonarcloud report in PR validation -> optional
[ ] what ever comes in your mind 💡 further ideas are always welcome

❗Important Note❗

Do not create any account in behalf of Apache StreamPipes in Sonarcloud or in CodeCov or using the name of Apache StreamPipes for any account creation. Your mentor will take care of it.

Relevant Skills

basic knowledge about GitHub worfklows

Learning Material

GitHub workflow docs
Apache StreamPipes workflows
Sonarcloud for Monorepos
Using code cov for a monorepo: https://www.curtiscode.dev/post/tools/codecov-monorepo/ & https://docs.codecov.com/docs/flags

References

You can find our corresponding issue on GitHub here

Name and Contact Information

Name: Tim Bossenmaier

email: bossenti[at]apache.org

community: dev[at]streampipes.apache.org

website: https://streampipes.apache.org/

Difficulty: Major

Project size: ~175 hour (medium)

Potential mentors:

Tim Bossenmaier, mail: bossenti (at) apache.org

Project Devs, mail: dev (at) streampipes.apache.org

Commons Statistics

[GSoC] Summary statistics API for Java 8 streams

Placeholder for tasks that could be undertaken in this year's GSoC.

Ideas:

Design an updated summary statistics API for use with Java 8 streams based on the summary statistic implementations in the Commons Math stat.descriptive package including moments, rank and summary sub-packages.

Difficulty: Minor

Project size: ~350 hour (large)

Potential mentors:

Alex Herbert, mail: aherbert (at) apache.org

Project Devs, mail:

Commons Numbers

Add support for extended precision floating-point numbers

Add implementations of extended precision floating point numbers.

An extended precision floating point number is a series of floating-point numbers that are non-overlapping such that:

double-double (a, b):
            |a| > |b|
            a == a + b

Common representations are double-double and quad-double (see for example David Bailey's paper on a quad-double library: QD).

Many computations in the Commons Numbers and Statistics libraries use extended precision computations where the accumulated error of a double would lead to complete cancellation of all significant bits; or create intermediate overflow of integer values.

This project would formalise the code underlying these use cases with a generic library applicable for use in the case where the result is expected to be a finite value and using Java's BigDecimal and/or BigInteger negatively impacts performance.

An example would be the average of long values where the intermediate sum overflows or the conversion to a double loses bits:

            long[] values = {Long.MAX_VALUE, Long.MAX_VALUE};
            System.out.println(Arrays.stream(values).average().getAsDouble()); System.out.println(Arrays.stream(values).mapToObj(BigDecimal::valueOf)
            .reduce(BigDecimal.ZERO, BigDecimal::add)
            .divide(BigDecimal.valueOf(values.length)).doubleValue());
            long[] values2 = {Long.MAX_VALUE, Long.MIN_VALUE};
            System.out.println(Arrays.stream(values2).asDoubleStream().average().getAsDouble()); System.out.println(Arrays.stream(values2).mapToObj(BigDecimal::valueOf)
               .reduce(BigDecimal.ZERO, BigDecimal::add)
            .divide(BigDecimal.valueOf(values2.length)).doubleValue());

Outputs:

-1.0
            9.223372036854776E18
            0.0
            -0.5

Difficulty: Major

Project size: ~175 hour (medium)

Potential mentors:

Alex Herbert, mail: aherbert (at) apache.org

Project Devs, mail: dev (at) commons.apache.org

Commons Math

[GSoC] Update components including machine learning; linear algebra; special functions

Placeholder for tasks that could be undertaken in this year's GSoC.

Ideas (extracted from the "dev" ML):

Redesign and modularize the "ml" package
-> main goal: enable multi-thread usage.
Abstract the linear algebra utilities
-> main goal: allow switching to alternative implementations.
Redesign and modularize the "random" package
-> main goal: general support of low-discrepancy sequences.
Refactor and modularize the "special" package
-> main goals: ensure accuracy and performance and better API,
add other functions.
Upgrade the test suite to Junit 5
-> additional goal: collect a list of "odd" expectations.
Review and finalize pending issues about the refactoring of the "genetic algorithm" functionality (cf. dedicated branch)

Other suggestions welcome, as well as

delineating additional and/or intermediate goals,
signalling potential pitfalls and/or alternative approaches to the intended goal(s).

Difficulty: Minor

Project size: ~350 hour (large)

Potential mentors:

Gilles Sadowski, mail: erans (at) apache.org

Project Devs, mail: dev (at) commons.apache.org

Commons Imaging

Placeholder for 1.0 release

A placeholder ticket, to link other issues and organize tasks related to the 1.0 release of Commons Imaging.

The 1.0 release of Commons Imaging has been postponed several times. Now we have a more clear idea of what's necessary for the 1.0 (see issues with fixVersion 1.0 and 1.0-alpha3, and other open issues), and the tasks are interesting as it involves both basic and advanced programming for tasks such as organize how test images are loaded, or work on performance improvements at byte level and following image format specifications.

The tasks are not too hard to follow, as normally there are example images that need to work with Imaging, as well as other libraries in C, C++, Rust, PHP, etc., that process these images correctly. Our goal with this issue is to a) improve our docs, b) improve our tests, c) fix possible security issues, d) get the parsers in Commons Imaging ready for the 1.0 release.

Assigning the label for GSoC 2023, and full time. Although it would be possible to work on a smaller set of tasks for 1.0 as a part time too.

Difficulty: Major

Project size: ~350 hour (large)

Potential mentors:

Bruno P. Kinoshita, mail: kinow (at) apache.org

Project Devs, mail:

CloudStack

CloudStack GSoC 2023 - Autodetect IPs used inside the VM

Github issue: https://github.com/apache/cloudstack/issues/7142

Description:

With regards to IP info reporting, Cloudstack relies entirely on it's DHCP data bases and so on. When this is not available (L2 networks etc) no IP information is shown for a given VM.

I propose we introduce a mechanism for "IP autodetection" and try to discover the IPs used inside the machines by means of querying the hypervisors. For example with KVM/libvirt we can simply do something like this:

{{root@fedora35 ~]# virsh domifaddr win2k22 --source agent
Name MAC address Protocol Address
-------------------------------------------------------------------------------
Ethernet 52:54:00:7b:23:6a ipv4 192.168.0.68/24
Loopback Pseudo-Interface 1 ipv6 ::1/128

- ipv4 127.0.0.1/8}}
The above command queries the qemu-guest-agent inside the Windows VM. The VM needs to have the qemu-guest-agent installed and running as well as the virtio serial drivers (easily done in this case with virtio-win-guest-tools.exe ) as well as a guest-agent socket channel defined in libvirt.

Once we have this information we could display it in the UI/API as "Autodetected VM IPs" or something like that.

I imagine it's very similar for VMWare and XCP-ng.

Thank you

Difficulty: Major

Project size: ~175 hour (medium)

Potential mentors:

Nicolás Vázquez, mail: nvazquez (at) apache.org

Project Devs, mail: dev (at) cloudstack.apache.org

CloudStack GSoC 2023 - Extend Import-Export Instances to the KVM Hypervisor

Github issue: https://github.com/apache/cloudstack/issues/7127

Description:

The Import-Export functionality is only allowed for the Vmware hypervisor. The functionality is developed within a VM ingestion framework that allows the extension to other hypervisors. The Import-Export functionality consists on few APIs and the UI to interact with them:

listUnmanagedInstances: Lists unmanaged virtual machines (not existing in CloudStack but existing on the hypervisor side)
importUnmanagedInstance: Import an unmanaged VM into CloudStack (this implies populating the database with the corresponding data)
unmanageVirtualMachine: Make CloudStack forget a VM but do not remove it on the hypervisor side

The complexity on KVM should be parsing the existing XML domains into different resources and map them in CloudStack to populate the database correctly.

As more and more projects start to develop based on Gradle and profit from Gradle, Dubbo also hopes to migrate to the Gradle project. This task requires you to transform the dubbo project[1] into a gradle project.

[1] https://github.com/apache/dubbo

Difficulty: Major

Project size: ~350 hour (large)

Potential mentors:

[SKIN] Update Commons Skin Bootstrap

Our Commons components use Commons Skin, a skin, or theme, for Apache Maven Site.

Our skin uses Bootstrap 2.x, but Bootstrap is already at 5.x release, and we are missing several improvements (UIUX, accessibility, browser compatibility) and JS/CSS bugs fixed over the years.

Work happening on Apache Maven Skins. Maybe we could adapt/use that one?

https://issues.apache.org/jira/browse/MSKINS-97

Difficulty: Minor

Project size: ~175 hour (medium)

Potential mentors:

Bruno P. Kinoshita, mail: kinow (at) apache.org

Project Devs, mail:

Airavata

[GSoC] Integrate JupyterHub with Airavata Django Portal

The Airavata Django Portal [1] allows users to create, execute and monitor computational experiments. However, when a user wants to then post-process or visualize the output of that computational experiment they must then download the output files and run tools that they may have on their computer or other systems. By integrating with JupyterHub the Django Portal can give users an environment in which they can explore the experiment's output data and gain insights.

The main requirements are:

from the Django Portal a user can click a button and navigate to a JupyterHub instance that the user is immediately logged into using single sign on
the user can save the Jupyter notebook and later retrieve it
the user's files are available within the context of the running Jupyter instance
- ideally users can also generate new outputs in the Jupyter instance and have them saved back in their portal data storage
users can share their notebooks with other portal users
(bonus) portal admins can suggest notebooks to use with specific applications so that with one click a user can open an experiment in a provided notebook
users can manage their notebooks and can, for example, clone a notebook

[1] https://github.com/apache/airavata-django-portal

Difficulty: Major

Project size: ~350 hour (large)

Potential mentors:

Marcus Christie, mail: marcuschristie (at) apache.org

Project Devs, mail: dev (at) airavata.apache.org

Apache Superset Dashboards to Airavata Catalogs

Integrate Apache Superset (https://superset.apache.org/) to visualize Airavata Catalogs (https://github.com/apache/airavata/tree/master/modules/registry)

Examples like this and stack overflow threads seem to indicate it is possible - https://medium.com/@s.akashb/apache-superset-integration-with-keycloak-a302840c290c

Integrate with Custos
We can start out by directly interfacing with MariaDB, but need to explore if we can write a superset DB Driver following the Hive example - https://github.com/apache/superset/blob/0409b12a55e893d88f6e992a7df247841a2da8f0/superset/db_engines/hive.py

Difficulty: Major

Project size: ~350 hour (large)

Potential mentors:

Suresh Marru, mail: smarru (at) apache.org

Project Devs, mail: dev (at) airavata.apache.org

Airavata Jupyter Platform Services

UI Framework
1. To host the jupyter environment we will need to envolop the notebooks in a user interface and connect it with Apache Airavata services
2. Leverage Airavata communications from within the Django Portal - https://github.com/apache/airavata-django-portal
3. Explore if the platform is better to be developed as VSCode extensions leveraging jupyter extensions like - https://github.com/Microsoft/vscode-jupyter
4. Alternatively, explore developing a standalone native application using ElectronJS
Draft up a platform architecture - Airavata based infrastructure with functionality similar to collab.
Authenticate with Airavata Custos Framework - https://github.com/apache/airavata-custos
Extend Notebook filesystem using the virtual file system approaching integration with Airavata based storage and catalog
Make the notebooks registered with Airavata app catalog and experiment catalog.

Advanced Possibilities:

Explore Multi-tenanted JupyterHub

Can K8 namespace isolation accomplish?
Make deployment of Jupyter support as part of the default core
Data and the user-level tenancy can be assumed, how to make sure infrastructure can isolate them, like not one gateway crashing a hosting environment.

How to leverage computational resources jupypter hub

Difficulty: Major

Project size: ~350 hour (large)

Potential mentors:

Suresh Marru, mail: smarru (at) apache.org

Project Devs, mail: dev (at) airavata.apache.org

Dashboards to get quick statistics

Gateway admins need period reports for various reporting and planning.

Features Include:

Compute resources across that had at least one job submitted during the period <start date - End date>
User groups created within a given period and how many users are in those and with permission levels and also number of jobs each user have submitted.
List applications and number of jobs for each applications for a given period and group them by job status.
Number of users that at least submitted a single job for the period <start date - End date>
Total number of Unique Users
User Registration Trends
Number of experiments for a given period <Start date - End date> grouped by the experiment status
The total cpu-hours used by a users, sorted, quarterly, plotted over a period of time
The total cpu-hours consumed by application, sorted, quarterly, plotted over a period of time

Difficulty: Major

Project size: ~350 hour (large)

Potential mentors:

Suresh Marru, mail: smarru (at) apache.org

Project Devs, mail: dev (at) airavata.apache.org

Provide meta scheduling capabilities within Airavata

As discussed on the architecture mailing list [1] and summarized at [2], Airavata will need to develop a metascheduler. In the short term, a user request (demeler, gobert) is to have airavata throttle jobs to resources. In the future more informed scheduling strategies needs to be integrated. Hopefully, the actual scheduling algorithms can be borrowed from third party implementations.

[1] - http://markmail.org/message/tdae5y3togyq4duv
[2] - https://cwiki.apache.org/confluence/display/AIRAVATA/Airavata+Metascheduler

Difficulty: Major

Project size: ~350 hour (large)

Potential mentors:

Suresh Marru, mail: smarru (at) apache.org

Project Devs, mail: dev (at) airavata.apache.org

Enhance File Transports in MFT

Complete all transports in MFT

Currently SCP, S3 is known to work
Others need effort to optimize, test, and declare readiness
Develop a complete a fully functional MFT Command-line interface
Have a feature-complete Python SDK
A minimum implementation will be prvoided, students need to complete it and test it.

Difficulty: Major

Project size: ~350 hour (large)

Potential mentors:

Suresh Marru, mail: smarru (at) apache.org

Project Devs, mail: dev (at) airavata.apache.org

Custos Backup and Restore

Custos does not have the capabilities to efficiently backup and restore a live instance. This is essential for high available services.

Difficulty: Major

Project size: ~350 hour (large)

Potential mentors:

Suresh Marru, mail: smarru (at) apache.org

Project Devs, mail: dev (at) airavata.apache.org

Airavata Rich Client based on ElectronJS

Using SEAGrid Rich Client as an example, develop a native application based on electronJS to mimic Airavata Django Portal.

Reference example - https://github.com/SciGaP/seagrid-rich-client

Difficulty: Major

Project size: ~350 hour (large)

Potential mentors:

Suresh Marru, mail: smarru (at) apache.org

Project Devs, mail: dev (at) airavata.apache.org

Space shortcuts

Child pages

James Server

Why ?

Benefits for the student

Inventory...

Beam

TrafficControl

SkyWalking

ShardingSphere

Apache ShardingSphere

Background

Task

Relevant Skills

Targets files

Mentor

Apache ShardingSphere

Background

Task

Relevant Skills

Mentor

Apache ShardingSphere

Background

Task

Relevant Skills

Mentor

Apache ShardingSphere

Background

Task

Relevant Skills

Targets files

Mentor

Apache ShardingSphere

Background

Task

Relevant Skills

References

Mentor

RocketMQ

Apache RocketMQ

Background

Task

Relevant Skills

Mentor

Apache RocketMQ

Background

Related Issue

Relevant Skills

Mentor

StreamPipes

Apache StreamPipes

Background

Tasks

❗Important Note❗

Relevant Skills

Learning Material

Name and Contact Information

Commons Statistics

Commons Numbers

Commons Math

Commons Imaging

CloudStack

Apache Dubbo

Background

Target

Background

Target

Background

Target

Apache Commons All

Airavata