Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Contents

...

Apache ShenYu Gsoc 2023 - ShenYu End-To-End SpringCloud plugin test case

Background:

Shenyu is a native API gateway for service proxy, protocol translation and API governance. but Shenyu lack of End-To-End Tests.

Relevant skills:

1.Understand the architecture of ShenYu

2.Understand SpringCloud micro-service and ShenYu SpringCloud proxy plugin.

3.Understand ShenYu e2e framework and architecture.

How to coding

1.please refer to org.apache.shenyu.e2e.testcase.plugin.DividePluginCases

How to test

1.start shenyu admin in docker

2.start shenyu boostrap in docker

3.run test case org.apache.shenyu.e2e.testcase.plugin.PluginsTest#testDivide

Task List

1.develop e2e tests of the springcloud plug-in.

2.write shenyu e2e springcloud plugin documentation in shenyu-website.

3.refactor the existing plugin test cases.


Links:

website: https://shenyu.apache.org/

issues: https://github.com/apache/shenyu/issues/4474


Difficulty: Major
Project size: ~175 hour (medium)
Potential mentors:
Fengen He, mail: hefengen (at) apache.org
Project Devs, mail: dev (at) shenyu.apache.org

TrafficControl

GSOC Varnish Cache support in Apache Traffic Control

Background
Apache Traffic Control is a Content Delivery Network (CDN) control plane for large scale content distribution.

Traffic Control currently requires Apache Traffic Server as the underlying cache. Help us expand the scope by integrating with the very popular Varnish Cache.

There are multiple aspects to this project:

  • Configuration Generation: Write software to build Varnish configuration files (VCL). This code will be implemented in our Traffic Ops and cache client side utilities, both written in Go.
  • Health Monitoring: Implement monitoring of the Varnish cache health and performance. This code will run both in the Traffic Monitor component and within Varnish. Traffic Monitor is written in Go and Varnish is written in C.
  • Testing: Adding automated tests for new code

Skills:

  • Proficiency in Go is required
  • A basic knowledge of HTTP and caching is preferred, but not required for this project.
Difficulty: Major
Project size: ~350 hour (large)
Potential mentors:
Eric Friedrich, mail: friede (at) apache.org
Project Devs, mail: dev (at) trafficcontrol.apache.org

Apache ShenYu Gsoc 2023 - ShenYu WasmPlugin

Add server indicator if a server is a cache

Background:{}

Apache ShenYu is a Java native API Gateway for service proxy, protocol conversion and API governance. Currently, ShenYu has good scalability in the Java language. However, ShenYu's support for multiple languages is still relatively weak.

The wasm bytecode is designed to be encoded in a size- and load-time-efficient binary format. WebAssembly aims to execute at native speed by taking advantage of common hardware capabilities available on a wide range of platforms.

The goal of WasmPlugin is to be able to run wasm bytecode(wasmer-java is a good choice, if you find a better choice, please discuss with me), and other languages can write ShenYu plugins based on this language(such as Rust/golang/C++) as long as they can be compiled into wasm bytecode.

More documents on wasm and WASI are as follows:

https://github.com/apache/trafficcontrol/issues/7076WebAssembly/design
https://github.com/WebAssembly/WASI

Relevant Skills

Know the use of Apache ShenYu, especially the plugin
Familiar with Java and another language which can be compiled into wasm bytecode

Task List

1.develop shenyu-wasm-plugin.

2.write integrated test for shenyu-wasm-plugin.

3.write wasm plugin documentation in shenyu-website.


Links:

website: https://shenyu.apache.org/

issues:  https://github.com/apache/shenyu/issues/4492

Difficulty: MajorDifficulty: Trivial
Project size: ~175 hour (medium)
Potential mentors:
Brennan FieckZiCheng Zhang, mail: ocket8888 zhangzicheng (at) apache.org
Project Devs, mail: dev (at) trafficcontrolshenyu.apache.org

Doris

TrafficControl

GSOC Varnish Cache support in Apache Traffic Control

Background
Apache Traffic Control is a Content Delivery Network (CDN) control plane for large scale content distribution.

Traffic Control currently requires Apache Traffic Server as the underlying cache. Help us expand the scope by integrating with the very popular Varnish Cache.

There are multiple aspects to this project:

  • Configuration Generation: Write software to build Varnish configuration files (VCL). This code will be implemented in our Traffic Ops and cache client side utilities, both written in Go.
  • Health Monitoring: Implement monitoring of the Varnish cache health and performance. This code will run both in the Traffic Monitor component and within Varnish. Traffic Monitor is written in Go and Varnish is written in C.
  • Testing: Adding automated tests for new code

Skills:

  • Proficiency in Go is required
  • A basic knowledge of HTTP and caching is preferred, but not required for this project.
Difficulty: Major
Project size: ~350 hour (large)
Potential mentors:
Eric Friedrich, mail: friede (at) apache.org
Project Devs, mail: dev (at) trafficcontrol.apache.org

Add server indicator if a server is a cache

[GSoC][Doris]Page Cache Improvement

Apache Doris
Apache Doris is a real-time analytical database based on MPP architecture. As a unified platform that supports multiple data processing scenarios, it ensures high performance for low-latency and high-throughput queries, allows for easy federated queries on data lakes, and supports various data ingestion methods.
Page: https://doris.apache.org

Github: https://github.com/apache/doris

Background

Apache Doris accelerates high-concurrency queries utilizing page cache, where the decompressed data is stored.
Currently, the page cache in Apache Doris uses a simple LRU algorithm, which reveals a few problems: 

  • Hot data will be phased out in large queries
  • The page cache configuration is immutable and does not support GC.

Task

  • Phase One: Identify the impacts on queries when the decompressed data is stored in memory and SSD, respectively, and then determine whether full page cache is required.
  • Phase Two: Improve the cache strategy for Apache Doris based on the results from Phase One.

Learning Material

Page: https://doris.apache.org
Github: https://github.com/apache/doris

Mentor

trafficcontrol/issues/7076

Difficulty: TrivialDifficulty: Major
Project size: ~350 ~175 hour (largemedium)
Potential mentors:
Zhijing LuBrennan Fieck, mail: luzhijing ocket8888 (at) apache.org
Project Devs, mail: dev (at) doristrafficcontrol.apache.org

Doris

[GSoC][Doris]

Supports BigQuery/Apache Kudu/Apache Cassandra/Apache Druid in Federated Queries

Page Cache Improvement

Apache Doris
Apache Doris is a real-time analytical database based on MPP architecture. As a unified platform that supports multiple data processing scenarios, it ensures high performance for low-latency and high-throughput queries, allows for easy federated queries on data lakes, and supports various data ingestion methods.
Page: https://doris.apache.org

Github: https://github.com/apache/doris

Background

Apache Doris supports acceleration of queries on external data sources to meet users' needs for federated queries and analysisaccelerates high-concurrency queries utilizing page cache, where the decompressed data is stored.
Currently, Apache Doris supports multiple external catalogs including those from Hive, Iceberg, Hudi, and JDBC. Developers can connect more data sources to Apache Doris based on a unified framework.

Objective

the page cache in Apache Doris uses a simple LRU algorithm, which reveals a few problems: 

  • Hot data will be phased out in large queries
  • The page cache configuration is immutable and does not support GC.
  • Enable Apache Doris to access one or more of these data sources via the Multi-Catalog feature: BigQuery/Kudu/Cassandra/Druid;
  • Compile relevant documentation. See an example here: https://doris.apache.org/docs/dev/lakehouse/multi-catalog/hive

Task

  • Phase One:
  • Identify the impacts on queries when the decompressed data is stored in memory and SSD, respectively, and then determine whether full page cache is required.
  • Phase Two: Improve the cache strategy for Apache Doris based on the results from Phase One
  • Get familiar with the Multi-Catalog structure of Apache Doris, including the metadata synchronization mechanism in FE and the data reading mechanism of BE.
  • Investigate how metadata should be acquired and how data access works regarding the picked data source(s); produce the corresponding design documentation.

Phase Two:

  • Develop connections to the picked data source(s) and implement access to metadata and data.

Learning Material

Page: https://doris.apache.org
Github: https://github.com/apache/doris

Mentor

Difficulty: Major
Project size: ~350 hour (large)
Potential mentors:
Zhijing Lu, mail: luzhijing (at) apache.org
Project Devs, mail: dev (at) doris.apache.org

[GSoC][Doris]

Dictionary Encoding Acceleration

Supports BigQuery/Apache Kudu/Apache Cassandra/Apache Druid in Federated Queries

Apache Doris
Apache Doris is a real-time analytical database based on MPP architecture. As a unified platform that supports multiple data processing scenarios, it ensures high performance for low-latency and high-throughput queries, allows for easy federated queries on data lakes, and supports various data ingestion methods.
Page: https://doris.apache.org
Github: https://github.com/apache/doris

Background

In Apache Doris, dictionary encoding is performed during data writing and compaction. Dictionary encoding will be implemented on string data types by default. The dictionary size of a column for one segment is 1M at most. The dictionary encoding technology accelerates strings during queries, converting them into INT, for example.
 

Task

Apache Doris supports acceleration of queries on external data sources to meet users' needs for federated queries and analysis.
Currently, Apache Doris supports multiple external catalogs including those from Hive, Iceberg, Hudi, and JDBC. Developers can connect more data sources to Apache Doris based on a unified framework.

Objective

Task
Phase One:

  • Get familiar with the Multi-Catalog structure of Apache Doris, including the metadata synchronization mechanism in FE and the data reading mechanism of BE.
  • Investigate how metadata should be acquired and how data access works regarding the picked data source(s); produce the corresponding design documentation.

Phase Two:

  • Develop connections to the picked data source(s) and implement access to metadata and data
  • Phase One: Get familiar with the implementation of Apache Doris dictionary encoding; learning how Apache Doris dictionary encoding accelerates queries.
  •  Phase Two: Evaluate the effectiveness of full dictionary encoding and figure out how to optimize memory in such a case.

Learning Material

Page: https://doris.apache.org
Github: https://github.com/apache/doris

Mentor

Difficulty: Major
Project size: ~350 hour (large)
Potential mentors:
Zhijing Lu, mail: luzhijing (at) apache.org
Project Devs, mail: dev (at) doris.apache.org

Beam

[GSoC][

Beam] Build out Beam Machine Learning Use Cases

Doris]Dictionary Encoding Acceleration

Apache Doris
Apache Doris is a real-time analytical database based on MPP architecture. As a unified platform that supports multiple data processing scenarios, it ensures high performance for low-latency and high-throughput queries, allows for easy federated queries on data lakes, and supports various data ingestion methods.
Page: https://doris.apache.org

Github: https://github.com/apache/doris

Background

In Apache Doris, dictionary encoding is performed during data writing and compaction. Dictionary encoding will be implemented on string data types by default. The dictionary size of a column for one segment is 1M at most. The dictionary encoding technology accelerates strings during queries, converting them into INT, for example.
 

Task

  • Phase One: Get familiar with the implementation of Apache Doris dictionary encoding; learning how Apache Doris dictionary encoding accelerates queries.
  •  Phase Two: Evaluate the effectiveness of full dictionary encoding and figure out how to optimize memory in such a case.

Learning Material

Page: https://doris.apache.org
Github: https://github.com/apache/doris

Mentor

Difficulty: Major
Project size: ~350 hour (large)
Potential mentors:
Zhijing Lu, mail: luzhijing (at) apache.org
Project Devs, mail: dev (at) doris.apache.org

Beam

[GSoC][Beam] Build out Beam Machine Learning Use Cases

Today, you can do all sorts of Machine Learning using Apache Beam (https://beam.apache.org/documentation/ml/overview/).
 
Many of our users, however, have a hard time getting started with ML and understanding how Beam can be applied to their day to day work. The goal of this project is to build out a series of Beam pipelines as Jupyter Notebooks demonstrating real world ML use cases, from NLP to image recognition to using large language models. As you go, there may be bugs or friction points as well which will provide opportunities to contribute back to Beam's core ML libraries.


Mentor for this will be Danny McCormick

Difficulty: Major
Project size: ~350 hour (large)
Potential mentors:
Pablo Estrada

Today, you can do all sorts of Machine Learning using Apache Beam (https://beam.apache.org/documentation/ml/overview/).
 
Many of our users, however, have a hard time getting started with ML and understanding how Beam can be applied to their day to day work. The goal of this project is to build out a series of Beam pipelines as Jupyter Notebooks demonstrating real world ML use cases, from NLP to image recognition to using large language models. As you go, there may be bugs or friction points as well which will provide opportunities to contribute back to Beam's core ML libraries.

Mentor for this will be Danny McCormick

Difficulty: Major
Project size: ~350 hour (large)
Potential mentors:
Pablo Estrada, mail: pabloem (at) apache.org
Project Devs, mail: dev (at) beam.apache.org

[GSoC][Beam] Advancing the Beam-on-Ray runner

There is a community effort to build a Beam runner to run Beam pipelines on top of Ray: https://github.com/ray-project/ray_beam_runner/

This involves pushing that project forward. It will require writing lots of Python code, and specifically going through the list of issues (https://github.com/ray-project/ray_beam_runner/issues) and solving as many of them as possible to make sure the runner is compliant.

Good resource docs:

This project is large.

Difficulty: Major
Project size: ~350 hour (large)
Potential mentors:
Pablo Estrada, mail: pabloem (at) apache.org
Project Devs, mail: dev (at) beam.apache.org

[GSoC][Beam] Advancing the

Rust SDK on Beam

Beam-on-Ray runner

Beam has an experimental, ongoing implementation for a Rust SDK.

This project involves advancing that implementation and making sure it's compiant with Beam standards.

Good resource materials:

There is a community effort to build a Beam runner to run Beam pipelines on top of Ray:

https://

lists.apache.org/thread/xg9xq0btp8k1wh2v1gpqyfhwpsyxq4ds

This project is large.

Difficulty: Major
Project size: ~350 hour (large)
Potential mentors:
Pablo Estrada, mail: pabloem (at) apache.org
Project Devs, mail: dev (at) beam.apache.org

[GSoC][Beam] An IntelliJ plugin to develop Apache Beam pipelines and the Apache Beam SDKs

github.com/ray-project/ray_beam_runner/


This involves pushing that project forward. It will require writing lots of Python code, and specifically going through the list of issues (https://github.com/ray-project/ray_beam_runner/issues) and solving as many of them as possible to make sure the runner is compliant.

Good resource docs:

This project is

Beam library developers and Beam users would appreciate this : )

This project involves prototyping a few different solutions, so it will be large.

Difficulty: Major
Project size: ~350 hour (large)
Potential mentors:
Pablo Estrada, mail: pabloem (at) apache.org
Project Devs, mail: dev (at) beam.apache.org

Comdev GSOC

[GSoC][

Airflow] Automation for PMC

This is a project to implement a tool for PMC task automation.

This is a large project.

Mentor will be aizhamal ,

Beam] Advancing the Rust SDK on Beam

Beam has an experimental, ongoing implementation for a Rust SDK.

This project involves advancing that implementation and making sure it's compiant with Beam standards.

Good resource materials:

This project is large.

Difficulty: Major
Project size: ~350 hour (large)
Potential mentors:
Pablo Estrada, mail: pabloem (at) apache.org
Project Devs, mail: dev (at) beam.apache.org

[GSoC][Beam] An IntelliJ plugin to develop Apache Beam pipelines and the Apache Beam SDKs

Beam library developers and Beam users would appreciate this : )


This project involves prototyping a few different solutions, so it will be large.

Difficulty: Major
Project size: ~350 hour (large)
Potential mentors:
Pablo Estrada, mail: pabloem (at) apache.org
Project Devs, mail: dev (at) beam.apache.org

Airflow

[GSoC][Airflow] Automation for PMC

This is a project to implement a tool for PMC task automation.


This is a large project.


Mentor will be aizhamal ,

Difficulty: Major
Project size: ~350 hour (large)
Potential mentors:
Pablo Estrada, mail: pabloem (at) apache.org
Project Devs, mail: dev (at) airflow.apache.org

Teaclave

[GSoC][Teaclave (incubating)] Data Privacy Policy Definition and Function Verification

Background

The Apache Teaclave (incubating) is a cutting-edge solution for confidential computing, providing Function-as-a-Service (FaaS) capabilities that enable the decoupling of data and function providers. Despite

Teaclave (incubating)] Data Privacy Policy Definition and Function Verification

Background

The Apache Teaclave (incubating) is a cutting-edge solution for confidential computing, providing Function-as-a-Service (FaaS) capabilities that enable the decoupling of data and function providers. Despite its impressive functionality and security features, Teaclave currently lacks a mechanism for data providers to enforce policies on the data they upload. For example, data providers may wish to restrict access to certain columns of data for third-party function providers. Open Policy Agent (OPA) offers flexible control over service behavior and has been widely adopted by the cloud-native community. If Teaclave were to integrate OPA, data providers could apply policies to their data, enhancing Teaclave’s functionality. Another potential security loophole in Teaclave is the absence of a means to verify the expected behavior of a function. This gap leaves the system vulnerable to exploitation by malicious actors. Fortunately, most of Teaclave’s interfaces can be reused, with the exception of the function uploading phase, which may require an overhaul to address this issue. Overall, the integration of OPA and the addition of a function verification mechanism would make Teaclave an even more robust and secure solution for confidential computing.

Benefits

If this proposal moves on smoothly, new functionality will be added to the Teaclave project that enables the verification of the function behavior that it strictly conforms to a prescribed policy.

Deliverables

  • Milestones: Basic policies (e.g., addition, subtraction) of the data can be verified by Teaclave; Complex policies can be verified.
  • Components: Verifier for the function code; Policy language adapters (adapt policy language to verifier); Policy language parser; Function source code converter (append policies to the functions).
  • Documentation: The internal working mechanism of the verification; How to write policies for the data.

Timeline Estimation

  • 0.5 month: Policy language parser and/or policy language design (if Rego is not an ideal choice).
  • 1.5 − 2 months: Verification contracts rewriting on the function source code based on the policy parsed. • (∼ 1 month): The function can be properly verified formally (by, e.g., querying the Z3 SMT solver).

Difficulty: Major
Project size: ~350 hour (large)
Potential mentors:
Mingshen Sun, Apache Teaclave (incubating) PPMC, mssun@apache.org

Difficulty: Major
Project size: ~350 hour (large)
Potential mentors:
Mingshen Sun, mail: mssun (at) apache.org
Project Devs, mail:

CloudStack

dev (at) teaclave.apache.org

CloudStack

CloudStack CloudStack GSoC 2023 - Autodetect IPs used inside the VM

Github issue: https://github.com/apache/cloudstack/issues/7142


Description:

With regards to IP info reporting, Cloudstack relies entirely on it's DHCP data bases and so on. When this is not available (L2 networks etc) no IP information is shown for a given VM.

I propose we introduce a mechanism for "IP autodetection" and try to discover the IPs used inside the machines by means of querying the hypervisors. For example with KVM/libvirt we can simply do something like this:

 
{{root@fedora35 ~]# virsh domifaddr win2k22 --source agent
Name MAC address Protocol Address
-------------------------------------------------------------------------------
Ethernet 52:54:00:7b:23:6a ipv4 192.168.0.68/24
Loopback Pseudo-Interface 1 ipv6 ::1/128

  • - ipv4 127.0.0.1/8}}
    The above command queries the qemu-guest-agent inside the Windows VM. The VM needs to have the qemu-guest-agent installed and running as well as the virtio serial drivers (easily done in this case with virtio-win-guest-tools.exe ) as well as a guest-agent socket channel defined in libvirt.

Once we have this information we could display it in the UI/API as "Autodetected VM IPs" or something like that.

I imagine it's very similar for VMWare and XCP-ng.

Thank you

Difficulty: Major
Project size: ~175 hour (medium)
Potential mentors:
Nicolás Vázquez, mail: nvazquez (at) apache.org
Project Devs, mail: dev (at) cloudstack.apache.org

...

CloudStack GSoC 2023 - Configure NFS version for Primary Storage

Github issue: https://github.com/apache/cloudstack/issues/4482


NFS Primary Storage mounts are handled by libvirt.

Currently libvirt defaults to NFS version 3 when mounting while it does support NFS version 4 if provided in the XML definition: https://libvirt.org/formatstorage.html#StoragePoolSource

            <source>
            <host name='localhost'/>
            <dir path='/var/lib/libvirt/images'/>
            <format type='nfs'/>
            <protocol ver='4'/>
            </source>
            

 
Maybe pass the argument 'nfsvers' to the URL provided to the Management Server and then pass this down to the Hypervisors which generate the XML for libvirt.

Difficulty: Major
Project size: ~175 hour (medium)
Potential mentors:
Nicolás Vázquez, mail: nvazquez (at) apache.org
Project Devs, mail: dev (at) cloudstack.apache.org

provided to the Management Server and then pass this down to the Hypervisors which generate the XML for libvirt.

Difficulty: Major
Project size: ~175 hour (medium)
Potential mentors:
Nicolás Vázquez, mail: nvazquez (at) apache.org
Project Devs, mail: dev (at) cloudstack.apache.org

CloudStack GSoC 2023 - Use Calico or Cilium in CKS

Github issue: https://github.com/apache/cloudstack/issues/6637


Weave project are looking for maintainers, it may be worth exploring what CNI is widely used and standard/stable for CKS use-case.

Difficulty: Major
Project size: ~175 hour (medium)
Potential mentors:
Nicolás Vázquez, mail: nvazquez (at) apache.org
Project Devs, mail: dev (at) cloudstack.apache.org

CloudStack GSoC 2023 - SSL LetsEncrypt the Console Proxy

Github issue: https://github.com/apache/cloudstack/issues/3141


New Global Option For Letsencrypt enable on console proxy. Letsencrypt domain name option for letsencrypt ssl auto renew

Difficulty: Major
Project size: ~175 hour (medium)
Potential mentors:
Nicolás Vázquez, mail: nvazquez (at) apache.org
Project Devs, mail: dev (at) cloudstack.apache.org

CloudStack GSoC 2023 - Direct Download extension to Ceph storage

Github issue: https://github.com/apache/cloudstack/issues/3065


Extend the Direct Download functionality to work with Ceph storage

Difficulty: Major
Project size: ~175 hour (medium)
Potential mentors:
Nicolás Vázquez, mail: nvazquez (at) apache.org
Project Devs, mail: dev (at) cloudstack.apache.org

Apache Nemo

Enhance Nemo to support autoscaling for bursty loads

The load of streaming jobs usually fluctuate according to the input rate or operations (e.g., window). Supporting the automatic scaling could reduce the operational cost of running streaming applications, while minimizing the performance degradation that can be caused by the bursty loads. 


We can harness the cloud resources such as VMs and serverless frameworks to acquire computing resources on demand. To realize the automatic scaling, the following features should be implemented.


1) state migration: scaling jobs require moving tasks (or partitioning a task to multiple ones). In this situation, the internal state of the task should be serialized/deserialized. 

2) input/output rerouting: if a task is moved to a new worker, the input and output of the task should be redirected. 

3) dynamic Executor or Task creation/deletion: Executor}}s or {{Task can be dynamically created or deleted. 

4) scaling policy: a scaling policy that decides when and how to scale out/in should be implemented. 

CloudStack GSoC 2023 - Use Calico or Cilium in CKS

Github issue: https://github.com/apache/cloudstack/issues/6637

Weave project are looking for maintainers, it may be worth exploring what CNI is widely used and standard/stable for CKS use-case.

Difficulty: Major
Project size: ~175 ~350 hour (mediumlarge)
Potential mentors:
Nicolás VázquezTae-Geon Um, mail: nvazquez taegeonum (at) apache.org
Project Devs, mail: dev (at) cloudstacknemo.apache.org

CloudStack GSoC 2023 - SSL LetsEncrypt the Console Proxy

Github issue: https://github.com/apache/cloudstack/issues/3141

New Global Option For Letsencrypt enable on console proxy. Letsencrypt domain name option for letsencrypt ssl auto renew

Collect task statistics necessary for estimating duration


Difficulty: Major
Project size: ~175 ~350 hour (mediumlarge)
Potential mentors:
Nicolás VázquezHwarim Hyun, mail: nvazquez hwarim (at) apache.org
Project Devs, mail: dev (at) cloudstack.apache.org

CloudStack GSoC 2023 - Direct Download extension to Ceph storage

Github issue: https://github.com/apache/cloudstack/issues/3065

Extend the Direct Download functionality to work with Ceph storage
nemo.apache.org

Detect skewed task periodically


Difficulty: Major
Project size: ~175 ~350 hour (mediumlarge)
Potential mentors:
Nicolás VázquezHwarim Hyun, mail: nvazquez hwarim (at) apache.org
Project Devs, mail: dev (at) cloudstacknemo.apache.org

...

Dynamic Task Sizing on Nemo

This is an umbrella issue to keep track of the issues related to the dynamic task sizing feature on Nemo.

Dynamic task sizing needs to consider a workload and try to decide on the optimal task size based on the runtime metrics and characteristics. It should have an effect on the parallelism and the partitions, on how many partitions an intermediate data should be divided/shuffled into, and to effectively handle skews in the meanwhile.

Difficulty: Major
Project size: ~350 hour (large)
Potential mentors:
Wonook, mail: wonook (at) apache.org
Project Devs, mail: dev (at) nemo.apache.org

Dynamic Work Stealing on Nemo for handling skews

We aim to handle the problem on throttled resources (heterogeneous resources) and skewed input data. In order to solve this problem, we suggest dynamic work stealing that can dynamically track task statuses and steal workloads among each other. To do this, we have the following action items:

  • Dynamically collecting task statistics during execution
  • Detecting skewed tasks periodically
  • Splitting the data allocated in skewed tasks and reallocating them into new tasks
  • Synchronizing the optimization procedure
  • Evaluation of the resulting implementations

Enhance Nemo to support autoscaling for bursty loads

The load of streaming jobs usually fluctuate according to the input rate or operations (e.g., window). Supporting the automatic scaling could reduce the operational cost of running streaming applications, while minimizing the performance degradation that can be caused by the bursty loads. 

We can harness the cloud resources such as VMs and serverless frameworks to acquire computing resources on demand. To realize the automatic scaling, the following features should be implemented.

1) state migration: scaling jobs require moving tasks (or partitioning a task to multiple ones). In this situation, the internal state of the task should be serialized/deserialized. 

2) input/output rerouting: if a task is moved to a new worker, the input and output of the task should be redirected. 

3) dynamic Executor or Task creation/deletion: Executor}}s or {{Task can be dynamically created or deleted. 

4) scaling policy: a scaling policy that decides when and how to scale out/in should be implemented. 
Difficulty: Major
Project size: ~350 hour (large)
Potential mentors:
Tae-Geon UmWonook, mail: taegeonum wonook (at) apache.org
Project Devs, mail: dev (at) nemo.apache.org
Collect task statistics necessary for estimating duration
at) nemo.apache.org

Implement an Accurate Simulator based on Functional model

Missing a deadline often has significant consequences for the business. And simulator can contributes to other approach for optimization 

So Implement a Simulator for Stream Processing Based on Functional models. 

There are some requirements:

  • Simulation should be able to execute before or during job execution
  • When a simulation is executed during job is running, It must be fast enough not to affect the job. 
  • Information about running environment is received through argument.
  • At least network topology should be considered for the WAN environment. 



Difficulty: Major
Project size: ~350 hour (large)
Potential mentors:
Hwarim HyunLee Hae Dong, mail: hwarim Lemarais (at) apache.org
Project Devs, mail: dev (at) nemo.apache.org

Detect skewed task periodically

Difficulty: Major
Project size: ~350 hour (large)
Potential mentors:
Hwarim Hyun, mail: hwarim (at) apache.org
Project Devs, mail: dev (at) nemo.apache.org

Implement a model that represent a task level exeuction time with statistical analysis

The current SimulatedTaskExecutor is hardly available. because it needs actual metric to predict execution time. To increase utilization, we need new model that predicts a task level execution time with statistical analysis. 

Some of the related TODOs are as follows:

  • Find factors that affect a task level execution time. with loose grid search.
  • Infer the most suitable model with tight grid search. 

Dynamic Task Sizing on Nemo

This is an umbrella issue to keep track of the issues related to the dynamic task sizing feature on Nemo.

Dynamic task sizing needs to consider a workload and try to decide on the optimal task size based on the runtime metrics and characteristics. It should have an effect on the parallelism and the partitions, on how many partitions an intermediate data should be divided/shuffled into, and to effectively handle skews in the meanwhile.
Difficulty: Major
Project size: ~350 hour (large)
Potential mentors:
WonookLee Hae Dong, mail: wonook Lemarais (at) apache.org
Project Devs, mail: dev (at) nemo.apache.org
Dynamic Work Stealing

Implement spill mechanism on Nemo

for handling skews

Currently, Nemo doesn't have a spill mechanism. This makes executors prone to memory problems such as OOM(Out Of Memory) or GC when task data is large. For example, handling skewed shuffle data in Nemo results in OOM and executor failure, as all data has to be handled in-memory.

We need to spill in-memory data to secondary storage when there are not enough memory in executor.

We aim to handle the problem on throttled resources (heterogeneous resources) and skewed input data. In order to solve this problem, we suggest dynamic work stealing that can dynamically track task statuses and steal workloads among each other. To do this, we have the following action items:

  • Dynamically collecting task statistics during execution
  • Detecting skewed tasks periodically
  • Splitting the data allocated in skewed tasks and reallocating them into new tasks
  • Synchronizing the optimization procedure
  • Evaluation of the resulting implementations
Difficulty: Major
Project size: ~350 hour (large)
Potential mentors:
WonookJeongyoon Eo, mail: wonook jeongyoon (at) apache.org
Project Devs, mail: dev (at) nemo.apache.org

Implement an Accurate Simulator based on Functional model

Approximate the factors that affect the stage group level execution time

There are some factors that can affect the stage group level simulation, such as a latency, the rate of skewed data and the error rate of the executor etc. It is required to find a reasonable distribution form for these factors. Such as the normal distribution or the landau distribution. In actual running, It makes it possible to approximate the model with a small amount of data.

Missing a deadline often has significant consequences for the business. And simulator can contributes to other approach for optimization 

So Implement a Simulator for Stream Processing Based on Functional models. 

There are some requirements:

  • Simulation should be able to execute before or during job execution
  • When a simulation is executed during job is running, It must be fast enough not to affect the job. 
  • Information about running environment is received through argument.
  • At least network topology should be considered for the WAN environment. 
Difficulty: Major
Project size: ~350 hour (large)
Potential mentors:
Lee Hae Dong, mail: Lemarais (at) apache.org
Project Devs, mail: dev (at) nemo.apache.org

Efficient Caching and Spilling on Nemo

In-memory caching and spilling are essential features in in-memory big data processing frameworks, and Nemo needs one.

  • Identify and persist frequently used data and unpersist it when its usage ended
  • Spill in-memory data to disk upon memory pressure

Implement a model that represent a task level exeuction time with statistical analysis

The current SimulatedTaskExecutor is hardly available. because it needs actual metric to predict execution time. To increase utilization, we need new model that predicts a task level execution time with statistical analysis. 

Some of the related TODOs are as follows:

  • Find factors that affect a task level execution time. with loose grid search.
  • Infer the most suitable model with tight grid search. 
    Difficulty: Major
    Project size: ~350 hour (large)
    Potential mentors:
    Lee Hae DongJeongyoon Eo, mail: Lemarais jeongyoon (at) apache.org
    Project Devs, mail: dev (at) nemo.apache.org

    Runtime Level Caching Mechanism

    If the the compile time identifies what data can be cached, the runtime requires logic to make this happen.

    Implementation needs:

    • (Driver) receive and update the status of blocks from various Executors, right now this seems to be best implemented as part of BlockManagerMaster
    • (Driver) communicate to the  Executors the availability, location and status of blocks
    • Possible concurrency issues:
    1. Concurrency in Driver when multiple Executors update/inquire the same block information
    2. Concurrency in Executor when a single cached block is accessed simultaneously

    Implement spill mechanism on Nemo

    Currently, Nemo doesn't have a spill mechanism. This makes executors prone to memory problems such as OOM(Out Of Memory) or GC when task data is large. For example, handling skewed shuffle data in Nemo results in OOM and executor failure, as all data has to be handled in-memory.

    We need to spill in-memory data to secondary storage when there are not enough memory in executor
    1. .


    Difficulty: Major
    Project size: ~350 hour (large)
    Potential mentors:
    Jeongyoon EoDongjoo Lee, mail: jeongyoon codinggosu (at) apache.org
    Project Devs, mail: dev (at) nemo.apache.org

    Approximate the factors that affect the stage group level execution time

    There are some factors that can affect the stage group level simulation, such as a latency, the rate of skewed data and the error rate of the executor etc. It is required to find a reasonable distribution form for these factors. Such as the normal distribution or the landau distribution. In actual running,

    Efficient Dynamic Reconfiguration in Stream Processing

    In Stream processing, we have many methods, starting from the primitive checkpoint-and-replay to a more fancy version of reconfiguration and reinitiation of stream workloads. We aim to find a way to find the most effective and efficient way of reconfiguring stream workloads. Sub-issues are to be created later on.

    Difficulty: Major
    Project size: ~350 hour (large)
    Potential mentors:
    Wonook, mail: wonook (at) apache.org
    Project Devs, mail: dev (at) nemo.apache.org

    Evaluate the performance of Work Stealing implementation

    It makes it possible to approximate the model with a small amount of data.


    Difficulty: Major
    Project size: ~350 hour (large)
    Potential mentors:
    Lee Hae DongHwarim Hyun, mail: Lemarais hwarim (at) apache.org
    Project Devs, mail: dev (at) nemo.apache.org

    Nemo on Google Dataproc

    Issues for making it easy to install and use Nemo on Google Dataproc.

    Efficient Caching and Spilling on Nemo

    In-memory caching and spilling are essential features in in-memory big data processing frameworks, and Nemo needs one.

    • Identify and persist frequently used data and unpersist it when its usage ended
    • Spill in-memory data to disk upon memory pressure
    Difficulty: Major
    Project size: ~350 hour (large)
    Potential mentors:
    Jeongyoon EoJohn Yang, mail: jeongyoon johnyangk (at) apache.org
    Project Devs, mail: dev (at) nemo.apache.org) nemo.apache.org

    Apache Dubbo

    Dubbo GSoC 2023 - Refactor the http layer

    Background

    Dubbo currently supports the rest protocol based on http1, and the triple protocol based on http2, but currently the two protocols based on the http protocol are implemented independently, and at the same time, they cannot replace the underlying implementation, and their respective implementation costs are relatively high.

    Target

    In order to reduce maintenance costs, we hope to be able to abstract http. The underlying implementation of the target implementation of http has nothing to do with the protocol, and we hope that different protocols can reuse related implementations

    Runtime Level Caching Mechanism

    If the the compile time identifies what data can be cached, the runtime requires logic to make this happen.

    Implementation needs:

    • (Driver) receive and update the status of blocks from various Executors, right now this seems to be best implemented as part of BlockManagerMaster
    • (Driver) communicate to the  Executors the availability, location and status of blocks
    • Possible concurrency issues:
  • Concurrency in Driver when multiple Executors update/inquire the same block information
  • Concurrency in Executor when a single cached block is accessed simultaneously

    .

    Difficulty: Major
    Project size: ~350 hour (large)
    Potential mentors:
    Dongjoo LeeAlbumen Kevin, mail: codinggosu albumenj (at) apache.org
    Project Devs, mail: dev (at) nemo.apache.org

    Dubbo GSoC 2023 - Integration suite on Kubernetes

    As a development framework that is closely related to users, Dubbo may have a huge impact on users if any problems occur during the iteration process. Therefore, Dubbo needs a complete set of automated regression testing tools.
    At present, Dubbo already has a set of testing tools based on docker-compose, but this set of tools cannot test the compatibility in the kubernetes environment. At the same time, we also need a more reliable test case construction system to ensure that the test cases are sufficiently complete

    Efficient Dynamic Reconfiguration in Stream Processing

    In Stream processing, we have many methods, starting from the primitive checkpoint-and-replay to a more fancy version of reconfiguration and reinitiation of stream workloads. We aim to find a way to find the most effective and efficient way of reconfiguring stream workloads. Sub-issues are to be created later on.

    Difficulty: Major
    Project size: ~350 hour (large)
    Potential mentors:
    WonookAlbumen Kevin, mail: wonook albumenj (at) apache.org
    Project Devs, mail: dev (at) nemo.apache.org

    Dubbo GSoC 2023 - Dubbo usage scanner

    As a development framework closely related to users, Dubbo provides many functional features (such as configuring timeouts, retries, etc.). We hope that a tool can be given to users to scan which features are used, which features are deprecated, which ones will be deprecated in the future, and so on. Based on this tool, we can provide users with a better migration solution.
    Suggestion: You can consider based on static code scanning or javaagent implementation.

    Evaluate the performance of Work Stealing implementation

    Difficulty: Major
    Project size: ~350 hour (large)
    Potential mentors:
    Hwarim HyunAlbumen Kevin, mail: hwarim albumenj (at) apache.org
    Project Devs, mail: dev (at) nemo.apache.orgDevs, mail:

    Dubbo GSoC 2023 - Remove jprotoc in compiler

    Dubbo supports the communication mode based on the gRPC protocol through Triple. For this reason, Dubbo has developed a compiling plug-in for proto files based on jprotoc. Due to the activeness of jprotoc, currently Dubbo compiler cannot run well on the latest protobuf version. Therefore, we need to consider implementing a new compiler with reference to gRPC

    Nemo on Google Dataproc

    Issues for making it easy to install and use Nemo on Google Dataproc.

    Difficulty: Major
    Project size: ~350 hour (large)
    Potential mentors:
    John YangAlbumen Kevin, mail: johnyangk albumenj (at) apache.org
    Project Devs, mail: dev (at) nemo.apache.org

    ...

    ...

    Dubbo

    GSoC 2023 -

    Refactor the http layer

    Dubbo i18n log

    Dubbo is a development framework that is closely related to users, and many usages by users may cause exceptions handled by Dubbo. Usually, in this case, users can only judge through logs. We hope to provide an i18n localized log output tool to provide users with a more friendly log troubleshooting experience

    Background

    Dubbo currently supports the rest protocol based on http1, and the triple protocol based on http2, but currently the two protocols based on the http protocol are implemented independently, and at the same time, they cannot replace the underlying implementation, and their respective implementation costs are relatively high.

    Target

    In order to reduce maintenance costs, we hope to be able to abstract http. The underlying implementation of the target implementation of http has nothing to do with the protocol, and we hope that different protocols can reuse related implementations.

    Difficulty: Major
    Project size: ~350 ~175 hour (largemedium)
    Potential mentors:
    Albumen Kevin, mail: albumenj (at) apache.org
    Project Devs, mail:

    Dubbo GSoC 2023 -

    Integration suite on Kubernetes

    Refactor dubbo project to gradle

    As more and more projects start to develop based on Gradle and profit from Gradle, Dubbo also hopes to migrate to the Gradle project. This task requires you to transform the dubbo project[1] into a gradle project.


     [1] https://github.com/apache/dubbo

    Difficulty: Major
    Project size: ~350 hour (large)
    Potential mentors:
    Albumen Kevin, mail: albumenj (at) apache.org
    Project Devs, mail:

    Dubbo GSoC 2023 - Metrics on Dubbo Admin

    Dubbo Admin is a console of Dubbo. Today, Dubbo's observability is becoming more and more powerful. We need to directly observe some indicators of Dubbo on Dubbo Admin, and even put forward suggestions for users to improve problems

    As a development framework that is closely related to users, Dubbo may have a huge impact on users if any problems occur during the iteration process. Therefore, Dubbo needs a complete set of automated regression testing tools.
    At present, Dubbo already has a set of testing tools based on docker-compose, but this set of tools cannot test the compatibility in the kubernetes environment. At the same time, we also need a more reliable test case construction system to ensure that the test cases are sufficiently complete.

    Difficulty: Major
    Project size: ~350 ~175 hour (largemedium)
    Potential mentors:
    Albumen Kevin, mail: albumenj (at) apache.org
    Project Devs, mail:

    Dubbo GSoC 2023 -

    Dubbo usage scanner

    Refactor Connection

    Background

    At present, the abstraction of connection by client in different protocols in Dubbo is not perfect. For example, there is a big discrepancy between the client abstraction of connection in dubbo and triple protocols. As a result, the enhancement of connection-related functions in the client is more complicated, and the implementation cannot be reused. At the same time, the client also needs to implement a lot of repetitive code when extending the protocol.

    Target

    Reduce the complexity of the client part when extending the protocol, and increase the reuse of connection-related modules

    As a development framework closely related to users, Dubbo provides many functional features (such as configuring timeouts, retries, etc.). We hope that a tool can be given to users to scan which features are used, which features are deprecated, which ones will be deprecated in the future, and so on. Based on this tool, we can provide users with a better migration solution.
    Suggestion: You can consider based on static code scanning or javaagent implementation.

    Difficulty: Major
    Project size: ~350 hour (large)
    Potential mentors:
    Albumen Kevin, mail: albumenj (at) apache.org
    Project Devs, mail:

    Dubbo GSoC 2023 -

    Remove jprotoc in compiler

    IDL management

    Background

    Dubbo supports the communication mode based on the gRPC protocol through Triple. For this reason, Dubbo has developed a compiling plug-in for proto files based on jprotoc. Due to the activeness of jprotoc, currently Dubbo compiler cannot run well on the latest protobuf version. Therefore, we need to consider implementing a new compiler with reference to gRPC.currently supports protobuf as a serialization method. Protobuf relies on proto (Idl) for code generation, but currently lacks tools for managing Idl files. For example, for java users, proto files are used for each compilation. It is more troublesome, and everyone is used to using jar packages for dependencies.

    Target

    Implement an Idl management and control platform, support idl files to automatically generate dependency packages in various languages, and push them to relevant dependency warehouses

    Difficulty: Major
    Project size: ~350 hour (large)
    Potential mentors:
    Albumen Kevin, mail: albumenj (at) apache.org
    Project Devs, mail:

    Dubbo GSoC 2023 -

    Dubbo i18n log

    Service Deployer

    For a large number of monolithic applications, problems such as performance will be encountered during large-scale deployment. For interface-oriented programming languages, Dubbo provides the capability of RPC remote calls, and we can help applications decouple through interfaces. Therefore, we can provide a deployer to help users realize the decoupling and splitting of microservices during deployment, and quickly provide performance optimization capabilities

    Dubbo is a development framework that is closely related to users, and many usages by users may cause exceptions handled by Dubbo. Usually, in this case, users can only judge through logs. We hope to provide an i18n localized log output tool to provide users with a more friendly log troubleshooting experience.

    Difficulty: Major
    Project size: ~175 ~350 hour (mediumlarge)
    Potential mentors:
    Albumen Kevin, mail: albumenj (at) apache.org
    Project Devs, mail:

    Dubbo GSoC 2023 -

    Refactor dubbo project to gradle

    API manager

    Since Dubbo runs on a distributed architecture, it naturally has the problem of difficult API interface definition management. It is often difficult for us to know which interface is running in the production environment. So we can provide an API-defined reporting platform, and even a management platform. This platform can automatically collect all APIs of the cluster, or can be directly defined by the user, and then unified distribution management is carried out through a mechanism similar to git and maven package management.

    As more and more projects start to develop based on Gradle and profit from Gradle, Dubbo also hopes to migrate to the Gradle project. This task requires you to transform the dubbo project[1] into a gradle project.

     [1] https://github.com/apache/dubbo

    Difficulty: Major
    Project size: ~350 hour (large)
    Potential mentors:
    Albumen Kevin, mail: albumenj (at) apache.org
    Project Devs, mail:

    Dubbo GSoC 2023 -

    Metrics on Dubbo Admin

    JSON compatibility check

    Dubbo Admin is a console of Dubbo. Today, Dubbo's observability is becoming more and more powerful. We need to directly observe some indicators of Dubbo on Dubbo Admin, and even put forward suggestions for users to improve problemscurrently supports a large number of Java language features through hessian under the Java SDK, such as generics, interfaces, etc. These capabilities will not be compatible when calling across systems. Therefore, Dubbo needs to provide the ability to detect the interface definition and determine whether the interface published by the user can be described by native json.

    Difficulty: Major
    Project size: ~175 ~350 hour (mediumlarge)
    Potential mentors:
    Albumen Kevin, mail: albumenj (at) apache.org
    Project Devs, mail:

    Dubbo GSoC 2023 -

    Refactor Connection

    Automated Performance Testing Mechanism

    Dubbo currently provides a very simple performance testing tool. But for such a complex framework as Dubbo, the functional coverage is very low. We urgently need a testing tool that can test multiple complex scenarios. In addition, we also hope that this set of testing tools can be run automatically, so that we can track the current performance of Dubbo in time

    Background

    At present, the abstraction of connection by client in different protocols in Dubbo is not perfect. For example, there is a big discrepancy between the client abstraction of connection in dubbo and triple protocols. As a result, the enhancement of connection-related functions in the client is more complicated, and the implementation cannot be reused. At the same time, the client also needs to implement a lot of repetitive code when extending the protocol.

    Target

    Reduce the complexity of the client part when extending the protocol, and increase the reuse of connection-related modules.

    Difficulty: Major
    Project size: ~350 hour (large)
    Potential mentors:
    Albumen Kevin, mail: albumenj (at) apache.org
    Project Devs, mail:

    Dubbo GSoC 2023 -

    IDL management

    Dubbo Client on WASM

    WebAssembly (abbreviated Wasm) is a binary instruction format for a stack-based virtual machine. For web client users, we can provide Dubbo's wasm client, so that front-end developers can simply initiate Dubbo requests in the browser, and realize Dubbo's full-link unification.

    This task needs to be implemented on a browser such as Chrome to initiate a request to the Dubbo backend.

    Background

    Dubbo currently supports protobuf as a serialization method. Protobuf relies on proto (Idl) for code generation, but currently lacks tools for managing Idl files. For example, for java users, proto files are used for each compilation. It is more troublesome, and everyone is used to using jar packages for dependencies.

    Target

    Implement an Idl management and control platform, support idl files to automatically generate dependency packages in various languages, and push them to relevant dependency warehouses

    Difficulty: Major
    Project size: ~350 hour (large)
    Potential mentors:
    Albumen Kevin, mail: albumenj (at) apache.org
    Project Devs, mail:

    Dubbo GSoC 2023 -

    Service Deployer

    Pure Dubbo RPC API

    At present, Dubbo provides RPC capabilities and

    For a large number of monolithic applications, problems such as performance will be encountered during large-scale deployment. For interface-oriented programming languages, Dubbo provides the capability of RPC remote calls, and we can help applications decouple through interfaces. Therefore, we can provide a deployer to help users realize the decoupling and splitting of microservices during deployment, and quickly provide performance optimization capabilitiesservice governance capabilities. This has led to the fact that Dubbo cannot be used well if some of Dubbo's own components only need to use RPC capabilities or some users who need extreme lightweight.
    Goal: To provide a Dubbo RPC kernel, users can directly program for service calls and focus on RPC.

    Difficulty: Major
    Project size: ~350 hour (large)
    Potential mentors:
    Albumen Kevin, mail: albumenj (at) apache.org
    Project Devs, mail:

    Dubbo GSoC 2023 -

    API manager

    HTTP/3 Rest Support

    HTTP/3 has been formalized as a standard in the last year. Dubbo, as a framework that supports publishing and invoking Web services, needs to support the HTTP/3 protocol.

    This task needs to expand the implementation of the current rest protocol to support publishing HTTP/3 services and calling HTTP/3 services

    Since Dubbo runs on a distributed architecture, it naturally has the problem of difficult API interface definition management. It is often difficult for us to know which interface is running in the production environment. So we can provide an API-defined reporting platform, and even a management platform. This platform can automatically collect all APIs of the cluster, or can be directly defined by the user, and then unified distribution management is carried out through a mechanism similar to git and maven package management.

    Difficulty: Major
    Project size: ~350 hour (large)
    Potential mentors:
    Albumen Kevin, mail: albumenj (at) apache.org
    Project Devs, mail:

    Dubbo GSoC 2023 -

    JSON compatibility checkDubbo currently supports a large number of Java language features through hessian under the Java SDK, such as generics, interfaces, etc. These capabilities will not be compatible when calling across systems. Therefore, Dubbo needs to provide the ability to detect the interface definition and determine whether the interface published by the user can be described by native json.

    Go Traffic Management


    Difficulty: Major
    Project size: ~350 hour (large)
    Potential mentors:
    Albumen KevinJun Liu, mail: albumenj liujun (at) apache.org
    Project Devs, mail:

    Dubbo GSoC 2023 -

    Automated Performance Testing Mechanism

    Go Security


    Difficulty: Major
    Project size: ~350 hour (large)
    Potential mentors:
    Jun Liu, mail: liujun (at) apache.org
    Project Devs, mail:

    Dubbo GSoC 2023 - Improve usability of Dubbo-go project

    Including but not limited to programming patterns, configuration, apis, documentation and demos

    Dubbo currently provides a very simple performance testing tool. But for such a complex framework as Dubbo, the functional coverage is very low. We urgently need a testing tool that can test multiple complex scenarios. In addition, we also hope that this set of testing tools can be run automatically, so that we can track the current performance of Dubbo in time.

    Difficulty: Major
    Project size: ~350 ~175 hour (largemedium)
    Potential mentors:
    Albumen KevinJun Liu, mail: albumenj liujun (at) apache.org
    Project Devs, mail:

    Dubbo GSoC 2023 - Dubbo SPI Extensions on WASM

    WebAssembly (abbreviated Wasm) is a binary instruction format for a stack-based virtual machine. Many capabilities of Dubbo support extensions, such as custom interceptors, routing, load balancing, etc. In order to allow the user's implementation to be used on Dubbo's multiple language SDKs, we can implement cross-platform operation based on wasm capabilities.


    The implementation of this topic needs to provide a set of mechanisms for Wasm on Dubbo, covering the implementation of Java and Go. Also supports at least Filter, Router and Loadbalance.

    Difficulty: Major
    Project size: ~350 hour (large)
    Potential mentors:
    Albumen Kevin, mail: albumenj (at) apache.org
    Project Devs, mail:

    Dubbo GSoC 2023 -

    Dubbo Client on WASM

    WebAssembly (abbreviated Wasm) is a binary instruction format for a stack-based virtual machine. For web client users, we can provide Dubbo's wasm client, so that front-end developers can simply initiate Dubbo requests in the browser, and realize Dubbo's full-link unification.

    This task needs to be implemented on a browser such as Chrome to initiate a request to the Dubbo backend.

    Admin Control Plane


    Difficulty: Major
    Project size: ~350 hour (large)
    Potential mentors:
    Jun Liu, mail: liujun (at) apache.org
    Project Devs, mail:

    Dubbo GSoC 2023 - Dubbo3 Node.js HTTP/2 RPC Protocol Implementation


    Difficulty: Major
    Project size: ~350 ~175 hour (largemedium)
    Potential mentors:
    Albumen KevinJun Liu, mail: albumenj liujun (at) apache.org
    Project Devs, mail:

    Dubbo GSoC 2023 - Go HTTP1&2 RPC Protocol Support


    Difficulty: Major
    Project size: ~175 hour (medium)
    Potential mentors:
    Jun Liu, mail: liujun (at) apache.org
    Project Devs, mail:

    Dubbo GSoC 2023 - Go Web Protocol and Programming Support


    Difficulty: Major
    Project size: ~175 hour (medium)
    Potential mentors:
    Jun Liu, mail: liujun (at) apache.org
    Project Devs, mail:

    Dubbo GSoC 2023 - Go Observability Improvement


    Pure Dubbo RPC API

    At present, Dubbo provides RPC capabilities and a large number of service governance capabilities. This has led to the fact that Dubbo cannot be used well if some of Dubbo's own components only need to use RPC capabilities or some users who need extreme lightweight.
    Goal: To provide a Dubbo RPC kernel, users can directly program for service calls and focus on RPC.

    Difficulty: Major
    Project size: ~350 ~175 hour (largemedium)
    Potential mentors:
    Albumen KevinJun Liu, mail: albumenj liujun (at) apache.org
    Project Devs, mail:

    Dubbo GSoC 2023 -

    HTTP/3 Rest Support

    HTTP/3 has been formalized as a standard in the last year. Dubbo, as a framework that supports publishing and invoking Web services, needs to support the HTTP/3 protocol.

    This task needs to expand the implementation of the current rest protocol to support publishing HTTP/3 services and calling HTTP/3 services

    Development of Dubbo Admin Dashboard UI Pages

    In charge of the maintenance of the development of the UI pages of the whole Dubbo Admin project.

    Difficulty: Major
    Project size: ~175 hour (medium)
    Potential mentors:
    Jun Liu, mail: liujun (at) apache.org
    Project Devs, mail:

    Dubbo GSoC 2023 - Rust Cluster Feature Implementation and Stability Improvement.


    Difficulty: Major
    Project size: ~350 ~175 hour (largemedium)
    Potential mentors:
    Albumen KevinJun Liu, mail: albumenj liujun (at) apache.org
    Project Devs, mail:

    ...