This page is auto-generated! Please do NOT edit it, all changes will be lost on next update

Contents

Airavata

Local user interaface for Airavata MFT

NOte: This is an issue in github - https://github.com/apache/airavata-mft/issues/114 cross posting in Jira for GSoC purposes. 

Currently, Airavata MFT can be accessed through its command line interface and the gRPC API. However, it is really easy if a Docker desktop-like user interface is provided for a locally running Airavata MFT. The functionalities of such an interface can be summarized as follows

  1. Start / Stop MFT Instance
  2. Register/ List/ Remove Storage endpoints
  3. Access data (list, download, delete, upload) in configured storage endpoints
  4. Move data between storage endpoints
  5. Search data across multiple storage endpoints
  6. Analytics - Performance numbers (data transfer rates in each agent)

We can use ElectonJS to develop this cross-platform user interface. The node.js backend of ElectronJS can use gRPC to connect to Airavata MFT to perform management operations

Difficulty: Major
Project size: ~350 hour (large)
Potential mentors:
Suresh Marru, mail: smarru (at) apache.org
Project Devs, mail: dev (at) airavata.apache.org

Apache Dubbo

Unified IDL control for multiple protocols

Unified IDL control for multiple protocols

Client and Server layer APIs can support both IDL and Non-IDL modes.
For IDL mode(Triple + Protobuf), defining proto file and making use of protoc-gen-go-triple to generate related code are straightforward. Generated code(XXX.triple.go) would contain statements that invoking APIs provided by Client and Server layers.
For Non-IDL mode, it needs users to write invoking code by themselves and is not convenient. Take Dubbo+Hessian2 as example:

Client Side

            
            Cli, err := client.NewClient()
            	cli, err := client.NewClient(
            		client.WithClientProtocolDubbo(), )
            	)
            	withClientProtocolDubbo(), ) if err ! = nil {
            		panic(err)
            	}
            	conn, err := cli.Dial("GreetProvider",
            		client.WithURL("127.0.0.1:20000"), )
            	)
            	if err ! = nil {
            		panic(err)
            	}
            	var resp string
            	if err := conn.CallUnary(context.Background(), []interface{}{"hello", "new", "dubbo"}, &resp,
            "Greet"); err ! = nil {
            		logger.Errorf("GreetProvider.Greet err: %s", err)
            		errorf("GreetProvider.Greet err: %s", err)
            	}
            

Server Side

            GreetProvider.
            type GreetProvider struct {
            }
            
            func (*GreetProvider) Greet(req string, req1 string,
            req2 string) (string, error) {
            	return req + req1 + req2, nil
            }
            
            srv, err := server.NewServer(
            	server.WithServerProtocol(
            		protocol.WithDubbo(), protocol.WithPort(20000))
            		protocol.WithPort(20000),
            	),
            )
            if err ! = nil {
            	panic(err)
            }
            if err := srv.Register(&GreetProvider{}, nil, server.WithInterface("GreetProvider"));
            err ! = nil {
            	panic(err)
            }
            if err := srv.Serve(); err ! = nil {
            	panic(err)
            }
            

Proposal

Even in Non-IDL mode, code is generated using protobuf IDL. In this way, whether you need schema (Protobuf) or not (Hessian2, Msgpack), it's all uniformly used: Protobuf IDL + generated code.

Details:

1. Generate Dubbo + Hessian2 related code with the help of Protobuf IDL. Compared to XXX.pb.go, XXX.hessian2.go would have much less content (due to Hessian2 schema-free), only structure definitions and the corresponding registration function (hessian2.Register(POJO)).
2. Non-IDL (Hessian2) may not map perfectly to Protobuf IDL, and we need to define our own dialect in a way that is compatible with the official semantics of Protobuf IDL
3. XXX.dubbo.go content is basically similar to XXX.triple.go, generating code that uses the APIs of Client layer and Server layer.

Prerequisite:

1. Provide tools for Dubbo side users to automatically convert Dubbo interface definitions into Protobuf IDL.
2. Protobuf IDL can support extensions (add Hessian2-specific tag extensions, generate Hessian2-specific content)

Results:

Not only Dubbo + Hessian2, but also Triple + Hessian2, Triple + Json and other Non-IDLs can use the interface in a unified way.

Mentor

  • Mentor: Albumen Kevin, Apache Dubbo PMC, albumenj@apache.org
  • Mailing List: dev@dubbo.apache.org

     

Difficulty: Major
Project size: ~350 hour (large)
Potential mentors:
Albumen Kevin, mail: albumenj (at) apache.org
Project Devs, mail:

Python integration & AI Traffic Management

Background

Dubbo is a an easy-to-use, high-performance remote procedure call framework. Most of the AI frameworks are running on Python and suffering unbalanced load for GPUs.

Objectives

  1. Enhance Dubbo on Python[1] and support the brand new Triple protocol in Dubbo-Java
  2. Introduce a new load balance algorithm for AI, which can gather the metrics from GPUs and select the most idle one to invoke

[1] https://github.com/apache/dubbo-python

Recommended Skills

  1. Familiar with Python
  2. Have a basic understanding of RPC
  3. Have a basic understanding of traffic management


Mentor

  • Mentor: Albumen Kevin, Apache Dubbo PMC, albumenj@apache.org
  • Mailing List: dev@dubbo.apache.org
Difficulty: Major
Project size: ~350 hour (large)
Potential mentors:
Albumen Kevin, mail: albumenj (at) apache.org
Project Devs, mail:

Apache NuttX

NuttX NAND Flash Subsystem

Currently NuttX has support only for NOR Flash and eMMC as solid state storage.

Although for low-end embedded systems NOR Flash still much used, for some devices that need bigger storage, NAND Flash is a better option, because its price per MB is very low.

In the other NAND Flash brings many challenges: you need to map and track all the bad-blocks, you need to have a good filesystem for wear leveling. Currently the SmartFS and LittleFS offer some kind wear leveling for NOR Flash. It needs to be adapted to NAND Flash.

Difficulty: Major
Project size: ~350 hour (large)
Potential mentors:
Alan Carvalho de Assis, mail: acassis (at) apache.org
Project Devs, mail: dev (at) nuttx.apache.org

Rust integration on NuttX

The Rust language is gain some momentum as an alternative to C and C++ for embedded system (https://www.rust-lang.org/what/embedded) and it should be very useful to be able to develop NuttX applications using Rust language.

Sometime Yoshiro Sugino already ported the Rust standard libraries, but it was not a complete port and wasn't integrated on NuttX. Anyway this initial port could be used as starting point for some student willing to add official support on NuttX.

Also it needs to pave the way to support developing NuttX driver in Rust and an complement to C drivers.

Difficulty: Normal
Project size: ~350 hour (large)
Potential mentors:
Alan Carvalho de Assis, mail: acassis (at) apache.org
Project Devs, mail: dev (at) nuttx.apache.org

Device Tree support for NuttX

Device Tree will simplify the way as boards are configured to support NuttX. Currently for each board the developer/user need to manually create an initialization file for each feature or device (expect when the device is already in the common board folder).

Matias Nitsche (aka v0id) create a very descriptive and information explanation here: https://github.com/apache/incubator-nuttx/issues/1020

The goal for this project is to add Device Tree support for NuttX and let it to be configurable (low end board should be able to avoid using Device Tree for instance).


Difficulty: Major
Project size: ~350 hour (large)
Potential mentors:
Alan Carvalho de Assis, mail: acassis (at) apache.org
Project Devs, mail: dev (at) nuttx.apache.org

Micro-ROS integration on NuttX

Micro-ROS (https://micro.ros.org) is a ROS2 support to Microcontrollers. Initially the project was developed over NuttX by Bosch and other EU organizations. Later on they added support to FreeRTOS and Zephyr. After that NuttX support started ageing and we didn't get anyone working to fix it (with few exceptions like Roberto Bucher work to test it with pysimCoder).

Difficulty: Major
Project size: ~350 hour (large)
Potential mentors:
Alan Carvalho de Assis, mail: acassis (at) apache.org
Project Devs, mail: dev (at) nuttx.apache.org

Add X11 graphic support on NuttX using NanoX

NanoX/Microwindows is a small graphic library what allow Unix/Linux X11 application to run on embedded systems that cannot support X-Server because it is too big. Add it to NuttX will allow many applications to be ported to NuttX. More importantly: it will allow FLTK 1.3 run on NuttX and that could big Dillo web browser.

Difficulty: Major
Project size: ~350 hour (large)
Potential mentors:
Alan Carvalho de Assis, mail: acassis (at) apache.org
Project Devs, mail: dev (at) nuttx.apache.org

TinyGL support on NuttX

TinyGL is a small 3D graphical library created by Fabrice Bellard (same creator of QEMU) designed for embedded system. Currently NuttX RTOS doesn´t have a 3D library and this could enable people to add more 3D programs on NuttX.

Difficulty: Major
Project size: ~350 hour (large)
Potential mentors:
Alan Carvalho de Assis, mail: acassis (at) apache.org
Project Devs, mail: dev (at) nuttx.apache.org

SkyWalking

[GSOC] [SkyWalking] Self-Observability of the query subsystem in BanyanDB

Background

SkyWalking BanyanDB is an observability database, aims to ingest, analyze and store Metrics, Tracing and Logging data.

Objectives

  1. Support EXPLAIN[1] for both measure query and stream query
  2. Add self-observability including trace and metrics for query subsystem
  3. Support EXPLAIN in the client SDK & CLI and add query plan visualization in the UI

[1]: EXPLAIN in MySQL

Recommended Skills

  1. Familiar with Go
  2. Have a basic understanding of database query engine
  3. Have an experience of Apache SkyWalking or other APMs

Mentor

Difficulty: Major
Project size: ~350 hour (large)
Potential mentors:
Jiajing Lu, mail: lujiajing (at) apache.org
Project Devs, mail: dev (at) skywalking.apache.org

[GSOC] [SkyWalking] Add Overview page in BanyanDB UI

Background

SkyWalking BanyanDB is an observability database, aims to ingest, analyze and store Metrics, Tracing and Logging data.


The BanyanDB UI is a web interface provided BanyanDB server. It's developed with Vue3 and Vite3

Objectives

The UI should have a user-friendly Overview page.
The Overview page must display a list of nodes running in a cluster.
For each node in the list, the following information must be shown:

  • Node ID or name
  • Uptime
  • CPU usage (percentage)
  • Memory usage (percentage)
  • Disk usage (percentage)
  • Ports(gRPC and HTTP)

The web app must automatically refresh the node data at a configurable interval to show the most recent information.

Recommended Skills

  1. Familiar with Vue and Vite
  2. Have a basic understanding of RESTFul
  3. Have an experience of Apache SkyWalking
Difficulty: Major
Project size: ~350 hour (large)
Potential mentors:
Hongtao Gao, mail: hanahmily (at) apache.org
Project Devs, mail: dev (at) skywalking.apache.org

Doris

[GSoC][Doris]Support UPDATE for Doris Duplicate Key Table

Objectives

Support UPDATE for Doris Duplicate Key Table

Currently, Doris supports three data models, Duplicate Key / Aggregate Key / Unique Key, of which Unique Key has perfect data update support (including UPDATE statement). With the widespread popularity of Doris, users have more demands on Doris. For example, some user needs to perform ETL processing operations inside Doris, but they uses Duplicate Key table and hopes that Duplicate Key can also support UPDATE. For Duplicate Key, since there is no primary key can help we locate one specific row, UPDATE is low efficient. The usual practice is to rewrite all the data, even if the user only updates one field of a row of data, he must rewrite at least the segment file it is in. Another potentially more efficient solution is to implement Duplicate Key by combining Unique Key's Merge-on-Write, and the auto_increment column. i.e., let's change the underlying implementation of Duplicate Key to use Unique Key MoW, and add a hidden auto_increment column in the primary key, so that all the keys written by the user to the Unique Key MoW table are not duplicated, which realizes the semantics of Duplicate Key, and since each row of data has a unique primary key, we can reuse the UPDATE capability of Unique Key to support the Duplicate Key's UPDATE

We would like participants to help design and implement the solution, and perform performance testing for comparison and performance optimization.

Recommended Skills

Familiar with C++ programming

Familiar with the storage layer of Doris

Mentor

Mentor: Chen Zhang, Apache Doris Committer, chzhang1987@gmail.com

Mentor: Guolei Yi, Apache Doris PMC Member, yiguolei@gmail.com

Mailing List: dev@doris.apache.org

Website: https://doris.apache.org

Difficulty: Major
Project size: ~350 hour (large)
Potential mentors:
Calvin Kirs, mail: kirs (at) apache.org
Project Devs, mail: dev (at) doris.apache.org

[GSoC][Doris]Dictionary encoding optimization

Background

Apache Doris is a modern data warehouse for real-time analytics.
It delivers lightning-fast analytics on real-time data at scale.

Objectives

Dictionary encoding optimization
To save storage space, Doris uses dictionary encoding when storing string-type data in the storage layer if the cardinality is relatively low. Dictionary encoding involves mapping string values to integer values using a dictionary. The data can be stored directly as integers, and the dictionary information is stored separately. When reading the data, the integers are converted back to their corresponding string values based on the dictionary.

The storage layer doesn't know whether a column has low or high cardinality when the data comes in. Currently, the implementation encodes the first page using dictionary encoding, and if the dictionary becomes too large, it indicates a column with high cardinality. Subsequent pages will not use dictionary encoding. However, even for columns with high cardinality, a dictionary page is still retained, which doesn't save storage space and adds additional memory overhead during reading as well as extra CPU overhead during decoding.
Optimizations can be made to improve the memory and CPU overhead caused by dictionary encoding.

Recommended Skills
 
Familiar with C++ programming
Familiar with the storage layer of Doris
 

Mentor

 
Mentor: Xin Liao, Apache Doris Committer, liaoxinbit@gmail.com
Mentor: YongQiang Yang, Apache Doris PMC Member, dataroaring@gmail.com
Mailing List: dev@doris.apache.org
Website: https://doris.apache.org
Source Code: https://github.com/apache/doris
 
 

Difficulty: Major
Project size: ~350 hour (large)
Potential mentors:
Calvin Kirs, mail: kirs (at) apache.org
Project Devs, mail: dev (at) doris.apache.org

Beam

[GSOC][Beam] Build out Beam Use Cases

Apache Beam is a unified model for defining both batch and streaming data-parallel processing pipelines, as well as a set of language-specific SDKs for constructing pipelines and Runners for executing them on distributed processing backends. On top of providing lower level primitives, Beam has also introduced several higher level transforms used for machine learning and some general data processing use cases. This project focuses on identifying and implementing real world use cases that use these transforms

Objectives:
1. Add real world use cases demonstrating Beam's MLTransform for preprocessing data and generating embeddings
2. Add real world use cases demonstrating Beam's Enrichment transform for enriching existing data with data from a slowly changing source.
3. (Stretch) Implement 1 or more additional "enrichment handlers" for interacting with currently unsupported sources

Useful links:
Apache Beam repo - https://github.com/apache/beam
MLTransform docs - https://beam.apache.org/documentation/transforms/python/elementwise/mltransform/
Enrichment code - https://github.com/apache/beam/blob/master/sdks/python/apache_beam/transforms/enrichment.py
Enrichment docs (should be published soon) - https://github.com/apache/beam/pull/30187

Difficulty: Major
Project size: ~350 hour (large)
Potential mentors:
Danny McCormick, mail: damccorm (at) apache.org
Project Devs, mail: dev (at) beam.apache.org

[GSOC][Beam] Add connectors to Beam ManagedIO

Apache Beam is a unified model for defining both batch and streaming data-parallel processing pipelines, as well as a set of language-specific SDKs for constructing pipelines and Runners for executing them on distributed processing backends. On top of providing lower level primitives, Beam has also introduced several higher level transforms used for machine learning and some general data processing use cases. One new transform that is being actively worked on is a unified ManagedIO transform which gives runners the ability to manage (upgrade, optimize, etc...) an IO (input source or output sink) without upgrading the whole pipeline. This project will be about adding one or more IO integrations to ManagedIO

Objectives:
1. Add a BigTable integration to ManagedIO
2. Add a Spanner integration to ManagedIO

Useful links:
Apache Beam repo - https://github.com/apache/beam
Docs on ManagedIO are relatively light since this is a new project, but here are some docs on existing IOs in Beam - https://beam.apache.org/documentation/io/connectors/

Difficulty: Major
Project size: ~350 hour (large)
Potential mentors:
Danny McCormick, mail: damccorm (at) apache.org
Project Devs, mail: dev (at) beam.apache.org

[GSOC][Beam] Build out Beam Yaml features

Apache Beam is a unified model for defining both batch and streaming data-parallel processing pipelines, as well as a set of language-specific SDKs for constructing pipelines and Runners for executing them on distributed processing backends. Beam recently added support for launching jobs using Yaml on top of its other SDKs, this project would focus on adding more features and transforms to the Yaml SDK so that it can be the easiest way to define your data pipelines.

Objectives:
1. Add support for existing Beam transforms (IOs, Machine Learning transforms, and others) to the Yaml SDK
2. Add end to end pipeline use cases using the Yaml SDK
3. (stretch) Add Yaml SDK support to the Beam playground

Useful links:
Apache Beam repo - https://github.com/apache/beam
Yaml SDK code + docs - https://github.com/apache/beam/tree/master/sdks/python/apache_beam/yaml
Open issues for the Yaml SDK - https://github.com/apache/beam/issues?q=is%3Aopen+is%3Aissue+label%3Ayaml

Difficulty: Major
Project size: ~350 hour (large)
Potential mentors:
Danny McCormick, mail: damccorm (at) apache.org
Project Devs, mail: dev (at) beam.apache.org

Kvrocks

[GSoC] [Kvrocks] Support time series data structure and commands like Redis

RedisTimeSeries is a redis module used to operate and query time series data, giving redis basic time series database capabilities.

As Apache Kvrocks is characterized by being compatible with the Redis protocol and commands, we also hope to provide temporal data processing capabilities that are compatible with RedisTimeSeries.

This task is to implement the time series data structure and its commands on Kvrocks. Note: Since Kvrocks is an on-disk database based on RocksDB, the implementation will be quite different from Redis.

Recommended Skills

Modern C++, Database Internals (especially for time series databases), Software Engineering and Testing

References

https://redis.io/docs/data-types/timeseries/

https://kvrocks.apache.org/community/data-structure-on-rocksdb

Mentor

Mentor: Mingyang Liu, Apache Kvrocks PMC Member, twice@apache.org

Mailing List: dev@kvrocks.apache.org

Website: https://kvrocks.apache.org

Difficulty: Major
Project size: ~350 hour (large)
Potential mentors:
Mingyang Liu, mail: twice (at) apache.org
Project Devs, mail: dev (at) kvrocks.apache.org

[GSoC] [Kvrocks] Support embedded storage for Kvrocks cluster controller

Currently, the Kvrocks controller supports using multiple external storages like Apache Zookeeer / ETCD and also plans to support more common databases in the future. However, using external components will bring extra operation complexity for users. So it would be great if we could support embedded storage inside the controller, making it easier to maintain the controller service.

We would like participants to help design and implement the solution.

Recommended Skills

Familiar with the Go programming language and Know how the Raft algorithm works.

Mentor

Mentor: Hulk Lin, Apache Kvrocks PMC Member, hulk.website@gmail.com

Mailing List: dev@kvrocks.apache.org

Website: https://kvrocks.apache.org

Difficulty: Major
Project size: ~350 hour (large)
Potential mentors:
Hulk Lin, mail: hulk (at) apache.org
Project Devs, mail: dev (at) kvrocks.apache.org

OpenDAL

Apache OpenDAL ovirtiofs, OpenDAL File System via Virtio

cross posted at https://github.com/apache/opendal/issues/4133


Background

OpenDAL is a data access layer that allows users to easily and efficiently retrieve data from various storage services in a unified way. ovirtiofs can expose OpenDAL power in virtio way that allow users to mount storage services to VM or contianer directly.

Objectives

Features

Similiar to virtiofsd

In Scope:

  • Continuous reading
  • Continuous writing
  • Random reading
  • List dir
  • Stat file

Out Scope:

  • Random Write
  • Xattrs
  • Permissions

Tasks

  • Implement features that in scope
  • Implement tests suite

Recommended Skills

  • Familiar with Rust
  • Familiar with basic ideas of file system and virtio
  • Familiar with OpenDAL Rust Core

Mentor

Mentor: Xuanwo, Apache Apache PMC Member Chair, xuanwo@apache.org
Mailing List: dev@opendal.apache.org

Difficulty: Major
Project size: ~350 hour (large)
Potential mentors:
Hao Ding, mail: xuanwo (at) apache.org
Project Devs, mail: dev (at) opendal.apache.org

Apache OpenDAL ofs, OpenDAL File System via FUSE

Cross posted at https://github.com/apache/opendal/issues/4130


Background

OpenDAL is a data access layer that allows users to easily and efficiently retrieve data from various storage services in a unified way. ofs can expose OpenDAL power in fuse way that allow users to mount storage services locally.

Objectives

Implement ofs, allowing users to mount storage services locally for read and write.

Features

In Scope:

  • Continuous reading
  • Continuous writing
  • Random reading
  • List dir
  • Stat file

Out Scope:

  • Random Write
  • Xattrs
  • Permissions

Tasks

  • Implement features that in scope
  • Implement tests suite

Recommended Skills

  • Familiar with Rust
  • Familiar with basic ideas of file system and fuse
  • Familiar with OpenDAL Rust Core

Mentor

Mailing List: dev@opendal.apache.org

Mentor: junouyang, Apache OpenDAL PMC Member, junouyang@apache.org

Please leave comments if you want to be a mentor

Difficulty: Major
Project size: ~350 hour (large)
Potential mentors:
Hao Ding, mail: xuanwo (at) apache.org
Project Devs, mail: dev (at) opendal.apache.org

Apache OpenDAL oftp, OpenDAL FTP Server

cross posted at https://github.com/apache/opendal/issues/4132

Background

OpenDAL is a data access layer that allows users to easily and efficiently retrieve data from various storage services in a unified way. oftp can expose OpenDAL power in FTP way that allow users to access storage services via FTP protocol.

Objectives

Features

  • Impelment a FTP Server based on opendal

Tasks

  • Implement features that in scope
  • Implement tests suite

Recommended Skills

  • Familiar with Rust
  • Familiar with basic ideas of FTP protocol
  • Familiar with OpenDAL Rust Core

Mentor

Mentor: PsiACE, Apache Apache Member, psiace@apache.org
Mailing List: dev@opendal.apache.org
Please leave comments if you want to be a mentor

Difficulty: Major
Project size: ~350 hour (large)
Potential mentors:
Hao Ding, mail: xuanwo (at) apache.org
Project Devs, mail: dev (at) opendal.apache.org

EventMesh

Apache EventmMesh Enhance the serverless ability for EventMesh

Apache EventMesh
Apache EventMesh is a new generation serverless event middleware for building distributed event-driven applications.

Website: https://eventmesh.apache.org

GitHub: https://github.com/apache/eventmesh

Upstream Issue: https://github.com/apache/eventmesh/issues/4765

Background

EventMesh currently has Eventing capabilities in the serverless field, but it should also improve and supplement the automatic expansion and contraction capabilities of eventmesh's own services and access services. This service is the coordinator responsible for automatically scaling services connected to EventMesh, supporting automatic scaling from 0 to n and scaling down from n to 0 based on event traffic or other user conditions.

Task

1. Discuss with the mentors what you need to do

2. Learn the details of the Apache EventMesh project

3. Implement the auto scaling service for eventmesh, which can support different auto scaling strategies by default, or knaive and keda can be selected as plugin services for automatic expansion and contraction of the service.

Recommended Skills

1. Familiar with go and K8S

2. Familiar with Knative\KEDA

Difficulty: Major
Project size: ~350 hour (large)

Mentor

Eason Chen, PMC of Apache EventMesh, https://github.com/qqeasonchen, chenguangsheng@apache.org

Mike Xue, PMC of Apache EventMesh, https://github.com/xwm1992, mikexue@apache.org

Difficulty: Major
Project size: ~350 hour (large)
Potential mentors:
Xue Weiming, mail: mikexue (at) apache.org
Project Devs, mail: dev (at) eventmesh.apache.org

ShenYu

Apache ShenYu KitexPlugin

Description
`Apache ShenYu` is a Java native API Gateway for service proxy, protocol conversion and API governance.

`WASM`(WebAssembly) bytecode is designed to be encoded in a size- and load-time-efficient binary format. WebAssembly aims to leverage the common hardware features available on various platforms to execute in browsers at machine code speed.

`WASI`(WebAssembly System Interface) allows WASM to run in non browser environments such as Linux.

This plugin should base on [WasmPlugin](https://github.com/apache/shenyu/issues/4612), whcih means other languages, as long as their code can be compiled into WASM bytecode (such as Rust/golang/C++), can be used to write ShenYu plugins.


[kitex](https://github.com/cloudwego/kitex) is a Go RPC framework with high-performance and strong-extensibility for building micro-services.

You can find useful information [here](https://github.com/cloudwego/kitex/issues/1237).

The usage documentation for WasmPlugin is [here](https://shenyu.apache.org/docs/next/developer/custom-plugin/).


Relevant Skills
Know the use of Apache ShenYu, especially the wasm plugin.
Familiar with Golang and Java.


Task List


Links:

website: https://shenyu.apache.org/

issues: https://github.com/apache/shenyu/issues/5425

 
Difficulty: Major
Project size: ~350 hour (large)
Potential mentors:
zhangzicheng/mahaitao, mail: zhangzicheng@apache.org , mahaitao@apache.org
Project Devs, mail: dev@shenyu.apache.org

Difficulty: Major
Project size: ~350 hour (large)
Potential mentors:
ZiCheng Zhang, mail: zhangzicheng (at) apache.org
Project Devs, mail: dev (at) shenyu.apache.org

James Server

Implement RFC-8617 The Authenticated Received Chain (ARC) Protocol

What

https://datatracker.ietf.org/doc/html/rfc8617

https://arc-spec.org/

The Authenticated Received Chain (ARC) protocol provides an
authenticated "chain of custody" for a message, allowing each entity
that handles the message to see what entities handled it before and
what the message's authentication assessment was at each step in the
handling.

IE secured and standard Received headers.

Example:

            ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none;
            b=S4DQRVgRLMeqank+UkagI9DIPrecaQa+tD+qrvD1XyuYolqGtWYole5yzajb6B71t9ceuFfCWYBmbze89vRt9bCc4KpcjEjzEzuf0xTo4HevTzZ62DEqXKzuXn+nWSGEAdrAcXS3w4RaoyeFC3ypKalcHJggiMStBBKuMG2k1jTk5vxirVqtxLr526AQ3XNGDEewIRMyhbjKDHKinjknJGLucWWli5YOheM4CDVwZXsbNbfhp8TPQitFd411+SDWRduqN2uKE/IqHn1FgqacCKkQaew5MS+GywnbCiNp2BHRgHMJbOt2gIHhFFLiPAow/98PyAdCPAqRmHqvUqSyRQ==
            ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com;
            s=arcselector9901;
            h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1;
            bh=FrVWL4P2FSzOMb/KTATCDQLYPJHy7pwVkwAdt3ueFh8=;
            b=E+f/prHAHynoo8GBK4s4Dxsdch6uPcErYd9R9h24Lb9sHlBVycnXby5PjcwqGtnvqEo14+8MEdxv41PYzIGHldjWh8CPgK6YHeWu+Zk8zwy05atOXXRgGkiRdge2bFSgtP4RLvoyV9kwngnR/vCIbSyTchnrZKyQ2IVCyZbEZtpDBgv4YtF9/972A+hZQLvymg4rZai74RDrVxVPJ2hmKOBSfaqTlUIm82HO5D2DMbbN50EmN9cicVOVkFo1d9m7sz7azq5VzybS/52B4nd7uby7ITkM/Enw/tihr9E6NHA31HgqEt8dx9pjTt4VJjVZbjSrv1AyKBl6VSxPerKzeA==
            ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass
            smtp.mailfrom=docaposte.fr; dmarc=pass action=none header.from=docaposte.fr;
            dkim=pass header.d=docaposte.fr; arc=none
            

How

Implement a Mailet implementing ARC

Implement a Matcher validating ARC

Documentation (README)

If applicable, parsing ARC records shall be done as a separate maven module.

Definition of done

GSOC notes

Presenting a 1 week POC on the topic (as a separate mailet) would greatly improve the submission.

How to write custom mailet / matcher: https://github.com/apache/james-project/tree/master/examples/custom-mailets

Difficulty: Major
Project size: ~350 hour (large)
Potential mentors:
Benoit Tellier, mail: btellier (at) apache.org
Project Devs, mail: dev (at) james.apache.org

Openmeetings

Add blur background filter options on video sharing - AI-ML

OpenMeetings uses webRTC and HTML5 video to share audio video. Purely browser based.

One feature missing is the ability to blur your webcam's camera background.

There are multiple ways to achieve it, Google Meet seems to use: https://www.tensorflow.org/ 

Tensorflow are AI/ML models, they provide precompiled models into JS, for detection of face/body it seems: https://github.com/tensorflow/tfjs-models/tree/master/body-segmentation is the best model.

Since Chrome 14 there is also a Background Blur API (relying on operating system APIs): https://developer.chrome.com/blog/background-blur - but that doesn't seem to be widely or reliable supported by operating systems yet.

The project would be about adding the background blur into a simple demo and then integrate into the OpenMeetings project. Additionally other types of backgrounds can be added.

Tensorflow TFJS is under the Apache 2.0 License (See LICENSE) and should be possible to redistribute with Apache OpenMeetings.

Other live demos and examples:

https://blog.francium.tech/edit-live-video-background-with-webrtc-and-tensorflow-js-c67f92307ac5



Difficulty: Major
Project size: ~350 hour (large)
Potential mentors:
Sebastian Wagner, mail: sebawagner (at) apache.org
Project Devs, mail: dev (at) openmeetings.apache.org

UIMA

Support Aggregate Engines in Apache UIMACPP

UIMA is a framework for unstructured information management, built around the idea of heavy annotators interoperating using a common exchange format.

It has been in production use for about two decades.

The framework is mostly written in Java. It has a C++ counterpart that implements a subset of the framework.

The challenge for this GSOC is to work together with the mentor to implement the full framework.

More details on GitHub: https://github.com/apache/uima-uimacpp/issues/6


Benefits to the community

This has been discussed as one of the main roadblocks in using the C++ version of the framework by its users: https://lists.apache.org/thread/f1r3sghgn2oqhvzz27y26zg6j3olv8qq


About the mentor

Dr. Duboue has more than 25 years of experience in AI.  He has a Ph.D. in Computer Science from Columbia University. and was a member of the IBM Watson team that beat the Jeopardy! Champions.

Aside from his consulting work, he he has taught in three different countries and done joint research with more than fifty co-authors.

He has years of experience mentoring both students and employees.



Difficulty: Major
Project size: ~350 hour (large)
Potential mentors:
Pablo Duboue, mail: drdub (at) apache.org
Project Devs, mail: dev (at) uima.apache.org
  • No labels

5 Comments

  1. How this page auto-generated? Can we trigger the update?


  2. Hi, I can't execute `saxonb-xslt -s:SearchRequest.xml -xsl:preprocess.xslt  > xml.xml && saxonb-xslt -s:xml.xml -xsl:ideas.xslt  > test.html` in my macbook, what dependencies should I install ?

    1. Unfortunately, I have no Idea :(( I'm at Ubuntu ....

      1. Thank you for helping me update this page.