Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Contents

...

[GSOC][SkyWalking] Add Terraform provider for Apache SkyWalking

Now the deployment methods for SkyWalking are limited, we only have Helm Chart for users to deploy in Kubernetes, other users that are not using Kubernetes have to do all the house keeping stuffs to set up SkyWalking on, for example, VM.


This issue aims to add a Terraform provider, so that users can conveniently  spin up a cluster for demonstration or testing, we should evolve the provider and allow users to customize as their need and finally users can use this in their production environment.


In this task, we will mainly focus on the support for AWS. In the Terraform provider, users need to provide their access key / secret key, and the provider does the rest stuffs: create VMs, create database/OpenSearch or RDS, download SkyWalking tars, configure the SkyWalking, and start the SkyWalking components (OAP/UI), create public IPs/domain name, etc.

Difficulty: Major
Project size: ~350 hour (large)
Potential mentors:
Zhenxu Ke, mail: kezhenxu94 (at) apache.org
Project Devs, mail: dev (at) skywalking.apache.org

[SkyWalking] Build the OAP into GraalVM native image

Currently skywalking OAP is bundled as a tar ball when releasing, and the start time is long, we are looking for a way to distribute the binary executable in a more convenient way and speed up the bootstrap time. Now we found that GraalVM is a good fit not only it can solve the two aforementioned points but also it will bring benefits that, we can rewrite our LAL or even MAL system in the future with a more secure and isolated method, wasm, which is supported GraalVM too!

so this task is to adjust OAP, build it into GraalVM and make all tests in OAP passed. 

Difficulty: Major
Project size: ~350 hour (large)
Potential mentors:
Zhenxu Ke, mail: kezhenxu94 (at) apache.org
Project Devs, mail: dev (at) skywalking.apache.org

[GSOC] [SkyWalking] Add Overview page in BanyanDB UI

Background

SkyWalking BanyanDB is an observability database, aims to ingest, analyze and store Metrics, Tracing and Logging data.


The BanyanDB UI is a web interface provided BanyanDB server. It's developed with Vue3 and Vite3

GSOC] [SkyWalking] Add Overview page in BanyanDB UI

Background

SkyWalking BanyanDB is an observability database, aims to ingest, analyze and store Metrics, Tracing and Logging data.

The BanyanDB UI is a web interface provided BanyanDB server. It's developed with Vue3 and Vite3

Objectives

The UI should have a user-friendly Overview page.
The Overview page must display a list of nodes running in a cluster.
For each node in the list, the following information must be shown:

  • Node ID or name
  • Uptime
  • CPU usage (percentage)
  • Memory usage (percentage)
  • Disk usage (percentage)
  • Ports(gRPC and HTTP)

The web app must automatically refresh the node data at a configurable interval to show the most recent information.

Recommended Skills

  1. Familiar with Vue and Vite
  2. Have a basic understanding of RESTFul
  3. Have an experience of Apache SkyWalking
Difficulty: Major
Project size: ~350 hour (large)
Potential mentors:
Hongtao Gao, mail: hanahmily (at) apache.org
Project Devs, mail: dev (at) skywalking.apache.org

...

[GSOC] [SkyWalking] Python Agent Performance Enhancement Plan

Apache SkyWalking is an application performance monitor tool for distributed systems, especially designed for microservices, cloud native and container-based (Kubernetes) architectures. This task is about enhancing Python agent performance, the tracking issue can be seen here -< https://github.com/apache/skywalking/issues/10408


Mentor


Difficulty: Major
Project size: ~350 hour (large)
Potential mentors:
Yihao Chen, mail: yihaochen (at) apache.org
Project Devs, mail: dev (at) skywalking.apache.org

Beam

[

SkyWalking] Build the OAP into GraalVM native image

Currently skywalking OAP is bundled as a tar ball when releasing, and the start time is long, we are looking for a way to distribute the binary executable in a more convenient way and speed up the bootstrap time. Now we found that GraalVM is a good fit not only it can solve the two aforementioned points but also it will bring benefits that, we can rewrite our LAL or even MAL system in the future with a more secure and isolated method, wasm, which is supported GraalVM too!

so this task is to adjust OAP, build it into GraalVM and make all tests in OAP passed. 

Difficulty: Major
Project size: ~350 hour (large)
Potential mentors:
Zhenxu Ke, mail: kezhenxu94 (at) apache.org
Project Devs, mail: dev (at) skywalking.apache.org

Beam

GSoC][Beam] Build out Beam Machine Learning Use Cases

[GSoC][Beam] Build out Beam Machine Learning Use Cases

Today, you can do all sorts of Machine Learning using Apache Beam (https://beam.apache.org/documentation/ml/overview/).
 
Many of our users, however, have a hard time getting started with ML and understanding how Beam can be applied to their day to day work. The goal of this project is to build out a series of Beam pipelines as Jupyter Notebooks demonstrating real world ML use cases, from NLP to image recognition to using large language models. As you go, there may be bugs or friction points as well which will provide opportunities to contribute back to Beam's core ML libraries.


Mentor for this will be Danny McCormick

Difficulty: Major
Project size: ~350 hour (large)
Potential mentors:
Pablo Estrada, mail: pabloem (at) apache.org
Project Devs, mail: dev (at) beam.apache.org

...

[GSoC][Teaclave (incubating)] Data Privacy Policy Definition and Function Verification

Background

The Apache Teaclave (incubating) is a cutting-edge solution for confidential computing, providing Function-as-a-Service (FaaS) capabilities that enable the decoupling of data and function providers. Despite its impressive functionality and security features, Teaclave currently lacks a mechanism for data providers to enforce policies on the data they upload. For example, data providers may wish to restrict access to certain columns of data for third-party function providers. Open Policy Agent (OPA) offers flexible control over service behavior and has been widely adopted by the cloud-native community. If Teaclave were to integrate OPA, data providers could apply policies to their data, enhancing Teaclave’s functionality. Another potential security loophole in Teaclave is the absence of a means to verify the expected behavior of a function. This gap leaves the system vulnerable to exploitation by malicious actors. Fortunately, most of Teaclave’s interfaces can be reused, with the exception of the function uploading phase, which may require an overhaul to address this issue. Overall, the integration of OPA and the addition of a function verification mechanism would make Teaclave an even more robust and secure solution for confidential computing.

Benefits

If this proposal moves on smoothly, new functionality will be added to the Teaclave project that enables the verification of the function behavior that it strictly conforms to a prescribed policy.

Deliverables

  • Milestones: Basic policies (e.g., addition, subtraction) of the data can be verified by Teaclave; Complex policies can be verified.
  • Components: Verifier for the function code; Policy language adapters (adapt policy language to verifier); Policy language parser; Function source code converter (append policies to the functions).
  • Documentation: The internal working mechanism of the verification; How to write policies for the data.

Timeline Estimation

  • 0.5 month: Policy language parser and/or policy language design (if Rego is not an ideal choice).
  • 1.5 − 2 months: Verification contracts rewriting on the function source code based on the policy parsed. • (∼ 1 month): The function can be properly verified formally (by, e.g., querying the Z3 SMT solver).

Difficulty: Major
Project size: ~350 hour (large)
Potential mentors:
Mingshen Sun, Apache Teaclave (incubating) PPMC, mssun@apache.orgImage Removed

Difficulty: Major
Project size: ~350 hour (large)
Potential mentors:
Mingshen Sun, mail: mssun (at) apache.org
Project Devs, mail: dev (at) teaclave.apache.org

Airflow

)
Potential mentors:
Mingshen Sun, Apache Teaclave (incubating) PPMC, mssun@apache.orgImage Added

[GSoC][Airflow] Automation for PMC

This is a project to implement a tool for PMC task automation.

This is a large project.

Mentor will be aizhamal ,

Difficulty: Major
Project size: ~350 hour (large)
Potential mentors:
Pablo EstradaMingshen Sun, mail: pabloem mssun (at) apache.org
Project Devs, mail: dev (at) airflowteaclave.apache.org

SeaTunnel

Apache SeaTunnel(Incubating) Http Client For SeaTunnel Zeta

Apache SeaTunnel(Incubating)

SeaTunnel is a very easy-to-use ultra-high-performance distributed data integration platform that supports real-time synchronization of massive data. It can synchronize tens of billions of data stably and efficiently every day, and has been used in the production of nearly 100 companies. 

SeaTunnel provides a Connector API that does not depend on a specific execution engine. Connectors (Source, Transform, Sink) developed based on this API can run On many different engines, such as SeaTunnel Zeta, Flink, Spark that are currently supported. SeaTunnel has supported more than 100 Connectors, and the number is surging.

Website: https://seatunnel.apache.org/

GitHub: https://github.com/apache/incubator-seatunnel

Background

To use SeaTunnel, the current user needs to first create and write a config file that specifies the engine that runs the job, as well as engine related parameters. Then define the Source, Transform, and Sink of the job. We hope to provide a client that allows users to define the engine, Source, Transform, and Sink information of the job run directly through code in the client without having to start with a config file. The user can then submit the job definition information through the Client, and SeaTunnel will run these jobs. After the job is submitted, the user can obtain the status of the job running through the Client. For jobs that are already running, users can use this client to manage them, such as stopping jobs, temporary jobs, and so on.

Task

1. Discuss with the mentors what you need to do

2. Learn the details of the Apache SeaTunnel project

3. Discuss and complete design and development

Relevant Skills

  1. Familiar with Java, Http
  2. Familiar with SeaTunnel is better

Mentor

  • Mentor: Jun Gao, Apache SeaTunnel(Incubating) PPMC Member, gaojun2048@apache.org
  • Mentor: Li Liu, Apache SeaTunnel(Incubating) Commiter, ic4y@apache.org
  • Mailing List: dev@seatunnel.apache.org
  • , gaojun2048@apache.org
  • Mentor: Li Liu, Apache SeaTunnel(Incubating) Commiter, ic4y@apache.org
  • Mailing List: dev@seatunnel.apache.org
Difficulty: Major
Project size: ~350 hour (large)
Potential mentors:
Jun Gao, mail: gaojun2048 (at) apache.org
Project Devs, mail: dev (at) seatunnel.apache.org

Airflow

[GSoC][Airflow] Automation for PMC

This is a project to implement a tool for PMC task automation.


This is a large project.


Mentor will be aizhamal ,

Difficulty: Major
Project size: ~350 hour (large)
Potential mentors:
Jun GaoPablo Estrada, mail: gaojun2048 pabloem (at) apache.org
Project Devs, mail: dev (at) seatunnelairflow.apache.org

CloudStack

CloudStack GSoC 2023 - Improve ConfigDrive to store network information

Github issue: https://github.com/apache/cloudstack/issues/2872


ConfigDrive / cloud-init supports a network_data.json file which can contain network information for a VM.

By providing the network information using ConfigDrive to a VM we can eliminate the need for DHCP and thus the Virtual Router in some use-cases.

An example JSON file:

            {
            "links": [
            {
            "ethernet_mac_address": "52:54:00:0d:bf:93",
            "id": "eth0",
            "mtu": 1500,
            "type": "phy"
            }
            ],
            "networks": [
            {
            "id": "eth0",
            "ip_address": "192.168.200.200",
            "link": "eth0",
            "netmask": "255.255.255.0",
            "network_id": "dacd568d-5be6-4786-91fe-750c374b78b4",
            "routes": [
            {
            "gateway": "192.168.200.1",
            "netmask": "0.0.0.0",
            "network": "0.0.0.0"
            }
            ],
            "type": "ipv4"
            },
            {
            "id": "eth0",
            "ip_address": "2001:db8:100::1337",
            "link": "eth0",
            "netmask": "64",
            "network_id": "dacd568d-5be6-4786-91fe-750c374b78b4",
            "routes": [
            {
            "gateway": "2001:db8:100::1",
            "netmask": "0",
            "network": "::"
            }
            ],
            "type": "ipv6"
            }
            ],
            "services": [
            {
            "address": "8.8.8.8",
            "type": "dns"
            }
            ]
            }

In Basic Networking and Advanced Networking zones which are using a shared network you wouldn't require a VR anymore.

Difficulty: Major
Project size: ~175 hour (medium)
Potential mentors:
Nicolás Vázquez, mail: nvazquez (at) apache.org
Project Devs, mail: dev (at) cloudstack.apache.org

...

Dubbo GSoC 2023 - HTTP/3 Rest Support

HTTP/3 has been formalized as a standard in the last year. Dubbo, as a framework that supports publishing and invoking Web services, needs to support the HTTP/3 protocol.

This task needs to expand the implementation of the current rest protocol to support publishing HTTP/3 services and calling HTTP/3 services.the current rest protocol to support publishing HTTP/3 services and calling HTTP/3 services.

Difficulty: Major
Project size: ~350 hour (large)
Potential mentors:
Albumen Kevin, mail: albumenj (at) apache.org
Project Devs, mail:

Dubbo GSoC - Pixiu supports gRPC/dubbo protocol with WASM plug-in

Pixiu acts as a gateway, forwarding traffic to various services.
Pixiu needs to support communication between different applications on the browser, and WASM needs to be supported on the browser. Currently, it only supports the HTTP protocol.
This project needs to complete the communication protocol below WASM (gRPC is preferred)
1. Support gRPC protocol
2. Support dubbo protocol

The front end calls gRPC reference https://github.com/grpc/grpc-web

Difficulty: Major
Project size: ~350 hour (large)
Potential mentors:
Albumen Kevin, mail: albumenj (at) apache.org
Project Devs, mail:

Dubbo GSoC 2023 - Automatically configure pixiu as istio ingress gateway

In the istio mesh environment, the public dubbo/dubbo go provider can be exposed outside the cluster through the http/https protocol through the istio ingress gateway. This requires the ingress gateway to complete the conversion from http to dubbo protocol, which is the main scenario of pixiu; this project Need to complete:
1. Customize pixiu, which can be used as an istio ingress gateway, proxy http/https requests and convert them into dubbo requests;
2. The gateway supports basic user authentication methods.

Basic reference: https://istio.io/latest/blog/2019/custom-ingress-gateway/
https://cloud.ibm.com/docs/containers?topic=containers-istio-custom-gateway

Difficulty: Major
Project size: ~175 hour (medium)
Potential mentors:
Albumen Kevin, mail: albumenj (at) apache.org
Project Devs, mail:

Dubbo GSoC - Pixiu supports gRPC/dubbo protocol with WASM plug-in

Pixiu acts as a gateway, forwarding traffic to various services.
Pixiu needs to support communication between different applications on the browser, and WASM needs to be supported on the browser. Currently, it only supports the HTTP protocol.
This project needs to complete the communication protocol below WASM (gRPC is preferred)
1. Support gRPC protocol
2. Support dubbo protocol

The front end calls gRPC reference https://github.com/grpc/grpc-web

Difficulty: Major
Project size: ~350 hour (large)
Potential mentors:
Albumen Kevin, mail: albumenj (at) apache.org
Project Devs, mail:

...