Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: re-generated

Contents

ShardingSphere

Synapse

Open Telemetry based Tracing for Apache Synapse

Currently, Apache Synapse does not have sophisticated support for modern tracing standardized. Therefore this new feature is intended to implement OpenTelemetery based tracing implementation for apache synapse.


This feature will include request-response training and inbound/outbound tracing at the transport level and the orchestration layer. Further, this also needs a really good investigation on Opentelemetry specification[1] and the Apache synapse transport component [1].


Relevant Skills

  1. JAVA language
  2. Understanding about observability 
  3. Integration and Synapse configuration language.

[1]https://opentelemetry.io/ 
[2] http://synapse.apache.org/userguide/transports/pass_through.html

Difficulty: Major
Potential mentors:
Vanjikumaran Sivajothy, mail: vanjikumaran@gmail.com (at) apache.org
Project Devs, mail:

ShardingSphere

Apache ShardingSphere: Proofread the SQL definitions for ShardingSphere Parser

Apache ShardingSphere

Apache ShardingSphere is a distributed database middleware ecosystem, including 2 independent products, ShardingSphere JDBC and ShardingSphere Proxy presently. They all provide functions of data sharding, distributed transaction, and database orchestration.
Page: https://shardingsphere.apache.org
Github

Apache ShardingSphere: Proofread the SQL definitions for ShardingSphere Parser

Apache ShardingSphere

Apache ShardingSphere is a distributed database middleware ecosystem, including 2 independent products, ShardingSphere JDBC and ShardingSphere Proxy presently. They all provide functions of data sharding, distributed transaction, and database orchestration.
Page: https://shardingsphere.apache.org
Github: https://github.com/apache/shardingsphere

Background

ShardingSphere parser engine helps users parse a SQL to get the AST (Abstract Syntax Tree) and visit this tree to get SQLStatement (Java Object). At present, this parser engine can handle SQLs for `MySQL`, `PostgreSQL`, `SQLServer` and `Oracle`, which means we have to understand different database dialect SQLs.
More details: https://shardingsphere.apache.org/document/current/en/features/sharding/principle/parse/

Task

This issue is to proofread the DML(SELECT/UPDATE/DELETE/INSERT) SQL definitions for Oracle. As we have a basic Oracle SQL syntax definitions but do not keep in line with Oracle DOC, we need you to find out the vague SQL grammar definitions and correct them referring to Oracle DOC.

Notice, when you review these DML(SELECT/UPDATE/DELETE/INSERT) SQLs, you will find that these definitions will involve some basic elements of Oracle SQL. No doubt, these elements are included in this task as well.

Relevant Skills

1. Master JAVA language
2. Have a basic understanding of Antlr g4 file
3. Be familiar with Oracle SQLs

Targets files

1. DML SQLs g4 file: https://github.com/apache/shardingsphere/blob/master/shardingsphere-sql-parser/shardingsphere-sql-parser-dialect/shardingsphere-sql-parser-oracle/src/main/antlr4/imports/oracle/DMLStatement.g4
2. Basic elements g4 file: https://github.com/apache/shardingsphere/blob/master/shardingsphere-sql-parser/shardingsphere-sql-parser-dialect/shardingsphere-sql-parser-oracle/src/main/antlr4/imports/oracle/BaseRule.g4

References

1. Oracle SQL quick reference: https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlqr/SQL-Statements.html#GUID-1FA35EAD-AED2-4619-BFEE-348FF05D1F4A
2. Detailed Oracle SQL info: https://docs.oracle.com/pls/topic/lookup?ctx=en/database/oracle/oracle-database/19/sqlqr&id=SQLRF008

Mentor

Juan Pan, PMC of Apache ShardingSphere, panjuan@apache.orgImage Removed

Difficulty: Major
Potential mentors:
Juan Pan, mail: panjuan (at) apache.org
Project Devs, mail: dev (at) shardingsphere.apache.org

SkyWalking

Background

ShardingSphere parser engine helps users parse a SQL to get the AST (Abstract Syntax Tree) and visit this tree to get SQLStatement (Java Object). At present, this parser engine can handle SQLs for `MySQL`, `PostgreSQL`, `SQLServer` and `Oracle`, which means we have to understand different database dialect SQLs.
More details: https://shardingsphere.apache.org/document/current/en/features/sharding/principle/parse/

Task

This issue is to proofread the DML(SELECT/UPDATE/DELETE/INSERT) SQL definitions for Oracle. As we have a basic Oracle SQL syntax definitions but do not keep in line with Oracle DOC, we need you to find out the vague SQL grammar definitions and correct them referring to Oracle DOC.

Notice, when you review these DML(SELECT/UPDATE/DELETE/INSERT) SQLs, you will find that these definitions will involve some basic elements of Oracle SQL. No doubt, these elements are included in this task as well.

Relevant Skills

1. Master JAVA language
2. Have a basic understanding of Antlr g4 file
3. Be familiar with Oracle SQLs

Targets files

1. DML SQLs g4 file: https://github.com/apache/shardingsphere/blob/master/shardingsphere-sql-parser/shardingsphere-sql-parser-dialect/shardingsphere-sql-parser-oracle/src/main/antlr4/imports/oracle/DMLStatement.g4
2. Basic elements g4 file: https://github.com/apache/shardingsphere/blob/master/shardingsphere-sql-parser/shardingsphere-sql-parser-dialect/shardingsphere-sql-parser-oracle/src/main/antlr4/imports/oracle/BaseRule.g4

References

1. Oracle SQL quick reference: https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlqr/SQL-Statements.html#GUID-1FA35EAD-AED2-4619-BFEE-348FF05D1F4A
2. Detailed Oracle SQL info: https://docs.oracle.com/pls/topic/lookup?ctx=en/database/oracle/oracle-database/19/sqlqr&id=SQLRF008

Mentor

Juan Pan, PMC of Apache ShardingSphere, panjuan@apache.orgImage Added

Difficulty: Major
Potential mentors:
Juan Pan, mail: panjuan (at) apache.org
Project Devs, mail: dev (at) shardingsphere.apache.org

SkyWalking

Apache SkyWalking: Python agent collects and reports PVM metrics to backend

Apache SkyWalking [1] is an application performance monitor (APM) tool for distributed systems, especially designed for microservices, cloud native and container-based (Docker, K8s, Mesos) architectures.

Tracing distributed systems is one of the main features of SkyWalking, with those traces, it can analyze some service metrics such as CPM, success rate, error rate, apdex, etc. SkyWalking also supports receiving metrics from the agent side directly.

In this task, we expect the Python agent to report its Python Virtual Machine (PVM) metrics, including (but not limited to, whatever metrics useful are also acceptable) CPU usage (%), memory used (MB), (active) thread/coroutine counts, garbage collection count, etc.

To complete this task, you must be comfortable with Python and gRPC, otherwise you'll have a hard time coming up to speed.

Live demo to play around: http://122.112.182.72:8080 (under reconstruction, maybe unavailable but latest demo address can be found at the GitHub index page http://github.com/apache/skywalking)

[1] http://skywalking.apache.org

Difficulty: Major
Potential mentors:
Zhenxu Ke, mail: kezhenxu94 (at) apache.org
Project Devs, mail: dev (at) skywalking.apache.org

Apache SkyWalking: Python agent supports profiling

Apache SkyWalking 

Apache SkyWalking: Python agent supports profiling

Apache SkyWalking [1] is an application performance monitor (APM) tool for distributed systems, especially designed for microservices, cloud native and container-based (Docker, K8s, Mesos) architectures.

SkyWalking is based on agent to instrument (automatically) monitored services, for now, we have many agents for different languages, Python agent [2] is one of them, which supports automatic instrumentations.

The goal of this project is to extend the agent's features by supporting profiling [3] a function's invocation stack, help the users to analyze which method costs the most major time in a cross-services call.

To complete this task, you must be comfortable with Python, have some knowledge of tracing system, otherwise you'll have a hard time coming up to speed..

[1] http://skywalking.apache.org
[2] http://github.com/apache/skywalking-python
[3] https://thenewstack.io/apache-skywalking-use-profiling-to-fix-the-blind-spot-of-distributed-tracing/

Difficulty: Major
Potential mentors:
Zhenxu Ke, mail: kezhenxu94 (at) apache.org
Project Devs, mail: dev (at) skywalking.apache.org

TrafficControl

GSOC: Varnish Cache support in Apache Traffic Control

Background
Apache Traffic Control is a Content Delivery Network (CDN) control plane for large scale content distribution.

Traffic Control currently requires Apache Traffic Server as the underlying cache. Help us expand the scope by integrating with the very popular Varnish Cache.

There are multiple aspects to this project:

  • Configuration Generation: Write software to build Varnish configuration files (VCL). This code will be implemented in our Traffic Ops and cache client side utilities, both written in Go.
  • Health Monitoring: Implement monitoring of the Varnish cache health and performance. This code will run both in the Traffic Monitor component and within Varnish. Traffic Monitor is written in Go and Varnish is written in C.
  • Testing: Adding automated tests for new code

Skills:

  • Proficiency in Go is required
  • A basic knowledge of HTTP and caching is preferred, but not required for this project.

Apache SkyWalking: Python agent collects and reports PVM metrics to backend

Apache SkyWalking [1] is an application performance monitor (APM) tool for distributed systems, especially designed for microservices, cloud native and container-based (Docker, K8s, Mesos) architectures.

Tracing distributed systems is one of the main features of SkyWalking, with those traces, it can analyze some service metrics such as CPM, success rate, error rate, apdex, etc. SkyWalking also supports receiving metrics from the agent side directly.

In this task, we expect the Python agent to report its Python Virtual Machine (PVM) metrics, including (but not limited to, whatever metrics useful are also acceptable) CPU usage (%), memory used (MB), (active) thread/coroutine counts, garbage collection count, etc.

To complete this task, you must be comfortable with Python and gRPC, otherwise you'll have a hard time coming up to speed.

Live demo to play around: http://122.112.182.72:8080 (under reconstruction, maybe unavailable but latest demo address can be found at the GitHub index page http://github.com/apache/skywalking)

[1] http://skywalking.apache.org
Difficulty: Major
Potential mentors:
Zhenxu KeEric Friedrich, mail: kezhenxu94 friede (at) apache.org
Project Devs, mail: dev (at) skywalkingtrafficcontrol.apache.org

Apache Hudi

[UMBRELLA] Checkstyle, formatting, warnings, spotless

Umbrella ticket to track all tickets related to checkstyle, spotless, warnings etc.

Difficulty: Major
Potential mentors:
sivabalan narayanan, mail: shivnarayan (at) apache.org
Project Devs, mail: dev (at) hudi.apache.org

...

[UMBRELLA] Improve CLI features and usabilities

(More details to be added)

Difficulty: Major
Potential mentors:
Raymond Xu, mail: xushiyan (at) apache.org
Project Devs, mail: dev (at) hudi.apache.org

[UMBRELLA] Support

Apache Calcite for writing/querying Hudi datasets

schema inference for unstructured data

(More details to be added)

Difficulty: Major
Potential mentors:
Raymond Xu, mail: xushiyan (at) apache.org
Project Devs, mail: dev (at) hudi.apache.org

[UMBRELLA] Improve source ingestion support in DeltaStreamer

(More details to be added)

) hudi.apache.org

Apache Airflow integration w/ Apache Hudi


Difficulty: Major
Potential mentors:
Raymond Xusivabalan narayanan, mail: rxu shivnarayan (at) apache.org
Project Devs, mail: dev (at) hudi.apache.org

[UMBRELLA] Survey indexing technique for better query performance

(More details to be added)

Pandas(python) integration w/ Apache Hudi


Difficulty: Major
Potential mentors:
Raymond Xusivabalan narayanan, mail: xushiyan shivnarayan (at) apache.org
Project Devs, mail: dev (at) hudi.apache.org

[UMBRELLA] Support schema inference for unstructured data

(More details to be added)

Pyspark w/ Apache Hudi


Difficulty: Major
Potential mentors:
Raymond Xusivabalan narayanan, mail: xushiyan shivnarayan (at) apache.org
Project Devs, mail: dev (at) hudi.apache.org
Apache Airflow

Snowflake integration w/ Apache Hudi


Difficulty: Major
Potential mentors:
sivabalan narayanan, mail: shivnarayan (at) apache.org
Project Devs, mail: dev (at) hudi.apache.org
Pandas(python) integration w/ Apache Hudi

[UMBRELLA] Support Apache Calcite for writing/querying Hudi datasets

(More details to be added)

Difficulty: Major
Potential mentors:
sivabalan narayananRaymond Xu, mail: shivnarayan xushiyan (at) apache.org
Project Devs, mail: dev (at) hudi.apache.org
Pyspark w/ Apache Hudi

[UMBRELLA] Improve source ingestion support in DeltaStreamer

(More details to be added)

Difficulty: Major
Potential mentors:
sivabalan narayananRaymond Xu, mail: shivnarayan rxu (at) apache.org
Project Devs, mail: dev (at) hudi.apache.org

[UMBRELLA] Survey indexing technique for better query performance

(More details to be added)

Snowflake integration w/ Apache Hudi

Difficulty: Major
Potential mentors:
sivabalan narayananRaymond Xu, mail: shivnarayan xushiyan (at) apache.org
Project Devs, mail: dev (at) hudi.apache.org

...

Apache APISIX: Support Nacos in a native way

Apache APISIX is a dynamic, real-time, high-performance cloud-native API gateway, based on the Nginx library and etcd.

Page: https://apisix.apache.org
Github: https://github.com/apache/apisix

Background

To get the upstream information dynamically, APISIX need to be integrated with other service discovery systems. Currently we already support Eureka, and many people hope we can support Nacos too.

Nacos is a widely adopted service discovery system: https://nacos.io/en-us/index.html

Previously we try to support Nacos via DNS. Nacos provides a CoreDNS plugin to expose the information via DNS: https://github.com/nacos-group/nacos-coredns-plugin

However, this plugin seems to be unmaintained.

Therefore, it would be good if we can support Nacos natively via its API, which is expected to be maintained.


Task

Integrate Nacos with APISIX via Nacos's HTTP API.


Relevant Skills

1. Master Lua language and HTTP protocol
2. Have a basic understanding of APISIX / Nacos


Targets files

1. https://github.com/apache/apisix/tree/master/apisix/discovery

References

1. Nacos Open API: https://nacos.io/en-us/docs/open-api.html

Mentor

Zexuan Luo, committer of Apache APISIX, spacewander@apache.org

Difficulty: Major
Potential mentors:
Zexuan Luo, mail: spacewander (at) apache.org
Project Devs, mail: dev (at) apisix.apache.org

Apache APISIX Dashboard: Enhancement plugin orchestration

The Apache APISIX Dashboard is designed to make it as easy as possible for users to operate operate Apache APISIX through  through a frontend interface.

The Dashboard is the control plane and performs all parameter checks; Apache APISIX mixes data and control planes and will evolve to a pure data plane.

This project includes managerincludes manager-api, which will gradually replace adminreplace admin-api in api in Apache APISIX.

Background

The plugin orchestration feature allows users to define the order of plugins to meet their scenarios. At present, we have implemented the plugin scheduling feature, but there are still many points to be optimized.

Task

1. develop a new plugin, conditional judgment card style.

2. Add arrows for connecting lines.

3. Limit plugin orchestration operations. For example, only one connection line is allowed between points.

Relevant Skills

1. Basic  Basic use of HTML, CSS, and JavaScript.

2. Basic use of of  React Framework.

Mentor

Yi Sun, committer of Apache APISIX,sunyi@apache.org
 

Difficulty: Major
Potential mentors:
Yi Sun, mail: sunyi (at) apache.org
Project Devs, mail: dev (at) apisix.apache.org

...

Apache APISIX: supports obtaining etcd data information through plugin

Apache APISIX

Apache APISIX is a dynamic, real-time, high-performance API gateway, based on the Nginx library and etcd.

APISIX provides rich traffic management features such as load balancing, dynamic upstream, canary release, circuit breaking, authentication, observability, and more.

You can use Apache APISIX to handle traditional north-south traffic, as well as east-west traffic between services. It can also be used as a k8s ingress controller.

Background
 
When we get the stored data of etcd, we need to manually execute the URI request method to get each piece of data, and we cannot monitor the changed data in etcd. This is not friendly to issues such as obtaining multiple etcd stored data and monitoring etcd data changes. Therefore, we need to design a method to solve this problem.

Related issue: https://github.com/apache/apisix/issues/2453

Task

In the Apache APISIX (https://github.com/apache/apisix) project, implement a plug-in with the following functions:

1.Find route based on URI;
2.Watch etcd to print out the object that has recently changed;
3.Query the corresponding data based on ID (route, service, consumer, etc.).

Relevant Skills

1. Master Lua language;
2. Have a basic understanding of API Gateway or Web server;
3. Be familiar with ETCD.

Mentor

Yuelin Zheng, yuelinz99@gmail.com

Difficulty: Major
Potential mentors:
Yuelin Zheng, mail: firstsawyou (at) apache.org
Project Devs, mail: dev (at) apisix.apache.org

...

Apache APISIX: support to fetch more useful information of client request

What's Apache APISIX?

Apache APISIX is a dynamic, real-time, high-performance API gateway, based on the Nginx library and etcd.

APISIX provides rich traffic management features such as load balancing, dynamic upstream, canary release, circuit breaking, authentication, observability, and more.

You can use Apache APISIX to handle traditional north-south traffic, as well as east-west traffic between services. It can also be used as a k8s ingress controller.

Background (route matching and run plugins)

When the client completes a request, there is a lot of useful information inside Apache APISIX. 


Task

Needs a way to show it. It is convenient for callers to troubleshoot problems and understand the workflow of Apache APISIX.

The first version target can display:
1. Which route is matched.
2. Which plugins are loaded.

In subsequent versions, we will add more information that the caller cares about, such as:

  • Whether the global plugin is executed
  • Time consumption statistics
  • The return value when the plugin is executed

    Relevant Skills

1. Master Lua language
2. Have a basic understanding of API Gateway or Web server

Difficulty: Major
Potential mentors:
YuanSheng Wang, mail: membphis (at) apache.org
Project Devs, mail: dev (at) apisix.apache.org

Apache APISIX: improve the website

Apache APISIX

Apache APISIX is a dynamic, real-time, high-performance API gateway, based on the Nginx library and etcd, and we have a standalone website to let more people know about the Apache APISIX. 

Background

The website of Apache APISIX is used for showing people what's Apache APISIX is, and it will include up to date docs to let developers searching guides more easily, and so on.

Task

In the website[1]  and its repo[2], we are going to refactor the homepage, improve those docs which include apisix's docs and some like release guide.

Relevant Skills
TypeScript

React.js

Mentor

Zhiyuan, PMC of Apache APISIX, juzhiyuan@apache.org


[1] https://apisix.apache.org/

[2]https://github.com/apache/apisix-website

Difficulty: Major
Potential mentors:
Zhiyuan, mail: juzhiyuan (at) apache.org
Project Devs, mail: dev (at) apisix.apache.org