...
Contents
- James Server
- Commons Statistics
- Commons Numbers
- Commons Math
- Commons Imaging
- RocketMQ
- EventMesh
- StreamPipes
- ShardingSphere
- SkyWalking
- ShenYu
- TrafficControl
- Doris
- Beam
- Airflow
- Teaclave
- CloudStack
- Apache Nemo
- Apache Dubbo
- Dubbo GSoC 2023 - Refactor the http layerDubbo GSoC 2023 - Integration suite on Kubernetes
- Apache Commons All
- Airavata
James Server
Adopt Pulsar as the messaging technology backing the distributed James server
https://www.mail-archive.com/server-dev@james.apache.org/msg71462.html
A good long term objective for the PMC is to drop RabbitMQ in
favor of pulsar (third parties could package their own components using
RabbitMQ if they wishes...)
This means:
- Solve the bugs that were found during the Pulsar MailQueue review
- Pulsar MailQueue need to allow listing blobs in order to be
deduplication friendly. - Provide an event bus based on Pulsar
- Provide a task manager based on Pulsar
- Package a distributed server backed by pulsar, deprecate then replace
the current one. - (optionally) support mail queue priorities
While contributions would of course be welcomed on this topic, we could
offer it as part of GSOC 2022, and we could co-mentor it with mentors of
the Pulsar community (see [3])
[3] https://lists.apache.org/thread/y9s7f6hmh51ky30l20yx0dlz458gw259
Would such a plan gain traction around here ?
...
Nemo on Google Dataproc
Issues for making it easy to install and use Nemo on Google Dataproc.
Apache Dubbo
Dubbo GSoC 2023 -
Refactor the http layerIntegration suite on Kubernetes
As a development framework that is closely related to users, Dubbo may have a huge impact on users if any problems occur during the iteration process. Therefore, Dubbo needs a complete set of automated regression testing tools.
At present, Dubbo already has a set of testing tools based on docker-compose, but this set of tools cannot test the compatibility in the kubernetes environment. At the same time, we also need a more reliable test case construction system to ensure that the test cases are sufficiently complete.
Background
Dubbo currently supports the rest protocol based on http1, and the triple protocol based on http2, but currently the two protocols based on the http protocol are implemented independently, and at the same time, they cannot replace the underlying implementation, and their respective implementation costs are relatively high.
Target
In order to reduce maintenance costs, we hope to be able to abstract http. The underlying implementation of the target implementation of http has nothing to do with the protocol, and we hope that different protocols can reuse related implementations.
Dubbo GSoC 2023 -
Integration suite on KubernetesDubbo usage scanner
As a development framework that is closely related to users, Dubbo provides many functional features (such as configuring timeouts, retries, etc.). We hope that a tool can be given to users to scan which features are used, which features are deprecated, which ones will be deprecated in the future, and so on. Based on this tool, we can provide users with a better migration solution.
Suggestion: You can consider based on static code scanning or javaagent implementation.
Dubbo GSoC 2023 - Remove jprotoc in compiler
Dubbo supports the communication mode based on the gRPC protocol through Triple. For this reason, Dubbo has developed a compiling plug-in for proto files based on jprotoc. Due to the activeness of jprotoc, currently Dubbo compiler cannot run well on the latest protobuf version. Therefore, we need to consider implementing a new compiler with reference to gRPC may have a huge impact on users if any problems occur during the iteration process. Therefore, Dubbo needs a complete set of automated regression testing tools.
At present, Dubbo already has a set of testing tools based on docker-compose, but this set of tools cannot test the compatibility in the kubernetes environment. At the same time, we also need a more reliable test case construction system to ensure that the test cases are sufficiently complete.
Dubbo GSoC 2023 - Dubbo
usage scanneri18n log
Dubbo is
As a development framework that is closely related to users, Dubbo provides many functional features (such as configuring timeouts, retries, etc.). We hope that a tool can be given to users to scan which features are used, which features are deprecated, which ones will be deprecated in the future, and so on. Based on this tool, we can provide users with a better migration solution.
Suggestion: You can consider based on static code scanning or javaagent implementation.and many usages by users may cause exceptions handled by Dubbo. Usually, in this case, users can only judge through logs. We hope to provide an i18n localized log output tool to provide users with a more friendly log troubleshooting experience.
Dubbo GSoC 2023 -
Remove jprotoc in compilerRefactor dubbo project to gradle
As more and more projects start to develop based on Gradle and profit from Gradle, Dubbo also hopes to migrate to the Gradle project. This task requires you to transform the dubbo project[1] into a gradle project.
Dubbo supports the communication mode based on the gRPC protocol through Triple. For this reason, Dubbo has developed a compiling plug-in for proto files based on jprotoc. Due to the activeness of jprotoc, currently Dubbo compiler cannot run well on the latest protobuf version. Therefore, we need to consider implementing a new compiler with reference to gRPC.
Dubbo GSoC 2023 - Metrics on Dubbo
i18n logAdmin
Dubbo Admin is a development framework that is closely related to users, and many usages by users may cause exceptions handled by Dubbo. Usually, in this case, users can only judge through logs. We hope to provide an i18n localized log output tool to provide users with a more friendly log troubleshooting experienceconsole of Dubbo. Today, Dubbo's observability is becoming more and more powerful. We need to directly observe some indicators of Dubbo on Dubbo Admin, and even put forward suggestions for users to improve problems.
Dubbo GSoC 2023 - Refactor
dubbo project to gradleConnection
Background
At present, the abstraction of connection by client in different protocols in Dubbo is not perfect. For example, there is a big discrepancy between the client abstraction of connection in dubbo and triple protocols. As a result, the enhancement of connection-related functions in the client is more complicated, and the implementation cannot be reused. At the same time, the client also needs to implement a lot of repetitive code when extending the protocol.
Target
Reduce the complexity of the client part when extending the protocol, and increase the reuse of connection-related modules.
As more and more projects start to develop based on Gradle and profit from Gradle, Dubbo also hopes to migrate to the Gradle project. This task requires you to transform the dubbo project[1] into a gradle project.
Dubbo GSoC 2023 -
Metrics on Dubbo AdminIDL management
Background
Dubbo currently supports protobuf as a serialization method. Protobuf relies on proto (Idl) for code generation, but currently lacks tools for managing Idl files. For example, for java users, proto files are used for each compilation. It is more troublesome, and everyone is used to using jar packages for dependencies.
Target
Implement an Idl management and control platform, support idl files to automatically generate dependency packages in various languages, and push them to relevant dependency warehouses
Dubbo Admin is a console of Dubbo. Today, Dubbo's observability is becoming more and more powerful. We need to directly observe some indicators of Dubbo on Dubbo Admin, and even put forward suggestions for users to improve problems.
Dubbo GSoC 2023 -
Refactor ConnectionService Deployer
For a large number of monolithic applications, problems such as performance will be encountered during large-scale deployment. For interface-oriented programming languages, Dubbo provides the capability of RPC remote calls, and we can help applications decouple through interfaces. Therefore, we can provide a deployer to help users realize the decoupling and splitting of microservices during deployment, and quickly provide performance optimization capabilities.
Background
At present, the abstraction of connection by client in different protocols in Dubbo is not perfect. For example, there is a big discrepancy between the client abstraction of connection in dubbo and triple protocols. As a result, the enhancement of connection-related functions in the client is more complicated, and the implementation cannot be reused. At the same time, the client also needs to implement a lot of repetitive code when extending the protocol.
Target
Reduce the complexity of the client part when extending the protocol, and increase the reuse of connection-related modules.
Dubbo GSoC 2023 -
IDL managementAPI manager
Since Dubbo runs on a distributed architecture, it naturally has the problem of difficult API interface definition management. It is often difficult for us to know which interface is running in the production environment. So we can provide an API-defined reporting platform, and even a management platform. This platform can automatically collect all APIs of the cluster, or can be directly defined by the user, and then unified distribution management is carried out through a mechanism similar to git and maven package management.
Background
Dubbo currently supports protobuf as a serialization method. Protobuf relies on proto (Idl) for code generation, but currently lacks tools for managing Idl files. For example, for java users, proto files are used for each compilation. It is more troublesome, and everyone is used to using jar packages for dependencies.
Target
Implement an Idl management and control platform, support idl files to automatically generate dependency packages in various languages, and push them to relevant dependency warehouses
Dubbo GSoC 2023 -
Service DeployerJSON compatibility check
Dubbo currently supports
For a large number of monolithic applications, problems such as performance will be encountered during large-scale deployment. For interface-oriented programming languages, Dubbo provides the capability of RPC remote calls, and we can help applications decouple through interfaces. Therefore, we can provide a deployer to help users realize the decoupling and splitting of microservices during deployment, and quickly provide performance optimization capabilitiesJava language features through hessian under the Java SDK, such as generics, interfaces, etc. These capabilities will not be compatible when calling across systems. Therefore, Dubbo needs to provide the ability to detect the interface definition and determine whether the interface published by the user can be described by native json.
Dubbo GSoC 2023 -
API managerAutomated Performance Testing Mechanism
Dubbo currently provides a very simple performance testing tool. But for such a complex framework as Dubbo, the functional coverage is very low. We urgently need a testing tool that can test multiple complex scenarios. In addition, we also hope that this set of testing tools can be run automatically, so that we can track the current performance of Dubbo in time
Since Dubbo runs on a distributed architecture, it naturally has the problem of difficult API interface definition management. It is often difficult for us to know which interface is running in the production environment. So we can provide an API-defined reporting platform, and even a management platform. This platform can automatically collect all APIs of the cluster, or can be directly defined by the user, and then unified distribution management is carried out through a mechanism similar to git and maven package management.
Dubbo GSoC 2023 -
JSON compatibility checkDubbo Client on WASM
WebAssembly (abbreviated Wasm) is a binary instruction format for a stack-based virtual machine. For web client users, we can provide Dubbo's wasm client, so that front-end developers can simply initiate Dubbo requests in the browser, and realize Dubbo's full-link unification.
This task needs to be implemented on a browser such as Chrome to initiate a request to the Dubbo backend
Dubbo currently supports a large number of Java language features through hessian under the Java SDK, such as generics, interfaces, etc. These capabilities will not be compatible when calling across systems. Therefore, Dubbo needs to provide the ability to detect the interface definition and determine whether the interface published by the user can be described by native json.
Dubbo GSoC 2023 -
Automated Performance Testing MechanismPure Dubbo RPC API
At present, Dubbo provides RPC capabilities and a large number of service governance capabilities. This has led to the fact that Dubbo cannot be used well if some of Dubbo's own components only need to use RPC capabilities or some users who need extreme lightweight.
Goal: To provide a Dubbo RPC kernel, users can directly program for service calls and focus on RPC.
Dubbo currently provides a very simple performance testing tool. But for such a complex framework as Dubbo, the functional coverage is very low. We urgently need a testing tool that can test multiple complex scenarios. In addition, we also hope that this set of testing tools can be run automatically, so that we can track the current performance of Dubbo in time.
Dubbo GSoC 2023 -
Dubbo Client on WASMGo Traffic Management
WebAssembly (abbreviated Wasm) is a binary instruction format for a stack-based virtual machine. For web client users, we can provide Dubbo's wasm client, so that front-end developers can simply initiate Dubbo requests in the browser, and realize Dubbo's full-link unification.
This task needs to be implemented on a browser such as Chrome to initiate a request to the Dubbo backend.
Dubbo GSoC 2023 -
Pure Dubbo RPC APIGo Security
At present, Dubbo provides RPC capabilities and a large number of service governance capabilities. This has led to the fact that Dubbo cannot be used well if some of Dubbo's own components only need to use RPC capabilities or some users who need extreme lightweight.
Goal: To provide a Dubbo RPC kernel, users can directly program for service calls and focus on RPC.
Dubbo GSoC 2023
- HTTP/3 Rest Support- Improve usability of Dubbo-go project
Including but not limited to programming patterns, configuration, apis, documentation and demos
HTTP/3 has been formalized as a standard in the last year. Dubbo, as a framework that supports publishing and invoking Web services, needs to support the HTTP/3 protocol.
This task needs to expand the implementation of the current rest protocol to support publishing HTTP/3 services and calling HTTP/3 services.
Dubbo GSoC 2023 - Go Traffic Management
Dubbo GSoC 2023 -
Go SecurityDubbo
GSoC 2023 - Improve usability of Dubbo-go projectIncluding but not limited to programming patterns, configuration, apis, documentation and demos.
Dubbo GSoC 2023 - Dubbo SPI Extensions on WASM
WebAssembly (abbreviated Wasm) is a binary instruction format for a stack-based virtual machine. Many capabilities of Dubbo support extensions, such as custom interceptors, routing, load balancing, etc. In order to allow the user's implementation to be used on Dubbo's multiple language SDKs, we can implement cross-platform operation based on wasm capabilities.
The implementation of this topic needs to provide a set of mechanisms for Wasm on Dubbo, covering the implementation of Java and Go. Also supports at least Filter, Router and Loadbalance.
...
Dubbo GSoC 2023 - Rust Cluster Feature Implementation and Stability Improvement.
Apache Commons All
Dubbo GSoC 2023 - Refactor the http layer
Background
Dubbo currently supports the rest protocol based on http1, and the triple protocol based on http2, but currently the two protocols based on the http protocol are implemented independently, and at the same time, they cannot replace the underlying implementation, and their respective implementation costs are relatively high.
Target
In order to reduce maintenance costs, we hope to be able to abstract http. The underlying implementation of the target implementation of http has nothing to do with the protocol, and we hope that different protocols can reuse related implementations.
Dubbo GSoC 2023 - HTTP/3 Rest Support
HTTP/3 has been formalized as a standard in the last year. Dubbo, as a framework that supports publishing and invoking Web services, needs to support the HTTP/3 protocol.
This task needs to expand the implementation of the current rest protocol to support publishing HTTP/3 services and calling HTTP/3 services.
Dubbo GSoC 2023 - Dubbo3 Python HTTP/2 RPC Protocol Implementation
[SKIN] Update Commons Skin Bootstrap
Our Commons components use Commons Skin, a skin, or theme, for Apache Maven Site.
Our skin uses Bootstrap 2.x, but Bootstrap is already at 5.x release, and we are missing several improvements (UIUX, accessibility, browser compatibility) and JS/CSS bugs fixed over the years.
Work happening on Apache Maven Skins. Maybe we could adapt/use that one?
https://issues.apache.org/jira/browse/MSKINS-97
...
Apache Commons All
[SKIN] Update Commons Skin Bootstrap
Our Commons components use Commons Skin, a skin, or theme, for Apache Maven Site.
Our skin uses Bootstrap 2.x, but Bootstrap is already at 5.x release, and we are missing several improvements (UIUX, accessibility, browser compatibility) and JS/CSS bugs fixed over the years.
Work happening on Apache Maven Skins. Maybe we could adapt/use that one?
https://issues.apache.org/jira/browse/MSKINS-97
Airavata
[GSoC] Integrate JupyterHub GSoC] Integrate JupyterHub with Airavata Django Portal
The Airavata Django Portal [1] allows users to create, execute and monitor computational experiments. However, when a user wants to then post-process or visualize the output of that computational experiment they must then download the output files and run tools that they may have on their computer or other systems. By integrating with JupyterHub the Django Portal can give users an environment in which they can explore the experiment's output data and gain insights.
The main requirements are:
- from the Django Portal a user can click a button and navigate to a JupyterHub instance that the user is immediately logged into using single sign on
- the user can save the Jupyter notebook and later retrieve it
- the user's files are available within the context of the running Jupyter instance
- ideally users can also generate new outputs in the Jupyter instance and have them saved back in their portal data storage
- users can share their notebooks with other portal users
- (bonus) portal admins can suggest notebooks to use with specific applications so that with one click a user can open an experiment in a provided notebook
- users can manage their notebooks and can, for example, clone a notebook
...
Airavata Jupyter Platform Services
- UI Framework
- To host the jupyter environment we will need to envolop the notebooks in a user interface and connect it with Apache Airavata services
- Leverage Airavata communications from within the Django Portal - https://github.com/apache/airavata-django-portal
- Explore if the platform is better to be developed as VSCode extensions leveraging jupyter extensions like - https://github.com/Microsoft/vscode-jupyter
- Alternatively, explore developing a standalone native application using ElectronJS
- Draft up a platform architecture - Airavata based infrastructure with functionality similar to collab.
- Authenticate with Airavata Custos Framework - https://github.com/apache/airavata-custos
- Extend Notebook filesystem using the virtual file system approaching integration with Airavata based storage and catalog
- Make the notebooks registered with Airavata app catalog and experiment catalog.
Advanced Possibilities:
Explore Multi-tenanted JupyterHub
- Can K8 namespace isolation accomplish?
- Make deployment of Jupyter support as part of the default core
- Data and the user-level tenancy can be assumed, how to make sure infrastructure can isolate them, like not one gateway crashing a hosting environment.
- How to leverage computational resources jupypter hub
Dashboards to get quick statistics
Provide meta scheduling capabilities within Airavata
As discussed on the architecture mailing list
Gateway admins need period reports for various reporting and planning.
Features Include:
- Compute resources across that had at least one job submitted during the period <start date - End date>
- User groups created within a given period and how many users are in those and with permission levels and also number of jobs each user have submitted.
- List applications and number of jobs for each applications for a given period and group them by job status.
- Number of users that at least submitted a single job for the period <start date - End date>
- Total number of Unique Users
- User Registration Trends
- Number of experiments for a given period <Start date - End date> grouped by the experiment status
- The total cpu-hours used by a users, sorted, quarterly, plotted over a period of time
- The total cpu-hours consumed by application, sorted, quarterly, plotted over a period of time
Provide meta scheduling capabilities within Airavata
As discussed on the architecture mailing list [1] and summarized at [2], Airavata will need to develop a metascheduler. In the short term, a user request (demeler, gobert) is to have airavata throttle jobs to resources. In the future more informed scheduling strategies needs to be integrated. Hopefully, the actual scheduling algorithms can be borrowed from third party implementations.
[1] - http://markmail.org/message/tdae5y3togyq4duv
[2] - https://cwiki.apache.org/confluence/display/AIRAVATA/Airavata+Metascheduler
Enhance File Transports in MFT
Complete all transports in MFT
- Currently SCP, S3 is known to work
- Others need effort to optimize, test, and declare readiness
- Develop a complete a fully functional MFT Command-line interface
- Have a feature-complete Python SDK A minimum implementation will be prvoided, students need to complete it and test it.
- -line interface
- Have a feature-complete Python SDK
- A minimum implementation will be prvoided, students need to complete it and test it.
Custos Backup and Restore
Custos does not have the capabilities to efficiently backup and restore a live instance. This is essential for high available services.
Airavata Rich Client based on ElectronJS
Using SEAGrid Rich Client as an example, develop a native application based on electronJS to mimic Airavata Django Portal.
Reference example - https://github.com/SciGaP/seagrid-rich-client
Custos Backup and Restore
Custos does not have the capabilities to efficiently backup and restore a live instance. This is essential for high available services.
Dashboards to get quick statistics
Gateway admins need period reports for various reporting and planning.
Features Include:
- Compute resources across that had at least one job submitted during the period <start date - End date>
- User groups created within a given period and how many users are in those and with permission levels and also number of jobs each user have submitted.
- List applications and number of jobs for each applications for a given period and group them by job status.
- Number of users that at least submitted a single job for the period <start date - End date>
- Total number of Unique Users
- User Registration Trends
- Number of experiments for a given period <Start date - End date> grouped by the experiment status
- The total cpu-hours used by a users, sorted, quarterly, plotted over a period of time
- The total cpu-hours consumed by application, sorted, quarterly, plotted over a period of time
Airavata Rich Client based on ElectronJS
Using SEAGrid Rich Client as an example, develop a native application based on electronJS to mimic Airavata Django Portal.
Reference example - https://github.com/SciGaP/seagrid-rich-client