Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

This page is auto-generated! Please do NOT edit it, all changes will be lost on next update

James Server

Adopt Pulsar as the messaging technology backing the distributed James server

https://www.mail-archive.com/server-dev@james.apache.org/msg71462.html

A good long term objective for the PMC is to drop RabbitMQ in
favor of pulsar (third parties could package their own components using
RabbitMQ if they wishes...)

This means:

  • Solve the bugs that were found during the Pulsar MailQueue review
  • Pulsar MailQueue need to allow listing blobs in order to be
    deduplication friendly.
  • Provide an event bus based on Pulsar
  • Provide a task manager based on Pulsar
  • Package a distributed server backed by pulsar, deprecate then replace
    the current one.
  • (optionally) support mail queue priorities

While contributions would of course be welcomed on this topic, we could
offer it as part of GSOC 2022, and we could co-mentor it with mentors of
the Pulsar community (see [3])

[3] https://lists.apache.org/thread/y9s7f6hmh51ky30l20yx0dlz458gw259

Would such a plan gain traction around here ?

Difficulty: Major
Project size: ~350 hour (large)
Potential mentors:
Benoit Tellier, mail: btellier (at) apache.org
Project Devs, mail: dev (at) james.apache.org

...

GSOC Varnish Cache support in Apache Traffic Control

Background
Apache Traffic Control is a Content Delivery Network (CDN) control plane for large scale content distribution.

Traffic Control currently requires Apache Traffic Server as the underlying cache. Help us expand the scope by integrating with the very popular Varnish Cache.

There are multiple aspects to this project:

  • Configuration Generation: Write software to build Varnish configuration files (VCL). This code will be implemented in our Traffic Ops and cache client side utilities, both written in Go.
  • Health Monitoring: Implement monitoring of the Varnish cache health and performance. This code will run both in the Traffic Monitor component and within Varnish. Traffic Monitor is written in Go and Varnish is written in C.
  • Testing: Adding automated tests for new code

Skills:

  • Proficiency in Go is required
  • A basic knowledge of HTTP and caching is preferred, but not required for this project.
Difficulty: Major
Project size: ~350 hour (large)
Potential mentors:
Eric Friedrich, mail: friede (at) apache.org
Project Devs, mail: dev (at) trafficcontrol.apache.org

...

Beam

[GSoC][Beam] An IntelliJ plugin to develop Apache Beam pipelines and the Apache Beam SDKs

Beam library developers and Beam users would appreciate this : )


This project involves prototyping a few different solutions, so it will be large.

Difficulty: Major
Project size: ~350 hour (large)
Potential mentors:
Pablo Estrada, mail: pabloem (at) apache.org
Project Devs, mail: dev (at) beam.apache.org

Commons Imaging

Placeholder for 1.0 release

A placeholder ticket, to link other issues and organize tasks related to the 1.0 release of Commons Imaging.

The 1.0 release of Commons Imaging has been postponed several times. Now we have a more clear idea of what's necessary for the 1.0 (see issues with fixVersion 1.0 and 1.0-alpha3, and other open issues), and the tasks are interesting as it involves both basic and advanced programming for tasks such as organize how test images are loaded, or work on performance improvements at byte level and following image format specifications.

The tasks are not too hard to follow, as normally there are example images that need to work with Imaging, as well as other libraries in C, C++, Rust, PHP, etc., that process these images correctly. Our goal with this issue is to a) improve our docs, b) improve our tests, c) fix possible security issues, d) get the parsers in Commons Imaging ready for the 1.0 release.

Assigning the label for GSoC 2023, and full time. Although it would be possible to work on a smaller set of tasks for 1.0 as a part time too.

Difficulty: Major
Project size: ~350 hour (large)
Potential mentors:
Bruno P. Kinoshita, mail: kinow (at) apache.org
Project Devs, mail:

...

[SKIN] Update Commons Skin Bootstrap

Our Commons components use Commons Skin, a skin, or theme, for Apache Maven Site.

Our skin uses Bootstrap 2.x, but Bootstrap is already at 5.x release, and we are missing several improvements (UIUX, accessibility, browser compatibility) and JS/CSS bugs fixed over the years.

Work happening on Apache Maven Skins. Maybe we could adapt/use that one?

https://issues.apache.org/jira/browse/MSKINS-97


Difficulty: Minor
Project size: ~175 hour (medium)
Potential mentors:
Bruno P. Kinoshita, mail: kinow (at) apache.org
Project Devs, mail:

Airavata

[GSoC] Integrate JupyterHub with Airavata Django Portal

The Airavata Django Portal [1] allows users to create, execute and monitor computational experiments. However, when a user wants to then post-process or visualize the output of that computational experiment they must then download the output files and run tools that they may have on their computer or other systems. By integrating with JupyterHub the Django Portal can give users an environment in which they can explore the experiment's output data and gain insights.

The main requirements are:

  • from the Django Portal a user can click a button and navigate to a JupyterHub instance that the user is immediately logged into using single sign on
  • the user can save the Jupyter notebook and later retrieve it
  • the user's files are available within the context of the running Jupyter instance
    • ideally users can also generate new outputs in the Jupyter instance and have them saved back in their portal data storage
  • users can share their notebooks with other portal users
  • (bonus) portal admins can suggest notebooks to use with specific applications so that with one click a user can open an experiment in a provided notebook
  • users can manage their notebooks and can, for example, clone a notebook

[1] https://github.com/apache/airavata-django-portal

Difficulty: Major
Project size: ~350 hour (large)
Potential mentors:
Marcus Christie, mail: marcuschristie (at) apache.org
Project Devs, mail: dev (at) airavata.apache.org

Apache Superset Dashboards to Airavata Catalogs

Integrate Apache Superset (https://superset.apache.org/) to visualize Airavata Catalogs (https://github.com/apache/airavata/tree/master/modules/registry) 

Difficulty: Major
Project size: ~350 hour (large)
Potential mentors:
Suresh Marru, mail: smarru (at) apache.org
Project Devs, mail: dev (at) airavata.apache.org

Airavata Jupyter Platform Services

  1. UI Framework 
    1. To host the jupyter environment we will need to envolop the notebooks in a user interface and connect it with Apache Airavata services 
    2. Leverage Airavata communications from within the Django Portal - https://github.com/apache/airavata-django-portal 
    3. Explore if the platform is better to be developed as VSCode extensions leveraging jupyter extensions like - https://github.com/Microsoft/vscode-jupyter
    4. Alternatively, explore developing a standalone native application using ElectronJS
  2. Draft up a platform architecture - Airavata based infrastructure with functionality similar to collab. 
  3. Authenticate with Airavata Custos Framework - https://github.com/apache/airavata-custos 
  4. Extend Notebook filesystem using the virtual file system approaching integration with Airavata based storage and catalog
  5. Make the notebooks registered with Airavata app catalog and experiment catalog. 


Advanced Possibilities:

Explore Multi-tenanted JupyterHub 

  • Can K8 namespace isolation accomplish?
  • Make deployment of Jupyter support as part of the default core
  • Data and the user-level tenancy can be assumed, how to make sure infrastructure can isolate them, like not one gateway crashing a hosting environment.
  1. How to leverage computational resources jupypter hub
Difficulty: Major
Project size: ~350 hour (large)
Potential mentors:
Suresh Marru, mail: smarru (at) apache.org
Project Devs, mail: dev (at) airavata.apache.org

Dashboards to get quick statistics

Gateway admins need period reports for various reporting and planning. 

Features Include:

  • Compute resources across that had at least one job submitted during the period <start date - End date>
  • User groups created within a given period and how many users are in those and with permission levels and also number of jobs each user have submitted.
  • List applications and number of jobs for each applications for a given period and group them by job status.
  • Number of users that at least submitted a single job for the period <start date - End date>
  • Total number of Unique Users
  • User Registration Trends
  • Number of experiments for a given period <Start date - End date> grouped by the experiment status
  • The total cpu-hours used by a users, sorted, quarterly, plotted over a period of time
  • The total cpu-hours consumed by application, sorted, quarterly, plotted over a period of time


Difficulty: Major
Project size: ~350 hour (large)
Potential mentors:
Suresh Marru, mail: smarru (at) apache.org
Project Devs, mail: dev (at) airavata.apache.org

Provide meta scheduling capabilities within Airavata

As discussed on the architecture mailing list [1] and summarized at [2], Airavata will need to develop a metascheduler. In the short term, a user request (demeler, gobert) is to have airavata throttle jobs to resources. In the future more informed scheduling strategies needs to be integrated. Hopefully, the actual scheduling algorithms can be borrowed from third party implementations.

[1] - http://markmail.org/message/tdae5y3togyq4duv
[2] - https://cwiki.apache.org/confluence/display/AIRAVATA/Airavata+Metascheduler

Difficulty: Major
Project size: ~350 hour (large)
Potential mentors:
Suresh Marru, mail: smarru (at) apache.org
Project Devs, mail: dev (at) airavata.apache.org

Enhance File Transports in MFT

Complete all transports in MFT

  • Currently SCP, S3 is known to work
  • Others need effort to optimize, test, and declare readiness
  • Develop a complete a fully functional MFT Command-line interface
  • Have a feature-complete Python SDK
  • A minimum implementation will be prvoided, students need to complete it and test it. 
Difficulty: Major
Project size: ~350 hour (large)
Potential mentors:
Suresh Marru, mail: smarru (at) apache.org
Project Devs, mail: dev (at) airavata.apache.org

Custos Backup and Restore

Custos does not have the capabilities to efficiently backup and restore a live instance. This is essential for high available services. 

Difficulty: Major
Project size: ~350 hour (large)
Potential mentors:
Suresh Marru, mail: smarru (at) apache.org
Project Devs, mail: dev (at) airavata.apache.org

Airavata Rich Client based on ElectronJS

Using SEAGrid Rich Client as an example, develop a native application based on electronJS to mimic Airavata Django Portal.

Reference example - https://github.com/SciGaP/seagrid-rich-client 

Difficulty: Major
Project size: ~350 hour (large)
Potential mentors:
Suresh Marru, mail: smarru (at) apache.org
Project Devs, mail: dev (at) airavata.apache.org