Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

...

[GSOC] James as a (distributed) MX server

Why ?

Alternatives like Postfix...

  • Do not offer a unified view of the mail queue across nodes
  • Requires statefull persistent storage

Given Apache James recent push to adopt a distributed mail queue based on Pulsar supporting delays (JAMES-3687), it starts making sense developing tooling for MX related tooling.

I propose myself to mentor a Gsoc on this topic.

Benefits for the student

At the end of this GSOC you will...

  • Have a solid understanding of email relaying and associated mechanics
  • Understand James modular architecture (mailet/ matcher / routes)
  • Have a hands-on expertise in SQL / NoSQL working with technologies like Cassandra, Redis, JPA...
  • Identify fix and solve architecture problems.
  • Conduct performance tests and develop an operational mindset

Inventory...

James ships a couple of MX related tools within smtp-hooks/mailets in default packages. It would make sense to me to move those as an extension.

James supports today...

checks agains DNS blacklists. `DNSRBLHandler` or `URIRBLHandler` smtp hook for instance. This can be moved as an extension IMO.

We would need a little performance benchmark to document performance implications of activating DNS-RBL.

Finally as quoted by a gitter guy: it would make more sens to have this done as a MailHook rather as a RcptHook as it would avoid doing the same job again and over again for each recipients. See JAMES-3820 .

Grey listing. There's an existing implementation using JDBC as an underlying storage.

Move it as an extension.

Remove JDBC storage, propose 2 storage possibilities: in-memory for single node, REDIS for a distributed topology.

Some work around whitelist mailets? Move it as an extension, propose JPA, Cassandra, and XML configured implementations ? With a route to manage entries in there for JPA + Cassandra ?

I would expect a student to do his own little audit and come up with extra suggestions!

Difficulty: Major
Project size: ~175 hour (medium)
Potential mentors:
Benoit Tellier, mail: btellier (at) apache.org
Project Devs, mail: dev (at) james.apache.org

TrafficControl

GSOC Varnish Cache support in Apache Traffic Control

Background
Apache Traffic Control is a Content Delivery Network (CDN) control plane for large scale content distribution.

Traffic Control currently requires Apache Traffic Server as the underlying cache. Help us expand the scope by integrating with the very popular Varnish Cache.

There are multiple aspects to this project:

  • Configuration Generation: Write software to build Varnish configuration files (VCL). This code will be implemented in our Traffic Ops and cache client side utilities, both written in Go.
  • Health Monitoring: Implement monitoring of the Varnish cache health and performance. This code will run both in the Traffic Monitor component and within Varnish. Traffic Monitor is written in Go and Varnish is written in C.
  • Testing: Adding automated tests for new code

Skills:

  • Proficiency in Go is required
  • A basic knowledge of HTTP and caching is preferred, but not required for this project.
Difficulty: Major
Project size: ~350 hour (large)
Potential mentors:
Eric Friedrich, mail: friede (at) apache.org
Project Devs, mail: dev (at) trafficcontrol.apache.org

Beam

Beam

[GSoC][Beam] An IntelliJ plugin to develop Apache Beam pipelines and the Apache Beam SDKs

Beam library developers and Beam users would appreciate this : )


This project involves prototyping a few different solutions, so it will be large.

Difficulty: Major
Project size: ~350 hour (large)
Potential mentors:
Pablo Estrada, mail: pabloem (at) apache.org
Project Devs, mail: dev (at) beam.apache.org

TrafficControl

GSOC Varnish Cache support in Apache Traffic Control

Background
Apache Traffic Control is a Content Delivery Network (CDN) control plane for large scale content distribution.

Traffic Control currently requires Apache Traffic Server as the underlying cache. Help us expand the scope by integrating with the very popular Varnish Cache.

There are multiple aspects to this project:

  • Configuration Generation: Write software to build Varnish configuration files (VCL). This code will be implemented in our Traffic Ops and cache client side utilities, both written in Go.
  • Health Monitoring: Implement monitoring of the Varnish cache health and performance. This code will run both in the Traffic Monitor component and within Varnish. Traffic Monitor is written in Go and Varnish is written in C.
  • Testing: Adding automated tests for new code

Skills:

  • Proficiency in Go is required
  • A basic knowledge of HTTP and caching is preferred, but not required for this project.
Difficulty: Major
Project size: ~350 hour (large)
Potential mentors:
Eric Friedrich, mail: friede (at) apache.org
Project Devs, mail: dev (at) trafficcontrol.apache.org

Commons Statistics

GSoC

Placeholder for tasks that could be undertaken in this year's GSoC.

Ideas:

  • Design an updated summary statistics API for use with Java 8 streams based on the summary statistic implementations in the Commons Math stat.descriptive package including moments, rank and summary sub-packages.
Difficulty: Minor
Project size: ~350 hour (large)
Potential mentors:
Alex Herbert, mail: aherbert (at) apache.org
Project Devs, mail:

Commons Numbers

Add support for extended precision floating-point numbers

Add implementations of extended precision floating point numbers.

An extended precision floating point number is a series of floating-point numbers that are non-overlapping such that:

double-double (a, b):
            |a| > |b|
            a == a + b

Common representations are double-double and quad-double (see for example David Bailey's paper on a quad-double library: QD).

Many computations in the Commons Numbers and Statistics libraries use extended precision computations where the accumulated error of a double would lead to complete cancellation of all significant bits; or create intermediate overflow of integer values.

This project would formalise the code underlying these use cases with a generic library applicable for use in the case where the result is expected to be a finite value and using Java's BigDecimal and/or BigInteger negatively impacts performance.

An example would be the average of long values where the intermediate sum overflows or the conversion to a double loses bits:

            long[] values = {Long.MAX_VALUE, Long.MAX_VALUE};
            System.out.println(Arrays.stream(values).average().getAsDouble()); System.out.println(Arrays.stream(values).mapToObj(BigDecimal::valueOf)
            .reduce(BigDecimal.ZERO, BigDecimal::add)
            .divide(BigDecimal.valueOf(values.length)).doubleValue());
            long[] values2 = {Long.MAX_VALUE, Long.MIN_VALUE};
            System.out.println(Arrays.stream(values2).asDoubleStream().average().getAsDouble()); System.out.println(Arrays.stream(values2).mapToObj(BigDecimal::valueOf)
               .reduce(BigDecimal.ZERO, BigDecimal::add)
            .divide(BigDecimal.valueOf(values2.length)).doubleValue());
            

Outputs:

-1.0
            9.223372036854776E18
            0.0
            -0.5
Difficulty: Major
Project size: ~175 hour (medium)
Potential mentors:
Alex Herbert, mail: aherbert (at) apache.org
Project Devs, mail: dev (at) commons.apache.org

Commons Math

GSoC

Placeholder for tasks that could be undertaken in this year's GSoC.

Ideas (extracted from the "dev" ML):

  1. Redesign and modularize the "ml" package
    -> main goal: enable multi-thread usage.
  2. Abstract the linear algebra utilities
    -> main goal: allow switching to alternative implementations.
  3. Redesign and modularize the "random" package
    -> main goal: general support of low-discrepancy sequences.
  4. Refactor and modularize the "special" package
    -> main goals: ensure accuracy and performance and better API,
    add other functions.
  5. Upgrade the test suite to Junit 5
    -> additional goal: collect a list of "odd" expectations.

Other suggestions welcome, as well as

  • delineating additional and/or intermediate goals,
  • signalling potential pitfalls and/or alternative approaches to the intended goal(s).
Difficulty: Minor

[GSoC][Beam] An IntelliJ plugin to develop Apache Beam pipelines and the Apache Beam SDKs

Beam library developers and Beam users would appreciate this : )

This project involves prototyping a few different solutions, so it will be large.

Difficulty: Major
Project size: ~350 hour (large)
Potential mentors:
Pablo EstradaGilles Sadowski, mail: pabloem erans (at) apache.org
Project Devs, mail: dev (at) beamcommons.apache.org

Commons Imaging

...