Google Summer Code 2018 - Scalability and performance improvements

Introduction

The platform is now fully fledged to support the core banking needs of the community. With the growing demands of clients ranging from several thousands to even millions as mentioned by large organizations, focusing on the performance and scalability is crucial for the smooth functioning of the multi-tenant architecture of the platform.

To realize the objective of enhancing performance and scale, three roads can be thought of.

Vertical scaling
Expanding the resources installed on each node the platform is running on.

Horizontal scaling
The demands or requests by users are spread across multiple nodes.

Performance tuning
Refactoring a web application's source code, analyzing the platform configuration settings, attempting to further parallelize a backend logic, implementing caching strategies, detecting hot spots and so forth.

However, within the scope of the project, only the strategy of performance tuning will be focused upon.

Current context

Scheduler jobs

The platform supports scheduler jobs which can be performed at a predefined time on a recurring basis. For instance, repayment schedules of loans, deposit schedules for savings account, meeting schedules for customers and so forth. These schedule jobs are single threaded, meaning no two jobs can be executed parallelly. This is a drawback on the performance aspect for jobs scheduling at regular intervals.

Figure 1 Diagrammatic view of single threaded scheduler jobs.

N.B: A representation to show how single threaded scheduler jobs generally work

Trial balance report job

For a given duration, this report shows list of GL accounts with their debit and credit records. For a large organization containing millions of transactions, for the defined period set the backend processing query takes forever to complete the processing.

Large bandwidth usage to deliver http responses

It is a serious concern for the field app users to gain access to large set of data, owing to the size of the http response body “413 Payload Too Large” errors could occur, what’s even unsettling is the data usage being considerably high to perform such actions.

Overloading of resources for Jobs execution

As mentioned above the scheduler jobs are executed on the single node instance, therefore catering huge number of requests can overload server resources. In terms job executions, these need to be executed separately based on the office levels per node instance.

For instance, executing all accruals for Office 1 be executed on Node instance 1, accruals for Office 2 be executed on Node instance 2 etc.

Suggested approach

Trial balance

Creating a consolidated entry in the trial balance table for each day with the aggregation of all journal entries for that given day by grouping by GL account, branch, transaction date and updating the closing balance. As for which the results can be used for any reporting service or MIS requirements of MFIs or for merely reconciliation requirements.

Parallel execution of scheduler jobs and prioritization of jobs

Making use of native java Thread pool concept which holds a fixed number of worker threads that execute tasks parallelly and can reused many times.

Analogy: Updating savings account deposits schedules and adjusting loan repayments

Figure 2: Diagrammatic representation of the usage of the Thread pool for the parallel execution of scheduled jobs.

As per the existing implementation, no thread pool is used. The schedule tasks get executed sequentially one after the other. With the introduction of the thread pool, it would mitigate the chances of schedule jobs queuing up.

It should be decided on a fixed number of paginated results to reduce the number of round trips to the datastore, thus saving each node’s usage of resources.

Compression of http response to preserve bandwidth data

On outings where field officers make visits to clients he/she may be relying on mobile data, at which point speed and bandwidth usage can be critical. As for an instance querying a huge list of customers from an office could be expensive in terms of bandwidth, which in turn costs a lot on mobile data charges for the MFIs.

A feasible approach

On Backend

1) Using a writer interceptor feature, catching the output stream of the response.

2) Setting the response header and wrapping it with a gzip output stream wrapper.

3) Returning the response.

On Client API

4) Using reader interceptor catching the response from backend.

5) Inflating the response.

Server side caching

The paginated results can be cached on a cache storage for the frequent use, thereby efficiently eliminating the round trips made to the datastore.

On the existing implementation, depending on the number of nodes there can be three main types of cache storage. No storage, single node caching, multi node caching.

The multi node caching is not functioning at moment, further investigations need to be carried out to rectify this issue.

Tentative Timeline

Allotted Period	Tasks to be done
Before application deadline	Compression of http requests.
April 14^th to May 14^th(before GSOC officially starts)	Trial balance report job
May 14^th to May 21^st (1^st week)	Accrual transactions (parallelizing and paging)
May 21^st to May 28^th (2^nd week)	Update loan summary(paging), send messages to SMS gateway (parallelizing and paging)
May 28^th to June 4^th (3^rd week)	Post interest savings, Interest recalculation for loans (parallelizing and paging)
June 4^th to June 11^th (4^th week)	JMeter script to show performance improvement
June 11^th to June 18^th (5^th week)	Make jobs more configurable by office and loan product.
June 18^th to June 25^th (6^th week)	Node aware scheduler
June 25^th to July 2^nd (7^th week)	Dirty job scheduler
July 2^nd to July 9^th (8^th week)	Multi node cache replication
July 9^th to July 16^th (9^th week)	Query optimization including lazy fetching
July 16^th to July 23^rd (10^th week)	Show collection sheet performance improvement using JMeter script.

Page tree