Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

This is not "fair" scheduling. Tasks that have been submit()ed and rejected and put in the blockingWorkQueue are not, in general, all processed ahead of new tasks arriving via other threads calling ExecutorService.submit(). The latter can cut in line in front of the thread servicing the blockingWorkQueue. This is not a bug per se, but it is surprising. The framework simply apparently doesn't have any fairness contract for these thread pools.

...

Why Do We Have This Unusual Arrangement

It's not clear why, even if the user supplies a non-synchronous queue to the constructor, PooledExecutorWithDMStats insists upon fabricating a synchronous queue for use as the ThreadPoolExecutor's workQueue.

...

But this code in AdoptOpenJDK 8 ThreadPoolExecutor.execute() looks like it would perform just fine if the workQueue was a non-synchronous queue:


Update 1:

Well well this 2007 commit introduced the arrangement in question: 050a62da28157ba4d146d2acb9521a3a14e557ab

That commit mentions bug 37091 which we were able to dig up:

...

Interim Bug ID:  070706-111344-darrel-zeus
Bug-Number: 37091
Submitted by: darrel
Date: Friday July 6 2007 11:13
Priority: R2
Customer / HR:
Bug Comment:

If

...

BridgeServer.SELECTOR

...

is

...

enabled

...

server

...

connection

...

threads

...

timeout

...

and

...

are

...

constantly

...

recreated

...

if

...

when

...

system

...

is

...

busy

Product Line:    GemFire
Version: 5.0.2
Build:
Bug Description:

...

Assign to: darrel
Keywords:
Audit/Repair:
Stone Host/OS:
Gem Host/OS:
Dbf Host/OS:
Private cache:
Module:
Script/QC test:
Gemstone login:
Cc List:
Workaround:
Keep the the thread pool size <= to the number of clients.


It's not entirely clear how/why the synchronous work queue prevents threads timing out when there are fewer clients than threads in the pool (the problem described in the ticket.)

But this article: provides insight into the way that the thread pool's "core pool size" relates to queueing behavior: Java ThreadPoolExecutor BlockingQueue Example from HowToDoInJava. From that article:

...

Since our core pool size is always 0 or 1 (as of the 2007 commit) we'd almost always have the executor preferring queuing over running immediately if we were using a non-synchronous work queue. It's hard to see the relationship between this theory and the subject of bug 37091 above.

It is counterintuitive that the commit in question changed the pool's core size from maxSize to 1 (when some maxSize is specified.) Offhand it seems as if this logic is working against the built-in JDK core size functionality, the very purpose of which is to maintain a minimum number of ready unused threads available when demand drops.


update 2:

As of the 2007 commit there were no calls to ThreadPoolExecutor.allowCoreThreadTimeout(boolean). That method was introduced in Java 6 (1.6) late in 2006. Perhaps Geode at the time of the 2007 commit could not rely on that JDK version. Without that method there was no ability to time out core threads. As a result "large" core pool sizes would result in large numbers of potentially idle threads when demand dropped. Though that issue is not mentioned in ticket 37091, it seems clear that the 2007 commit was addressing that issue (by changing the core pool size maximum to 1 thread.)

Another issue not mentioned in ticket 37091 but which seems, in discussions, to have played a part here, was to try and prefer immediate execution (of a task) to queueing all the way up to maximumPoolSize. But by setting allowCoreThreadTimeout(true), and setting the corePoolSize = maximumPoolSize we could, with a modern JDK have our cake and eat it too, and eliminate the extra queue and thread, and eliminate the unfairness too! With those settings and any caller-supplied  BlockingQueue as the workQueue (including a non-synchronous one) we should be able to avoid queueing and immediately dispatch tasks to threads, right up to the maximumPoolSize (since that's also the corePoolSize.) Then the thread keepAliveTime would kick in and start aging out any unused threads to eventually bring us all the way back down to zero if all demand was removed for long enough.

An experiment with code derived from that Stack Overflow question seems to confirm that Geode could now use corePoolSize = maximumPoolSize and allowCoreThreadTimeout(true) to get immediate dispatch of tasks (without queueing) right up to the maximumPoolSize, and also get pool draining when demand drops.

Here's that experiment:

https://gist.github.com/Bill/3e0587f43171d2d72718a945deb79fb3

It seems that this "unusual arrangement" (forcing all task submission through a synchronous queue, shunting failed tasks to another queue, and an extra thread competing (unfairly) with new task submissions, to drain that queue) isn't needed as of this writing.

Fixing this to eliminate the extra queue and the extra thread will entail running performance tests. Along with this work we should investigate whether the similar unusual arrangement in FunctionExecutionPooledExecutor can also be addressed. The latter has slightly different behavior in the backlog-processing thread: instead of put()ing items into the ThreadPoolExecutors workQueue, it offer()s and if the offer() fails, the thread runs the task directly