Distributed Entity Cache Clear (DCC) Mechanism
Why and when use it?
The distributed cache clearing mechanism is needed when you have multiple servers in a cluster that are sharing a single database. When one entity engine has either a create, update or delete operation that goes through it, it will clear its own caches. But it can also sent out a message to the other servers in the pool to clear their caches.
This is a feature that runs through the service engine which operates on top of the entity engine. When you're doing a distributed cache clear it will result in service calls. In most cases you will use Java Messaging Service, to send a message to a JMS server that can then be distributed to other servers in the cluster.
How to set it?
To keep it simple we will only set the mandatory values. There are other options which are covered by defaults.
RMI deactivated since OFBIZ-6942
Because of The infamous Java serialization vulnerability the RMI container has been disabled in the default configuration of OFBiz, and hence JNDI (relies on RMI in OFBiz). So if you want to use the DCC mechanism you will first need to uncomment as explained at (Note: I'm not quire sure this is required. Because JNDI relies on the RMI registry service provider but I don't think the RMI loader is required for the DCC, and OFBIZ-6942 is only about disabling the RMI loader. To be checked and updated...
The Entity Engine
This is the easiest part, for a given delegator you only have to set its distributed-cache-clear-enable attribute to "true" (false by default)
The Service Engine
The location of the JMS definition is in the framework/service/config/serviceengine.xml file. By default you set a jms-service of name "serviceMessenger". You define there a JMS server with its name, a jndi name and a topic name. To make as less changes as possible we use "default" for the server name and set the values in the jndi.properties file. I could have also set a server name in jndiservers.xml but my catch phrase is "the less changes the better". This is the service.xml setting, same on each servers
<!-- JMS Service Active MQ Topic Configuration (set as default in jndi.properties, the less changes the better) --> <jms-service name="serviceMessenger" send-mode="all"> <server jndi-server-name="default" jndi-name="topicConnectionFactory" topic-queue="OFBTopic" type="topic" listen="true"/> </jms-service>
I decided to use Apache ActiveMQ as JMS server and to simply set these properties in the jndi.properties files (commenting out the OOTB default):
java.naming.factory.initial=org.apache.activemq.jndi.ActiveMQInitialContextFactory java.naming.provider.url=tcp://<AMQ-IP-SERVER1>:61616 topic.OFBTopic=OFBTopic connectionFactoryNames=connectionFactory, queueConnectionFactory, topicConnectionFactory
AvtiveMQ provides also a point to point model with queues. We rely rather on topics because they use a Publish/subscribe model and we need to broadcast messages to all servers in the cluster.
At this stage you need to install an ActiveMQ server somewhere. Initially, I decided to install the last available release of ActiveMQ : 5.5.0. But it turned that there are some known issues in this release. So I finally took the 5.4.2 release. To test, I installed it on my XP development machine, and on the cluster. It could be embedded in OFBiz, but I decided to simply run it as an external broker. I don't think it's interesting to have it embedded in OFBiz: you just install, run it and forget about it (it sets /etc/ for you). For testing I used the ActiveMQ recommended default setting for that. For production you will want to run it as a Unix Daemon (or Windows Service).
You need also to put the corresponding activemq-all-x.x.x.jar in the framework/base/lib OFBiz directory. Then the Distributed Cache Clearing Mechanism should be ready to use.
You can then monitor ActiveMQ using the Web Console by pointing your browser at http://localhost:8161/admin/ and then topics page
Single point of failure
The setting above is sufficient in a staging environment but is a single point of failure in a production environment. So we need to create a cluster of ActiveMQ brokers. Since they should not consume much resources (only 256MB of memory at max and not much CPU cycles), we can put each instance on the same machines than the OFBiz instances.
There is a simple way to load balance ActiveMQ queues in an ActiveMQ cluster. But, as explained above, tough we use async services for clearing distributed caches, it does not make sense for us to use queues since we need to broadcast messages. There is also the so called virtual destinations solution but it's a bit complicated and it seems still uses a queue underneath. After some resarches, I have finally decided to go with the the Failover Transport solution.
It's fairly simple to set through JNDI. For this we only need to replace in jndi.properties files
java.naming.provider.url=tcp://<AMQ-IP-SERVER1>:61616
by
java.naming.provider.url=failover:(tcp://<AMQ-IP-SERVER1>61616?soTimeout=60000,tcp://<AMQ-IP-SERVER2>61616?soTimeout=60000)?randomize=false
You may add any number of AMQ instances you want in the failover/tcp chain. The soTimeout=60000 parameter prevents to keep too much useless connections opened.
I tried to add &backup=true&trackMessages=true at the end of the failover chain (ie after ?randomize=false) but the connectionsd created are held and it seems there are no means to close them. It's weird, sot I asked on ActiveMQ user ML, no answers yet...
See Transport Options for details on these 2 parameters. There is a also a link at bottom of this page if ever you need to escalate more smoothly dynamic setting of failover. But it would need more work in OFBiz...
Then you should not get issues with held connections or too many open connections
On the broker side (in activemq.xml) you need to
- set advisorySupport="false" for the broker (exceptif you want to use advisroy messages)
use transport.soTimeout=60000 and set enableStatusMonitor="true" for Openwire connector
<broker xmlns="http://activemq.apache.org/schema/core" brokerName="localhost" dataDirectory="${activemq.base}/data" destroyApplicationContextOnStop="true" advisorySupport="false"> .... <transportConnector name="openwire" uri="tcp://0.0.0.0:61616?transport.soTimeout=60000" enableStatusMonitor="true"/>