Running a Qpidd cluster

There are several pre-requisites to running a qpidd cluster:

Install and configure openais/corosync

Qpid clustering uses a multicast protocol provided by the corosync (formerly called openais) library. Install whichever is available on your OS.
E.g. in fedora10: yum install corosync.

The configuration file is /etc/ais/openais.conf on openais, /etc/corosync.conf on early corosync versions and /etc/corosync/corosync.conf on recent corosync versions.

Depending on the version there may already be a config file installed, or you may need to copy corosync.conf.example to corosync.conf

Here is an example, with places marked that you will
change. ( Below, I will describe how to change the file. )

# Please read the openais.conf.5 manual page

totem {
        version: 2
        secauth: off
        threads: 0
        interface {
                ringnumber: 0
                ## You must change this address ##
                bindnetaddr: 20.0.100.0
                mcastaddr: 226.94.32.36
                mcastport: 5405
        }
}

logging {
        debug: off
	timestamp: on
	to_file: yes
	logfile: /tmp/aisexec.log
}

amf {
     	mode: disabled
}

You must sent the bindnetaddr entry in the configuration file to the network address of your network interface. This must be a real network interface, not the loopback address 127.0.0.1

You can find your network address by running ifconfig. This will list the host address and the mask, e.g.

inet addr:20.0.20.32  Bcast:20.0.20.255  Mask:255.255.255.0

The network address for bindnetaddr is the logical AND of the inet addr and mask values, in the example above 20.0.20.0

On more recent systems that do not have ifconfig you can use: ip addr

This will give output like

3: wlp3s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
link/ether 24:77:03:42:3a:70 brd ff:ff:ff:ff:ff:ff
inet 192.168.0.104/24 brd 192.168.0.255 scope global dynamic wlp3s0
valid_lft 60405sec preferred_lft 60405sec
inet6 fe80::2677:3ff:fe42:3a70/64 scope link
valid_lft forever preferred_lft forever

The relevant line is inet 192.168.0.104/24 which means that the mask is 24 bits, i.e. 255.255.255.0, so the bindnetaddr should be 192.168.0.

The network address (masked address) is used so that you can use the same configuration for several hosts on the same network. You can also use the complete host address as the bindnetaddr, but in that case each host will need a different configuration file.

Open your firewall

In the above example file, I use mcastport 5405.
This implies that your firewall must allow UDP
protocol over port 5405, or that you disable the firewall

Use the proper identity.

The qpidd process must be started with the correct identity in order to use the corosync/openais library.

For openais and early corosync versions the installation of openAIS/corosync on your system will create a new
group called "ais". The user that starts the qpidd processes of the cluster
must have "ais" as its effective group id. You can create a user specifically for this purpose with ais as the primary group, or
a user that has ais as a secondary group can use "newgrp" to set the primary group to ais when running qpidd.

For more recent corosync 1 versions you no longer need to set your group to "ais" but you do need to create a file in /etc/corosync/uidgid.d/ to allow access for whatever user/group ID you want to use. For example create /etc/corosync/uidgid.d/qpid th the contents:

uidgid {
   uid: qpid
   gid: qpid
}

Finally for corosync 2, you should add the uidgid section shown above directly to your corosync.conf file, not to a separate uidgid file.

Starting a Cluster

Make sure the openais or corosync service has been started.

To be a member of a cluster you must pass the --cluster-name argument to qpidd. This is the only required option to join a  cluster, other options can be set as for a normal qpidd.

For example to start a cluster of 3 brokers on the current host
Here is an example of starting a cluster of 3 members, all on the current host but with different ports and different log files:

qpidd -p5672 --cluster-name=MY_CLUSTER --log-output=cluster0.log -d --no-data-dir --auth=no
qpidd -p5673 --cluster-name=MY_CLUSTER --log-output=cluster0.log -d --no-data-dir --auth=no
qpidd -p5674 --cluster-name=MY_CLUSTER --log-output=cluster0.log -d --no-data-dir --auth=no

In a deployed system, cluster members will normally be on different hosts but for development its useful to be able to create a cluster on a single host.

SELinux conflicts

Developers will often start openais/corosync as a service like this:

service openais start

But will then will start a cluster-broker without using the service script like this:

/usr/sbin/qpidd --cluster-name my_cluster ...

If SELinux is in enforcing mode this may cause qpidd to hang due because of the different SELinux contexts.
There are 3 ways to resolve this:

  • run both qpidd and openais/corosync as services.
  • run both qpidd and openais/corosync as user processes.
  • make selinux permissive:

To check what mode selinux is running:

# getenforce

To change the mode:

# setenforce permissive

Note that in a deployed system both openais/corosync and qpidd should be started as services, in which case there is no problem with SELinux running in enforcing mode.

Troubleshooting checklist.

If you have trouble starting your cluster, make sure that:

  1. You have edited the correct openais/corosync configuration file and set bindnetaddr correctly
  2. Your firewall allows UDP on the openais/corosync mcastport
  3. Your effective group is "ais" (openais/old corosync) or you have created an appropriate ID file (new corosync)
  4. Your firewall allows TCP on the ports used by qpidd.
  5. If you're starting openais as a service but running qpidd directly, ensure selinux is in permissive mode

Troubleshooting FAQ.

  • If I got "Daemon startup failed: Cluster-ID mismatch. stores belong to different clusters.", even if I did check name-cluster
    1. Clean all filesystem database, with --truncate yes
    2. Delete all file with rm -rf /var/lib/qpidd/* (depending data-dir option)
  • If I got "Starting Qpid AMQP daemon: Daemon startup failed: Queue XXXXX: recoverQueues() failed: jexception 0x0c00 jinf::validate() threw JERR_JINF_CVALIDFAIL: Journal compatibility validation failure. (File "/var/lib/qpidd/rhm/jrnl/NNNN/XXXXX/JournalData.jinf": RHM_JDAT_VERSION mismatch: (MessageStoreImpl.cpp:820)"
    1. Your datastore may be corrupt after a insuffisant disk space error
    2. Resize your filesystem up
    3. Clean all filesystem database, with --truncate yes
    4. Delete all file with rm -rf /var/lib/qpidd/* (depending data-dir option)
  • No labels