Broker Statistics
The following statistics are provided as a basic useful minimum.
- Peak bytes per second
- Current bytes per second
- Total bytes
- Peak messages per second
- Current messages per second
- Total messages
These statistics will be generated for both delivered and received messages, per connection and aggregated per virtualhost and also for the entire broker. Delivered messages are those that have been sent to a client by the broker, and may be counted more than once die to rollbacks and re-delivery. Received messages are those that are published by a client and received by the broker.
Mechanism
The statistics are generated using a sample counter, which is thread-safe and is triggered on incoming message events. The totals for messages and data (bytes) are recorded as type long
and the rates are recorded as double
since they may be fractional.
The default sample period is one second, during which events are recorded cumulatively. This can be changed for all counters in the broker by setting the system property qpid.statistics.samplePeriod
. Basically, rates are calculated as an average over this sample period, so if the traffic is very bursty, a small sample period will fail to capture events most of the time, and you will end up with inflated peak rates. Since we are calculating rates per-second, the sample period is 1000ms as a default, but if this is increased to, say, 10000ms then messages will be counted over ten second periods, and the resulting totals divided by ten to give the rate.
Since qpid.statistics.samplePeriod
is the cut-off point when the rates are calculated, stored, and then re-set, changing this property should affect performance. Increasing the value should improve throughput, decreasing should reduce it.
When all statistics generation is enabled, there will be 12 counters that are used, since statistics are calculated for the product of the following sets:
{ Broker, Virtualhost, Connection } x { Delivery, Receipt } x { Messages, Data }
Reporting
If statistics are being generated for the broker or for virtualhosts then periodic reporting can be enabled. This is generated at specified intervals, with the option to reset the statistics after outputting the current data.
[Statistics-Reporting] BRK-1008 : received : 4.395 kB/s peak : 4500 bytes total [Statistics-Reporting] BRK-1008 : delivered : 4.395 kB/s peak : 4500 bytes total [Statistics-Reporting] BRK-1009 : received : 45 msg/s peak : 45 msgs total [Statistics-Reporting] BRK-1009 : delivered : 45 msg/s peak : 45 msgs total [Statistics-Reporting] VHT-1003 : localhost : received : 1.465 kB/s peak : 1500 bytes total [Statistics-Reporting] VHT-1003 : localhost : delivered : 1.465 kB/s peak : 1500 bytes total [Statistics-Reporting] VHT-1004 : localhost : received : 15 msg/s peak : 15 msgs total [Statistics-Reporting] VHT-1004 : localhost : delivered : 15 msg/s peak : 15 msgs total [Statistics-Reporting] VHT-1003 : test : received : 0.977 kB/s peak : 1000 bytes total [Statistics-Reporting] VHT-1003 : test : delivered : 0.977 kB/s peak : 1000 bytes total [Statistics-Reporting] VHT-1004 : test : received : 10 msg/s peak : 10 msgs total [Statistics-Reporting] VHT-1004 : test : delivered : 10 msg/s peak : 10 msgs total [Statistics-Reporting] VHT-1003 : development : received : 1.953 kB/s peak : 2000 bytes total [Statistics-Reporting] VHT-1003 : development : delivered : 1.953 kB/s peak : 2000 bytes total [Statistics-Reporting] VHT-1003 : development : received : 1.953 kB/s peak : 2000 bytes total [Statistics-Reporting] VHT-1003 : development : delivered : 1.953 kB/s peak : 2000 bytes total [Statistics-Reporting] VHT-1004 : development : received : 20 msg/s peak : 20 msgs total [Statistics-Reporting] VHT-1004 : development : delivered : 20 msg/s peak : 20 msgs total
Sample statistics report log messages
The reported statistics are output using the operational logging mechanism, and can therefore be extracted from the log files using the same mechanisms as are currently used for other such messages. The new message types are as follows:
- BRK-1008 - Broker data statistics (peak bytes per second and total bytes)
- BRK-1009 - Broker message statistics (peak messages per second and total messages)
- VHT-1003 - Virtualhost data statistics (peak bytes per second and total bytes)
- VHT-1004 - Virtualhost message statistics (peak messages per second and total messages)
Configuration
The statistics generation will be configured via the config.xml
broker configuration mechanism.
<statistics> <generation> <!-- default false --> <broker>true</broker> <virtualhosts>true</virtualhosts> <connections>true</connections> </generation> <reporting> <period>3600</period><!-- seconds --> <reset>true</reset> </reporting> </statistics>
It is not possible to enable statistics generation for a single virtualhost - all virtualhosts will generate statistics if the //statistics/generation/virtualhosts
element is set to true.
It only makes sense to enable reporting if either broker or virtualhosts have statistics generation enabled also. However, the broker will simply ignore the reporting configuration if no statistics are being generated. If the //statistics/reporting/reset
element is set to true then after reporting on the statistics in the log, the statistics will be reset to zero for the entire broker.
Even if statistics generation is completely disabled, it is still possible to activate statistics on an individual connection, while the broker is running. The JMX attribute statisticsEnabled on a connection MBean can be set to true which will start statistics generation, and totals and rates will be calculated from this point onward, or until it is set to false again.
Additionally, the following two system properties can be set to configure the statistics counter:
qpid.statistics.samplePeriod
- sample period in ms - default 1000qpid.statistics.disable
- override configuration and disable statistics collection - default false (not set)
These two properties are exposed on the StatisticsCounter
class as static fields DEFAULT_SAMPLE_PERIOD
and DISABLE_STATISTICS
.
Design
StatisticsCounter Class
This implements the counting of event data and generates the total and rate statistics. There should be one instance of this class per type of statistic, such as messages or bytes. The instance methods that are called to add an event are:
void registerEvent()
Registers a single event at the current system time, such as the arrival of a message.
void registerEvent(long value)
Registers a cumulative value, such as the size of a message, at the current system time.
void registerEvent(long value, long timestamp)
Registers a cumulative value at the specified time. As with the previous method, this could be 1, which will allow timestamped single events to be registered if the no-argument version is not sufficient.
There are three constructors:
StatisticsCounter()
StatisticsCounter(String name)
StatisticsCounter(String name, long samplePeriod)
These are chained in that order, using a default name of counter and the default sample period of 2000 ms or set in the qpid.statistics.samplePeriod
property.
To retrieve the data, there are methods to return the current rate, peak rate and total, as well as the start time, sample period and name of the counter, and also a method to reset the counter.
StatisticsGatherer Interface
This is implemented by the broker business object that is generating statistics. It provides the following methods:
void initialiseStatistics()
void registerMessageReceived(long messageSize, long timestamp)
void registerMessageDelivered(long messageSize)
StatisticsCounter getMessageDeliveryStatistics()
StatisticsCounter getDataDeliveryStatistics()
StatisticsCounter getMessageReceiptStatistics()
StatisticsCounter getDataReceiptStatistics()
void resetStatistics()
void setStatisticsEnabled(boolean enabled)
boolean isStatisticsEnabled()
These statistics are exposed using the separate JMX Mbean interfaces detailed below, which calls these methods to retrieve the underlying StatisticsCounter
objects and return their attributes. This interface gives a standard way for parts of the broker to set up and configure statistics generation.
When creating these objects, there should be a parent/child relationship between them, normally from broker to virtualhost to connection. This means that the lowest level gatherer can record statistics (if enabled) and also pass on the notification to its parent object to allow higher level aggregation of statistics. When resetting statistics, this works in the opposite direction, with higher level gatherers also resetting all of their child objects. Note that this parent/choild relationship is never explicitly specified, and is dependant on the implementation of registerMessageDelivery
and resetStatistics
in order to allow more flexibility.
JMX Interface
The Qpid JMX interface level that supports statistics is 1.9. Each object (MBean) that can generate statistics will add the following attributes and operations:
Operation |
Description |
---|---|
resetStatistics |
Reset the statistics counters for this object, and all children. |
Attribute |
Permissions |
Description |
---|---|---|
peakMessageDeliveryRate |
R |
Peak number of messages delivered per second |
peakDataDeliveryRate |
R |
Peak number of bytes delivered per second |
messageDeliveryRate |
R |
Current number of messages delivered per second |
dataDeliveryRate |
R |
Current number of bytes delivered per second |
totalMessagesDelivered |
R |
Total number of messages delivered since start or reset |
totalDataDelivered |
R |
Total number of bytes delivered since start or reset |
peakMessageReceiptRate |
R |
Peak number of messages received per second |
peakDataReceiptRate |
R |
Peak number of bytes received per second |
messageReceiptRate |
R |
Current number of messages received per second |
dataReceiptRate |
R |
Current number of bytes received per second |
totalMessagesReceived |
R |
Total number of messages received since start or reset |
totalDataReceived |
R |
Total number of bytes received since start or reset |
statisticsEnabled |
R (W Connections only) |
Is statistics generation enabled or disabled |
The following MBeans have had statistics attributes added:
- ServerInformation
The entire broker, statistics for all virtualhosts. The totals for the broker will be equal to the sum of the totals for all virtualhosts. Rates for the broker include messages from all virtualhosts, so the broker peak can be higher than individual virtualhosts, and possibly also higher than the sum, due to slight timing differences. This is to be expected.
org.apache.qpid:type=ServerInformation,name=ServerInformation,*
- ManagedBroker
This is just a misleadingly named virtualhost. Note that the sum of all connections statistics will not always equal the statistics for the virtualhost, as connection MBeans are discarded when the connection is closed.
org.apache.qpid:type=VirtualHost.VirtualHostManager,VirtualHost=*,*
- ManagedConnection
Individual connections, which can be selected on a per-virtualhost basis. The name of a connection is usually the JavatoString()
representation of the address of the remote internet socket making the connection, so it is usually not possible to determine these in advance. These allow enabling and disabling of statistics generation.
org.apache.qpid:type=VirtualHost.Connection,VirtualHost=*,name=*
The JMX attributes that record statistics are always present, and will have a value of 0/0.0 if generation is not enabled, and the statisticsEnabled attribute will be set to false.
The qpid.statistics.disable
system property will override all configuration on the broker. All attributes will be reported as 0/0.0 and JMX will show statistics as being disabled via the statisticsEnabled property, although it will be possible to selectively enable generation on a per-connection basis.
Usage
The JMX JConsole application can be used to view the attributes, both as discrete values or as graphs. Sample output is shown below, illustrating the virtualhost statistics for a broker with a producer and a consumer attached and sending messsages.
Unable to render embedded object: File (jconsole-statistics-zero.png) not found.
Virtualhost statistics attributes at broker startup
Unable to render embedded object: File (jconsole-statistics-graphs.png) not found.
Virtualhost statistics graphs and attributes after sending and receiving messages through the broker
Testing
One caveat related to testing is that these statistics, apart from totals, are by their nature time dependent, and subject to race conditions, where sending messages as part of a test can overlap two sample periods, instead of one, causing seemingly incorrect results.
Unit
The StatisticsCounter
class is suitable for unit testing. Tests checking class creation and operation check the total, rate and peak data, as well as other class properties. It is difficult to write a test to provably show thread safety when updating the counter with new events, but there are tests to check that out-of-order events will not cause problems with the data generated.
System
System testing covers the generation of statistics using a running broker, as well as the configuration mechanisms and the operation of the JMX interface to the data and the operational logging output. The following system tests have been written, all using a shared parent test case class, MessageStatisticsTestCase
which sets up the JMX connection to the broker.
In order for these JMX based tests to work correctly with the default test profile, a fix had to be back-ported from trunk which disables the use of a custom ServerSocketFactory
in the JMX registry. The property that disables this is set only in the JMXUtils
helper class, and should not be used in normal broker operation.
MessageStatisticsTest
MessageStatisticsDeliveryTest
MessageStatisticsReportingTest
MessageConnectionStatisticsTest
MessageStatisticsConfigurationTest
These tests can be run with the following command, from the systests module directory:
$ ant test -Dtest=Message*Statistics*Test
Performance
To measure performance, a release of 2.6.0.7 was generated, using the standard release mechanism. The data should be compared with the previous [results] for the 2.6.0.6 release. The performance test suite was run on noc-qpiddev02.jpmorganchase.com
with the broker running on noc-qpiddev01.jpmorganchase.com
. Statistics were enabled on the broker by editing the etc/persistent_config.xml
file to set statistics generation to true for the broker, virtualhosts and connections, as well as to report every hour (without resetting the statistics). The Disabled configuration used the provided default configuration file, and the Broker Only configuration used the same as the first set of tests, but with the //statistics/generation/virtualhosts
and //statistics/generation/connections
elements set to false.
The following command was executed for each of the configurations, and the results are aggregated into the tables below:
$ ./Throughput.sh
Accuracy
The measured standard deviation of these results is 2.2% (about 500) of the mean (about 23000), based on ten sample runs of the TTBT-TX test with no statistics generation enabled. For the TTBT-NA test, a set of [102 test runs|^throughput.csv] indicated that the standard deviation was around 0.8% of the mean, with a range of 3.8% of the mean between the minimum and maximum result.
Initial Results
These results use the code from SVN revision r1033077, annd were generated on the 09 and 10 November 2010.
Throughput Numbers
Test
Thoughput
Throughput
Broker Only
Throughput
Disabled
PQBT-AA-Qpid-01
2937.3007
0
3239.986
PQBT-TX-Qpid-01
3142.3297
0
3240.4037
PTBT-AA-Qpid-01
3817.1709
3897.9127
3941.363
PTBT-TX-Qpid-01
3971.9589
0
4087.7404
TQBT-AA-Qpid-01
52299.27
0
53195.957
TQBT-NA-Qpid-01
50979.767
48090.19
51696.26
TQBT-TX-Qpid-01
7321.2233
0
7438.8633
TTBT-AA-Qpid-01
86934.654
0
92742.66
TTBT-NA-Qpid-01
103829.22
0
128478.24
TTBT-TX-Qpid-01
21685.606
0
22985.97
There was no obvious difference between broker only and all statistics, so that testing was discontinued.
Comparison with previous release:
Test
2.6.0.6
2.6.0.7
Change
2.6.0.7
Disabled
Change
Disabled
PQBT-AA-Qpid-01
3214.728
2937.3007
-8.63%
3239.986
0.79%
PQBT-TX-Qpid-01
3224.1871
3142.3297
-2.54%
3240.4037
0.50%
PTBT-AA-Qpid-01
4069.4785
3817.1709
-6.20%
3941.363
-3.15%
PTBT-TX-Qpid-01
4088.1968
3971.9589
-2.84%
4087.7404
-0.1%
TQBT-AA-Qpid-01
51871.49
52299.27
0.82%
53195.957
2.55%
TQBT-NA-Qpid-01
54278.095
50979.767
-6.08%
51696.26
-4.76%
TQBT-TX-Qpid-01
7496.785
7321.2233
-2.34%
7438.8633
-0.77%
TTBT-AA-Qpid-01
94392.1
86934.654
-7.90%
92742.66
-1.75%
TTBT-NA-Qpid-01
129089.54
103829.22
-19.57%
128478.24
-0.47%
TTBT-TX-Qpid-01
23134.289
21685.606
-6.26%
22985.97
-0.64%
Average
-6.84%
Average
-0.77%
Analysis
As can be seen, this results in an average of around 7% reduction in throughput with all statistics generation enabled, and less that 1% reduction with them disabled. One anomaly that stands out is the TTBT_NA test (transient, non-acknowledge messages) which shows a nearly 20% decrease in throughput, which warrants further investigation. However this is obviously too great a penalty, and will require code changes and further investigation and testing.
Further testing with different combinations of broker, virtualhost and connection statistics enabled will need to be carried out to determine where the greatest penalty in performance lies. Based on this, there may be changes required to the StatisticsCounter
class, or the propagation mechanism that links child and parent counter objects.
It is also important to determine what the maximum acceptable loss of throughput is to a customer. It would seem desirable to have less that 5% decreate in throughput due to statistics generation, however it should be noted that a performance penalty is inevitable when adding code to the message delivery and receipt paths in the broker, and the aim is only to minimise this, not to eliminate it.
Further Testing
The 7% performance loss was felt to be unacceptable, so modifications were made to the StatisticsCounter
code to reduce the time spend carrying out computations. The following changes were made:
- The system property
qpid.statistics.samplePeriod
was changed from 1000 to 2000 and 5000. - The code in
StatisticsCounter
was modified to defer more calculation to the end of a sample period.
Throughput Numbers
The same 2.6.0.7 broker with QPID_OPTS set to -Dqpid.statistics.samplePeriod=N
where N is 2000 or 5000.
Test
2.6.0.6
2.6.0.7
2000
Change
2000
2.6.0.7
5000
Change
5000
PQBT-AA-Qpid-01
3214.728
3198.1702
-0.52%
2970.78
-7.59%
PQBT-TX-Qpid-01
3224.1871
3214.087
-0.31%
2875.1144
-10.82%
PTBT-AA-Qpid-01
4069.4785
4030.176
-0.97%
3872.7224
-4.83%
PTBT-TX-Qpid-01
4088.1968
4175.8924
2.15%
3962.1336
-3.08%
TQBT-NA-Qpid-01
54278.095
52546.314
-3.19%
48368.572
-10.89%
TTBT-TX-Qpid-01
23134.289
22086.33
-4.53%
18947.935
-18.10%
Average
-1.23%
Average
-9.22%
Old is an unmodified 2.6.0.7 broker, the new broker has QPID_OPTS set to -Dqpid.statistics.samplePeriod=5000
and some changes to the StatisticsCounter
class that move computation to the end of the sample period, all three have QPID_JAVA_MEM set to -Xmx2g
. Results are the average of three runs.
TTBT-NA-Qpid-01
Result
Change
2.6.0.6
113206
0.0%
2.6.0.7 Old
103077
-8.9%
2.6.0.7 New 5s
106202
-6.1%
2.6.0.7 New 10s
104196
-8.0%
More Tests
Two 5 minute runs of a simple test, producing and consuming messages as fast as possible in a single thread. This was run on a HP DL585, Quad dual-core 2.2GHz Opteron, 8Gb, hence the difference in reported rates.
- Statistics Configuration Enabled
$ QPID_OPTS=-Dqpid.statistics.samplePeriod=2500 QPID_JAVA_MEM=-Xmx4g ./qpid-server
3792.9 msgs/s - Statistics Configuration Enabled
$ QPID_OPTS=-Dqpid.statistics.samplePeriod=500 QPID_JAVA_MEM=-Xmx4g ./qpid-server
3961.8 msgs/s - Statistics Configuration Enabled
$ QPID_OPTS=-Dqpid.statistics.samplePeriod=30000 QPID_JAVA_MEM=-Xmx4g ./qpid-server
3999.3 msgs/s - Statistics Configuration Enabled
$ QPID_OPTS=-Dqpid.statistics.disable=true QPID_JAVA_MEM=-Xmx4g ./qpid-server
4156.3 msgs/s - Stock Apache 0.5 Download - http://apache.mirror.anlx.net//qpid/0.5/qpid-java-broker-0.5.tar.gz
$ QPID_JAVA_MEM=-Xmx4g ./qpid-server
4068.3 msgs/s
These results would seem to indicate that the act of generating statistics is causing the slowdown, rather than updating the peak value at the end of a sample period. Additionally, the results for the single-threaded testing are much closer, suggesting that lock contention with many connections is an issue.
Profiling
To properly determine hot-spots in the code, a profiling run was made using the YourKit profiler. The simple test used in the previous section was run with the profiler attached, and CPU statistics were collated. The profiler detected the following hotspots:
StatisticsCounter.registerEvent
- 4%AtomicLong.addAndGet
- 3%AtomicLong.compareAndSet
- 2%
Since the AtomicLong
calls are made inside registerEvent
, it seems that this is where the CPU is spending the extra time that is causing the slowdown. In order to mitigate this, the registerEvent
method was changed to be synchronized
and the internal logic changed to use primitive long
types and standard arithmentic operations. After this change, the profiler no longer showed the registerEvent
as a hotspot in the code when another test run was made.
The updated code was copied to the broker being used for throughput testing, and the TTBT-NA-Qpid-01 test was re-run, with results shown below (again, an average of three runs was taken).
TTBT-NA-Qpid-01
Result
Change
2.6.0.6
113206
0.0%
2.6.0.7 Latest
103698
-8.3%
Finally, the single-threaded test was also run against 2.6.0.6 and the latest 2.6.0.7 build, with the following results:
TTBT-NA-Qpid-01
Msgs/Second
Change
2.6.0.6
1714.5
0.0%
2.6.0.7 Latest
1660
-3.2%
In order to more fully test this, the script which starts the TTBT-NA test, and the Throughput.sh sctipt itself, were copied, and modified. A series of scripts, STAT-XX.sh, was created, consisting of identical calls to the TTBT-NA test apart from the number of consumers, which varied from 01 to 16 and is represented by XX, and a Statistics.sh script was created which calls each of these in turn.
The STAT throughput tests were run against a standard 2.6.0.6 broker and the 2.6.0.7 build with all statistics generation enabled. It is expected that the difference in throughput should increase as the number of consumers increases. Note that these tests were only run once so are not as accurate as the averaged, multi-run datasets.
2.6.0.6 Throughput Numbers
$ QPID_JAVA_MEM=-Xmx2g ./qpid-server -c ../etc/persistent_config.xml
Test
Thoughput
STAT-01
57415.543
STAT-02
92730.82
STAT-03
105619.415
STAT-04
116192.78
STAT-05
119669.87
STAT-06
122700.195
STAT-07
111541.725
STAT-08
126887.405
STAT-09
110207.38
STAT-10
122272.66
STAT-11
123028.33
STAT-12
127724.23
STAT-13
114467.72
STAT-14
128505.86
STAT-15
116783.16
STAT-16
127264.14
2.6.0.7 Throughput Numbers
$ QPID_OPTS=-Dqpid.statistics.samplePeriod=5000 QPID_JAVA_MEM=-Xmx2g ./qpid-server -c ../etc/persistent_config.xml
Test
Thoughput
Change
STAT-01
57101.006
-0.5%
STAT-02
83016.304
-10.5%
STAT-03
92533.17
-12.4%
STAT-04
99060.04
-14.8%
STAT-05
103271.49
-13.7%
STAT-06
103339.02
-15.8%
STAT-07
103619.804
-7.1%
STAT-08
105613.2
-16.8%
STAT-09
104110.99
-5.5%
STAT-10
107108.84
-12.4%
STAT-11
106436.25
-13.5%
STAT-12
108537.865
-15.0%
STAT-13
104056.22
-9.1%
STAT-14
108337.49
-15.7%
STAT-15
106900.96
-8.5%
STAT-16
109222.84
-14.2%
Although it is difficult to draw any real or meaningful conclusions from these results, note that single consumer performance degradation is negligible.
A useful further test would be to run the 2.6.0.7 broker with only connection-level statistics being generated? This ought to remove the contention for the lock on the counters for virtualhosts and the broker when many consumers are being delivered to.
2.6.0.7 Throughput Numbers
$ QPID_OPTS=-Dqpid.statistics.samplePeriod=5000 QPID_JAVA_MEM=-Xmx2g ./qpid-server -c ../etc/connection_config.xml
Test
Thoughput
Change
STAT-01
55884.613
-2.7%
STAT-02
91857.41
-0.9%
STAT-07
114754.22
2.9%
STAT-08
117917.595
-7.1%
STAT-09
114506.71
3.9%
STAT-10
122330.33
0.0%
STAT-11
118661.09
-3.5%
STAT-13
116632.744
1.9%
STAT-15
118917.79
1.8%
STAT-16
120704.68
-5.2%
Conclusion?
It would appear that there is a penalty of a few percent (up to 4%) for a single thread, rising to over 10% when there are many threads. Further testing will be required to determine how the number of connections (i.e. threads) causes the performance to vary. This is, of course, hampered by the fact that the throughput tests exhibit up to 4% variability between the minimum and maximum results for a particular test.
Additionally, it is unclear which of the two mechanisms (AtomicLong
versus synchronized registerEvent
method) used in the counter is preferable. For the many-threaded, multi-consumer/producer case the AtomicLong
seems the better choice, for a single producer/consumer the synchronized registerEvent
method is better.
Another point to note is that the actual per-message latency increase is quite small, and it is only when sending large numbers of messages using many connections that the degradation comes into play, and the ratio of messages to data is small, i.e. the data throughput does not concern us, it is purely the number of messages. This means that for most applications there should not be a noticeable difference in performance, with any small change in per message latency being eaten up in noise.
Another point to note is that in these test garbage collection
Further Modifications
This data was generated after modifying the broker code to place statistics recording for message delivery after the message is enqueued at the broker (or at least, after an attempt has been made to do so).
Test
2.6.0.7
2.6.0.6
2.6.0.7
2.6.0.7
Disabled
PQBT-AA-Qpid-01
2777.4448
2696.0464
2702.624
2815.1379
2730.4008
PQBT-TX-Qpid-01
3206.5027
3147.2404
3179.9607
2701.3173
3148.375
PTBT-AA-Qpid-01
4014.191
3977.7153
3902.3087
3983.5536
3996.9091
PTBT-TX-Qpid-01
4023.8338
3996.622
4079.2136
3694.2487
4006.0062
TQBT-AA-Qpid-01
47845.87
51156.82
52660.16
45339.924
51316.2
TQBT-NA-Qpid-01
48345.562
47846.085
48281.647
49123.188
47374.016
TQBT-TX-Qpid-01
6301.3253
7262.565
7209.0416
6326.9844
7234.8924
TTBT-AA-Qpid-01
78419.44
89033.62
79150.925
80203.896
90342.84
TTBT-NA-Qpid-01
97468.78
110630.585
97623.314
98914.57
120317.604
TTBT-TX-Qpid-01
18498.884
22650.204
21379.168
18915.474
22702.417
As a cross-check, the tests were repeated using a stock 2.6.0.6 broker, then again (twice actually), with the 2.6.0.7 broker, like the first time, in case there are any environmental factors that are skewing the results. And finally, with the 2.6.0.7 broker, but with statisatics generation disabled.
These results show that the persistent testing has results that are about the same for both versions, whether the statistics are enbled or disabled. This is due to the naturally lower throughput causing less contention and allowing the statistics generation to run unimpeded. The transient tests get worse with statistics generation enabled, due to the much higher message throughput and the fact that may messages are trying to update the counters at the same time.
Latencey
As a general recommendation, I would suggest that statistics generation is only enabled in cases of low message (count, not size) traffic, where the broker is not being 100% ustilised in terms of CPU. Persistant messaging is nescessarily I/O bound, interms of disk access, and this is usually the determining factor in message latency, so adding the statistics counters places no large additional burden, and will not add a large amount to the latency, certainly less that 5% degradation. It is obvious that any additional processing that talkes place during the act of delivery or receipt of a message will inevitably add latency and decrease throughput on the broker. If a broker is already operati ng at peak performance, something that is only possible with transient messages, which are only dependant on the amount of CPU and RAM available, then the enabling of statistics counters will have a greater effect, possibly degrading pewrformance by over 10%.
Tasks
- Create generic data capture utility for gathering stats on received/delivered messages inc current/peak messages/sec and rolling total messages inc message count and bytes (payload only) stats, inc unit tests. Requires further discussion.
- Configure stats capture (on/off) for broker (config.xml - new section) and per connection (jmx) and data extract dump interval, and code to dump stats data
- Expose stats via JMX MBean attributes for query by users (notification - reqs to be confirmed with client), add reset feature via JMX and tests for these features. JMX changes to expose new stats attributes and reset function, inc tests
- Performance test to measure overhead of stats calculation (see RG email/example class) - separate from perf test harness, small class to measure calc cost only
QPID-2932 Add statistics generation for broker message delivery
QPID-???? Port to trunk