Archive

Archive for the ‘Performance’ Category

Enable JCA resource adapter statistics in JBoss EAP 6

Performance tuning is guesswork without good instrumentation. Knowing the bottleneck is at least half the battle. Thus it is always nice to graph connection pool statistics for databases and resource adapters. With EAP 6, statistics can be enabled in the connection-definition element, for example:


<connection-definition
      class-name="org.apache.activemq.ra.ActiveMQManagedConnectionFactory"
      jndi-name="java:/QCF" enabled="true" pool-name="QCF"
      statistics-enabled="true"
      >

Unfortunately it doesn’t work, most metrics (such as InUseCount) are reported as 0. Why? Apparently this is a bug that has been fixed in EAP 7. However, for 6.4.8+ statistics can be enabled by setting the system property -Dorg.jboss.as.connector.deployers.ra.enablePoolStatistics=true. Do it now…

Categories: Java, Performance

JBoss EAP 6 and CACHE_CONSUMER with Spring DMLC

The documentation for Spring’s DefaultMessageListenerContainer states:

Some Java EE servers only register their JMS resources with an ongoing XA transaction in case of a freshly obtained JMS Connection and Session.

Rumour has it that JBoss EAP is one of the guilty servers, but is it true? We couldn’t find any definitive answers for the recent versions. But yes, caching consumers with JBoss EAP 6 using XA transactions will fail. The application server only enlists with the active transaction when a connection is checked out from the pool. If the consumer is reused for other transactions that will seem to work (no error messages), but in fact the messages are consumed without transactions. If a rollback occurs messages are lost.

In short, be sure to use CACHE_NONE with JBoss EAP 6.

Categories: Java, Performance

Free toolset for JMS tests released

In 1996 I worked with IBM MQ Series using OS/2 and CICS. That was the first time I worked with performance tuning for messaging. Since then I have been involved in many projects with the same basic goal: make sure it works correctly (no lost messages under any circumstances) and make it fast. It has never been my full-time job, but as a consultant I have been there and done that a few times now. The more memorable have been webMethods, AQ, HornetQ, ActiveMQ/AMQ and (again) AQ. Every time I have ended up writing my own tools. Yes, every time. So, I’ll do it again, but this time I’ll release the code and make it work for as many JMS providers as possible. Or at least the ones I need.

You can find the project, now in 1.0, at GitHub. If you find it useful, please let me know. One major benefit of making the code open source is that the vendors always need to reproduce any issues on their own. With an open source tool that is easier. A widely used tool is even better, as it has some credibility from the start, so spread the word (and the code)!

If I have time during the summer I will write a series of posts on how to use the tools. Let me know if there is anything in particular I should cover!

Categories: Java, Performance

Oracle AQ JMS Performance

Introduction

AQ is Oracle’s message queue implementation. Well, one of them. It supports JMS 1.1 and is included in all versions of the Oracle database, even the free version. It has been battle-tested for twenty years and last time I checked (don’t take my word for it) it required no extra license. What’s not to like?

AQ uses normal database constructs such as tables and SQL commands. That has several advantages, not least that the normal database performance tuning tools and methods can be used. Plus it makes it very easy to instrument and manipulate queues programmatically.

For the most part, tuning AQ is nothing special. However, there are a few dark corners. Stay tuned.

Basic configuration

First of all we need a user with AQ privileges and quota on a tablespace:


create user aqtest identified by whatever
  quota unlimited on users default tablespace users;
grant aq_administrator_role to aqtest;
grant create session to aqtest;

With a user in place we can create a queue table and a queue (as aqtest):


begin
  dbms_aqadm.create_queue_table(
    queue_table        => 'test_qtab',
    queue_payload_type => 'sys.aq$_jms_message',
    storage_clause     =>
    'lob (user_data.bytes_lob) store as securefile ' ||
    '(retention none cache) ' ||
    'lob (user_data.text_lob) store as securefile  ' ||
    '(retention none cache) '  ||
    'opaque type user_prop store as securefile ' ||
    'lob (retention none cache)');
  dbms_aqadm.create_queue(
    queue_name             => 'test_queue',
    queue_table            => 'test_qtab',
    max_retries            => 1,
    retry_delay            => 30,
    retention_time         => 0);
  dbms_aqadm.start_queue (queue_name => 'test_queue');
end;
/

Note the storage clause. It uses securefile (after all we’re in 2016 now), retention none as a message is read exactly once most of the time and cache as most messages are read and deleted almost immediately when posted. Keeping the data in memory makes sense.

Block sizes

Oracle can use many different block sizes. The default block size is typically 8k. Write-heavy applications can often benefit from smaller block sizes, as that reduces contention. The smallest reasonable block size for AQ is 4k:


alter system set db_4k_cache_size=100M scope=both;
create tablespace users4k datafile '/oradata/orcl/users4k.dbf'
  size 100M autoextend on next 5M extent management local
  segment space management auto;
alter user aqtest quota unlimited on users4k;

Recreate the queue:


begin
  dbms_aqadm.stop_queue (queue_name => 'test_queue');
  dbms_aqadm.drop_queue (queue_name => 'test_queue');
  dbms_aqadm.drop_queue_table (queue_table => 'test_qtab');

  dbms_aqadm.create_queue_table(
    queue_table        => 'test_qtab',
    queue_payload_type => 'sys.aq$_jms_message',
    storage_clause     => 'tablespace users4k ' ||
    'lob (user_data.bytes_lob) store as securefile ' ||
    '(retention none cache) ' ||
    'lob (user_data.text_lob) store as securefile  ' ||
    '(retention none cache) '  ||
    'opaque type user_prop store as securefile ' ||
    'lob (retention none cache)');
  dbms_aqadm.create_queue(
    queue_name             => 'test_queue',
    queue_table            => 'test_qtab',
    max_retries            => 1,
    retry_delay            => 30,
    retention_time         => 0);
  dbms_aqadm.start_queue (queue_name => 'test_queue');
end;
/

AQ can store the message payload inline (i.e. in the same row as the metadata) or separately. However, it will only be stored inline if the size of the payload is less than about 4000 bytes. All message queue implementations work best with small messages, but here a small difference in size can theoretically have a significant impact. In practice I haven’t seen any major effects though, up to 3% in some benchmarks and none in others. As usual in engineering it all depends – where is the bottleneck?

If the application uses larger messages, the tablespace for the payload can use a block size optimized for the average message size, making it more likely that the whole message fits into a single block.

Block compression

It is possible to compress the payload and the user properties and by all means the metadata. That costs CPU, but on the other hand it reduces the storage requirements and perhaps it makes the difference between using one or two blocks for a message?

It is very easy to add. Simply tack on a compress clause when creating the queue table:


dbms_aqadm.create_queue_table(
  queue_table        => 'test_qtab',
  queue_payload_type => 'sys.aq$_jms_message',
  storage_clause     => 'tablespace users4k '    ||
  'lob (user_data.bytes_lob) store as securefile '  ||
  '  (retention none cache compress low) '          ||
  'lob (user_data.text_lob) store as securefile '   ||
  '  (retention none cache compress low) '          ||
  'opaque type user_prop store as securefile lob '  ||
  '  (retention none cache compress low)');

Note that this almost certainly requires an extra license.

Driver versions

AQ JMS uses aqapi.jar and Oracle’s JDBC driver. While most versions are likely to work there can be a tremendous difference between the ancient versions and the latest ones. Make sure that the drivers are current and preferably matched (i.e. for the same target database)!

The JDBC driver can be downloaded from Oracle, but as far as I know aqapi.jar ships with the database and with Oracle’s application servers. Note that the database ships with two versions, one that can be used standalone and one that is intended for WebLogic or Oracle Application Server! Use the correct one.

Receive timeout

The way AQ JMS handles a receive timeout can be a real killer for applications that need to use many threads processing messages from the same queue. That equates to most Java EE applications.

When a client calls receive a single SELECT is issued in order to check if there are any messages on the queue. If there are no messages the client goes to sleep. If a message arrives the client wakes up, issues the same SELECT again and ideally consumes the message.

Unfortunately if there are 100 clients waiting for a message on a queue Oracle wakes them all when a message arrives. They will compete for it. One will succeed, the other 99 will fail. In this case the database needs to process 100 simultaneous SELECT statements and only one actually returns any data. The other 99 represent wasted resources. This can really kill the database, as it consumes large amounts of CPU.

A good test case for this is to spin up 100 consumer threads and one producer thread; then watch the load on the database and the top SQL. There should be an easy to find query similar to:

select  /*+ INDEX(TAB AQ$_TEST_QUEUE_TAB_I) */   tab.rowid, tab.msgid, tab.corrid, tab.priority, tab.delay,   tab.expiration ,tab.retry_count, tab.exception_qschema,   tab.exception_queue, tab.chain_no, tab.local_order_no, tab.enq_time,   tab.time_manager_info, tab.state, tab.enq_tid, tab.step_no,   tab.sender_name, tab.sender_address, tab.sender_protocol,   tab.dequeue_msgid, tab.user_prop, tab.user_data   from "AQTEST"."TEST_QUEUE_TAB" tab  where q_name = :1 and (state = :2  )  order by q_name, state, enq_time, step_no, chain_no, local_order_no for update skip locked

Check the statistics in Enterprise Manager or with SQL:


select rows_processed / executions rows_per_execution
from v$sqlarea where sql_text like
'select  /*+ INDEX(TAB AQ$_TEST_QUEUE_TAB_I) */   tab.rowid,%';

The number of rows returned per execution is very low, less than 0.01. The CPU load on the other hand is substantial.

The simple solution is to keep the number of threads down, but that is seldom possible. A more realistic alternative is to sleep on the client side. Use receiveNoWait instead of receive and if the method returns without a message, sleep for a short time in Java before the next attempt. At peak load all threads will get messages so no time is wasted sleeping and when there is less work available most threads will spend their time sleeping in the application server, not performing DOS attacks on the database. Ideally the number of listeners should ramp up and down based on traffic as well.

Dynamic destinations

In JMS a destination can be looked up with JNDI, or it can be created dynamically. For example:


Destination testQueue = session.createQueue("test_queue");

This is convenient, but there is a price to pay. Every time the method is called it issues a SELECT in order to find the destination. An application that creates a dynamic destination when it posts a message will do it for every single message. That adds up.

See https://www.javacodegeeks.com/2013/04/jms-and-spring-small-things-sometimes-matter.html for a more in-depth discussion.

What to do? Fortunately Destination is thread-safe, so it can be cached. With Spring the JndiDestinationResolver can be used with cache=true and fallbackToDynamicDestination=true. It will fail to make a JNDI lookup, do the dynamic lookup and then cache the result. Without Spring, use an application cache, for example a ConcurrentHashMap.

Ordered delivery

An application server that processes messages from the same queue in parallel using multiple threads really can’t guarantee that the messages are processed in order. Even if they are delivered in the same order as they were sent, they will be processed in parallel and will complete in non-deterministic order. However, by default Oracle ensures that messages on a queue are delivered in order. That can be a bit expensive, in particular if the application is using selectors.

Set the system property oracle.jms.orderWithSelector=false in order to cut corners here. This is unlikely to give a large boost, but unless ordered delivery is required it is a quick hit.

Final words

Tuning AQ is mostly about tuning the database. It is also about tuning the Java side and finding where the bottlenecks are. In other words it is fun! Oracle 12c comes with support for sharded queues – I haven’t had the opportunity to test them in a real-world scenario yet, but I look forward to that as they offer horizontal scaling for a single queue with RAC. I’ll be back.

Categories: Java, Oracle, Performance

Oracle JDBC memory settings

Many applications use ancient versions of Oracle’s JDBC drivers, as they are downloaded manually and seldom upgraded. That is a pity as the newer drivers offer much better performance. However, some of the performance gains are bought with increased memory consumption and that can be a problem.

We ran into an issue with the connection pool in JBoss. It uses connections in round-robin, so as long as there is some load the pool tends to stay at peak size. Each connection normally keeps a buffer cache and it may cache other things as well. The size of the connections would grow over time, eventually consuming most of the heap.

A JBoss-specific workaround is to use the cli and close the idle connections (off-peak):


/subsystem=datasources/xa-data-source=DS/:flush-idle-connection-in-pool

DS is the name of the data source. A better approach may be to set the system property oracle.jdbc.useThreadLocalBufferCache to true. That moves the buffer cache from the connections to thread locals. Depending on how the application behaves that may be better as the memory is reclaimed when a thread dies. Plus a given thread may be more likely to issue the same SQL multiple times and can thus benefit more from the cache.

It may also be useful to limit the maximum buffer size with oracle.jdbc.maxCachedBufferSize. The implicit statement cache is normally off, but if it is enabled the size can be controlled using oracle.jdbc.implicitStatementCacheSize and oracle.jdbc.freeMemoryOnEnterImplicitCache can force buffers allocated for a statement to be released when it is put into the cache. In most cases it is best to leave those options alone.

See Oracle JDBC Memory Management for the whole story!

Categories: Java, Oracle, Performance

JMeter with Graphite

JMeter, probably the most popular open-source tool for load testing, can now report test results to Graphite! The Graphite listener is not part of the latest release (2.12), but it is included in the nightly build and it works quite well. It can report summary statistics for the entire test run or statistics for specific samplers with successes and failure counts, percentiles and so on. Highly recommended, in particular if the system under test also reports statistics to Graphite!

Categories: Performance