Oracle AQ JMS Performance
AQ is Oracle’s message queue implementation. Well, one of them. It supports JMS 1.1 and is included in all versions of the Oracle database, even the free version. It has been battle-tested for twenty years and last time I checked (don’t take my word for it) it required no extra license. What’s not to like?
AQ uses normal database constructs such as tables and SQL commands. That has several advantages, not least that the normal database performance tuning tools and methods can be used. Plus it makes it very easy to instrument and manipulate queues programmatically.
For the most part, tuning AQ is nothing special. However, there are a few dark corners. Stay tuned.
First of all we need a user with AQ privileges and quota on a tablespace:
create user aqtest identified by whatever quota unlimited on users default tablespace users; grant aq_administrator_role to aqtest; grant create session to aqtest;
With a user in place we can create a queue table and a queue (as aqtest):
begin dbms_aqadm.create_queue_table( queue_table => 'test_qtab', queue_payload_type => 'sys.aq$_jms_message', storage_clause => 'lob (user_data.bytes_lob) store as securefile ' || '(retention none cache) ' || 'lob (user_data.text_lob) store as securefile ' || '(retention none cache) ' || 'opaque type user_prop store as securefile ' || 'lob (retention none cache)'); dbms_aqadm.create_queue( queue_name => 'test_queue', queue_table => 'test_qtab', max_retries => 1, retry_delay => 30, retention_time => 0); dbms_aqadm.start_queue (queue_name => 'test_queue'); end; /
Note the storage clause. It uses securefile (after all we’re in 2016 now), retention none as a message is read exactly once most of the time and cache as most messages are read and deleted almost immediately when posted. Keeping the data in memory makes sense.
Oracle can use many different block sizes. The default block size is typically 8k. Write-heavy applications can often benefit from smaller block sizes, as that reduces contention. The smallest reasonable block size for AQ is 4k:
alter system set db_4k_cache_size=100M scope=both; create tablespace users4k datafile '/oradata/orcl/users4k.dbf' size 100M autoextend on next 5M extent management local segment space management auto; alter user aqtest quota unlimited on users4k;
Recreate the queue:
begin dbms_aqadm.stop_queue (queue_name => 'test_queue'); dbms_aqadm.drop_queue (queue_name => 'test_queue'); dbms_aqadm.drop_queue_table (queue_table => 'test_qtab'); dbms_aqadm.create_queue_table( queue_table => 'test_qtab', queue_payload_type => 'sys.aq$_jms_message', storage_clause => 'tablespace users4k ' || 'lob (user_data.bytes_lob) store as securefile ' || '(retention none cache) ' || 'lob (user_data.text_lob) store as securefile ' || '(retention none cache) ' || 'opaque type user_prop store as securefile ' || 'lob (retention none cache)'); dbms_aqadm.create_queue( queue_name => 'test_queue', queue_table => 'test_qtab', max_retries => 1, retry_delay => 30, retention_time => 0); dbms_aqadm.start_queue (queue_name => 'test_queue'); end; /
AQ can store the message payload inline (i.e. in the same row as the metadata) or separately. However, it will only be stored inline if the size of the payload is less than about 4000 bytes. All message queue implementations work best with small messages, but here a small difference in size can theoretically have a significant impact. In practice I haven’t seen any major effects though, up to 3% in some benchmarks and none in others. As usual in engineering it all depends – where is the bottleneck?
If the application uses larger messages, the tablespace for the payload can use a block size optimized for the average message size, making it more likely that the whole message fits into a single block.
It is possible to compress the payload and the user properties and by all means the metadata. That costs CPU, but on the other hand it reduces the storage requirements and perhaps it makes the difference between using one or two blocks for a message?
It is very easy to add. Simply tack on a compress clause when creating the queue table:
dbms_aqadm.create_queue_table( queue_table => 'test_qtab', queue_payload_type => 'sys.aq$_jms_message', storage_clause => 'tablespace users4k ' || 'lob (user_data.bytes_lob) store as securefile ' || ' (retention none cache compress low) ' || 'lob (user_data.text_lob) store as securefile ' || ' (retention none cache compress low) ' || 'opaque type user_prop store as securefile lob ' || ' (retention none cache compress low)');
Note that this almost certainly requires an extra license.
AQ JMS uses aqapi.jar and Oracle’s JDBC driver. While most versions are likely to work there can be a tremendous difference between the ancient versions and the latest ones. Make sure that the drivers are current and preferably matched (i.e. for the same target database)!
The JDBC driver can be downloaded from Oracle, but as far as I know aqapi.jar ships with the database and with Oracle’s application servers. Note that the database ships with two versions, one that can be used standalone and one that is intended for WebLogic or Oracle Application Server! Use the correct one.
The way AQ JMS handles a receive timeout can be a real killer for applications that need to use many threads processing messages from the same queue. That equates to most Java EE applications.
When a client calls receive a single SELECT is issued in order to check if there are any messages on the queue. If there are no messages the client goes to sleep. If a message arrives the client wakes up, issues the same SELECT again and ideally consumes the message.
Unfortunately if there are 100 clients waiting for a message on a queue Oracle wakes them all when a message arrives. They will compete for it. One will succeed, the other 99 will fail. In this case the database needs to process 100 simultaneous SELECT statements and only one actually returns any data. The other 99 represent wasted resources. This can really kill the database, as it consumes large amounts of CPU.
A good test case for this is to spin up 100 consumer threads and one producer thread; then watch the load on the database and the top SQL. There should be an easy to find query similar to:
select /*+ INDEX(TAB AQ$_TEST_QUEUE_TAB_I) */ tab.rowid, tab.msgid, tab.corrid, tab.priority, tab.delay, tab.expiration ,tab.retry_count, tab.exception_qschema, tab.exception_queue, tab.chain_no, tab.local_order_no, tab.enq_time, tab.time_manager_info, tab.state, tab.enq_tid, tab.step_no, tab.sender_name, tab.sender_address, tab.sender_protocol, tab.dequeue_msgid, tab.user_prop, tab.user_data from "AQTEST"."TEST_QUEUE_TAB" tab where q_name = :1 and (state = :2 ) order by q_name, state, enq_time, step_no, chain_no, local_order_no for update skip locked
Check the statistics in Enterprise Manager or with SQL:
select rows_processed / executions rows_per_execution from v$sqlarea where sql_text like 'select /*+ INDEX(TAB AQ$_TEST_QUEUE_TAB_I) */ tab.rowid,%';
The number of rows returned per execution is very low, less than 0.01. The CPU load on the other hand is substantial.
The simple solution is to keep the number of threads down, but that is seldom possible. A more realistic alternative is to sleep on the client side. Use receiveNoWait instead of receive and if the method returns without a message, sleep for a short time in Java before the next attempt. At peak load all threads will get messages so no time is wasted sleeping and when there is less work available most threads will spend their time sleeping in the application server, not performing DOS attacks on the database. Ideally the number of listeners should ramp up and down based on traffic as well.
In JMS a destination can be looked up with JNDI, or it can be created dynamically. For example:
Destination testQueue = session.createQueue("test_queue");
This is convenient, but there is a price to pay. Every time the method is called it issues a SELECT in order to find the destination. An application that creates a dynamic destination when it posts a message will do it for every single message. That adds up.
See https://www.javacodegeeks.com/2013/04/jms-and-spring-small-things-sometimes-matter.html for a more in-depth discussion.
What to do? Fortunately Destination is thread-safe, so it can be cached. With Spring the JndiDestinationResolver can be used with cache=true and fallbackToDynamicDestination=true. It will fail to make a JNDI lookup, do the dynamic lookup and then cache the result. Without Spring, use an application cache, for example a ConcurrentHashMap.
An application server that processes messages from the same queue in parallel using multiple threads really can’t guarantee that the messages are processed in order. Even if they are delivered in the same order as they were sent, they will be processed in parallel and will complete in non-deterministic order. However, by default Oracle ensures that messages on a queue are delivered in order. That can be a bit expensive, in particular if the application is using selectors.
Set the system property oracle.jms.orderWithSelector=false in order to cut corners here. This is unlikely to give a large boost, but unless ordered delivery is required it is a quick hit.
Tuning AQ is mostly about tuning the database. It is also about tuning the Java side and finding where the bottlenecks are. In other words it is fun! Oracle 12c comes with support for sharded queues – I haven’t had the opportunity to test them in a real-world scenario yet, but I look forward to that as they offer horizontal scaling for a single queue with RAC. I’ll be back.