Archive

Archive for the ‘Database’ Category

Built-in timestamp for row changes in Oracle

I tend to include create and last change timestamps in most tables when I build something, but how can you see when a record was changed if they are missing or cannot be trusted (maintained by application code rather than triggers and someone has updated the database directly)? It turns out that Oracle has a nifty feature for this, the ora_rowscn pseudo-column. It reports the system change number for a row or block and that can be converted into a timestamp. For example:


create table test_row_scn (
  t_id number,
  constraint pk_test_row_scn primary key (t_id)
) rowdependencies;
/
insert into test_row_scn values (1);
commit;

select scn_to_timestamp(ora_rowscn) from test_row_scn where t_id = 1;

The rowdependencies option makes the table track changes for each row. That costs 6 bytes per row. Without it the query still works, but it returns the system change number for the block rather than the individual row and that may be a bit misleading.

Advertisements
Categories: Oracle

Read the Oracle alert log using SQL

In 11g (yes, that was ages ago) Oracle added x$dbgalertext, making it possible to query the alert log using SQL. For people like me who work close to the database, but often without true DBA access and seldom with access to the physical servers, that is a godsend. Being able to see the alerts without asking really helps. For example, to get all the reported ORA errors for the last hour:


select originating_timestamp, message_text
  from x$dbgalertext
  where originating_timestamp > sysdate - 1/24
  and message_text like '%ORA-%'
  order by originating_timestamp desc;

Talking a DBA into granting select is much easier than getting access to the server.

Categories: Oracle

Oracle AQ JMS Performance

Introduction

AQ is Oracle’s message queue implementation. Well, one of them. It supports JMS 1.1 and is included in all versions of the Oracle database, even the free version. It has been battle-tested for twenty years and last time I checked (don’t take my word for it) it required no extra license. What’s not to like?

AQ uses normal database constructs such as tables and SQL commands. That has several advantages, not least that the normal database performance tuning tools and methods can be used. Plus it makes it very easy to instrument and manipulate queues programmatically.

For the most part, tuning AQ is nothing special. However, there are a few dark corners. Stay tuned.

Basic configuration

First of all we need a user with AQ privileges and quota on a tablespace:


create user aqtest identified by whatever
  quota unlimited on users default tablespace users;
grant aq_administrator_role to aqtest;
grant create session to aqtest;

With a user in place we can create a queue table and a queue (as aqtest):


begin
  dbms_aqadm.create_queue_table(
    queue_table        => 'test_qtab',
    queue_payload_type => 'sys.aq$_jms_message',
    storage_clause     =>
    'lob (user_data.bytes_lob) store as securefile ' ||
    '(retention none cache) ' ||
    'lob (user_data.text_lob) store as securefile  ' ||
    '(retention none cache) '  ||
    'opaque type user_prop store as securefile ' ||
    'lob (retention none cache)');
  dbms_aqadm.create_queue(
    queue_name             => 'test_queue',
    queue_table            => 'test_qtab',
    max_retries            => 1,
    retry_delay            => 30,
    retention_time         => 0);
  dbms_aqadm.start_queue (queue_name => 'test_queue');
end;
/

Note the storage clause. It uses securefile (after all we’re in 2016 now), retention none as a message is read exactly once most of the time and cache as most messages are read and deleted almost immediately when posted. Keeping the data in memory makes sense.

Block sizes

Oracle can use many different block sizes. The default block size is typically 8k. Write-heavy applications can often benefit from smaller block sizes, as that reduces contention. The smallest reasonable block size for AQ is 4k:


alter system set db_4k_cache_size=100M scope=both;
create tablespace users4k datafile '/oradata/orcl/users4k.dbf'
  size 100M autoextend on next 5M extent management local
  segment space management auto;
alter user aqtest quota unlimited on users4k;

Recreate the queue:


begin
  dbms_aqadm.stop_queue (queue_name => 'test_queue');
  dbms_aqadm.drop_queue (queue_name => 'test_queue');
  dbms_aqadm.drop_queue_table (queue_table => 'test_qtab');

  dbms_aqadm.create_queue_table(
    queue_table        => 'test_qtab',
    queue_payload_type => 'sys.aq$_jms_message',
    storage_clause     => 'tablespace users4k ' ||
    'lob (user_data.bytes_lob) store as securefile ' ||
    '(retention none cache) ' ||
    'lob (user_data.text_lob) store as securefile  ' ||
    '(retention none cache) '  ||
    'opaque type user_prop store as securefile ' ||
    'lob (retention none cache)');
  dbms_aqadm.create_queue(
    queue_name             => 'test_queue',
    queue_table            => 'test_qtab',
    max_retries            => 1,
    retry_delay            => 30,
    retention_time         => 0);
  dbms_aqadm.start_queue (queue_name => 'test_queue');
end;
/

AQ can store the message payload inline (i.e. in the same row as the metadata) or separately. However, it will only be stored inline if the size of the payload is less than about 4000 bytes. All message queue implementations work best with small messages, but here a small difference in size can theoretically have a significant impact. In practice I haven’t seen any major effects though, up to 3% in some benchmarks and none in others. As usual in engineering it all depends – where is the bottleneck?

If the application uses larger messages, the tablespace for the payload can use a block size optimized for the average message size, making it more likely that the whole message fits into a single block.

Block compression

It is possible to compress the payload and the user properties and by all means the metadata. That costs CPU, but on the other hand it reduces the storage requirements and perhaps it makes the difference between using one or two blocks for a message?

It is very easy to add. Simply tack on a compress clause when creating the queue table:


dbms_aqadm.create_queue_table(
  queue_table        => 'test_qtab',
  queue_payload_type => 'sys.aq$_jms_message',
  storage_clause     => 'tablespace users4k '    ||
  'lob (user_data.bytes_lob) store as securefile '  ||
  '  (retention none cache compress low) '          ||
  'lob (user_data.text_lob) store as securefile '   ||
  '  (retention none cache compress low) '          ||
  'opaque type user_prop store as securefile lob '  ||
  '  (retention none cache compress low)');

Note that this almost certainly requires an extra license.

Driver versions

AQ JMS uses aqapi.jar and Oracle’s JDBC driver. While most versions are likely to work there can be a tremendous difference between the ancient versions and the latest ones. Make sure that the drivers are current and preferably matched (i.e. for the same target database)!

The JDBC driver can be downloaded from Oracle, but as far as I know aqapi.jar ships with the database and with Oracle’s application servers. Note that the database ships with two versions, one that can be used standalone and one that is intended for WebLogic or Oracle Application Server! Use the correct one.

Receive timeout

The way AQ JMS handles a receive timeout can be a real killer for applications that need to use many threads processing messages from the same queue. That equates to most Java EE applications.

When a client calls receive a single SELECT is issued in order to check if there are any messages on the queue. If there are no messages the client goes to sleep. If a message arrives the client wakes up, issues the same SELECT again and ideally consumes the message.

Unfortunately if there are 100 clients waiting for a message on a queue Oracle wakes them all when a message arrives. They will compete for it. One will succeed, the other 99 will fail. In this case the database needs to process 100 simultaneous SELECT statements and only one actually returns any data. The other 99 represent wasted resources. This can really kill the database, as it consumes large amounts of CPU.

A good test case for this is to spin up 100 consumer threads and one producer thread; then watch the load on the database and the top SQL. There should be an easy to find query similar to:

select  /*+ INDEX(TAB AQ$_TEST_QUEUE_TAB_I) */   tab.rowid, tab.msgid, tab.corrid, tab.priority, tab.delay,   tab.expiration ,tab.retry_count, tab.exception_qschema,   tab.exception_queue, tab.chain_no, tab.local_order_no, tab.enq_time,   tab.time_manager_info, tab.state, tab.enq_tid, tab.step_no,   tab.sender_name, tab.sender_address, tab.sender_protocol,   tab.dequeue_msgid, tab.user_prop, tab.user_data   from "AQTEST"."TEST_QUEUE_TAB" tab  where q_name = :1 and (state = :2  )  order by q_name, state, enq_time, step_no, chain_no, local_order_no for update skip locked

Check the statistics in Enterprise Manager or with SQL:


select rows_processed / executions rows_per_execution
from v$sqlarea where sql_text like
'select  /*+ INDEX(TAB AQ$_TEST_QUEUE_TAB_I) */   tab.rowid,%';

The number of rows returned per execution is very low, less than 0.01. The CPU load on the other hand is substantial.

The simple solution is to keep the number of threads down, but that is seldom possible. A more realistic alternative is to sleep on the client side. Use receiveNoWait instead of receive and if the method returns without a message, sleep for a short time in Java before the next attempt. At peak load all threads will get messages so no time is wasted sleeping and when there is less work available most threads will spend their time sleeping in the application server, not performing DOS attacks on the database. Ideally the number of listeners should ramp up and down based on traffic as well.

Dynamic destinations

In JMS a destination can be looked up with JNDI, or it can be created dynamically. For example:


Destination testQueue = session.createQueue("test_queue");

This is convenient, but there is a price to pay. Every time the method is called it issues a SELECT in order to find the destination. An application that creates a dynamic destination when it posts a message will do it for every single message. That adds up.

See https://www.javacodegeeks.com/2013/04/jms-and-spring-small-things-sometimes-matter.html for a more in-depth discussion.

What to do? Fortunately Destination is thread-safe, so it can be cached. With Spring the JndiDestinationResolver can be used with cache=true and fallbackToDynamicDestination=true. It will fail to make a JNDI lookup, do the dynamic lookup and then cache the result. Without Spring, use an application cache, for example a ConcurrentHashMap.

Ordered delivery

An application server that processes messages from the same queue in parallel using multiple threads really can’t guarantee that the messages are processed in order. Even if they are delivered in the same order as they were sent, they will be processed in parallel and will complete in non-deterministic order. However, by default Oracle ensures that messages on a queue are delivered in order. That can be a bit expensive, in particular if the application is using selectors.

Set the system property oracle.jms.orderWithSelector=false in order to cut corners here. This is unlikely to give a large boost, but unless ordered delivery is required it is a quick hit.

Final words

Tuning AQ is mostly about tuning the database. It is also about tuning the Java side and finding where the bottlenecks are. In other words it is fun! Oracle 12c comes with support for sharded queues – I haven’t had the opportunity to test them in a real-world scenario yet, but I look forward to that as they offer horizontal scaling for a single queue with RAC. I’ll be back.

Categories: Java, Oracle, Performance

Oracle JDBC memory settings

Many applications use ancient versions of Oracle’s JDBC drivers, as they are downloaded manually and seldom upgraded. That is a pity as the newer drivers offer much better performance. However, some of the performance gains are bought with increased memory consumption and that can be a problem.

We ran into an issue with the connection pool in JBoss. It uses connections in round-robin, so as long as there is some load the pool tends to stay at peak size. Each connection normally keeps a buffer cache and it may cache other things as well. The size of the connections would grow over time, eventually consuming most of the heap.

A JBoss-specific workaround is to use the cli and close the idle connections (off-peak):


/subsystem=datasources/xa-data-source=DS/:flush-idle-connection-in-pool

DS is the name of the data source. A better approach may be to set the system property oracle.jdbc.useThreadLocalBufferCache to true. That moves the buffer cache from the connections to thread locals. Depending on how the application behaves that may be better as the memory is reclaimed when a thread dies. Plus a given thread may be more likely to issue the same SQL multiple times and can thus benefit more from the cache.

It may also be useful to limit the maximum buffer size with oracle.jdbc.maxCachedBufferSize. The implicit statement cache is normally off, but if it is enabled the size can be controlled using oracle.jdbc.implicitStatementCacheSize and oracle.jdbc.freeMemoryOnEnterImplicitCache can force buffers allocated for a statement to be released when it is put into the cache. In most cases it is best to leave those options alone.

See Oracle JDBC Memory Management for the whole story!

Categories: Java, Oracle, Performance

Timeouts for Oracle XA datasources in JBoss EAP 6

The documentation for configuring datasources in JBoss EAP 6 is somewhat lacking when it comes to timeouts. Normally this is fine, but what if there are network issues? With the wrong timeout settings the application can hang until it is killed and restarted. With proper timeouts it can handle an outage and recover.

Here is an example:


<xa-datasource jndi-name="java:/AppDS" pool-name="AppDS">
  <xa-datasource-property name="URL">
    ${appdb.url}
  </xa-datasource-property>
  <xa-datasource-property name="nativeXA">true</xa-datasource-property>
  <xa-datasource-property name="ConnectionProperties">
    oracle.jdbc.ReadTimeout=330000
  </xa-datasource-property>
  <xa-datasource-class>
    oracle.jdbc.xa.client.OracleXADataSource
  </xa-datasource-class>
  <driver>oracle</driver>
  <security>
    <user-name>${appdb.user}</user-name>
    <password>${appdb.password}</password>
  </security>
  <xa-pool>
    <min-pool-size>${appdb.min.pool.size}</min-pool-size>
    <max-pool-size>${appdb.max.pool.size}</max-pool-size>
    <prefill>false</prefill>
    <use-strict-min>false</use-strict-min>
    <flush-strategy>FailingConnectionOnly</flush-strategy>
    <is-same-rm-override>false</is-same-rm-override>
    <no-tx-separate-pools/>
    <pad-xid>true</pad-xid>
    <wrap-xa-resource>true</wrap-xa-resource>
  </xa-pool>
  <validation>
    <valid-connection-checker class-name=
      "org.jboss.jca.adapters.jdbc.extensions.oracle.OracleValidConnectionChecker"/>
    <stale-connection-checker class-name=
      "org.jboss.jca.adapters.jdbc.extensions.oracle.OracleStaleConnectionChecker"/>
    <exception-sorter class-name=
      "org.jboss.jca.adapters.jdbc.extensions.oracle.OracleExceptionSorter"/>
  </validation>
  <timeout>
    <blocking-timeout-millis>60000</blocking-timeout-millis>
    <xa-resource-timeout>310</xa-resource-timeout>
    <query-timeout>300</query-timeout>
    <set-tx-query-timeout/>
  </timeout>
  <statement>
    <track-statements>false</track-statements>
  </statement>
  <recovery no-recovery="false">
    <recover-credential>
      <user-name>${appdb.user}</user-name>
      <password>${appdb.password}</password>
    </recover-credential>
  </recovery>
</xa-datasource>

The oracle.jdbc.ReadTimeout is essential. It sets the network timeout on the socket, making reads time out eventually in the face of a broken connection. The default TCP timeout for established connections is very long, so it is important to set a value. It should be larger than the transaction timeouts. The query timeout and tx-query-timeout both limit the time that individual statements can take. The query timeout is used when there is no transaction, otherwise the remaining time until transaction timeout is used.

Note that read timeout is in milliseconds, query timeout is in seconds.

Categories: Java, Networking, Oracle

Materialize hint required for complex SELECT in 12c

During migration from Oracle 11gR2 to 12c we encountered a weird error. We had a very complex select, that used WITH clauses, some with GROUP BY. It works in 11g, but with 12c it fails with:


ORA-00979: not a GROUP BY expression

Apparently something has changed. We have not yet found the root cause of the issue, but if we add /*+ materialize */ to the selects in the WITH clauses it works!

If the query is simplified a bit the hint is not required, even though it still has the same GROUP BY clauses. Smells like a bug.

EDIT: reported to Oracle as Bug 20887136 – ORA-00979 AFTER UPGRADE TO 12.1.0.2.

Categories: Oracle

Beware of OPTIMIZER_CAPTURE_SQL_PLAN_BASELINES

What can make a simple query that normally runs in split seconds using the primary key turn into a 19 minute full scan? Well, many things, but in this case non of the usual suspects. The primary key was usable and rebuilding it made no difference. We updated the already recent statistics to no avail. There was no SQL Profile defined and the query did not use hints. It turned out that at some point the parameter OPTIMIZER_CAPTURE_SQL_PLAN_BASELINES had been set and as a result Oracle had happily collected baselines in order to prevent performance regressions, as the sales message goes. Deleting the baseline fixed the problem and the optimizer immediately switched to an index range scan on the primary key.

SQL baselines may serve a purpose, but if you don’t trust the optimizer to do the right thing, why would you trust it to collect baselines automatically in order to guide itself in the future? Beware of OPTIMIZER_CAPTURE_SQL_PLAN_BASELINES and either create the baselines under controlled circumstances or let the optimizer do its job and correct it with profiles where necessary!

Categories: Oracle