In this ninth article in the series on DB2 Locking for Application Developers, which provides information about coding your application not only for data integrity, which is the principle focus of this series of articles, but also for performance and concurrency, taking into account the transaction isolation levels in effect at run time. Background information about DB2 for z/OS locking semantics and mechanisms, transaction isolation levels, data anomalies and more are discussed in previous articles. This article concentrates on coding techniques, mostly for combining read-only cursors with searched update statements, that will provide protection against data anomalies, most specifically the lost update anomaly.
Let’s start with a restatement of why this is important. In DB2 for z/OS, the recommended programming technique for reading rows via a cursor and then updating some or all of those rows is to specify the FOR UPDATE clause on the cursor declaration and use positioned updates – UPDATE WHERE CURRENT OF. This has the advantage that, when you read a row, DB2 takes a U lock on the row or page. This allows concurrent readers with an S lock, but any concurrent transactions requesting U or X locks will have to wait until the U lock is released. When the transaction issues UPDATE WHERE CURRENT OF <cursor-name>, DB2 attempts to promote the U lock to an X lock. This ensures that no other transaction can have updated the row between the SELECT, which protects the row with a U lock, and the UPDATE.
However, it’s not always possible to use FOR UPDATE cursors. A lot of Java tooling, for example, generates read-only cursors with searched updates (UPDATE … WHERE cola = :hostvar). For all applications, regardless of the tooling, using FOR UPDATE cursors is either not possible or safe if ordering of the result set is required. In cases like this, even if an implicitly-ordering index is available, because it’s unsafe for the application to rely on that index in case the access path changes, it’s necessary for ORDER BY to be coded on the SELECT statement referenced by the cursor.
For this kind of application, and for those where there is a requirement to minimise the CPU cost of locking, a technique called optimistic locking can be used. This technique typically involves the use of ISOLATION(CS) and CURRENTDATA(NO). It combines read-only cursors with searched updates, even where a FOR UPDATE cursor is possible. It’s based on the optimistic premise that a lock is not required for the FETCH, but is only needed if the transaction tries to update the row, saving the CPU cost of acquiring the lock. The smaller the ratio of updated rows, the more favourable this technique is, as each update means the row has to be retrieved twice: once for the FETCH and once for the UPDATE.
Bear in mind that even with optimistic locking running with ISOLATION(CS) CURRENTDATA(NO), read locks will still be required if another concurrent uncommitted transaction has updated the row being read. Provided that the result set is not materialised at open cursor time, then an optimistic locking transaction trying to read the row via a read-only cursor will enter a lock wait at fetch time, when DB2 will delay obtaining an S lock on its behalf until the concurrent update transaction has committed. Otherwise, any lock waits occur at open cursor time.
Some applications adopt a different strategy by declaring the cursor using the WITH UR clause. The technique for UPDATE (discussed below) remains the same, but the difference between the two approaches is that the UR reader might see uncommitted updates to the row after reading the uncommitted update without acquiring a lock, and only trying to acquire an X lock if and when it issues an UPDATE.
Which of these two methods you choose for reading the data will depend on a number of factors, including the level of contention within the application, the percentage of rows read that are updated, the toleration for reading transactionally inconsistent data, the importance of minimising CPU consumption, and many other considerations.
However, for isolation CS with CURRENTDATA NO, if DB2 can avoid taking a lock for Transaction A’s read-only cursor, it will – this allows other concurrent transactions to subsequently take U or X locks, potentially even while Transaction A’s cursor is still positioned on the row. There will definitely be no lock on the row if the result set was materialised e.g. in a workfile. This opens up the possibility of the lost update anomaly, if Transaction A updates the row without verifying that the row hasn’t changed since it read it. Very similar considerations apply not only with isolation UR but also with singleton SELECTs (SELECT without a cursor). In both cases, the transaction reading the row will definitely not have a lock on the row when the application reads it.
The techniques for guaranteeing data integrity when using read-only cursors with searched updates are really quite simple in principle but the performance profile can vary considerably. Therefore, you should factor your performance requirements into the criteria for selecting one of the techniques. Any application using these techniques must handle the “row not found” condition for updates (SQL Code 100, SQLSTATE 01568), as we’ll see.
The first technique is to select all columns from the table, and when updating a row specify all retrieved column values as predicates on the searched UPDATE statement. If any value has changed since the row was read, the “row not found” condition will be returned when the transaction attempts to perform the update. At least one software vendor application uses this method. The technique is effective but could have a significant CPU cost, dependent on the table design. There are several factors to take into account when considering whether or not to use this technique, including:
- Each column returned to the application and each predicate evaluated on the UPDATE increases the CPU cost. The more columns in the table, the less attractive this solution from a performance/CPU cost perspective. This technique suits tables with a smaller number of columns better than tables with a larger number.
- A high ratio of rows updated / rows read can make this technique more costly in CPU terms if the number of searched UPDATE predicates is large
- All programs using this technique must be coded accordingly. If any updateable columns are added to the table, the programs must be modified.
One temptation might be to select all columns that are updated and to include only those columns in the predicate list for the update along with the predicates needed to uniquely identify the row. However, that is fraught with danger – what if the list of updated columns changes? What about deletes? If you’re planning to use this technique, it’s better to err on the side of caution than risk compromising data integrity and include all columns in the list of predicates.
The second technique is to add one or more columns that indicate whether the row has been updated. A typical example is to add a timestamp column which is updated using the current timestamp by every UPDATE or INSERT statement. Other techniques are possible, but in this discussion I’m assuming that the added column is a timestamp column. All searched UPDATE statements must include the timestamp column in the list of predicates, which means that if any value has changed since the row was read, the “row not found” condition will be returned when the transaction attempts to perform the update. Factors to consider for this technique include:
- All programs updating rows in the affected tables must update the timestamp column regardless of whether they use searched updates or positioned updates.
- All programs reading rows via a read-only cursor with the possible intent of updating them must include the timestamp column in the list of columns returned and must include it in the predicate list.
The third technique is to use a feature introduced in DB2 9 for z/OS, the ROW CHANGE TIMESTAMP column. DB2 automatically maintains the contents of ROW CHANGE TIMESTAMP columns so there is no need for the application to update them. To be more precise, DB2 generates a value for the column for each row as it is inserted, and whenever it is updated. Factors to consider for this technique include:
- Programs don’t need to maintain the ROW CHANGE TIMESTAMP column.
- However, all programs reading rows via a read-only cursor with the possible intent of updating them must retrieve the ROWCHANGE TIMESTAMP COLUMN value as well as the list of columns they are interested in, and must include it in the predicate list when performing the UPDATE. One way to do this, using the ROW CHANGE TIMESTAMP or ROW CHANGE TOKEN function instead of retrieving the ROW CHANGE TIMESTAMP column itself, is discussed below.
The syntax for dealing with row change timestamp columns looks a little strange at first, but all you have to remember that it is, in fact, just an expression in the column list. I’ve simplified the statements below, starting off with the SELECT, where the row change timestamp value is selected into host variable :rcthvar
In this example, only one column, Col2, is shown being updated
SELECT Col1, Col2, ... , ROW CHANGE TIMESTAMP FOR Table1 FROM Table1 WHERE Col1 = :hvar1 ... AND ... ORDER BY ...
And the UPDATE statement is very similar:
UPDATE Table1 SET Col2 = :hvar2, ... , WHERE Col1 = :hvar1 AND ... AND ROW CHANGE TIMESTAMP FOR Table1 = :rcthvar
You might be tempted to improve the performance of the UPDATE by using the ROWID or RID functions. For example, having retrieved the ROWID for a row into a host variable in the SELECT, you can use direct row access for the UPDATE:
UPDATE Table1 SET Col2 = :hvar2, ... , WHERE ROWID_Col = ROWID(:rowidhvar1) AND ROW CHANGE TIMESTAMP FOR Table1 = :rcthvar
However, this can be a risky strategy as both the RID and ROWID for a given row can change if a REORG SHRLEVEL CHANGE utility runs concurrently with the transaction.
The most important point with optimistic locking strategies is that they all require a programming standard to be defined, documented and enforced. That standard must include one – and only one – of the techniques for combining updates with optimistic locking and a strategy for handling ‘row not found’ on the UPDATE. One badly written application program that doesn’t follow the standard can undo all the good work of the applications that do comply to the standard, and to make matters worse these kinds of logical data inconsistencies can be very difficult to detect and correct. In some cases it can take weeks or months for the data inconsistencies to become apparent.
Having unravelled the knotty problems of optimistic locking and combining read-only cursors with searched updates, I conclude the series in the next and final article with some general advice about locking and some miscellaneous comments about the data integrity implications of read-only cursors, together with some RR/RS considerations.
Read the final instalment in my series of blogs here.
Db2 for z/OS-locking for Application Developers eBook
Gareth's eBook focuses on the role of the application programmer in collaborating with the locking mechanisms provided by Db2 to ensure data integrity.