This is the seventh article in the series on DB2 Locking for Application Developers. So far we’ve covered a lot of topics, including the ACID database transaction properties, lock size, lock mode and lock duration, transaction isolation levels and data anomalies, all with a view to understanding how data integrity and application performance requirements can only be achieved by the application being coded according to the isolation level and data currency options in use.
In this article we’ll start off by looking at which isolation levels are susceptible to which data anomalies before moving on to discuss the related topic of data currency.
The following table shows which transaction isolation levels are susceptible, in theory, to which data anomalies. Note that the table specifies ‘might occur’ and not ‘does occur’ – data anomalies are not guaranteed to occur for these isolation levels, but can occur in the right (or wrong) combination of circumstances.
Note that by design, the RS, CS and UR isolation levels accommodate the possibility of encountering one or more of the anomalies. This means that when you code your application, you must be aware of the potential exposure to the associated data anomalies. If you never need to access a row more than once, then you won’t need to use isolation RR or RS. Be wary of that word ‘access’: it includes update access as well as read access. More of that in a later article.
Also note that for CS and UR, the table indicates that the lost update does not occur. This is not strictly true: with DB2 for z/OS, if you use the wrong coding techniques, then the lost update anomaly could occur. For that reason, I’m going to discuss some data currency options and, first, some data currency issues associated with cursors.
As we saw in article five, the DB2 locking mechanisms, especially lock duration, are used to enforce each level of transaction isolation. However, CS introduces a series of data currency considerations around when a lock is taken and if a lock is taken, related to cursor type and the CURRENTDATA BIND option. It’s vital to emphasise that the following discussion concerns isolation CS only – none of the issues can be encountered with isolation RS or RR1.
The cursor types relevant to this discussion do not include the set of cursors known as scrollable cursors, but rather read-only cursors, ambiguous cursors, and for-update cursors, and I’ll deal with for-update cursors first.
A for-update cursor is essentially a cursor that can be updated via a positioned update statement. A positioned update statement is of the type UPDATE … WHERE CURRENT OF <cursor_name> (or DELETE … WHERE CURRENT OF). Typically, a for-update cursor declared using static SQL requires a FOR UPDATE clause when it is declared2. The declaration of a for-update cursor must not include an ORDER BY clause nor any clause that implies sorting or aggregation. With dynamic SQL, to use a positioned update, the cursor declaration must always include a FOR UPDATE clause. Locking with update cursors running with CS isolation is always the same: when a row is read, a U lock is taken; if the row is actually updated, an X lock is taken and is held until commit time; if the row is not updated then the U lock is released when the next row is read or the cursor is closed, whichever happens first.
Next, I’ll deal with read-only cursors, but as they are the most important cursors in terms of the data currency and integrity discussion, we’ll return to them again later on in this article. If the result set is read-only – that is, it’s not eligible for update – then the cursor is read-only. For example: if the query contains any of ORDER BY, GROUP BY, DISTINCT, UNION, INTERSECT or EXCEPT; if the FROM clause represents more than one table or view, or a nested table expression; the query uses FOR FETCH ONLY; or the result set is materialized in a work file.
We’ll come back to read-only cursors once I’ve dealt with the thorny issue of ambiguous cursors. Simply put, a cursor is ambiguous if DB2 can’t tell whether it’s used for update or for read-only purposes. It’s actually a little bit more complicated than simply being cursors without FOR UPDATE or FOR FETCH ONLY (or implicitly read-only cursors). I won’t go into all the ins and outs of ambiguous cursors, other than to highlight some of the issues. Firstly, improper use of an ambiguous cursor can result in SQLCODE -510 being issued, something your application will have to deal with. Secondly, whereas in some circumstances use of an ambiguous cursor can result in DB2 lock avoidance techniques being used (we’ll cover lock avoidance in a later article), in other circumstances it can inhibit the use of DB2 lock avoidance techniques. Because of the difficult nature of ambiguous cursors, you should always code FOR FETCH ONLY or FOR UPDATE on a cursor, to avoid ambiguous cursors and to document the use of the cursor in the program code. Theoretically, you could rely on ORDER BY to implicitly define a cursor as read only, but if you were to remove the ORDER BY when maintaining the application program, then the cursor might move from read-only to ambiguous.
Having outlined the difference between for-update, ambiguous and read-only cursors, it’s time to return to the topic of when locks are acquired and released with read-only cursors. If the result set is not materialised in a work file or there is no in-memory sort, then the S locks on the pages or rows are acquired to fetch time. They are released when the application fetches a row on another page (LOCKSIZE PAGE) or another row (LOCKSIZE ROW). So far, so good. On the other hand, if the rows are sorted or the result set is materialised in a work file, then any necessary locks are acquired at open cursor time, and will have all been released before the first or any subsequent fetch. This means that when a transaction with this sort of access path gets to read each row, that row could be in the process of being updated by another transaction, or could have been updated by another transaction which has committed the update, either by issuing the commit explicitly, or implicitly at program termination.
The time at which locks are taken also affects, of course, when any lock waits are incurred, either at cursor open time or row fetch time. And following from that, if there is lock contention within the application, the time when lock waits are incurred determines when any timeouts are experienced and, depending on the mix of SQL statements used, can have an effect on the likelihood of deadlocks.
To summarise, the exact currency of each row accessed via a read-only cursor is dependent on the access path chosen by DB2. The row seen by the application could be the row as of open cursor, or could be the row as of fetch time. The application cannot tell if it is seeing the latest version of the row, unless it is access-path dependent and knows with certainty that an S lock was acquired at fetch time. I’ll cover the dangers of access path-dependent cursors in a later article, and I’ll say more about the data integrity implications of read-only cursors in a different article.
For now, I’ll move on to discuss the ‘if’ of locking, in conjunction with some more data currency issues. As well as the options we’ve already discussed, there is another BIND option, CURRENTDATA. The default setting for this option has changed over DB2 releases more than once, but since at least DB2 Version 8 it has defaulted to NO.
Let’s start off with CURRENDATA(YES), which means that the data cannot change while the cursor is positioned on it. If a cursor declared by Transaction A is positioned on a base table row or index (for index-only access), then the data returned by the cursor is guaranteed to be current. Be aware that if the cursor moves onto the next row, then the lock on the previous row is released and a lock on the next row is obtained. This means that another transaction could acquire an exclusive lock on the row and update it, even while Transaction A continues to read other rows via the cursor.
However, if the cursor is positioned on data in a work file (including rows sorted in memory), the data returned by the cursor only matches the contents of the work file, not the row in the base table. The row in the base table could remain unchanged, or could have been changed by another transaction. When the application gets to read rows from the result set, there will be no locks on the base table rows – guaranteed. CURRENDATA also disables the lock avoidance feature, which is described below.
The default BIND option of CURRENTDATA(NO) means that the data can actually change while the cursor is positioned on it. Importantly for concurrency and for reduced CPU consumption, CURRENDATA(NO) enables lock avoidance. That is, DB2 can avoid locking the data if it can determine that the row or page contains committed data3. If taking a lock can be avoided, then the data is returned to the application without a lock being taken, and the data can be changed even while the cursor is positioned on the row. Even if you’re not planning to update the row, this might be an issue, but there are techniques described in a later article to allow you to use CURRENTDATA(NO) combined with searched updates safely. If DB2 is unable to determine that the row or page contains committed data, it has to acquire an S lock to make sure that no other transaction has updated the row without committing the update.
Experience shows that effective lock avoidance brings great concurrency benefits, especially in a data sharing environment. It also plays a role in reducing CPU consumption and for update-intensive batch can help control MLC costs. It’s important to emphasise that for performance reasons, CURRENTDATA(NO) is strongly recommended in a data sharing environment, provided the application is coded correctly and does not depend on share locks always being obtained.
There is an important point to make here about singleton select statements – non-cursor SELECT statements that return a single row. There is a clue to the locking implications for singleton selects in the name of the isolation level: cursor stability. With a singleton select, regardless of the CURRENTDATA setting, by the time the row is returned to the application, there will not be a lock on the page or row. If a lock is required, it is obtained when DB2 determines the row qualifies, and then released almost immediately, and before the row is returned to the application. This is because the transaction is not positioned on the row, as position is only possible when the row is accessed via a cursor.
Having dealt with data currency, in the next article I’ll concentrate on various data currency and integrity issues with cursors, including combining read-only cursors with searched updates, and the problem with access-path dependent cursors. I’ll return to data currency later in this series, to cover some additional considerations.
Footnotes
1 There is one data issue for isolation RR and RS concerning cursors and data currency which, again, is discussed in a later article.
2 For a cursor declared in a static SQL statement, the “FOR UPDATE” clause is not required if the STDSQL(YES) or NOFOR options are in effect at the program preparation stage. The default settings for the STQSQL system parameter is NO, but this can be overridden at program preparation time.
3 I won’t go into detail in this article about the lock avoidance mechanism here, but the topic is covered in some detail in the Redbook “DB2 9 for z/OS: Resource Serialization and Concurrency Control”, SG24-4725. Although this book is over 10 years old, it’s still a useful source. Nevertheless, I’ll still discuss lock avoidance in rather less detail than the Redbook in a later article.
Db2 for z/OS-locking for Application Developers eBook
Gareth's eBook focuses on the role of the application programmer in collaborating with the locking mechanisms provided by Db2 to ensure data integrity.