Overview
Apologies in advance if this is a subject you’ve been subjected to before by me; it’s a real bug-bear of mine and one that I spend quite a lot of time discussing with our clients.
Indexes use storage space and if a table is ‘over-indexed’ can use more space than the base table itself. We support databases where some of the tables with highest rate of transactional activity have 10, 20, even 30 indexes. There are cases where there are more indexes on the table than the table has columns. And indexes, whilst a performance benefit (hopefully) are also a performance overhead; every INSERT and DELETE of a base row will incur an operation on all the table indexes, and every UPDATE might provoke an index change too.
The problem is that indexes get created to try and address an issue, and then not ‘cleaned-up’ if they don’t deliver the anticipated benefits. This is compounded by rather more ‘liberal’ access to the Production environments than I’d like. And, of course, as the applications change and code is modified or becomes redundant, indexes that might once have been useful, are no longer.
So, the issue is to find indexes that are really not used? And then, once we find a way of doing that, we can find the biggest ones and recover the most space.
The LASTUSED column
Some of the objects for which meta-data is stored in the system catalogues include a LASTUSED column. This is, ostensibly, pretty handy; as the name suggests, it will indicate when the object was last used. And, as it is in SYSCAT.INDEXES, it should give you a pretty clear indication of whether an index is of any value.
But it’s got a few provisos. From the Knowledge Centre (I almost wrote ‘manual’: I really am showing my age now), the definitions for LASTUSED are
- Date when the index was last used by any DML statement to perform a scan or used to enforce referential integrity constraints.
- This column is not updated when the index is used on an HADR standby database, nor is it updated when rows are inserted into the table on which the index is defined.
- The default value is ‘0001-01-01’.
- This value is updated asynchronously not more than once within a 24-hour period and might not reflect usage within the last 15 minutes.
Let’s just check whether this represents any sort of limitation
- It should be updated by any Index Scan (IXSCAN in your explain / access plan). Not sure this is 100% true: see experiment with Foreign Key below
- Not updated on HADR Standby: that can be an issue. We have customers who operate extensive analytical processes using the Read-Only Standby (which is great; we highly recommend that if possible) but the problem is that some indexes are needed purely for the analytical processing. So, they’re never used on the Primary although that is where they have to be created, of course, so that they ’ripple through’ to the Standby. But there will never be any indication that they’re used, either in the update of LASTUSED, nor any of the other metrics we’re going to look at.
Also, it won’t be updated by INSERT statements. Well, OK, an INSERT doesn’t need an Index and all that will happen to the Index is that its details are updated to match the new base data. But this does have implications for uniqueness and Primary Key checking.
- Can’t argue with that. ‘0001-01-01’ is the value you’ll be searching for to make sure it is unused.
- This can be a bit of an issue. I’m not sure about the 15-minute proviso; the asynchronous update of the LASTUSED column is outside your control and, if that statement is valid, could indicate that it is 23 hours out of date, couldn’t it?
As an example, I have a fairly simple little table with 8 indexes on it. I rebuilt it and then ran a few queries and so on it to exercise those indexes.
I’ve highlighted one Index here; as you can see it still has a LASTUSED value indicating that it has never been used. The workloads I ran that are responsible for the Index Scan figure of 10 (more on this later) were executed over 3 hours ago. So, there can be quite a significant lag in this async activity.
MON_GET_INDEX
Arguably of more use is the output of the table function MON_GET_INDEX. Amongst the other myriad Index metrics that this function returns are the numbers of Index, Index Only and Index Jump scans (I haven’t included the Jump Scans in the examples shown, just to keep the output concise). This gets updated immediately; the only proviso is that the counters are in-memory and get zeroed with an instance restart. So, you need to treat the numbers with a bit of scepticism if there has been a recent restart, as they might report an Index as being unused when it is actually only used infrequently.
So here is an execution of the same Unused Index query, but only showing those indexes with no recorded numbers in Index Scans or Index Only Scans.
We now have 5 indexes that look like they might qualify for the chop. But LOGBOOK_PK is clearly the Primary Index; we surely don’t want to get rid of that? How can it not be used?
It’s there to enforce uniqueness and is based on the LOGBOOK_KEY value. Hence if I try and INSERT a row that specifies a key that is already used, we get this
You might suggest that this having been a failed operation, it shouldn’t update the Index usage metrics, but it must have accessed the index to establish that the unique was actually already being used. And, anyway, if you repeat the operation with a valid key
the Primary Key is still showing as unused:
It appears that the Index Scan metrics are operated in the same way as the LASTUSED column, i.e. that they are not updated when rows are inserted.
So, our query for Unused Indexes needs to exclude any Primary or Unique keys; they are probably fundamental to the operation of the table but will not update any metrics.
Clustering
Another ‘background’ function, if you like, of some indexes is clustering. It might not take any direct part in the IUD operations on the table, but it will allow the data to be placed in a specified order if it is explicitly, or implicitly, used in a REORG.
However, that operation has also not updated any metrics. What it has done, slightly bizarrely, is to set the Primary Key metrics to nulls.
That shouldn’t affect your search for Unused Indexes but strikes me as a bit of an anomaly.
Therefore the Unused Index query needs to exclude any clustering index too.
Foreign Keys
Here’s another bit of functionality where Indexes can be extensively used. Here I attempt to insert a row into the LOGBOOK table where the value for TO_AIRFIELD doesn’t exist in the parent table (AIRFIELD):
Predictably this INSERT fails because the constraint defined within CREATE TABLE is
And here is a display of the Foreign Keys in LOGBOOK. You can see that the index that would have been used to check this Foreign Key is called LOGBOOK_TO_AIRFIELD_IX5
But even after this Foreign Key evaluation that index is still showing up as nominally unused:
I’d therefore recommend excluding any index involved in a Foreign Key constraint from your search too.
Once we’ve incorporated all the exclusions we’ve just looked at, this is what we just have this left from the Unused Index query
Conclusion
So, this exercise started with 8 indexes and we’ve now whittled it down to just one that might qualify. To summarize:
- LASTUSED might not be entirely trustworthy. It’s an indication but I wouldn’t rely on it
- Primary Keys might show up as unused but they’re in use ‘under the covers’ and should be excluded from the search
- Clustering indexes can be fundamental to how the data is stored and retrieved, but REORG won’t update the usage metrics
- Similarly, any index involved in servicing a Foreign Key should be excluded as their use won’t update the metrics either
- Be aware of how long the instance has been active: if it’s not been enough time for all workloads to be exercised, you might identify something as unused that is infrequently used but is still fundamental to the application
- Read-only Standby; if you have an HADR set-up that uses this feature, you might find that Indexes appear unused in both Primary and Standby, but the ROS might need them
- And, finally, ‘cherry-pick’ the largest indexes. I tend to run my Unused Index query sorted by size, in descending order, and just fetch the first 10 or so. That will give you the most bang-for-your-buck if you do get to drop them
Please drop me a line if you have any comments, positive or negative, and let me know if you’d like a copy of the query I use to find Unused Indexes. Always happy to share.
E-mail: Mark.Gillis@triton.co.uk