Linux – Large amount of cursors causing slow response and massive CPU usage – cursor: mutex s

linuxoracleoracle-11g

We have a problem with our Oracle server. It was upgraded to 11g a couple of months ago and is running a third-party system. The system has been running for years (and has had several other issues), but this is new: Once or twice a day, CPU usage increases, and Cursor Mutex S is very visible, having had 30 million wait events since server startup (which was recently).

It seems that all of a sudden, a number of simple INSERT has started giving problems. We have checked that statistics, indexes, etc. are as they should be – up-to-date, properly sized, space on disk, etc… No problems there.

We have isolated a single SQL-expression as being the main culprit. A number of similar statements give similar problems, but I'll focus on the one. The 'middleware' software executing this particular insert is running simultaneously on ~70 servers.

When we started noticing the problems, this statement had 10,000+ entries in v@sql_shared_cursor. We have set up a cron job purging the cursors every five minutes, but this doesn't solve anything and only reduces the problem slightly.

Looking again at v@sql_shared_cursor, it turns out that the reason for the many cursors being created is INST_DRTLD_MISMATCH = Y. This is odd, since the middleware (over which we have little direct control) doesn't insert that many rows.

We turned to the vendor and asked how they did the inserts. They replied that they do a select from the table WHERE 1 = 0 to get the column-structure into their internal ADODB object that they then fill with relevant data. They usually execute between 1 and 20 inserts pr. 'batch'.

I'm guessing that ADODB behind the scene makes it look like a bulk insert when you do a batch-update, which will be the only rational reason for Oracle to look at this as a bulk, but I have been unable to find any hard facts on this.

Can anybody offer insight into:

  • Why this happens to begin with?
  • If related to 11g, then why it happens now, two months after upgrade?
  • Any other parameters I should look at in order to get to the bottom of this?

EDIT: It turns out this may very well be a bug with Oracle on Linux. We're currently testing a patch and I'll post an answer myself in a couple of days, if it turns out to be true.

EDIT2: The patch didn't fix it – and while we probably haven't found the cause, we may have alleviated the problem by increasing the number of redo-logs. I still hope to write an answer to this at some point.

Best Answer

This turned out to be an Oracle bug. After applying the below mentioned patch, no single statement has had above 50 cursors for more than a month and the performance issues described in the question has vanished.


Patch : Bug 10636231 - High version count for INSERT .. RETURNING statements with reason INST_DRTLD_MISMATCH

Bug 10636231 - High version count for INSERT .. RETURNING statements with reason INST_DRTLD_MISMATCH [ID 10636231.8]

Modified 17-SEP-2011 Type PATCH Status PUBLISHED

Bug 10636231 High version count for INSERT .. RETURNING statements with reason INST_DRTLD_MISMATCH This note gives a brief overview of bug 10636231. The content was last updated on: 17-SEP-2011 Click here for details of each of the sections below. Affects: Product (Component) Oracle Server (Rdbms)
Range of versions believed to be affected Versions >= 11.2.0.2 but BELOW 12.1
Versions confirmed as being affected •11.2.0.2

Platforms affected Generic (all / most platforms affected)

It is believed to be a regression in default behaviour thus: Regression introduced in 11.2.0.2

Fixed: This issue is fixed in •12.1 (Future Release) •11.2.0.3 •11.2.0.2.3 Patch Set Update •11.2.0.2 Bundle Patch 8 for Exadata Database •11.2.0.1 Bundle Patch 12 for Exadata Database •11.2.0.2 Patch 7 on Windows Platforms

Symptoms: Related To: •Leak (Memory Leak / Growth) •Mutex Contention •Shared Pool Affected •Cursor not shared due to INST_DRTLD_MISMATCH •V$SQLAREA •V$SQL_SHARED_CURSOR

Description This problem is introduced in 11.2.0.2 by the fix for bug 9380377

Insert SQL with a RETURNING clause may not share child cursors leading to a high VERSION_COUNT in V$SQLAREA and associated problems (possible mutex contention etc..). This can occur if the session is involved in a global transaction. eg: If the session is either executing within an externally coordinated transaction, such as XA, or if the session makes use of database links.

Rediscovery Notes: High version_count in V$SQLAREA Insert statement with a RETURNING clause A global transaction is involved The reason in V$SQL_SHARED_CURSOR is INST_DRTLD_MISMATCH