Oracle – Why ORA-03106 in an OCI program when SELECTing more than 1 row? 10g only, not 11g

oracleoracle-call-interfaceoracle10g

My OCI application is returning ORA-03106 ("fatal two-task communication protocol error") when performing a SELECT * FROM mytable where mytable has four columns: INTEGER, VARCHAR(1000), FLOAT and DATE, and two rows. The call succeeds if there is only 1 row. The call also succeeds if there is a BLOB column in the table (that is also being selected).

Various forums have indicated that NLS_LANG-related issues may be the cause – namely, that character set conversion is failing on the client when the data is retrieved, because (possibly) the character set conversion files cannot be found. Note that I have tested many possible combinations of setting NLS_LANG, NLS_CHARACTERSET, ORA_NLS10, and other environment variables. Setting certain of these impacts the observed behavior: sometimes, Oracle throws ORA-01034: ORACLE not available ORA-27101: shared memory realm does not exist; at other times the client application crashes before the first row is retrieved (the crash occurs inside OCIStmtFetch2()). Notably, if the client application executes the following query when the connection is first established: ALTER SESSION SET NLS_NUMERIC_CHARACTERS = '.,', the crash does not occur; instead, the ORA-03106: fatal two-task communication protocol error error is returned.

Note: it is possible that the presence of the DATE column is causing this – but establishing reproducible cases has proven to be very difficult, so I am posting my question as-is.

I have spent in excess of 50 hours attempting to resolve this problem. Any assistance would be greatly appreciated.

NEW: important additional details (see code snippet below): The problem ONLY occurs during PIECEWISE retrieval of data – it does not matter which of the two methods I use (a dynamic callback function, or the loop method using the get/set piece info functions). Stepping through the debugger and looking at the buffers in memory that I have supplied to OCI to store the FETCHED data, all 4 fields in the first row are always retrieved properly; the initial FETCH of the second column (this is the call that returns with a request for info about the VARCHAR column when the looping approach is used) successfully populates all non-dynamic fields up to but not including the VARCHAR column (i.e., the INTEGER column is correctly populated); then the next FETCH successfully populates the VARCHAR field, and should successfully populate the remaining (non-dynamic) columns as well (the FLOAT and DATE fields); however, the FLOAT field is correctly populated, but the DATE field is corrupt, and furthermore, this FETCH call should return success, but instead it returns ORA-03106.

The following code snippet, stripped of irrelevant error-checking and other code, shows what is happening:

void RetrieveRow(...)
{
    // All necessary environment, statement, describe, and define functions
    // have already been called; non-dynamic buffers have already been allocated
    retcode = OCIStmtFetch2(mystmt, fConnection->myerrhp, 1, OCI_DEFAULT, 0, OCI_DEFAULT);
    while (retcode == OCI_NEED_DATA)
    {
        // ...
        // ... Define the necessary arguments to OCIStmtGetPieceInfo() here ...
        // ...
        OCIStmtGetPieceInfo(mystmt, fConnection->myerrhp, (dvoid**)&define, &type, &inout, &iter, &idx, &piecep)

        // ...
        // ... Iteratively allocate and increase the buffer size for the required dynamic column here...
        // ...
        OCIStmtSetPieceInfo(define, OCI_HTYPE_DEFINE, fConnection->myerrhp, buf, alenp, piecep, indp, rcodep)

        // ... Call OCIStmtFetch2() as part of the dynamic loop to fetch the next piece
        retcode = OCIStmtFetch2(mystmt, fConnection->myerrhp, 1, OCI_DEFAULT, 0, OCI_DEFAULT);
    }
    // retcode is OCI_SUCCESS here when RetrieveRow() is called for the first row,
    // ... and all data in the first row is properly populated in the buffers.
    // But, when RetrieveRow() is called the second time, the above loop is entered,
    // ... the INTEGER, VARCHAR (a dynamic field), and FLOAT fields are correctly populated,
    // ... but the DATE field is corrupt, retcode is OCI_ERROR, and the error is ORA-03106.
    // ... Note that even with a dynamic callback used instead of a loop, the error is the same.
    // ... Note that when piecewise (dynamic) fetching is NOT used,
    // ... all rows are retrieved successfully and there is no error.
}

Note: As reflected in my title change, this problem does NOT occur when OCI and Oracle 11g are used. With the code COMPLETELY unchanged, but including the OCI 11g header files and loading the 11g OCI dll, running against an 11g Oracle database server listener/instance, the code succeeds WITHOUT this error. It is only 10g in which the error occurs.

I would be satisfied if a reference can be provided that clearly and reputably identifies this as an OCI/Oracle bug in version 10g. (I am convinced that it is.) However, I cannot find any confirmation that this is an OCI/Oracle bug.

Best Answer

What exact version of Oracle client and server are in use?

There's a possible match on at least one known Oracle bug:

Applies to: Oracle Server - Enterprise Edition - Version: 10.2.0.1 to 10.2.0.2 - Release: 10.2 to 10.2 Information in this document applies to any platform. Checked for relevance on 21-Jan-2010

Symptoms OCI based application intermittently throws ORA-03106: fatal two-task communication proto error and has error entries in application log.

Important : The error may not bee seen in the alert.log and in the SQL NET trace files

Cause This has been identified as a Bug 4523125

Solution Upgrade the Client and the Server to 10.2.0.3

More generally, this seems like an issue for Oracle Support.

Related Topic