C# – Approach to handling multiple instances of an application that make large database operations simultaneously

Architecturecoracle

So, essentially the flow of my application works like this:

  • A user selects some options to start a process on the server.
  • Once they hit submit, on the server it opens up a console application that processes large amounts of data and then exits when it's finished
  • The idea being that the user can start multiple processes and have them all run at the same time (due to how the architecture is designed, multi-threading couldn't address this issue).

There are certain parameters that a user can choose that will cause a large insert to occur during the start of the process (something like 1,500,000 records from an external source). Even if there are multiple processes running, but only one of them has the option selected to perform that large insert, the applications perform reasonably well (the insert portion of the process takes roughly 4 minutes). However, if there are at least two instances that require that bulk insert, performance and timing drastically falls (i.e. the applications have been running for half an hour, stuck on the insert).

It might be worth mentioning that I'm doing this through a stored procedure call from the application to an Oracle 11g database.

I don't have very many ideas on how to handle this, but one approach is to check if the procedure is running with a different application, and waiting for the procedure to finish. I'm not sure if this is feasible or even the best approach. Any suggestions or directions for me to research is greatly appreciated.

Here is the code that performs the bulk insert:

public void CaptureDataForProcess(int processId)
{
    using (var context = new SomeContext())
    {
        var in_processId = new OracleParameter("in_processId", OracleDbType.Int32, processId, ParameterDirection.Input);

        context.Database.SqlQuery<object>("BEGIN SP_CAPTURE_DATA_FOR_PROCESS(:in_processId); END;", in_processId);
    }
}

Also, this is more or less the stored procedure (Since it's just a regular Insert command, I've limited the columns referenced for the sake of brevity):

CREATE OR REPLACE PROCEDURE SP_CAPTURE_DATA_FOR_PROCESS (in_processid IN NUMBER)

AS
BEGIN
    DECLARE 
        processId NUMBER(38, 0) := in_processid ;

    BEGIN  

        INSERT INTO CURRENT_PROCESS_SNAPSHOTS
            ("MARKET",
            "COUNTY",
            "STATE",
            "VENDOR",
            "PROCESSID")
        SELECT
            "MARKET",
            "COUNTY",
            "STATE",
            "VENDOR",
            processId
        FROM external_source@foo;
    END;
END;
/

Best Answer

Four min is way too long. You need to split up what you are doing so that the final insert is a single statement with all the data already on the box.

So in general you can follow the following steps, although each DB has its own special bulk operations.

  1. Collect all the data before you even talk to the database
  2. Insert that data into a temp table that won't lock any other processes
  3. Check there are no constraints that will fail for the rows
  4. Copy the data over from the temp table to the real table in one go.

Edit

The way you are doing it with a linked database means that the database will be running that select while all the data downloads from the other database.

Bring the data in with an application instead. Following the steps above

Related Topic