C# – Performance issue with SQL Server stored procedure

cperformancesqlsql serverstored-procedures

I used the ANTS profiler to identify the remaining bottleneck in my C# application: the SQL Server stored procedure. I am using SQL Server 2008. Can anybody here help me increase performance, or give me pointers as to what I can do to make it better or more performant?

First, here's the procedure:

PROCEDURE [dbo].[readerSimilarity] 
-- Add the parameters for the stored procedure here
@id int,
@type int
AS
BEGIN
-- SET NOCOUNT ON added to prevent extra result sets from
-- interfering with SELECT statements.
SET NOCOUNT ON;

-- Insert statements for procedure here
IF (@type=1) --by Article
    SELECT id1, id2, similarity_byArticle FROM similarity WHERE (id1 = @id OR id2 = @id) 
AND similarity_byArticle != 0

ELSE IF (@type=2) --by Parent
    SELECT id1, id2, similarity_byParent FROM similarity WHERE (id1 = @id OR id2 = @id) 
AND similarity_byParent != 0

ELSE IF (@type=3) --by Child
    SELECT id1, id2, similarity_byChild FROM similarity WHERE (id1 = @id OR id2 = @id) 
AND similarity_byChild != 0

ELSE IF (@type=4) --combined
    SELECT id1, id2, similarity_combined FROM similarity WHERE (id1 = @id OR id2 = @id) 
AND similarity_combined != 0

END

The table 'similarity' consists of two ids (id1 and id2) and a number of columns that store double values. The constraint is that id1 < id2.

Column     Data
-----      ----
ID1         PK, Indexed
ID2         PK, Indexed

The table contains 28.5 million entries.

Stored Procedure Background

The job of the stored procedure is to get all the rows that have the parameter id in either id1 or id2. Additionally, the column specified by the type-parameter cannot be zero.

The stored procedure is called multiple times for different ids. Although only taking ~1.6 ms per call, it sums up, when calling it 17,000 times.

The processor is running at only 25%, which seems to be because the application is waiting for the procedure call to return.

Do you see any way to speed things up?

Calling the Stored Procedure C# Code Snippet

private HashSet<NodeClustering> AddNeighbourNodes(int id)
    {
        HashSet<NodeClustering> resultSet = new HashSet<NodeClustering>();
        HashSet<nodeConnection> simSet = _graphDataLoader.LoadEdgesOfNode(id);

        foreach (nodeConnection s in simSet)
        {
            int connectedId = s.id1;
            if (connectedId == id)
                connectedId = s.id2;

            // if the corresponding node doesn't exist yet, add it to the graph
            if (!_setNodes.ContainsKey(connectedId))
            {
                NodeClustering nodeToAdd = CreateNode(connectedId);
                GraphAddOuter(nodeToAdd);
                ChangeWeightIntoCluster(nodeToAdd.id, s.weight);
                _bFlowOuter += s.weight;
                resultSet.Add(nodeToAdd);
            }
        }

        // the nodes in the result set have been added 
                   to the outernodes -> add to the outernodes count
        _setNodes[id].countEdges2Outside += resultSet.Count;

        return resultSet;
    }

C# Code Background Information

This method is called each time a new id is added to the cluster. It gets all the connected nodes of that id (they are connected, when there is an entry in the db with id1=id or id2=id) via

_graphDataLoader.LoadEdgesOfNode(id);

Then it checks all the connected ids and if they are not loaded yet:

if (!_setNodes.ContainsKey(connectedId))

It Loads them:

CreateNode(connectedId); 

The Method:

_graphDataLoader.LoadEdgesOfNode(id); 

is called again, this time with the connectedId.

I need this to get all the connections of the new nodes with those nodes that are already in the set.

I probably could collect the ids of all nodes i need to add and call my stored procedure only once with a list of the ids.

Ideas

I could probably load the connected ids connection at once via something like

        SELECT id1, id2, similarity_byArticle FROM similarity WHERE 
                   (id1 = @id OR id2 = @id OR
        id1 IN (SELECT id1 FROM similarity WHERE id2 = @id) OR 
        id2 IN (SELECT id1 FROM similarity WHERE id2 = @id) OR
        id1 IN (SELECT id2 FROM similarity WHERE id1 = @id) OR 
        id2 IN (SELECT id2 FROM similarity WHERE id1 = @id))
                    AND similarity_byArticle != 0

but then I would get more entries than I'd need, because I would get them for already loaded nodes too (which from my tests would make up around 75% of the call).

Questions

  1. How can I speed up the Stored Procedure?
  2. Can I do it differently, is there a more performant way?
  3. Can I use a List<int> as a SP-Parameter?
  4. Any other thoughts?

Best Answer

If it runs that quickly, your problem is probably in the sheer number of repeated calls to the procedure. Is there a way that you could modify the stored procedure and code to return all the results the app needs in a single call?

Optimizing a query that runs in less than 2ms is probably not a fruitful effort. I doubt you will be able to shave more than fractions of a millisecond with query tweaks.