Sql-server – SSIS/ETL jobs fail with network related errors, when network is OK

loggingsql serversql-server-2012ssis

We have a SQL Server 2012 instance that is our main ETL/DW server. The daily jobs consist of about 40+ ETL processes querying other data sources, updating the DataMart, standard ETL and BI processes.

For the past several weeks, many jobs have failed with network related errors. Most of these errors consist of:
"Unable to complete login process due to delay in opening server connection"
"Login timeout expired"
"A network-related or instance-specific error has occurred while establishing a connection to SQL Server. Server is not found or not accessible. Check if instance name is correct and if SQL Server is configured to allow remote connections."

All data source accounts, permissions, availability, have been verified as working, and logging in and running the ETL packages manually has worked without issue. The only resource issue we have is high-CPU, between 90-99%, during the daily job processing, which is when these errors come up.

The failures are not consistent, but usually once a week or so we get a large number in any one day, when everything else is running fine.

Where else could I look to find the source of these issues? Is high-CPU utilization causing long waits, which manifest as network errors?

Best Answer

It very well could be the high CPU on the SSIS server. If the CPUs are to busy on the client (in this case the SSIS server) then the client may end up waiting for to long in order to process the response from the SQL Server. You'll need to reduce the workload on the SSIS box or optimize some stuff so that the CPU load drops so that the connections work without issue.