Sql-server – How much data can SQL Server handle without a DBA / other professionals

scalabilitysql server

We deploy clusters that use Cassandra, Elasticsearch and similar NoSQL technologies to index and process data. We go to quite some lengths to make sure we can consume and process records quickly.

One of our clients has asked us to export their data so they can cross-reference it in SQL Server. It's been quite some time since I used SQL server in anger (2008) so I'm a little out of touch with the art of the possible nowadays.

While the client has data centres and a range of skilled people (DBAs, devs, etc), the department we're dealing with have been given a single server running SQL Server 2014 and have limited technical knowledge. It's a massive organisation with strict regulatory requirements which usually translates into months of paperwork, process and sign-off to get resources allocated.

They've asked us to dump ~730M records into their database and then set up a process to push new data as it arrives. That's fairly simple from our end but I have serious concerns about if they're going to be able to actually use the data.

Record length varies but is on the order of 4k for the information they want.

To make things more interesting, nobody seems to really know what specs the server has. Going on other equipment they use, I'd expect something in the 64GB RAM, RAIDed spinning disks and 6-12 cores.

I've mentioned a couple of times that this might be an issue and only get vague reassurances that SQL Server can handle that much data.

Now… I know SQL Server can handle that much data when partitioned, configured properly, and has a skilled DBA to tweak things, but what's a reasonable amount of data to load into SQL instance without someone who knows what they're doing overseeing the process?

Since getting new equipment/staff allocated their end is going to be a time consuming process and their project has a tight deadline, I'd prefer not to wait until it goes horribly wrong.

I know that nobody can give me a hard and fast rule with such vague information, but at what point should I be concerned? 10M / 100M / 500M / 1B?

Best Answer

I don't think I can give you a magic "be worried here" number, where anything under that number is "okay" and anything over that number is "bad."

That said, there are a number of red flags in your question, at least to my point of view:

  1. "nobody seems to really know what specs the server has."
  2. "They've asked us to dump ~730M records into their database and then set up a process to push new data as it arrives."
  3. "the department we're dealing with have been given a single server running SQL Server 2014 and have limited technical knowledge."
  4. "It's a massive organisation with strict regulatory requirements which usually translates into months of paperwork, process and sign-off to get resources allocated."
  5. "Record length varies"

Okay, SQL Server can absolutely handle that amount of data. I personally have over 20TB on four servers.

However, SQL Server is a lot like some other Microsoft products, in that if you have a couple of tiny databases that are only lightly used, you can just shove it in the corner and generally be mean to it and it'll perk right along and not bite you (at least, not right away), but scaling out requires more thought and effort.

I'm particularly concerned about whether they're planning on doing proper maintenance on the server. "Dumping ~730M records into their database" on a regular basis without transaction log backups will quickly consume their disk.

I'm also not comforted by:

They're attempting to take output from three separate systems including ours. Records relate to fields on their network and thus have a URI that will be (largely) the same across all three data sets. They want three tables, one from each provider and then to JOIN them together to answer questions. They're planning to do this all in SSMS with a couple of staff who have a passing knowledge of SQL Server/Databases

I'm not sure this server will be happy if they decide to run terrible queries against it. It sounds to me like the data may not be normalized and/or may not contain a good join key.

Last but not least, I've had really unpleasant experiences with "We decided to save money by [letting the user administer his/her own server]/[letting the nice kid in the mail room do it]/[telling them we won't support it but they can do whatever they want]." It's always ended up being expensive and time-consuming to fix.