Database – What exactly does it mean that storing “large blobs in the database reduces performance”

databasedatabase-designfile-systemssharepointsql server

To someone who knows database internals this may be an easy question, but can someone explain in a clear way why storing large blobs (say 400 MB movies) in the database is supposed to decrease performance and what exactly does that mean? This is a claim often found throughout the internet, but I've never seen it really explained.

To be specific, I'm referring to SharePoint/MSSQL performance, i.e. file-upload performance, site browsing, displaying lists, document opening etc. – operations that are said to become slower once a database gets too big. Blob externalization to filesystem (which in SharePoint is called Remote Blob Storage, aka moving files out of the database, leaving only a reference) is supposed to solve this to an extent, but what exactly – at the bottom level – is the difference? It's obvious that backups would take longer with giant files stored in the database … but what operations exactly are impacted and what's the underlying mechanism of it (i.e. in what way are files stored on filesystem outside of the database accessed or stored differently)?

Suppose a simple table containing columns ID(guid, PK), FileName(string), Data(varbinary(max)) – would large Data column really slow down operations such as displaying a list of files on a website (which I assume internally means running SELECT FileName FROM table), or inserting a new row? It's not like the actual binary content columns are indexed.

I know there have been some questions like this asked already, but I've not found an adequate explanation.

Best Answer

This really depends on the DB system, but one major thing you have to consider with BLOBs is transaction processing. By externalization to the filesystem, one takes changes to the binary data out of the transactions. That will typically result in faster write operations, opposed to the situation where the DB assures you ACID compliance with full rollback mechanisms etc.

Slower read operations hypothetically can also occur, when you retrieve data from your db from a BLOB table without actually selecting the BLOB data, since the DB may store the remaining rows more localized on disk, which will allow faster read access (but I guess most modern DB sytems are clever enough to store the actual binary data in a separated disk area or table space, so without testing this with a real world scenario one should not make any general assumptions here).

Related Topic