Why does storage’s performance change at various queue depths

hard driveiopsperformancessd

I'm in the market for a storage upgrade for our servers.
I'm looking at benchmarks of various PCIe SSD devices and in comparisons I see that IOPS change at various queue depths.
How can that be and why is that happening?
The way I understand things is: I have a device with maximum (theoretical) of 100k IOPS. If my workload consistently produces 100,001 IOPS, I'll have a queue depth of 1, am I correct?
However, from what I see in benchmarks some devices run slower at lower queue depths, then speedup at depth of 4-64 and then slow down again at even larger depths.
Isn't queue depths a property of OS(or perhaps storage controller), so why would that affect IOPS?

Best Answer

The short answer is that harddrives optimizes retrieval of data if there are more than one IO request outstanding, which usually increases throughput at the expense of latency.

NCQ does this, reorders IO requests to optimize throughput.

SSD:s work differently from mechanical drives since they have parallell flash chips to store data on. Ie if you issue one IO request at a time the latency (search + read time) decides IOPS. But if you issue 4 requests at once the ssd disk might be able to retrieve them in parallell or another optimized way and you might get 4 times the throughput.

The higher the queue depth gets the more possibilities to optimize the disk gets, to a point. Since IOPS is a function of throughput this increases IOPS at higher queue depths.

EDIT:

The true queue resides in the OS which is issuing all requests. That said I would conjecture the controller driver passes on a certain amount of the queue to the controller and disk so they can work at their optimized queue depths. The disk must have it's own queue to be able to optimize.