Linux – Does the “bs” option in “dd” really improve the speed

dddisk-imagekernellinux

Every now and then, I'm told that to increase the speed of a "dd" I should carefully choose a proper "block size".

Even here, on ServerFault, someone else wrote that "…the optimum block size is hardware dependent…" (iain) or "…the perfect size will depend on your system bus, hard drive controller, the particular drive itself, and the drivers for each of those…" (chris-s)

As my feeling was a bit different (BTW: I tought that the time needed to deeply-tune the bs parameter was much higher than the gain received, in terms of time-saved, and that the default was reasonable), today I just went through some quick-and-dirty benchmarks.

In order to lower external influences, I decided to read:

  • from an external MMC card
  • from an internal partition

and:

  • with related filesystems umounted
  • sending the output to /dev/null to avoid issues related to "writing speed";
  • avoiding some basic issues of HDD-caching, at least when involving the HDD.

In the following table, I've reported my findings, reading 1GB of data with different values of "bs" (you can find the raw numbers at the end of this message):

enter image description here

Basically it cames out that:

  • MMC: with a bs=4 (yes! 4 bytes), I reached a throughput of 12MB/s. A not so distant values wrt to the maximum 14.2/14.3 that I got from bs=5 and above;

  • HDD: with a bs=10 I reached 30 MB/s. Surely lower than the 95.3 MB got with the default bs=512 but… significant as well.

Also, it was very clear that the CPU sys-time was inversely proportional to the bs value (but this sounds reasonable, as the lower the bs, the higher the number of sys-calls generated by dd).

Having said all the above, now the question: can someone explain (a kernel hacker?) what are the major component/systems involved in such throughput, and if it really worth the effort in specifying a bs higher than the default?


MMC case – raw numbers

bs=1M

root@iMac-Chiara:/tmp# time dd if=/dev/sdc of=/dev/null bs=1M count=1000
1000+0 record dentro
1000+0 record fuori
1048576000 byte (1,0 GB) copiati, 74,1239 s, 14,1 MB/s

real    1m14.126s
user    0m0.008s
sys     0m1.588s

bs=1k

root@iMac-Chiara:/tmp# time dd if=/dev/sdc of=/dev/null bs=1k count=1000000
1000000+0 record dentro
1000000+0 record fuori
1024000000 byte (1,0 GB) copiati, 72,7795 s, 14,1 MB/s

real    1m12.782s
user    0m0.244s
sys     0m2.092s

bs=512

root@iMac-Chiara:/tmp# time dd if=/dev/sdc of=/dev/null bs=512 count=2000000
2000000+0 record dentro
2000000+0 record fuori
1024000000 byte (1,0 GB) copiati, 72,867 s, 14,1 MB/s

real    1m12.869s
user    0m0.324s
sys     0m2.620s

bs=10

root@iMac-Chiara:/tmp# time dd if=/dev/sdc of=/dev/null bs=10 count=100000000
100000000+0 record dentro
100000000+0 record fuori
1000000000 byte (1,0 GB) copiati, 70,1662 s, 14,3 MB/s

real    1m10.169s
user    0m6.272s
sys     0m28.712s

bs=5

root@iMac-Chiara:/tmp# time dd if=/dev/sdc of=/dev/null bs=5 count=200000000
200000000+0 record dentro
200000000+0 record fuori
1000000000 byte (1,0 GB) copiati, 70,415 s, 14,2 MB/s

real    1m10.417s
user    0m11.604s
sys     0m55.984s

bs=4

root@iMac-Chiara:/tmp# time dd if=/dev/sdc of=/dev/null bs=4 count=250000000
250000000+0 record dentro
250000000+0 record fuori
1000000000 byte (1,0 GB) copiati, 80,9114 s, 12,4 MB/s

real    1m20.914s
user    0m14.436s
sys     1m6.236s

bs=2

root@iMac-Chiara:/tmp# time dd if=/dev/sdc of=/dev/null bs=2 count=500000000
500000000+0 record dentro
500000000+0 record fuori
1000000000 byte (1,0 GB) copiati, 161,974 s, 6,2 MB/s

real    2m41.976s
user    0m28.220s
sys     2m13.292s

bs=1

root@iMac-Chiara:/tmp# time dd if=/dev/sdc of=/dev/null bs=1 count=1000000000
1000000000+0 record dentro
1000000000+0 record fuori
1000000000 byte (1,0 GB) copiati, 325,316 s, 3,1 MB/s

real    5m25.318s
user    0m56.212s
sys     4m28.176s

HDD case – raw numbers

bs=1

root@iMac-Chiara:/tmp# time dd if=/dev/sda3 of=/dev/null bs=1 count=1000000000
1000000000+0 record dentro
1000000000+0 record fuori
1000000000 byte (1,0 GB) copiati, 341,461 s, 2,9 MB/s

real    5m41.463s
user    0m56.000s
sys 4m44.340s

bs=2

root@iMac-Chiara:/tmp# time dd if=/dev/sda3 of=/dev/null bs=2 count=500000000
500000000+0 record dentro
500000000+0 record fuori
1000000000 byte (1,0 GB) copiati, 164,072 s, 6,1 MB/s

real    2m44.074s
user    0m28.584s
sys 2m14.628s

bs=4

root@iMac-Chiara:/tmp# time dd if=/dev/sda3 of=/dev/null bs=4 count=250000000
250000000+0 record dentro
250000000+0 record fuori
1000000000 byte (1,0 GB) copiati, 81,471 s, 12,3 MB/s

real    1m21.473s
user    0m14.824s
sys 1m6.416s

bs=5

root@iMac-Chiara:/tmp# time dd if=/dev/sda3 of=/dev/null bs=5 count=200000000
200000000+0 record dentro
200000000+0 record fuori
1000000000 byte (1,0 GB) copiati, 66,0327 s, 15,1 MB/s

real    1m6.035s
user    0m11.176s
sys 0m54.668s

bs=10

root@iMac-Chiara:/tmp# time dd if=/dev/sda3 of=/dev/null bs=10 count=100000000
100000000+0 record dentro
100000000+0 record fuori
1000000000 byte (1,0 GB) copiati, 33,4151 s, 29,9 MB/s

real    0m33.417s
user    0m5.692s
sys 0m27.624s

bs=512 (offsetting the read, to avoid caching)

root@iMac-Chiara:/tmp# time dd if=/dev/sda3 of=/dev/null bs=512 count=2000000 skip=6000000
2000000+0 record dentro
2000000+0 record fuori
1024000000 byte (1,0 GB) copiati, 10,7437 s, 95,3 MB/s

real    0m10.746s
user    0m0.360s
sys 0m2.428s

bs=1k (offsetting the read, to avoid caching)

root@iMac-Chiara:/tmp# time dd if=/dev/sda3 of=/dev/null bs=1k count=1000000 skip=6000000
1000000+0 record dentro
1000000+0 record fuori
1024000000 byte (1,0 GB) copiati, 10,6561 s, 96,1 MB/s

real    0m10.658s
user    0m0.164s
sys 0m1.772s

bs=1k (offsetting the read, to avoid caching)

root@iMac-Chiara:/tmp# time dd if=/dev/sda3 of=/dev/null bs=1M count=1000 skip=7000
1000+0 record dentro
1000+0 record fuori
1048576000 byte (1,0 GB) copiati, 10,7391 s, 97,6 MB/s

real    0m10.792s
user    0m0.008s
sys 0m1.144s

Best Answer

What you have done is only a read speed test. if you are actually copying blocks to another device you have pauses in the reading while the other device is accepting the data you want to write, when this happens you can hit rotational latency issues on the read device (if it's a hard disk) and so it's often significantly faster to read 1M chunks off the HDD as you come up against rotational latency less often that way.

I know when I'm copying hard disks I get a faster rate by specifying bs=1M than by using bs=4k or the default. I'm talking speed improvements of 30 to 300 percent. There's no need to tune it for absolute best unless it's all you do every day. but picking something better than the default can cut hours off the execution time.

When you're using it for real try a few different numbers and send the dd process a SIGUSR1 signal to get it to issue a status report so you can see how it's going.

✗ killall -SIGUSR1 dd
1811+1 records in
1811+1 records out
1899528192 bytes (1.9 GB, 1.8 GiB) copied, 468.633 s, 4.1 MB/s