What is the relation between block size and IO

filesystemshard driveioperformance

I have been reading about disk recently which led me to 3 different doubts. And I am not able to link them together. Three different terms I am confused with are block size, IO and Performance.

I was reading about superblock at slashroot when I encountered the statement

Less IOPS will be performed if you have larger block size for your
file system.

From this I understand that If I want to read 1024 KB of data, a disk(say A) with block size 4KB/4096B would take more IO than a disk(Say B) with block size of 64KB.

Now my question is how much more IO would disk A need ?.

As far as I am understanding the number of IO request required to read this data would also be dependent on the size of each IO request.

So who is deciding what is the size of the IO request? Is it equal to the block size? Some people say that your application decides the size of IO request which seems fair enough but how then OS divides the single request in multiple IO. There must be a limit after which the request splits in more then one IO. How to find that limit ?
Is it possible that in both disk (A and B) the data can be read in same number of IO?
Does reading each block means a single IO ? If not how many blocks can be maximum read in a single IO?
If the data is sequential or random spread, does CPU provides all block address to read once?

Also

num of IOPS possible = 1 /(average rotational delay + avg seek time)

Throughput = IOPS * IO size

From above the IOPS for a disk would be fix always but IO size can be variable. So to calculate the maximum possible throughput we would need maximum IO size. And from this what I understand is If I want to increase throughput from a disk I would do request with maximum data I can send in a request. Is this assumption correct ?

I apologize for too many questions but I have been reading about this for a while and could not get any satisfactory answers. I found different views on the same.

Best Answer

I think the Wikipedia article explains it well enough:

Absent simultaneous specifications of response-time and workload, IOPS are essentially meaningless.
...
Like benchmarks, IOPS numbers published by storage device manufacturers do not directly relate to real-world application performance. ...

Now to your questions:

So who is deciding what is the size of the IO request?

That is a both an easy and a difficult question to answer for a non-programmer like myself.

As usual the answer is an unsatisfactory "it depends"...

I/O operations with regards to disk storage by an application are usually system calls to the operating system and their size depends on which system call is made...

I'm more familiar with Linux than other operating systems, so I'll use that as reference.

The size of I/O operations such as open() , stat() , chmod() and similar is almost negligible.
On a spinning disk the performance of those calls is mainly dependant on how much the disk actuator needs to move the arm and read head the correct position on the disk platter.

On the other hand the size of a read() and write() calls is initially set by the application and can vary between 0 and 0x7ffff000 (2,147,479,552) bytes in a single I/O request...

Of course once such a system call has been made by the application and is received by the OS, the call will get scheduled and queued (depending on wether or not the O_DIRECT flag was used to by-pass the page cache and buffers and direct I/O was selected).

The abstract system call will need to be mapped to/from operations on the underlying file-system which is ordered in discrete blocks (the size of which is usually set when the file-system was created) and eventually the disk driver operates on either hard disk sectors of 512 or 4096 bytes or SSD memory pages of 2K, 4K, 8K, or 16K.

(For benchmarks typically the read and write calls are usually set to either 512B or 4KB which align really well with the underlying disk resulting in optimal performance.)

There must be a limit after which the request splits in more then one IO. How to find that limit ?

Yes there is a limit, on Linux as documented in the manual a single read() or write() system call will return a maximum of 0x7ffff000 (2,147,479,552) bytes. To read larger files larger you will need additional system calls.

Does reading each block means a single IO ?

As far as I understand typically each occurrence of a system call is what counts as an IO event.

A single read() system call counts as 1 I/0 event and neither X nor Y IO's regardless of how that system call get translated/implemented to accessing X blocks from a filesystem or reading Y sectors from a spinning hard disk.

Related Solutions

How does ZFS Block Level Deduplication fit with Variable Block Size

ZFS deduplication works on blocks (recordlength) it does not know/care about files. Each block is checksummed using sha256 (by default changeable). If the checksum matches an other block it will just reference the same record and no new data will be written. One problem of deduplication with ZFS is that checksums are kept in memory so large pools will require a lot of memory. So you should only apply reduplication when using large record length

Assuming recordlength 128k

If I a randomly filled file of 1GB, then I write a second file that is the same except half way through, I change one of the bytes. Will that file be deduplicated (all except for the changed byte's block?)

Yes only one block will not be duplicated.

If I write a single byte file, will it take a whole 128 kilobytes? If not, will the blocks get larger in the event the file gets longer?

128k will be allocated, if the file size grows above 128k more blocks will be allocated as needed.

If a file takes two 64kilobyte blocks (would this ever happen?), then would an identical file get deduped after taking a single 128 kilobyte block

A file will take 128k the same file will be deduplicated

If a file is shortened, then part of its block would have been ignored, perhaps the data would not be reset to 0x00 bytes. Would a half used block get deduced?

If the exact same block is found yes

Linux – How to find out where a file is physically located on the disk (block numbers)

You could use debugfs for this:

debugfs -R "stat ~/myfile" /dev/hda1

Change the hard/partition drive accordingly and make sure the drive is unmounted. You will get a list with all the blocks used:

BLOCKS:
(0):1643532
TOTAL: 1

Best Answer

Related Solutions

How does ZFS Block Level Deduplication fit with Variable Block Size

Linux – How to find out where a file is physically located on the disk (block numbers)

Related Topic