Why is there both character device and block device for nvme

block-devicedevicenvme

I am trying to understand the configuration of nvme.

But I do not understand why there are two devices: nvme block and nvme character device:

crw------- 1 root root 243, 0 Dec 12 16:09 /dev/nvme0
brw-rw---- 1 root disk 259, 0 Jan 14 01:30 /dev/nvme0n1

What is the purpose of each or when to use them?

Best Answer

The character device /dev/nvme0 is the NVME device controller, and block devices like /dev/nvme0n1 are the NVME storage namespaces: the devices you use for actual storage, which will behave essentially as disks.

In enterprise-grade hardware, there might be support for several namespaces, thin provisioning within namespaces and other features. For now, you could think namespaces as sort of meta-partitions with extra features for enterprise use.

Related Solutions

Linux – Readahead Settings for LVM, Device-Mapper, Software Raid and Block Devices – what wins

How does the RA setting get passed down the virtual block device chain?

It depends. Let's assume you are inside Xen domU and have RA=256. Your /dev/xvda1 is actual LV on the dom0 visible under /dev/dm1. So you have RA(domU(/dev/xvda1)) = 256 and RA(dom0(/dev/dm1)) = 512 . It will have such effect that dom0 kernel will access /dev/dm1 with another RA than domU's kernel. Simple as that.

Another sittutation will occur if we assume /dev/md0(/dev/sda1,/dev/sda2) sittuation.

blockdev --report | grep sda
rw   **512**   512  4096          0   1500301910016   /dev/sda
rw   **512**   512  4096       2048      1072693248   /dev/sda1
rw   **512**   512  4096    2097152   1499227750400   /dev/sda2
blockdev --setra 256 /dev/sda1
blockdev --report | grep sda
rw   **256**   512  4096          0   1500301910016   /dev/sda
rw   **256**   512  4096       2048      1072693248   /dev/sda1
rw   **256**   512  4096    2097152   1499227750400   /dev/sda2

Setting /dev/md0 RA won't affect /dev/sdX blockdevices.

rw   **256**   512  4096       2048      1072693248   /dev/sda1
rw   **256**   512  4096    2097152   1499227750400   /dev/sda2
rw   **512**   512  4096          0      1072627712   /dev/md0

So generally in my opinion kernel accesses blockdevice in the manner that is actually set. One logical volume can be accessed via RAID (that it's part of) or devicemapper device and each with another RA that will be respected.

So the answer is - the RA setting is IMHO not passed down the blockdevice chain, but whatever the top level device RA setting is, will be used to access the constituent devices

Does dm-0 trump all because that is the top level block device you are actually accessing?

If you mean deep propagation by "trump all" - as per my previous comment I think that you may have different RA's for different devices in the system.

Would lvchange -r have an impact on the dm-0 device and not show up here?

Yes but this is a particular case. Let's assume that we have /dev/dm0 which is LVM's /dev/vg0/blockdevice. If you do:

lvchange -r 512 /dev/vg0/blockdevice

the /dev/dm0 will also change because /dev/dm0 and /dev/vg0/blockdevice is exactly the same block device when it comes to kernel access.

But let's assume that /dev/vg0/blockdevice is the same as /dev/dm0 and /dev/xvda1 in Xen domU that is utilizing it. Setting the RA of /dev/xvda1 will take effect but dom0 will see still have it's own RA.

What do you use, equivalent to the sector size above to determine the actual readahead value for a virtual device:

I typically discover RA by experimenting with different values and testing it with hdparm .

The stripe size of the RAID (for md0)?

Same as above.

Does the FS play a part (I am primarily interested in ext4 and XFS)?

Sure - this is a very big topic. I recommend You start here http://archives.postgresql.org/pgsql-performance/2008-09/msg00141.php

Linux – Multi-Generation Snapshots using Device Mapper (Linux)

testing the script on a clean Xen-box

That does not guarantee the exported disk(s) to contain only zeroes in their unwritten areas. Thus the kernel may detect something which isn't really there. You should overwrite the first part of the COW volume (I don't know how much is needed but the first 4 MiB should be enough. Oh, your COW volume isn't even 4 MiB in size:

dd if=/dev/zero of=/dev/mapper/dm.base.cow bs=4K count=1024

Maybe there is a minimun size for COW volumes and yours is simply too small?

Best Answer

Related Solutions

Linux – Readahead Settings for LVM, Device-Mapper, Software Raid and Block Devices – what wins

Linux – Multi-Generation Snapshots using Device Mapper (Linux)

Related Topic