Ceph Local Node – How to Connect to a Local Node in Ceph

ceph

I have an idea for an application that I'd like to build and one of the requirements is a globally replicated filesystem. Things like Ceph and GlusterFS exist, but I'm not sure they meet my particular use case.

Let's say I have 3 app servers in 3 different regions [US, Europe, Asia]
Then I have a 3 node Ceph setup with 1 node in each of those regions [US, Europe, Asia]
Can I have each app server connect directly to the Ceph node in their region or do I have to go through some centralized orchestration node?

I ask because I want to keep file system latency to a minimum and just use Ceph to synchronize changes between all the nodes. If I can't connect directly to the "local" node, I think latency would be quite high.

Any help understanding this would be greatly appreaciated!

Best Answer

It depends on the type of data access: Ceph can store data as block devices (RBD), as an S3 object store (RGW), or as a filesystem (CephFS). I assume CephFS here as you mentioned it and Gluster, both of which are filesystem abstractions.

In a three-node configuration, Ceph would have one or more OSD daemons running at each site (one per disk drive). The data is striped across the OSDs in the cluster, and your CephFS client (kernel, FUSE, or Windows) will algorithmically access the right node to store data in, no gateway is needed. How this is done is long to explain, but essentially it is a distributed hash table mapping with additional data kept server-side in the MON daemons.

The data path of CephFS is straight, from your client to the OSD, with no gateways interposed.

The filesystem makes use of an additional daemon type, the MDS, which stores your filesystem metadata. If your filesystem operation performs a filesystem change (e.g. create a directory), the MDS will be accessed instead of the OSD.

However, specifically to your intended use case, Ceph is a synchronous storage system, and its performance will decline the farther you stretch the distance between the nodes. It is generally recommended you keep a stretched configuration to within 10ms of round-trip latency between nodes. In other words, Ceph clusters like to live in one datacenter, but you can stretch them across a city or some small country if you have very good links.

Best Answer

Related Solutions

Slow fsync() with ceph (cephfs)

Enterprise SSDs as WAL/DB/Journals because they ignore fsync

Ceph OSDs and journal drives

Journal/data separation

Partitions

Caveat emptor

Related Topic

Enterprise SSDs as WAL/DB/Journals because they ignore `fsync`