Centos – Is Software Raid1 Using mdadm with a Local Hard Disk and GNDB Possible

centosmdadmsoftware-raid

I have multiple webservers which use many small files to created dynamic web pages. Caching the web pages isn't an option. The webserver also performs writes so I need a synchronous filesystem.

I'm looking to maximise performance as it's my understanding that small files is the weakness (to varying degreess) of a cluster filesystem over ethernet.

Currently I'm using Centos 5.5, 64 bit.

Since it's only about 300MB of data, I'm looking at mdadm using RAID-1 with the GNBD and a local hard disk using the "–write-mostly" option so the reads are done using the local hard disk.

Is this possible?

If so, is there any advantage to making it a tmpfs disk instead of a local hard disk?

Or will the files on the local hard disk just get cached in RAM anyway so I won't see a performance gain by using tmpfs, assuming there's enough RAM available?

Best Answer

I suggest you look at glusterfs. I use it for 1) transparency - it's backing store, if you will, is an ordinary file-system such as ext3; 2) data availability - glusterfs provides both striping, replication, or any combination; 3) performance and reliability and 4) easy of use.

While you could use it in a (web-server) client / (file-server) server mode, depending on the speed of your network, it could make more sense to me to enable it on each machine. In a sense the file-server becomes the definitive source. Each web-server reads and writes to it's own local glusterfs server, or at the very least it's own cache at local I/O speeds and to the file-server at network speeds making the system quite fast.

It can use tcp or Infiniband. And it seems that it works under Amazon Web Services. It also exports NFS and CIFS so it can be rather portable. Install via yum under CentOS, up and running in under 20 minutes. Compared to GNBD, it is much easier to setup and use. Glusterfs is configured in a highly modular way so you can use only what you need.

The beauty of glusterfs is that it's very tolerant of network or host outages. At my business, whcreative.com, I use it for partially mobile laptops serving home directories as well as html and database file-systems (for the Drupal CMS) in a mixed environment with CentOS 5.5, Fedora 13, and other assorted Linux flavours. Home directories are served from every laptop as well as the server. When a laptop reconnects after being used off-network, a simple >ls -Rl on the server syncs everything. If a machine crashes and the ext4 filesystem potentially has stale data, it's not a problem as syncing to the crashed machine once it's alive solves the problem rather quickly.

The first drawback is it is only tested on x86_64 (claimed to run on i386). Not a big issue though for most. The bigger drawback is it's documentation. For example, there is no man page describing one of the key commands, glusterfs-volgen and the 'man like' page on the website does not provide a working synopsis although it does provide examples. Configuration options are not clearly documented and take a bit of hacking to figure out. The last drawback is that it essentially relies only on user permissions for security. But in the *nix tradition, it is quite easy to run inside a VPN so that's not really a big issue.

I can not vouch for it's reliability as I have only been using it a few months. However, it seems to handle our home directories just fine after disconnecting, using Laptop, and reconnecting. Of course I don't trust it completely and do tar based backups to a CentOS, ext3 filesystem.

Best of luck, Eric Chowanski