Apache Mesos – Persistent Storage Solutions

distributed computingpostgres

Recently I've discovered such a thing as a Apache Mesos.

It all looks amazingly in all that demos and examples. I could easily imagine how one would run for stateless jobs – that fits to the whole idea naturally.

Bot how to deal with long running jobs that are stateful?

Say, I have a cluster that consists of N machines (and that is scheduled via Marathon). And I want to run a postgresql server there.

That's it – at first I don't even want it to be highly available, but just simply a single job (actually Dockerized) that hosts a postgresql server.

How would one organize it?

Constraint a server to a particular cluster node? Use some distributed FS?

Best Answer

Short answer: Apache Mesos doesn't provide distributed FS.

So, apps have to work with local FS on slaves or you may run any distributed FS alongside Mesos. Mesos is typically deployed together with HDFS, and most of the frameworks that run on top of Mesos can work with HDFS (Hadoop, Spark, Storm, etc.)

And in case your app doesn't support any distributed fs, it has to work with local FS on each slave.

I run ElasticSearch on top of Mesos: I specified local directories for ES data in the config file that each mesos slave take when I start framework. So if I restart ES framework, each ES slave will use the specified directories, and if there's some data, that data will be used. I run multiple instances of ES, and they replicate data between each other, so I don't need to worry about losing data.

However, there may be a problem at some point: let's say, I have 4 mesos slaves, I run ES on 2 of them. Then I stop ES framework and start some other framework on 2 slaves, then I start ES framework again on 2 slaves, but mesos doesn't guaranty that 2 new slaves are the same 2 slaves that I was running ES on previously. So there'll be no previous copy of data.

I run ES on majority of Mesos slaves and I never stop the framework, so I've never encountered this problem.

Related Topic