How to restart rabbitmq after switching machines

rabbitmq

I'm running django/celery on EC2, with rabbitmq as the broker. The machine I was using failed, so I fired up another instance. But since switching to the new machine, I haven't been able to get celery to work.

EDIT: I've included a lot of logs below, just in case I'm misdiagnosing the problem. But I'm 85% sure that the problem is that rabbitmq-server fails to start up in the "starting database" phase.

node          : rabbit@ip-10-212-66-181
app descriptor: /usr/lib/rabbitmq/lib/rabbitmq_server-1.7.2/sbin/../ebin/rabbit.app
home dir      : /var/lib/rabbitmq
cookie hash   : 5+uQ077En5bpvle3HJCQMg==
log           : /var/log/rabbitmq/rabbit.log
sasl log      : /var/log/rabbitmq/rabbit-sasl.log
database dir  : /var/lib/rabbitmq/mnesia/rabbit

starting internal event notification system                           ...done
starting logging server                                               ...done
starting database                                                     ...Erlang has closed

Any ideas on how to further diagnose/solve this problem?

Here's what happens when I try to run celery:

$ python manage.py celeryd -l info
/opt/bitnami/python/lib/python2.6/site-packages/django_celery-2.4.2-py2.6.egg/djcelery/loaders.py:86: UserWarning: Using settings.DEBUG leads to a memory leak, never use this setting in production environments!
  warnings.warn("Using settings.DEBUG leads to a memory leak, never "
[2011-12-05 19:40:13,545: WARNING/MainProcess]  

 -------------- celery@ip-10-212-66-181 v2.4.3
---- **** -----
--- * ***  * -- [Configuration]
-- * - **** ---   . broker:      amqp://guest@localhost:5672//
- ** ----------   . loader:      djcelery.loaders.DjangoLoader
- ** ----------   . logfile:     [stderr]@INFO
- ** ----------   . concurrency: 1
- ** ----------   . events:      OFF
- *** --- * ---   . beat:        OFF
-- ******* ----
--- ***** ----- [Queues]
 --------------   . celery:      exchange:celery (direct) binding:celery


[Tasks]
  . tbAnalytics.models.processAnalysis
  . tbCollections.models.processCollection

[2011-12-05 19:40:13,558: INFO/PoolWorker-1] child process calling self.run()
[2011-12-05 19:40:13,562: WARNING/MainProcess] celery@ip-10-212-66-181 has started.
[2011-12-05 19:40:13,564: ERROR/MainProcess] Consumer: Connection Error: [Errno 111] Connection refused. Trying again in 2 seconds...
[2011-12-05 19:40:15,574: ERROR/MainProcess] Consumer: Connection Error: [Errno 111] Connection refused. Trying again in 4 seconds...

Tracing it back, it looks like the rabbitmq server is the problem, and the database in particular:

$ sudo rabbitmqctl status
Status of node 'rabbit@ip-10-212-66-181' ...
Error: unable to connect to node 'rabbit@ip-10-212-66-181': nodedown
diagnostics:
- nodes and their ports on ip-10-212-66-181: [{rabbitmqctl14448,38289}]
- current node: 'rabbitmqctl14448@ip-10-212-66-181'
- current node home dir: /var/lib/rabbitmq
- current node cookie hash: 5+uQ077En5bpvle3HJCQMg==

But I haven't been able to figure out how to restart the server:

bitnami@ip-10-212-66-181:/var/log/rabbitmq$ sudo rabbitmq-server start_app

+---+   +---+
|   |   |   |
|   |   |   |
|   |   |   |
|   +---+   +-------+
|                   |
| RabbitMQ  +---+   |
|           |   |   |
|   v1.7.2  +---+   |
|                   |
+-------------------+
AMQP 8-0
Copyright (C) 2007-2010 LShift Ltd., Cohesive Financial Technologies LLC., and Rabbit Technologies Ltd.
Licensed under the MPL.  See http://www.rabbitmq.com/

node          : rabbit@ip-10-212-66-181
app descriptor: /usr/lib/rabbitmq/lib/rabbitmq_server-1.7.2/sbin/../ebin/rabbit.app
home dir      : /var/lib/rabbitmq
cookie hash   : 5+uQ077En5bpvle3HJCQMg==
log           : /var/log/rabbitmq/rabbit.log
sasl log      : /var/log/rabbitmq/rabbit-sasl.log
database dir  : /var/lib/rabbitmq/mnesia/rabbit

starting internal event notification system                           ...done
starting logging server                                               ...done
starting database                                                     ...Erlang has closed
{"init terminating in do_boot",{{nocatch,{error,{cannot_start_application,rabbit,{bad_return,{{rabbit,start,[normal,[]]},{'EXIT',{{case_clause,{error,{timeout_waiting_for_tables,[rabbit_user,rabbit_user_permission,rabbit_vhost,rabbit_config,rabbit_listener,rabbit_durable_route,rabbit_route,rabbit_reverse_route,rabbit_durable_exchange,rabbit_exchange,rabbit_durable_queue,rabbit_queue]}}},[{rabbit,'-run_boot_step/1-lc$^1/1-1-',1},{rabbit,run_boot_step,1},{rabbit,'-start/2-lc$^0/1-0-',1},{rabbit,start,2},{application_master,start_it_old,4}]}}}}}}},[{init,start_it,1},{init,start_em,1}]}}

Crash dump was written to: erl_crash.dump
init terminating in do_boot ()

Also, don't know if it's relevant, but this process is running in the background.

$ ps aux | grep rabbit
rabbitmq   714  0.0  0.0   1980   408 ?        S    Dec04   0:00 /usr/lib/erlang/erts-5.7.4/bin/epmd -daemon

I haven't been able to find any documentation for this kind of failure. Any suggestions?

Best Answer

I got some very good help from the rabbitmq-discuss list:

The database RabbitMQ uses is bound to the machine's hostname, so if you copied the database dir to another machine, it won't work. If this is the case, you have to set up a machine with the same hostname as before and transfer any outstanding messages to the new machine. If there's nothing important in rabbit, you could just clear everything by removing the RabbitMQ files in /var/lib/rabbitmq.

I deleted everything in /var/lib/rabbitmq/mnesia/rabbit/ and it started up without trouble. Hooray!

Related Solutions

How to make RabbitMQ listen only to localhost

Putting the following in /etc/rabbitmq/rabbitmq-env.conf will make RabbitMQ and epmd listen on only localhost:

export RABBITMQ_NODENAME=rabbit@localhost
export RABBITMQ_NODE_IP_ADDRESS=127.0.0.1
export ERL_EPMD_ADDRESS=127.0.0.1

It takes a bit more work to configure Erlang to only use localhost for the higher numbered port (which is used for clustering nodes as far as I can tell). If you don't care about clustering and just want Rabbit to be run fully locally then you can pass Erlang a kernel option for it to only use the loopback interface.

To do so, create a new file in /etc/rabbitmq/ - I'll call it rabbit.config. In this file we'll put the Erlang option that we need to load on run time.

[{kernel,[{inet_dist_use_interface,{127,0,0,1}}]}].

If you're using the management plugin and also want to limit that to localhost, you'll need to configure its ports separately, making the rabbit.config include this:

[ {rabbitmq_management, [ {listener, [{port, 15672}, {ip, "127.0.0.1"}]} ]}, {kernel, [ {inet_dist_use_interface,{127,0,0,1}} ]} ].

(Note RabbitMQ leaves epmd running when it shuts down, so if you want to block off Erlang's clustering port, you will need to restart epmd separately from Rabbit.)

Next we need to have RabbitMQ load this at startup. Open up /etc/rabbitmq/rabbitmq.conf again and put the following at the top:

export RABBITMQ_CONFIG_FILE="/etc/rabbitmq/rabbit"

This loads that config file when the rabbit server is started and will pass the options to Erlang.

You should now have all Erlang/RabbitMQ processes listening only on localhost! This can be checked with netstat -ntlap

EDIT : In older versions of RabbitMQ, the configuration file is : /etc/rabbitmq/rabbitmq.conf. However, this file has been replaced by the rabbit-env.conf file.

Unable to start rabbitmq

Your error message is right there in what you pasted: "error,{future_upgrades_found".

Typically this is because you upgraded the version of RabbitMQ on your system and then (a) subsequently downgraded but didn't blow away the database, or (b) tried to run an older version of RabbitMQ against the upgraded database.

Re-Create your database (or upgrade to the appropraite version of RabbitMQ that the DB was created with) and the problem will go away.

Best Answer

Related Solutions

How to make RabbitMQ listen only to localhost

Unable to start rabbitmq

Related Topic