Centos – Rabbit Mq fails to start

centosrabbitmq

Just installed RabitMq on a brand new CentOs6.6 server ( rabbitMq 3.4.1, erlang says 17 ), and it fails to start.

Starting rabbitmq-server: FAILED - check /var/log/rabbitmq/startup_{log, _err}
rabbitmq-server.

startup log

Stack trace:
[{inet_gethost_native,ensure_started,0,
                     [{file,"inet_gethost_native.erl"},{line,548}]},
{inet_gethost_native,getit,2,
                     [{file,"inet_gethost_native.erl"},{line,487}]},
{inet,gethostbyname_tm_native,4,[{file,"inet.erl"},{line,1094}]},
{inet,gethostbyname,3,[{file,"inet.erl"},{line,459}]},
{erl_epmd,port_please1,3,[{file,"erl_epmd.erl"},{line,81}]},
{rabbit_networking,record_distribution_listener,0,[]},
{rabbit_networking,boot,0,[]},
{rabbit,'-run_step/3-lc$^1/1-1-',2,[]}]

BOOT FAILED
===========

Error description:
{could_not_start,rabbit,
   {bad_return,
       {{rabbit,start,[normal,[]]},
        {'EXIT',
            {rabbit,failure_during_boot,
                {boot_step,networking,
                    {could_not_start_server,inet_gethost_native}}}}}}}

Let me know if you need any other info. Any help would be appreciated, I think this sever is cursed. Last time i get a new server around Halloween.

It looks like it running status. I have a line for my host, there is one with the ip address to, but removing that didn't help, in etc/hosts.

Status of node rabbit@host4 ...
Error: unable to connect to node rabbit@host4: nodedown

DIAGNOSTICS
===========

attempted to contact: [rabbit@host4]

rabbit@host4:
  * connected to epmd (port 4369) on host4
  * epmd reports: node 'rabbit' not running at all
                  no other nodes on host4
  * suggestion: start the node

current node details:
- node name: 'rabbitmqctl-29678@host4'
- home dir: /var/lib/rabbitmq

I have this process running too. If it helps.

root@host4 [981 19:09:47 ~]# ps aux|grep rabbit
rabbitmq 16068  0.0  0.0  10828   528 ?        S    16:13   0:00 /usr/lib64/erlang/erts-6.2/bin/epmd -daemon

also
running hostname -f shows me the correct hostname. For example

  host4.mysite.com

IN reply I ran those commands: rabbit was not running and would not connect as the app_start.

Here is the results of rpm -qa --queryformat '%{NAME}\n' |grep erlang

erlang-mnesia
erlang-otp_mibs
erlang-reltool
erlang-snmp
erlang-erl_docgen
erlang-examples
erlang-erts
erlang-cosEvent
erlang-ic
erlang-debugger
erlang-stdlib
erlang-cosProperty
erlang-jinterface
erlang-typer
erlang-compiler
erlang-asn1
erlang-cosNotification
erlang-cosFileTransfer
erlang-parsetools
erlang-wx
erlang-dialyzer
erlang
erlang-solutions
erlang-hipe
erlang-ssl
erlang-ssh
erlang-percept
erlang-odbc
erlang-webtool
erlang-megaco
erlang-syntax_tools
erlang-public_key
erlang-edoc
erlang-cosTransactions
erlang-erl_interface
erlang-observer
erlang-common_test
erlang-kernel
erlang-runtime_tools
erlang-orber
erlang-eldap
erlang-sasl
erlang-os_mon
erlang-inets
erlang-diameter
erlang-tools
erlang-crypto
erlang-cosTime
erlang-eunit
erlang-test_server
erlang-gs
erlang-ose
erlang-xmerl
erlang-cosEventDomain
erlang-et

I do have ip_tables, so I thought it might be a firewall issue but I can successfully telnet to the host name rabbit is using with 4369 as the port.

When I run nmap -p 5672

Starting Nmap 5.51 ( http://nmap.org ) at 2014-11-03 20:30 CST
Nmap scan report for host4 (127.0.0.1)
Host is up (0.000049s latency).
Other addresses for host4 (not scanned): xxx.xxx.xxx.xxx
rDNS record for 127.0.0.1: localhost.localdomain
PORT     STATE  SERVICE
5672/tcp closed amqp

Nmap done: 1 IP address (1 host up) scanned in 0.06 seconds

What's the best way to remove this thing, when I do yum remove rabbitmq-server and instal an older version say 3.3.5 I get this error message

Crash dump was written to: erl_crash.dump
Failed to create aux thread
Aborted (core dumped)

Best Answer

For future reference I have solved the issue, with some help from the RabbitMq community which pointed me in this direction, by a simple statement.

This suggests Erlang VM cannot create a thread. Do you have any resource or security restrictions in place?

This was directly in response to 2 items.

 Failed to create aux thread

Not sure why it didn't occur to me before, because I did see this in the erlang dump

 processes: 13064032
 processes_used: 13064032

However I am not sure how the number of erlang processes convert to system process, but regardless I thought it was a bug or programming incompatibility. It just didn't make a whole lot of sense because the installation went smoothly on my virtual development server. As well as our old CentOs 5.1 server. Also as this was a new, sever with > 3x the speed of our old one, I thought hitting resources limits was not an issue. I just needed someone to say it to make it click in my mind.

Anyway, after some researching I ran this command

#su rabbitmq
bash-4.1$ ulimit -a
=============================
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 128218
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) 131072
open files                      (-n) 4096
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 100
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

The important thing here is this part:

 max user processes              (-u) 100

Checking my development box ( which has a functional RabbitMq installation with the management plugin ) I further saw this.

 Erlang processes   206

So it doesn't really take a genius to figure out that 206 is more then 100 so after some more research I discovered the default value for this setting is typically 1024, and that i can change it in /etc/security/limits.conf In that file I found

 *               hard    nproc           100

So I just upped that to the 1024 amount for the rabbitmq user

 rabbitmq                 hard    nproc           1024

And it fired right up! After starting it and checking the status, I see this

 {processes,[{limit,1048576},{used,147}]},

I believe the limit here is system wide? Still not really sure how the erlang process and these other process numbers relate.

So in conclusion 100 process is not enough for erlang to work. This is a cloud hosted SSAE 16 dedicated webserver, typically the hosting company sets these up for use for resellers, ie. you can farm out parts of the server to host you're clients websites. This is most likely why they set a default limit so low. We use this type of server because we do a lot of database querying and report writing, and it offers a fair amount of power for what we pay. So while the hardware meets our needs, the configuration doesn't fit our use case as well.

Hopefully this can help someone in the future.