Hadoop – HBase master stops with “Connetion Refused” Error

clouderahadoophbase

This is happening in pseudo-distributed as well as distributed mode.
When I try to start HBase, initially all the 3 services – master, region and quorumpeer start. However within a minute, the master stops. In the logs, this is the trace –

2013-05-06 20:10:25,525 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: <master/master_ip>:9000. Already tried 0 time(s).
2013-05-06 20:10:26,528 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: <master/master_ip>:9000. Already tried 1 time(s).
2013-05-06 20:10:27,530 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: <master/master_ip>:9000. Already tried 2 time(s).
2013-05-06 20:10:28,533 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: <master/master_ip>:9000. Already tried 3 time(s).
2013-05-06 20:10:29,535 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: <master/master_ip>:9000. Already tried 4 time(s).
2013-05-06 20:10:30,538 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: <master/master_ip>:9000. Already tried 5 time(s).
2013-05-06 20:10:31,540 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: <master/master_ip>:9000. Already tried 6 time(s).
2013-05-06 20:10:32,543 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: <master/master_ip>:9000. Already tried 7 time(s).
2013-05-06 20:10:33,544 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: <master/master_ip>:9000. Already tried 8 time(s).
2013-05-06 20:10:34,547 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: <master/master_ip>:9000. Already tried 9 time(s).
2013-05-06 20:10:34,550 FATAL org.apache.hadoop.hbase.master.HMaster: Unhandled exception. Starting shutdown.
java.net.ConnectException: Call to <master/master_ip>:9000 failed on connection exception: java.net.ConnectException: Connection refused
        at org.apache.hadoop.ipc.Client.wrapException(Client.java:1179)
        at org.apache.hadoop.ipc.Client.call(Client.java:1155)
        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:226)
        at $Proxy9.getProtocolVersion(Unknown Source)
        at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:398)
        at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:384)
        at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:132)
        at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:259)
        at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:220)
        at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:89)
        at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1611)
        at org.apache.hadoop.fs.FileSystem.access$300(FileSystem.java:68)
        at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:1645)
        at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1627)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:254)
        at org.apache.hadoop.fs.Path.getFileSystem(Path.java:183)
        at org.apache.hadoop.hbase.util.FSUtils.getRootDir(FSUtils.java:363)
        at org.apache.hadoop.hbase.master.MasterFileSystem.<init>(MasterFileSystem.java:86)
        at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:368)
        at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:301)
Caused by: java.net.ConnectException: Connection refused
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
        at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:592)
        at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
        at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:519)
        at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:484)
        at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:468)
        at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:575)
        at org.apache.hadoop.ipc.Client$Connection.access$2300(Client.java:212)
        at org.apache.hadoop.ipc.Client.getConnection(Client.java:1292)
        at org.apache.hadoop.ipc.Client.call(Client.java:1121)
        ... 18 more

Steps I have taken to fix this without any success
– downgraded from distributed mode to pseudo-distributed mode. Same issue.
– tried standalone mode- no luck
– used same user (hadoop) for both hadoop and hbase. Setup passwordless ssh for hadoop. – same problem.
– edited /etc/hosts file and changed localhost/servername as well as 127.0.0.1 to actual IP address referencing SO and different sources. Still same issue.
– rebooted the server

Here are the conf files.

hbase-site.xml

<configuration>
<property>
  <name>hbase.rootdir</name>
  <value>hdfs://<master>:9000/hbase</value>
        <description>The directory shared by regionservers.</description>
</property>

<property>
        <name>hbase.cluster.distributed</name>
        <value>true</value>
</property>

<property>
        <name>hbase.zookeeper.quorum</name>
        <value><master></value>
</property>

<property>
        <name>hbase.master</name>
        <value><master>:60000</value>
        <description>The host and port that the HBase master runs at.</description>
</property>

<property>
        <name>dfs.replication</name>
        <value>1</value>
        <description>The replication count for HLog and HFile storage. Should not be greater than HDFS datanode count.</description>
</property>

</configuration>

/etc/hosts file

127.0.0.1 localhost.localdomain localhost
::1 localhost6.localdomain6 localhost6
.

What am I doing wrong here?

Hadoop Version – Hadoop 0.20.2-cdh3u5
HBase Version – Version 0.90.6-cdh3u5

Best Answer

By looking at you configuration file, I assume that you are using the actual hostname in your config files. Add the hostname along with the IP of the machine into the /etc/hosts file if that is the case. Also make sure it matches with the hostname in your Hadoop's core-site.xml. Proper name resolution is vital for a proper HBase functioning.

If you still face any problem please follow the steps mentioned here properly. I have tried to explain the procedure in detail and hopefully you'll be able to make it run if you follow all the steps carefully.

HTH

Related Topic