Way to get a list of Hadoop cluster machines from one of the data nodes

clusterdiscoveryhadoop

I have access to a data node in a Hadoop cluster, and I'd like to find out the identity of the name nodes for the same cluster. Is there a way to do this?

Best Answer

You can read the configuration file of the datanode, specifically hdfs-site.xml. It will list the namenode that the datanode will try to connect to.

Related Solutions

Hadoop cluster. 2 Fast, 4 Medium, 8 slower machines

If you can I would look at utilizing Cloud Infrastructure Services like Amazon Web Services (AWS) Elastic Compute Cloud (EC2), at least until you determine that it makes sense to invest in your own hardware. It's easy to get caught up in buying the shiny gear (I have to resist daily). By trying before you buy in the cloud you can learn a lot and answer the question: Does my companies software X or map/reduce framework against this data set best match a small, medium, or large set of server(s). I ran a number of combination's on AWS, scaling up, down, in, and out for pennies on the dollar within a few days. We were so happy with our testing that we decided to stay with AWS and forgo buying a large cluster of machines that we have to cool, power, maintain, etcetera. Instance types range from:

Standard Instances

Small Instance (Default) 1.7 GB of memory, 1 EC2 Compute Unit (1 virtual core with 1 EC2 Compute Unit), 160 GB of instance storage, 32-bit platform
Large Instance 7.5 GB of memory, 4 EC2 Compute Units (2 virtual cores with 2 EC2 Compute Units each), 850 GB of instance storage, 64-bit platform
Extra Large Instance 15 GB of memory, 8 EC2 Compute Units (4 virtual cores with 2 EC2 Compute Units each), 1690 GB of instance storage, 64-bit platform

High-CPU Instances

High-CPU Medium Instance 1.7 GB of memory, 5 EC2 Compute Units (2 virtual cores with 2.5 EC2 Compute Units each), 350 GB of instance storage, 32-bit platform
High-CPU Extra Large Instance 7 GB of memory, 20 EC2 Compute Units (8 virtual cores with 2.5 EC2 Compute Units each), 1690 GB of instance storage, 64-bit platform

EC2 Compute Unit (ECU) – One EC2 Compute Unit (ECU) provides the equivalent CPU capacity of a 1.0-1.2 GHz 2007 Opteron or 2007 Xeon processor.

Standard On-Demand Instances Linux/UNIX Usage Windows Usage
Small (Default) $0.10 per hour $0.125 per hour
Large $0.40 per hour $0.50 per hour
Extra Large $0.80 per hour $1.00 per hour

High CPU On-Demand Instances Linux/UNIX Usage Windows Usage
Medium $0.20 per hour $0.30 per hour
Extra Large $0.80 per hour $1.20 per hour

Sorry to make an answer sound like a vendor pitch, but if your environment allows you to go this route, I think you'll be happy and make a much better purchase decision should you buy your own hardware in the future.

How to fix Hadoop HDFS cluster with missing blocks after one node was reinstalled

I ended up having to delete the files with bad blocks, which after further investigation, I realized had a very low replication (rep=1 if I recall correctly).

This SO post has more information on finding the files with bad blocks, using something along the lines of:

hadoop fsck / | egrep -v '^\.+$' | grep -v eplica

So, to answer my own questions:

Can these files be recovered? Not unless the failed nodes/drives are brought back online with the missing data.
How do I get out of safe mode? Remove these troublesome files, and then leave safe mode via dfsadmin.

Best Answer

Related Solutions

Hadoop cluster. 2 Fast, 4 Medium, 8 slower machines

How to fix Hadoop HDFS cluster with missing blocks after one node was reinstalled

Related Topic