We're going to purchase some new hardware to use just for a Hadoop cluster and we're stuck on what we should purchase. Say we have a budget of $5k should we buy two super nice machines at $2500/each, four at around $1200/each or eight at around $600 each? Will hadoop work better with more slower machines or fewest much faster machines? Or, as like most things "it depends"? 🙂
Hadoop cluster. 2 Fast, 4 Medium, 8 slower machines
clusterhadoophardware
Best Answer
If you can I would look at utilizing Cloud Infrastructure Services like Amazon Web Services (AWS) Elastic Compute Cloud (EC2), at least until you determine that it makes sense to invest in your own hardware. It's easy to get caught up in buying the shiny gear (I have to resist daily). By trying before you buy in the cloud you can learn a lot and answer the question: Does my companies software X or map/reduce framework against this data set best match a small, medium, or large set of server(s). I ran a number of combination's on AWS, scaling up, down, in, and out for pennies on the dollar within a few days. We were so happy with our testing that we decided to stay with AWS and forgo buying a large cluster of machines that we have to cool, power, maintain, etcetera. Instance types range from:
Standard Instances
High-CPU Instances
High-CPU Medium Instance 1.7 GB of memory, 5 EC2 Compute Units (2 virtual cores with 2.5 EC2 Compute Units each), 350 GB of instance storage, 32-bit platform
High-CPU Extra Large Instance 7 GB of memory, 20 EC2 Compute Units (8 virtual cores with 2.5 EC2 Compute Units each), 1690 GB of instance storage, 64-bit platform
EC2 Compute Unit (ECU) – One EC2 Compute Unit (ECU) provides the equivalent CPU capacity of a 1.0-1.2 GHz 2007 Opteron or 2007 Xeon processor.
Standard On-Demand Instances Linux/UNIX Usage Windows Usage
Small (Default) $0.10 per hour $0.125 per hour
Large $0.40 per hour $0.50 per hour
Extra Large $0.80 per hour $1.00 per hour
High CPU On-Demand Instances Linux/UNIX Usage Windows Usage
Medium $0.20 per hour $0.30 per hour
Extra Large $0.80 per hour $1.20 per hour
Sorry to make an answer sound like a vendor pitch, but if your environment allows you to go this route, I think you'll be happy and make a much better purchase decision should you buy your own hardware in the future.