What are the best tools for troubleshooting SAN performance bottlenecks?
Tools for Troubleshooting SAN Performance Bottlenecks
storage-area-network
Related Solutions
We have an EVA6100 so I've done tests on similar hardware. The 8000 is, IIRC, an older system than the 6100.
First off, your writes to the DB volume and Log volume are very likely contending for writes as it sounds like they're on the same Disk Group inside the EVA. A disk group is a grouping of disks that you create LUNs on, and each LUN is a disk presented to one (or more) servers. It is not at all uncommon to house many, many LUNs in a single disk-group. If you only have one disk-group, then the benefits of separating your log I/O from your DB I/O are somewhat reduced.
Second, if that DG is running into true I/O contention, then it'll degrade performance. This is something that's very hard to check from a consumer point of view, it's the kind of thing the SAN maintainers need to be aware of. Ask them if they know what the average controller CPU load is, as that'll significantly affect your write performance.
Third, know what your I/O mix really is an benchmark for that, not just file-copy speed. If your DB volume I/O is 80/20 read/write, then you really want to check read-path throughput. If it's the other way around, then your write performance is the metric you really need to check.
I guess this depends on you definition of white box Lefthand is really just a HP server with a custom OS on it.
Using Datacore you could do a similar thing, HP server with Windows and Datacore software
I've seen plenty examples where both of these outperform EMC systems, and I'm not talking bottom of the barrel AX systems either
for a white box setup the performance issue really come from the RAID cards, if your putting P800 HP cards in, they are costing you close to $1000 up anyway, but they perform very well, throw that in a Datacore system with 32gb of memory and all of a sudden you have twice as much cache as a high end EMC system, cache is where your performance lies, but the more in cache, the more not on disk, how often is the cache being written to disk? what sort of problems arise when there is a failure? individual UPS for that kind of white box is a great start, especially when you can interface it with the server for controlled cache flushing and such
Generally a whitebox SAN won't perform as well as a commercial SAN, this is generally down to components though, and just what you consider a whitebox SAN. I don't consider a Datacore or Lefthand system whitebox, because they are commercially supported software products, despite the fact that they are really just software layers for industry standard servers
Edit: Just to mirror what was said below, it is important that if the storage is important and mission critical, you have adequate support, that means not just buying the best product, but the one that offers the best support, now thats not always the big names, but its really not likely to be whoever is out back putting that white box together is it? :)
Best Answer
A lot depends on the hardware you're playing with. Bottlenecks can come from a variety of sources: