This answer has been edited after the question was clarified.
What are other reasons effects clouds to prefer DAS
Where "DAS" means Direct Attached Storage, i.e. SATA or SAS harddisk drives.
Cloud vendors all use DAS because it offers order-of-magnitude improvements in price/performance. It is a case of scaling horizontally.
In short, SATA harddisk drives and SATA controllers are cheap commodities. They are mass-market products, and are priced very low. By building a large cluster of cheap PCs with cheap SATA drives, Google, Amazon and others obtain vast capacity at a very low price point. They then add their own software layer on top. Their software does multi-server replication for performance and reliability, monitoring, re-balancing replication after hardware failure, and other things.
You could take a look at MogileFS as a simpler representative of the kind of software that Google, Amazon and others use for storage. It's a different implementation of course, but it shares many of the same design goals and solutions as the large-scale systems. If you want to, here is a jumping point for learning more about GoogleFS.
stated later in the paper, Clouds should use SAN or NAS because of DAS is not appropriate when a VM moves to another server
There are 2 reasons why SAN's are not used.
1) Price.
SAN's are hugely expensive at large scale. While they may be the technically "best" solution, they are typically not used at very large scale installations due to the cost.
2) The CAP Theorem
Eric Brewer's CAP theorem shows that at very large scale you cannot maintain strong consistency while keeping acceptable reliability, fault tolerance, and performance. SAN's are an attempt at making strong consistency in hardware. That may work nicely for a 5.000 server installation, but it has never been proved to work for Google's 250.000+ servers.
Result:
So far the cloud computing vendors have chosen to push the complexity of maintaining server state to the application developer. Current cloud offerings do not provide consistent state for each virtual machine. Application servers (virtual machines) may crash and their local data be lost at any time.
Each vendor then has their own implementation of persistent storage, which you're supposed to use for important data. Amazon's offerings are nice examples; MySQL, SimpleDB, and Simple Storage Service. These offerings themselves reflect the CAP theorem -- the MySQL instance has strong consistency, but limited scalability. SimpleDB and S3 scale fantastically, but are only eventually consistent.
Your questions are non trivial and there is not enough info to give a good answer.
I can give an answer (clustered filesystem over fibre channel SAN) - but it may well turn out to be more expensive and complex than it needs to be.
So I'll just throw out some comments/thoughts. Really stuff for you to consider.
Perhaps after reading this brain dump, you'll be able to restate your app's intended behaviour and maybe then we can give you a better answer.
NAS devices export file systems (e.g. CIFS, NFS), so you don't really connect them to your servers - your servers mount file systems from them.
That means reads and writes to them need to go over your a connection.
So if you have a 100mbit network connection between your NAS and your server and your read/writes occur at a 1:1 ratio, then the best you'll get is 50mbit reads, because for every byte you read, you also write a byte. If your client and download traffic are on that same network then you can halve it again.
Clearly if you want to use a NAS then you going to want multiple NICs in your servers and multipel networks/VLANs in your architecure.
Assuming there are 4 possible data locations in your app.
- A) Orignal data source, e.g.
internet.
- B) Your server.
- C) NAS.
- D) client
List item
Then there are 4 possible data vectors
- AB i.e. the data download from A(the net) to B(your server).
- BC i.e. writing data from your server to the NAS.
- CB reading data from the NAS to your server
- BD writing data from your server to the client
Depending on how your app works and ignoring protocol overhead you may (worst case) then need 4 100mbit networks to transport 100mbit per second to your clients.
So you'll need to consider the read and the write bandwidth to the NAS if you use a NAS.
If you use a FC SAN you can reduce your network needs and you get other advatangges.
E.g.
Depending on OS and the filesystem you end up using, a SAN will allow you to grow your LUNs dynamically and grow your filessyems live as well as share the LUNs wth more hosts, again potentially as a live operation.
You can reduce the cost of the SAN by not using fibre channel, e.g. you could use iSCSI.
In which case you'll again want separate networks for your data and you'll want dedicated NICs, ideally with tcp/iSCSI offload hardware. That will give you most of the advatnges of a SAN with lower cost.
I have not really used iSCSI exxcept for the most basic single LUN to a single host, with simple linux LVM and ext3, so I am not 100% sure if it si really as good as FC SAN, but I gather it can be if well implemented.
SAN arrays are probably the better choice if you are going to use a clustered filesystem. The question is do you really need a clustered file system?
That will depend on the characteristics of your app and your architecture.
Now if your app can guarantee that only node node will write to a given file at a given time, then you can probably go to a NAS. But you may have problems if you modify a file with one host while it is being read with another host, so your app would need to detect and deal with that scenario. If that is a scenario that you dont want to bother with then a clustered file system is probably better choice - they are designed to work with that sort of scenario.
So questions like some of these listed below might make a big difference to your architecture:
- Does the file need to be reused again after it has been downloaded once and sent to the client - i.e. might it be re-read from storage and served to another client?
- Does a file need to be written completely to storage before it is sent to a client?
- Can a file be stored on local disk on the server and served to the client from local disk, and then be written to NAS/SAN after it has been served to the client?
- Are multiple clients likely to be using the same file at once? E.g. is it likely that 50 clients will access one file or 50 clients will access 50 different files.
- If 50 clients each request the same file, will it be downloaded once or 50 times?
- If another client comes along 3 hours later and requests the same file will the file be downloaded again or will it come from disk?
- Is the disk a cache or a slow buffer?
- Will there be any other processing performed on the file before it is returned to eh clients, e.g. security scanned, have URLs re-written etc.
Given the limited info we have I'd say the safest architecture is the most expensive and complex architecture as that will deal with most of the worst case problems and be very scalable.
I.e. Fibre channel SAN and clustered file system.
In all cases whatever your storage, DAS, SAN, NAS, all other things being equal more spindles are better.
Best Answer
TL;DR
1) On a cloud, not that many cheap options unless you want to go for an S3-like system. With a centralized system, you can only scale so far before you start running into issues (See scaling up vs. scaling out) so if you are rolling your own solution you'd probably be best starting off with a distributed system that lets you add and remove servers on demand, rather than just getting a big SAN and keep adding disks.
2) They will almost certainly use dedicated hardware, co-located or in private datacenters. If you go to a storage provider and say "hey, I want to buy 2000 disks" they'll give you some pretty decent discounts if you know what you're doing. Storing 100TB of data will always be cheaper (Per GB) than storing 100GB, the more you store the cheaper it gets.
Have a look into a distributed data store like HFS or Riak. Never used HFS but we're using a Riak cluster on 4 nodes with 10TB of storage. RIAK has a HTTP API so with a little bit of careful configuration you can just point your CDN to your Riak cluster. Alternatively just use S3, RackSpace cloud files, Google Storage etc. and let someone else worry about that for you. Since pre-existing storage providers are already on multi-TB/PB of storage, they can most likely do it cheaper than you would be able to roll your own.
That being said, BackBlaze (Online backup company) "open sourced" the designs for their storage "pods" which store ridiculous amounts of data very cheaply. They are more suited to "write once, sit there doing nothing for years" as is the nature of backups, but it's still an interesting read. You could also look into something like the BroadBerry storage servers, their top end model has 36 hot-swap drive bays but costs +$5k without drives (Filling it with 2TB enterprise 7200RPM drives you're looking at more like $25k, or with cheap drives $15k, that entirely depends on your workload). OVH provide some "backup" servers with ~20TB of un-RAIDed storage for around £200/mo if I remember correctly.
You also need to think about tiered storage. Basically, this means you split your data up into "tiers" based on what you need. If some of your objects must be kept at all costs, and need to be accessed quickly, they should be on top, or "gold" tier storage with fast, reliable disks, on servers well equip to handle the load. This might be the sort of thing you would put on a high-end SAN with lots of lovely SAS or even SSD disks. If you have some objects which are re-generatable and don't need to be accessed quickly (Say, thumbnails for images that are normally cached on CDN edges), you can put those on "silver" tier storage; cheaper disks, on slower servers. Then you have your backups, while you may never need them, and they might not need to be available immediately, you want to keep them for as long as possible, as cheaply as possible. You might put those on "bronze" storage, like tapes.
The storage levels I described are for a purely fictional situation, it's entirely possible to have 50 tiers of storage, and you can call them whatever you want. It might be that even your lowest tier of storage requires super-fast access, that all depends on your usage.