A SQL Server cluster is not a scalability solution, is strictly a high-availability solution. In a SQL Server cluster only one node is active and handles load, all the other nodes are just passive stand-by, waiting in case the active node has a hardware failure. As such froma SQL Server cluster you obtain, at best, the performance of a single SQL Server instance. You'll find sometimes references to a so called 'Active-Active' deployment, but that is not a SQL Server cluster. The so called 'Active-Active' is two separate clusters that use each other's node in reverse roles (one cluster in active on node A and passive on B, the other is active B and passive on A).
If your application is already partitioned and each customer has a separate database, then you'd be much better continuing to scale out. The problems of administration and maintenance can be addressed, once you automate the administration of 4 servers (ie. now) you can automate the administration of 1000 servers. So are the problem of versionning, deployment and account provisioning.
While right now you may be attracted by a perceived ease of administration, troubleshoot and tunning from a scale up solution, you are going to have new problems, more difficult to solve, culminating with a limit on how high can you scale up.
Updated After you edited the OP with explanation on concerns related to scale out.
Indeed scale up and clustering offer a high availability solution. There are basically two HA solutions for SQL Server: clustering and database mirroring. Since you plan for thousands of individual databases, this pretty much rules out mirroring. What you have to keep in mind though is that SQL Server clusters are supported only on a restricted list of hardware combinations:
Microsoft server clusters are only
supported on cluster solutions that
are listed in the Windows Server
Catalog under Cluster Solutions. To
view the Windows Server Catalog, visit
the following Microsoft Web site:
http://www.windowsservercatalog.com/
That doesn't mean you cannot run a cluster on any hardware pair/group that has a shared SCSI buss. It means you won't be able to request support from Microsoft CSS if you have an issue on a hardware that is not approved.
SQL Server clustering offers protection against hardware failures, except media failure (since the media is shared between nodes). What it doesn't offer protection from is human administration errors, which unfortunately are the primary reason for down-time. With one thousand customer databases on a single cluster, you'll have a very large omelette cooking in one basket...
As for you question about how many databases can a cluster run, ultimately a cluster has only one active instance at any time, so the limits of a single instance apply to a cluster as well. Having one thousand databases attached is not a big thing, the real question is the load, how many users will be running queries simultaneously. To get a ballpark region of numbers, consider that the TPC-C benchmarks published for SQL Server are for some 1.2 mil transactions per minute on a 64 way Superdome, which means roughly 16k transactions per second. You anticipate about 5k clients connected, that gives you a margin of 3 queries per client per second, and to reach that you need a pretty beeffy hardware and an exceptionally well tunned application.
65 GB really is trivial, but may not be over in copying over a slow LAN, or if you can't afford a few minutes of down-time.
The fastest way to cut-over (and keep it simple) is to do logshipping (no recovery mode) between your your existing db and your new db.
You should be able to copy logins & agent jobs before the cut-over & leave them disabled till you move your live traffic over.
Not really a mistake to switch servers and version of sql server at the same time, depends on your requirements. Just remember that bringing the db online on sql server 2008 R2 is a one way process that will upgrade the data files the first time the db does a recovery & there is no going back!
Your other not so simple option is to do the following:
- on your new server, install the same
version of sql server as your
existing server
- setup database mirroring btwn
existing & new server
- do the same prep in copying
logins/agent jobs/ other dependencies
- failover the mirror to new server
making it the primary (assuming you
have support from your app to do
this)
- update existing server to 2008 r2,
then fail back the mirror upgrade new
server to 2008 r2 then failover again
remove the mirror, shutdown old
server
There is probably many variations of the above approaches. The moral of the story is that you have to consider the cost of the cutover vs. the cost of down-time. It's a trade-off.
Best Answer
That indicates that it is an issue with that particular web app, since you say other web apps don't have this problem. Unless those other web apps use radically different database data, performance should be at least comparable.
What you want to do is run a server-side trace on the database, and capture only the duration and rowset size for each SQL call from that server - and possibly from another server for comparison.
You can then plot this against perfmon captured data for SQL server and the IIS machine - yes, SQL Server Profiler fully supports this scenario :)
This will yield zoom-, scroll-, and clickable graphs for all the data you put in; refer to SQL Server Profiler help (BOL) for details on how to set this up.