What you want is called Microsoft Application Request Routing 2 (ARR). (Maybe the clumsy name is part of why so few people know of its existence?)
Microsoft ARR is a free-of-charge HTTP layer load balancer, implemented as a module for IIS 7+. (ARR itself is gratis, but the Windows Server license is of course required for the underlying OS.)
Since ARR is just a thin shim on top of IIS, it is quite fast and absolutely robust. And administrating ARR will be familiar for you guys, since you're already an IIS shop. ARR just installs itself in the IIS Manager GUI.
For a true high-availability setup, you should combine NLB and ARR, so that NLB keeps the ARR server tier highly available, and ARR keeps the backend web server tier highly available. See Microsoft's docs, and see the long list of documentation at the end of the ARR overview page linked at the top.
The only real downside to ARR is that if you do true high-availability, then you will require at least 2 Windows Server licenses & physical servers. Given that, and given the time it takes to set up, then low-end load balancer appliances like Coyote Point or loadbalancer.org can sometimes be a cost-effective alternative (Or Kemp, Barracuda Networks, or any of the other low-end vendors).
ability to seamlessly take a web server out of the load-balanced mix for maintenance without interrupting users.
That will depend on how session state is handled, i.e. how your backend servers share or not share the "this user is logged in" information.
If the webapp tier is stateless (i.e. placing session state in a shared datastore, fx a shared RAM cache or MSSQL), then you can just pull a server out of the pool. If not, then you can use "sticky sessions" on the load balancer, and remove a backend server from the load balancer pool, and then wait until all users have 'drained off' the server in question.
Willy Tarreau, the author of HAProxy, has a nice overview of load balancing techniques and issues here.
One of the main issues with resource based load balancing is that the load information becomes stale by the time you would make the routing decision. There is an academic paper on the topic of staleness that you might want to read called Interpreting State Load Information. You can get nasty side effects like sending too much load to a box that seems under-utilized and then overwhelm it. In short, load based balancing seems like the best way to do it at first to everyone but it turns out simple methods tend to work better in practice.
In most load balancing simple algorithms are usually fine because either the transactions are short lived or they cause such low load that a round-robin or random distribution will be close enough to a good balance. There generally needs to be overhead to absorb the load from failed servers anyways (if you are close the max utilization on all 3, as soon as one dies the load will cascade and you lose the whole cluster).
One solution might be to create two queues, one for the "heavy stuff" and one for the "light stuff". I would call the "light stuff" load balancing and the "heavy stuff" job scheduling - in other words they seem like different problems. Then just have limit to max number of sessions per each client and a universal queue for them for the job scheduling. I don't know of an ideal tool for that off the top of my head though.
Best Answer
Usually not. For most web sites, the normal "stateless" behavior of http connections means that connections can be torn down very quickly. Apache for example the default timeout is 15 second, IIS two minutes (although that could be lowered).
A worse case scenario is you have session affinity enabled, and a long connection timeout (15 minutes, 30 minutes, etc or higher), and a lot of unique visitors. In that scenario, the max connections could be orders of magnitude lower. That design with a high connection load would be rare.