This:
shape average 9000000 36096
actually gives your ASA license to burst the the bandwidth you've allocated, if there#s been a suitably quiet period. If you want a guarantee that you never exceed a given bandwidth, using policing is a better option (on the other hand, with policing, any packets that exceed the bandwidth will be dropped instead of delayed).
There is no one answer. For the most simplest of TCP services, each client will attempt to grab data as fast as it can, and the server will shovel it to the clients as fast as it is able. Given two clients of combined bandwidth exceeding the bandwidth of the server, both clients will probably download at speeds of roughly half the server's bandwidth.
There are a LOT of variables in this that make this not quite true in real life. If the TCP/IP stacks of the different clients are differently able to handle high streaming connections, that by itself can affect bandwidth even if the server has infinite bandwidth. Different operating systems or server programs handle streaming speed ramp-up differently. Latency has an effect on throughput, where large latency connections can be significantly slower than low latency connections even though both connections can stream (in absolute values) the same amount of data.
A case in point, downloading kernel source archives. I've got very fast bandwidth at work, in fact it exceeds my LAN speed so I can saturate my local 100Mb connection if I get the right server. Watching my network utilization chart while downloading large files I can see some servers start small, 100Kb/s, slowly ramp up to high values, 7Mb/s, then something happens and it all starts over again. Other servers will give me everything immediately when I start downloading.
Anyway, items that can cause actual bandwidth allocation to differ from absolute equality:
- TCP/IP capabilities of the client and server relationship
- TCP tuning parameters on either side, not just capabilities
- Latency on the line
- The application-level transfer protocols being used
- The existence of hardware specifically designed for load balancing
- Congestion between clients and the server itself
In regards to your test-cases, what likely happened is that one client was able to establish a higher datastream rate than the other, perhaps by getting there first. When the other stream started it was not allocated sufficient resources to gain full speed parity; the first stream got there first and got most of the resources. If the first stream ended the second would likely pick up speed. In this case, the speed experienced by the clients was determined by the Server OS, the application doing the streaming, and the TCP/IP stack of the Server. Also, if the network card supported it, the TCP Offload Engine of the network card, if present and enabled.
As I said, there are a lot of variables that go into it.
Slow-ramp bandwidth usage:
Best Answer
Yep
Make an access-list and class to match the traffic you're interested in
Create a queuing structure for your important traffic and give it priority in times of congestion:
Finally, create a shaper to limit your upstream to match what you pay for (this will force the queuing to happen on your equipment instead of upstream in the provider where they'll drop whatever they want (including your voice);
Finally apply the policy map outbound on your router interface that faces the internet