Try disabling Window's auto-tuning feature.
In a CMD window:
netsh interface tcp set global autotuning=disabled
Re-run your test, and see if you notice a performance improvement. I've had to do this on a couple of laptops running Windows 7 in my house, and it's helped.
If things get worse, or you don't notice any improvement, you can re-enable autotuning by:
netsh interface tcp set global autotuning=normal
Yes. Using single cables to "cascade" multiple Ethernet switches together does create bottlenecks. Whether or not those bottlenecks are actually causing poor performance, however, can only be determined by monitoring the traffic on those links. (You really should be monitoring your per-port traffic statistics. This is yet one more reason why that's a good idea.)
An Ethernet switch has a limited, but typically very large, internal bandwidth to perform its work within. This is referred to as the switching fabric bandwidth and can be quite large, today, on even very low-end gigabit Ethernet switches (a Dell PowerConnect 6248, for example, has a 184 Gbps switching fabric). Keeping traffic flowing between ports on the same switch typically means (with modern 24 and 48 port Ethernet switches) that the switch itself will not "block" frames flowing at full wire speed between connected devices.
Invariably, though, you'll need more ports than a single switch can provide.
When you cascade (or, as some would say, "heap") switches with crossover cables you're not extending the switching fabric from the switches into each other. You're certainly connecting the switches, and traffic will flow, but only at the bandwidth provided by the ports connecting the switches. If there's more traffic that needs to flow from one switch to another than the single connection cable can support frames will be dropped.
Stacking connectors are typically used to provide higher speed switch-to-switch interconnects. In this way you can connect multiple switches with a much less restrictive switch-to-switch bandwidth limitatation. (Using the Dell PowerConnect 6200 series again as an example, their stack connections are limited in length to under .5 meters, but operate at 40Gbps). This still doesn't extend the switching fabric, but it typically offers vastly improved performance as compared to a single cascaded connection between switches.
There were some switches (Intel 500 Series 10/100 switches come to mind) that actually extended the switching fabric between switches via stack connectors, but I don't know of any that have such a capability today.
One option that other posters have mentioned is using link aggregation mechanisms to "bond" multiple ports together. This uses more ports on each switch, but can increase switch-to-switch bandwidth. Beware that different link aggregation protocols use different algorithms to "balance" traffic across the links in the aggregation group, and you need to monitor the traffic counters on the individual interfaces in the aggregation group to insure that balancing is really occurring. (Typically some kind of hash of the source / destination addresses is used to achieve a "balancing" effect. This is done so that Ethernet frames arrive in the same order since frames between a single source and destination will always move across the same interfaces, and has the added benefit of not requiring queuing or monitoring of traffic flows on the aggregation group member ports.)
All of this concern about port-to-port switching bandwidth is one argument for using chassis-based switches. All the linecards in, for example, a Cisco Catalyst 6513 switch, share the same switching fabric (though some line cards may, themselves, have an independent fabric). You can jam a lot of ports into that chassis and get more port-to-port bandwidth than you could in a cascaded or even stacked discrete switch configuration.
Best Answer
Firstly check the interface counters on each server, there should be 0 or close to 0 errors reported.
Secondly, check the duplex of both servers. If you are mixing 100mbit and GbE then you may have a duplex mismatch. Ensure that both sides are set to auto/auto, or manually set the speed and the duplex on all interfaces.
Thirdly, what kind of contention is there on the GbE backbone, can you confirm that sufficient headroom exists for your transfer ?
Finally, is your sending server capable of transmitting fast enough. As the comments below suggest, are you limited by the IO bandwidth of the senders drives, or by CPU (if you are using scp or similar) ?
Btw, 11 to 13 mega bytes per second is the theoretical max for 100mbit, are you sure the tool you are using to measure is reporting the correct units?