Yes. Using single cables to "cascade" multiple Ethernet switches together does create bottlenecks. Whether or not those bottlenecks are actually causing poor performance, however, can only be determined by monitoring the traffic on those links. (You really should be monitoring your per-port traffic statistics. This is yet one more reason why that's a good idea.)
An Ethernet switch has a limited, but typically very large, internal bandwidth to perform its work within. This is referred to as the switching fabric bandwidth and can be quite large, today, on even very low-end gigabit Ethernet switches (a Dell PowerConnect 6248, for example, has a 184 Gbps switching fabric). Keeping traffic flowing between ports on the same switch typically means (with modern 24 and 48 port Ethernet switches) that the switch itself will not "block" frames flowing at full wire speed between connected devices.
Invariably, though, you'll need more ports than a single switch can provide.
When you cascade (or, as some would say, "heap") switches with crossover cables you're not extending the switching fabric from the switches into each other. You're certainly connecting the switches, and traffic will flow, but only at the bandwidth provided by the ports connecting the switches. If there's more traffic that needs to flow from one switch to another than the single connection cable can support frames will be dropped.
Stacking connectors are typically used to provide higher speed switch-to-switch interconnects. In this way you can connect multiple switches with a much less restrictive switch-to-switch bandwidth limitatation. (Using the Dell PowerConnect 6200 series again as an example, their stack connections are limited in length to under .5 meters, but operate at 40Gbps). This still doesn't extend the switching fabric, but it typically offers vastly improved performance as compared to a single cascaded connection between switches.
There were some switches (Intel 500 Series 10/100 switches come to mind) that actually extended the switching fabric between switches via stack connectors, but I don't know of any that have such a capability today.
One option that other posters have mentioned is using link aggregation mechanisms to "bond" multiple ports together. This uses more ports on each switch, but can increase switch-to-switch bandwidth. Beware that different link aggregation protocols use different algorithms to "balance" traffic across the links in the aggregation group, and you need to monitor the traffic counters on the individual interfaces in the aggregation group to insure that balancing is really occurring. (Typically some kind of hash of the source / destination addresses is used to achieve a "balancing" effect. This is done so that Ethernet frames arrive in the same order since frames between a single source and destination will always move across the same interfaces, and has the added benefit of not requiring queuing or monitoring of traffic flows on the aggregation group member ports.)
All of this concern about port-to-port switching bandwidth is one argument for using chassis-based switches. All the linecards in, for example, a Cisco Catalyst 6513 switch, share the same switching fabric (though some line cards may, themselves, have an independent fabric). You can jam a lot of ports into that chassis and get more port-to-port bandwidth than you could in a cascaded or even stacked discrete switch configuration.
What you're looking for is commonly called a "transmit hash policy" or "transmit hash algorithm". It controls the selection of a port from a group of aggregate ports with which to transmit a frame.
Getting my hands on the 802.3ad standard has proven difficult because I'm not willing to spend money on it. Having said that, I've been able to glean some information from a semi-official source that sheds some light on what you're looking for. Per this presentation from the 2007 Ottawa, ON, CA IEEE High Speed Study Group meeting the 802.3ad standard does not mandate particular algorithms for the "frame distributor":
This standard does not mandate any particular distribution algorithm(s); however, any distribution algorithm shall ensure that, when frames are received by a Frame Collector as specified in 43.2.3, the algorithm shall not cause a) Mis-ordering of frames that are part of any given conversation, or b) Duplication of frames. The above requirement to maintain frame ordering is met by ensuring that all frames that compose a given conversation are transmitted on a single link in the order that they are generated by the MAC Client; hence, this requirement does not involve the addition (or modification) of any information to the MAC frame, nor any buffering or processing on the part of the corresponding Frame Collector in order to re-order frames.
So, whatever algorithm a switch / NIC driver uses to distribute transmitted frames must adhere to the requirements as stated in that presentation (which, presumably, was quoting from the standard). There is no particular algorithm specified, only a compliant behavior defined.
Even though there's no algorithm specified, we can look at a particular implementation to get a feel for how such an algorithm might work. The Linux kernel "bonding" driver, for example, has an 802.3ad-compliant transmit hash policy that applies the function (see bonding.txt in the Documentation\networking directory of the kernel source):
Destination Port = ((<source IP> XOR <dest IP>) AND 0xFFFF)
XOR (<source MAC> XOR <destination MAC>)) MOD <ports in aggregate group>
This causes both the source and destination IP addresses, as well as the source and destination MAC addresses, to influence the port selection.
The destination IP address used in this type of hashing would be the address that's present in the frame. Take a second to think about that. The router's IP address, in an Ethernet frame header away from your server to the Internet, isn't encapsulated anywhere in such a frame. The router's MAC address is present in the header of such a frame, but the router's IP address isn't. The destination IP address encapsulated in the frame's payload will be the address of the Internet client making the request to your server.
A transmit hash policy that takes into account both source and destination IP addresses, assuming you have a widely varied pool of clients, should do pretty well for you. In general, more widely varied source and/or destination IP addresses in the traffic flowing across such an aggregated infrastructure will result in more efficient aggregation when a layer 3-based transmit hash policy is used.
Your diagrams show requests coming directly to the servers from the Internet, but it's worth pointing out what a proxy might do to the situation. If you're proxying client requests to your servers then, as chris speaks about in his answer then you may cause bottlenecks. If that proxy is making the request from its own source IP address, instead of from the Internet client's IP address, you'll have fewer possible "flows" in a strictly layer 3-based transmit hash policy.
A transmit hash policy could also take layer 4 information (TCP / UDP port numbers) into account, too, so long as it kept with the requirements in the 802.3ad standard. Such an algorithm is in the Linux kernel, as you reference in your question. Beware that the the documentation for that algorithm warns that, due to fragmentation, traffic may not necessarily flow along the same path and, as such, the algorithm isn't strictly 802.3ad-compliant.
Best Answer
Failover: Unplug each cable and ensure traffic continues to flow between the switches.
Bandwidth: iperf seems to be the standard took for testing these kinds of things. If you can get a setup like this:
Then setup Iperf on all 3 boxes and try various client/server setups or you can also just SCP over some files if your on linux. Simultaneously transfer files form A to C and B to C. Try unplugging L1 or L2, or L1 and then L2 and observe if the bandwidth drops. The reason you need to do 2 simultaneous transfers is becasue each transfer(TCP) can only use one line(L1 or L2 in this case). Depending on your switches, you might be able to transfer between 2 boxes(for example A and C) if the transfers are on different ports, but thats only newer switches, most hash based on src and dst IP addresses.