THE PROBLEM:
It turns out we ran into a node limit on the FWSM. Evidently you can be within your rule limit but hit the node ceiling. This doc https://supportforums.cisco.com/docs/DOC-8786 details the compilation memory exhaustion issue that hit us. Quoting:
This command show np 3 acl stats
in the context in question will show
if the total nodes is reached. This limit may be reached even before
the ACL limit is reached. Each ACE may take a minimum of 2 nodes to a
maximum up to to 5 nodes depending on where the ACL is being called.
The ACL that is tied to MPF (modular policy framework) may take up
more nodes than the ACL that is tied to a NAT or to the access-group.
There is no way to calculate the number of nodes. The best way to
monitor this is to regularly look at the above output to make sure the
node count is not exceeded.
RESOLUTION:
1. Power down the secondary FWSM and power on the primary.
2. Revise our configuration by examining which rules could be removed (stale rules and recent changes allowing certain Development environments access to Production) in order to bring down the rule and therefore node count.
3. Ensure the new config compiles then reset the primary FWSM.
4. Verify the FWSM comes up clean with no errors.
5. Power down the primary FWSM and power up the secondary FWSM.
6. Repeat steps 3 and 4 with the secondary FWSM then power it down.
7. Power on the primary, ensure it's clean, power on the secondary.
8. Ensure the primary's config is copied to the secondary and that the primary is Active and sees the secondary as Standby.
MITIGATION:
1. We upgraded our development environment (2 x 6509 with Active/Standby FWSM) to FWSM version 4.1(13) from 3.2(23). The upgrade to the 4.x train increased the maximum node count for us from 28,356 to 38,439. More information: http://www.cisco.com/en/US/prod/collateral/modules/ps2706/product_bulletin_c25-478751.html.
2. We will upgrade our Production FWSM during the next change window.
3. We implemented a Kiwi CatTools job to run the show np 3 acl stats
command in our contexts and e-mail us a daily report. Whenever we make a rule change we also run this command to ensure we're within node and rule limits.
It's interesting behavior that the FWSM would allow us to continue adding rules. It only failed when it was power cycled and tried to compile the rule set. Lessons learned!
TCAM is a type of memory, which takes 10-12 transistors to store a single bit. By way of comparison, Static RAM (SRAM) only takes 6 transistors to store a single bit, and Dynamic RAM (DRAM) takes one transistor and a capacitor. All these different types of memories can either be internal or external to an ASIC. One reason to put all memories on a chip, is that they can be ran at higher clock rates than when external to a chip. Why choose one type of memory over another? This has to do with characteristics of the memory, SRAM can be accessed every clock, DRAM requires periodic refresh, so can not be accessed every clock and TCAM gives you ternary capability.
TCAMs are as scalable as long as you have space on a chip to instantiate them, or pins on package to connect to external ones. The issue with TCAM is they take 2x space of SRAM, and 12x space of DRAM. It does not always make sense to use TCAM for the same operations that you can do them algorithmically (Hashes, *tries) with other memory types. It comes down to a tradeoff between utilization effectiveness of the algorithm and space on the chip on which one to choose. TCAM's power utilization grows in linear proportion to size. The majority of large TCAMs (greater than 2M entries) now use algorithmic techniques so that power savings can be achieved.
NAT/PAT is complex feature, which generally needs a CPU or Network Processor (NPU) to handle fixups. The general packet flow for NAT is first packet goes to CPU/NPU, and a flow entry is installed in flow table or ACL table with the information on how to translate subsequent packets in the flow. There are multiple different forms of NAT/PAT, and just as many ways to optimize each one in a chip. The simplest NAT being rewrite the IPs, and don't worry if you break the addresses embedded in the payload, no fixups.
There is another version of BRKARC-3466 which was presented at CiscoLive 2013 in Melbourne that covers some of the high level ideas behind lookups, which is missing from the 2013 Orlando one. A good reference book on this area is Network Algorithmics: An Interdisciplinary Approach to Designing Fast Networked Devices by George Varghese.
Best Answer
Since you have used Cisco terminology in your question, I will assume that you are only talking about Cisco equipment. Also I will assume that you are only interested in L2+L3 devices (like the Cisco Catalyst family of switches), not in the pure L3 devices (like the ISR and ASR routers).
As Zac67 points out, there are some models of Cisco switches with just pure Layer 2 capability and zero Layer 3 capability. All switches in the Catalyst series are, however, capable of some layer 3 functionality, although the software may deliberately disable some layer 3 functionality based on the license etc. and I will confine my answer to these L2+L3 devices.
Rather than distinguish the behaviour based on "this is what a layer 2 switch contains" and "this is what a layer 3 switch contains", a more useful discussion would be based on how layer 2 switching (bridging) is done and how layer 3 switching (routing) is achieved. I will use the word "forwarding" to mean both bridging and routing.
At a high level, there are two approaches: software forwarding and hardware forwarding. As the name suggests, in hardware forwarding the ASIC forwards the packet. In software forwarding the packet reaches the CPU where the software code will examine the various fields of the packet and determine which interface(s) the packet will have to be sent out on. Hardware forwarding is much faster, but software forwarding is more flexible because it is just based on code that some programmer writes.
It is important to note that hardware and software forwarding co-exist. In the ideal case all packets will be forwarded in hardware, but there are situations where the packet cannot be forwarded in hardware and must be forwarded in software. There are many examples of this. For example, there is no Cisco ASIC that supports Appletalk routing, but there are IOS versions that still support Appletalk. If the switch receives an Appletalk packet, and Appletalk is configured, the packet is sent to the software where the Appletalk routing code will route the packet to the correct interface. Another example is an IPv4 packet with one or more Header Options fields present. Another example is when there are so many routes that the hardware table (i.e. ASIC TCAM) is unable to accommodate more routes.
Cisco IOS uses multiple techniques for L3 routing a packet in software: (1) process switching (2) fast switching and (3) CEF switching. These are all different software techniques, with different performance in terms of the maximum number of packets that can be routed per second. Fast-switching #2 is somewhat obsolete. CEF switching uses a software data structure called FIB in order to determine the output interface to which a packet must be sent.
L2 bridging in software has no specific named technique. It's just called "L2 bridging in software".
Coming now to hardware forwarding. ASICs are designed by the vendor keeping in view the requirements of the market segment, one of them being performance and the other requirement being cost. So the components that go into an ASIC are basically the cheapest components that can be put in while at the same time meeting the performance criteria. What I am trying to say is that there is no hard and fast rule that says that L2 bridging must always use a CAM. Yes, for L3 routing, because the requirement is to match on variable length CIDR masks, a TCAM is the most efficient component to use going by today's available technology. However, for doing, say, L2 MAC address lookup (which is a full 48-bit lookup), an ASIC designer may be able to get away by using a cheaper RAM-like component (especially if the entries can be hashed or sorted or arranged in such a way that lookup time can satisfy the performance constraint).
The layer 3 route lookup TCAM in Cisco switches is a hardware representation of the FIB. In other words, the same FIB data structure that is used in software L3 routing technique #3 above is programmed in the hardware TCAM to achieve hardware routing of IPv4 and IPv6 packets.
Note that TCAMs are also used in ASICs for other reasons than L3 routing. One example is to implement security ACLs, and to identify packets for QoS treatment. Cisco 4500 and 3850 switches have TCAMs for both L3 routing as well as for security/QoS.
Final note on "merchant silicon". In recent years, there is a school of thought among major equipment vendors (Cisco, Juniper, Arista...) that there is only so much "secret sauce" that can be put into ASICs, i.e. there isn't much competitive advantage to designing their own custom ASICs for doing L2/L3 forwarding. The competition is now in software innovation, and for this reason the thought process is "why not just source the ASIC from vendors like Broadcom, and focus the innovation efforts in software?" Having said that, at least Cisco has invested several billions of dollars over several decades to build in-house ASICs, and there is a reluctance to just throw it all away.