TCAM relationships in hardware-switching architecture

asicscpumemory

I am familiar (high level) with how Ternary Content Addressable Memory operates but I remain confused as to how TCAM relates to ASICs and when these components possibly also combine with the CPU for greater switching performance… (particularly when vendors often market the use of merchant/custom silicon, or custom ASICs for new products & features then it appears confusing).

I am familiar in Cisco IOS for example with the ability to partition more space for TCAM performance for features like qos, acl and route lookups. I also understand features like NAT will still rely on CPU processing, however i am specifically struggling with;

  1. Is TCAM part of the same hardware architecture, i.e. internal or
    external to the ASIC itself?
  2. Are TCAMs scalable (for example can manufacturers just keep adding TCAMs to a platform for greater performance & custom features) or is there a limit based on things like power consumption?
  3. Do TCAM cycles and ASICs work in parallel with the CPU at all for features like
    NAT or should I only think of them as independent things?

Best Answer

TCAM is a type of memory, which takes 10-12 transistors to store a single bit. By way of comparison, Static RAM (SRAM) only takes 6 transistors to store a single bit, and Dynamic RAM (DRAM) takes one transistor and a capacitor. All these different types of memories can either be internal or external to an ASIC. One reason to put all memories on a chip, is that they can be ran at higher clock rates than when external to a chip. Why choose one type of memory over another? This has to do with characteristics of the memory, SRAM can be accessed every clock, DRAM requires periodic refresh, so can not be accessed every clock and TCAM gives you ternary capability.

TCAMs are as scalable as long as you have space on a chip to instantiate them, or pins on package to connect to external ones. The issue with TCAM is they take 2x space of SRAM, and 12x space of DRAM. It does not always make sense to use TCAM for the same operations that you can do them algorithmically (Hashes, *tries) with other memory types. It comes down to a tradeoff between utilization effectiveness of the algorithm and space on the chip on which one to choose. TCAM's power utilization grows in linear proportion to size. The majority of large TCAMs (greater than 2M entries) now use algorithmic techniques so that power savings can be achieved.

NAT/PAT is complex feature, which generally needs a CPU or Network Processor (NPU) to handle fixups. The general packet flow for NAT is first packet goes to CPU/NPU, and a flow entry is installed in flow table or ACL table with the information on how to translate subsequent packets in the flow. There are multiple different forms of NAT/PAT, and just as many ways to optimize each one in a chip. The simplest NAT being rewrite the IPs, and don't worry if you break the addresses embedded in the payload, no fixups.

There is another version of BRKARC-3466 which was presented at CiscoLive 2013 in Melbourne that covers some of the high level ideas behind lookups, which is missing from the 2013 Orlando one. A good reference book on this area is Network Algorithmics: An Interdisciplinary Approach to Designing Fast Networked Devices by George Varghese.