Networking – How Do VLANs Work?

ethernetnetworkingswitchtrunkvlan

What are VLANs? What problems do they solve?

I'm helping a friend learn basic networking, as he's just become the sole sysadmin at a small company. I've been pointing him at various questions/answers on Serverfault relating to various networking topics, and noticed a gap – there doesn't appear to be an answer which explains from first principles what VLANs are. In the spirit of How does Subnetting Work, I thought it would be useful to have a question with a canonical answer here.

Some potential topics to cover in an answer:

What are VLANs?
What problems were they intended to solve?
How did things work before VLANs?
How do VLANs relate to subnets?
What are SVIs?
What are trunk ports and access ports?
What is VTP?

EDIT: to be clear, I already know how VLANs work – I just think that Serverfault should have an answer that covers these questions. Time permitting, I'll be submitting my own answer as well.

Best Answer

Virtual LANs (VLANs) are an abstraction to permit a single physical network to emulate the functionality of multiple parallel physical networks. This is handy because there may be situations where you need the functionality of multiple parallel physical networks but you'd rather not spend the money on buying parallel hardware. I'll be speaking about Ethernet VLANs in this answer (even though other networking technologies can support VLANs) and I won't be diving deeply into every nuance.

A Contrived Example and a Problem

As a purely contrived example scenario, imagine you own an office building that you lease to tenants. As a benefit of the lease, each tenant will get live Ethernet jacks in each room of the office. You buy a Ethernet switch for each floor, wire them up to jacks in each office on that floor, and wire all the switches together.

Initially, you lease space to two different tenants-- one on the floor 1 and one on 2. Each of these tenants configures their computers w/ static IPv4 addresses. Both tenants use different TCP/IP subnets and everything seems to work just fine.

Later, a new tenant rents half of floor 3 and brings up one of these new-fangled DHCP servers. Time passes and the 1st floor tenant decides to jump on the DHCP bandwagon, too. This is the point when things start to go awry. The floor 3 tenants report that some of their computers are getting "funny" IP addresses from a machine that isn't their DHCP server. Soon, the floor 1 tenants report the same thing.

DHCP is a protocol that takes advantage of the broadcast capability of Ethernet to allow client computers to obtain IP addresses dynamically. Because the tenants are all sharing the same physical Ethernet network they share the same broadcast domain. A broadcast packet sent from any computer in the network will flood out all the switch ports to every other computer. The DHCP servers on floors 1 and 3 will receive all requests for IP address leases and will, effectively, duel to see who can answer first. This is clearly not the behavior you intend your tenants to experience. This is the behavior, though, of a "flat" Ethernet network w/o any VLANs.

Worse still, a tenant on floor 2 acquires this "Wireshark" software and reports that, from time to time, they see traffic coming out of their switch that references computers and IP addresses that they've never heard of. One of their employees has even figured out that he can communicate with these other computers by changing the IP address assigned to his PC from 192.168.1.38 to 192.168.0.38! Presumably, he's just a few short steps away from performing "unauthorized pro-bono system administration services" for one of the other tenants. Not good.

Potential Solutions

You need a solution! You could just pull the plugs between the floors and that would cut off all unwanted communication! Yeah! That's the ticket...

That might work, except that you have a new tenant who will be renting half of the basement and the unoccupied half of floor 3. If there isn't a connection between the floor 3 switch and the basement switch the new tenant won't be able to get communication between their computers that will be spread around both of their floors. Pulling the plugs isn't the answer. Worse still, the new tenant is bringing yet another one of these DHCP servers!

You flirt with the idea of buying physically separate sets of Ethernet switches for each tenant, but seeing as how your building has 30 floors, any of which can be subdivided up to 4 ways, the potential rats nest of floor-to-floor cables between massive numbers of parallel Ethernet switches could be a nightmare, not to mention expensive. If only there was a way to make a single physical Ethernet network act like it was multiple physical Ethernet networks, each with its own broadcast domain.

VLANs to the Rescue

VLANs are an answer to this messy problem. VLANs permit you to subdivide an Ethernet switch into logically disparate virtual Ethernet switches. This allows a single Ethernet switch to act as though it's multiple physical Ethernet switches. In the case of your subdivided floor 3, for example, you could configure your 48 port switch such that the lower 24 ports are in a given VLAN (which we'll call VLAN 12) and the higher 24 ports are in a given VLAN (which we'll call VLAN 13). When you create the VLANs on your switch you'll have to assign them some type of VLAN name or number. The numbers I'm using here are mostly arbitrary, so don't worry about what specific numbers I choose.

Once you've divided the floor 3 switch into VLANs 12 and 13 you find that the new floor 3 tenant can plug in their DHCP server to one of the ports assigned to VLAN 13 and a PC plugged into a port assigned to VLAN 12 doesn't get an IP address from the new DHCP server. Excellent! Problem solved!

Oh, wait... how do we get that VLAN 13 data down to the basement?

VLAN Communication Between Switches

Your half-floor 3 and half-basement tenant would like to connect computers in the basement to their servers on floor 3. You could run a cable directly from one of the ports assigned to their VLAN in the floor 3 switch to the basement and life would be good, right?

In the early days of VLANs (pre-802.1Q standard) you might do just that. The entire basement switch would be, effectively, part of VLAN 13 (the VLAN you've opted to assign to the new tenant on floor 3 and the basement) because that basement switch would be "fed" by a port on floor 3 that's assigned to VLAN 13.

This solution would work until you rent the other half of the basement to your floor 1 tenant who also wants to have communication between their 1st floor and basement computers. You could split the basement switch using VLANs (into, say, VLANS 2 and 13) and run a cable from floor 1 to a port assigned to VLAN 2 in the basement, but you better judgement tells you that this could quickly become a rat's nest of cables (and is only going to get worse). Splitting switches using VLANs is good, but having to run multiple cables from other switches to ports which are members of different VLANs seems messy. Undoubtedly, if you had to divide the basement switch 4 ways between tenants who also had space on higher floors you'd use 4 ports on the basement switch just to terminate "feeder" cables from upstairs VLANs.

It should now be clear that some type of generalized method of moving traffic from multiple VLANs between switches on a single cable is needed. Just adding more cables between switches to support connections between different VLANs isn't a scalable strategy. Eventually, with enough VLANs, you'll be eating up all the ports on your switches with these inter-VLAN / inter-switch connections. What's needed is a way to carry the packets from multiple VLANs along a single connection-- a "trunk" connection between switches.

Up to this point, all the switch ports we've talked about are called "access" ports. That is, these ports are dedicated to accessing a single VLAN. The devices plugged into these ports have no special configuration themselves. These devices don't "know" that any VLANs are present. Frames the client devices send are delivered to the switch which then takes care of making sure that the frame is only sent to ports assigned as members of the VLAN assigned to the port where the frame entered the switch. If a frame enters the switch on a port assigned as a member of VLAN 12 then the switch will only send that frame out ports that are members of VLAN 12. The switch "knows" the VLAN number assigned to a port from which it receives a frame and somehow knows to only deliver this frame out ports of the same VLAN.

If there were some way for a switch to share the VLAN number associated with a given frame to other switches then the other switch could properly handle delivering that frame only to the appropriate destination ports. This is what the 802.1Q VLAN tagging protocol does. (It's worth noting that, prior to 802.1Q, some vendors made up their own standards for VLAN tagging and inter-switch trunking. For the most part these pre-standard methods have all been supplanted by 802.1Q.)

When you have two VLAN-aware switches connected to each other and you want those switches to deliver frames between each other to the proper VLAN you connect those switches using "trunk" ports. This involves changing the configuration of a port on each switch from "access" mode to "trunk" mode (in a very basic configuration).

When a port is configured in trunk mode each frame that the switch sends out that port will have a "VLAN tag" included in the frame. This "VLAN tag" wasn't part of the original frame that the client sent. Rather, this tag is added by the sending switch prior to sending the frame out the trunk port. This tag denotes the VLAN number associated with the port from which the frame originated.

The receiving switch can look at the tag to determine which VLAN the frame originated from and, based on that information, forward the frame out only ports that are assigned to the originating VLAN. Because the devices connected to "access" ports aren't aware that VLANs are being used the "tag" information must be stripped from the frame before it's sent out a port configured in access mode. This stripping of the tag information causes the entire VLAN trunking process to be hidden from client devices since the frame they receive will not bear any VLAN tag information.

Before you configure VLANs in real life I'd recommend configuring a port for trunk mode on a test switch and monitoring the traffic being sent out that port using a sniffer (like Wireshark). You can create some sample traffic from another computer, plugged into an access port, and see that the frames leaving the trunk port will, in fact, be larger than the frames being send by your test computer. You'll see the VLAN tag information in the frames in Wireshark. I find that it's worth actually seeing what happens in a sniffer. Reading up on the 802.1Q tagging standard is also a decent thing to do at this point (especially since I'm not talking about things like "native VLANs" or double-tagging).

VLAN Configuration Nightmares and the Solution

As you rent more and more space in your building the number of VLANs grows. Each time you add a new VLAN you find that you have to logon to increasingly more Ethernet switches and add that VLAN to the list. Wouldn't it be great if there were some method by which you could add that VLAN to a single configuration manifest and have it automatically populate the VLAN configuration of each switch?

Protocols like Cisco's proprietary "VLAN Trunking Protocol" (VTP) or the standards-based "Multiple VLAN Registration Protocol" (MVRP-- previously spelled GVRP) fulfill this function. In a network using these protocols a single VLAN creation or deletion entry results in protocol messages being sent to all switches in the network. That protocol message communicates the change in VLAN configuration to the rest of the switches which, in turn, modify their VLAN configurations. VTP and MVRP aren't concerned with which specific ports are configured as access ports for specific VLANs, but rather are useful in communicating the creation or deletion of VLANs to all the switches.

When you've gotten comfortable with VLANs you'll probably want to go back and read about "VLAN pruning", which is associated with protocols like VTP and MVRP. For now it's nothing to be tremendously concerned with. (The VTP article on Wikipedia has a nice diagram that explains VLAN pruning and the benefits therewith.)

When Do You Use VLANs In Real Life?

Before we go much further it's important to think about real life rather than contrived examples. In lieu of duplicating the text of another answer here I'll refer you to my answer re: when to create VLANs. It's not necessarily "beginner-level", but it's worth taking a look at now since I'm going to make reference to it briefly before moving back to a contrived example.

For the "tl;dr" crowd (who surely have all stopped reading at this point, anyway), the gist of that link above is: Create VLANs to make broadcast domains smaller or when you want to segregate traffic for some particular reason (security, policy, etc). There aren't really any other good reasons to use VLANs.

In our example we're using VLANs to limit broadcast domains (to keep protocols like DHCP working right) and, secondarily, because we want isolation between the various tenants' networks.

An Aside re: IP Subnets and VLANs

Generally speaking there is a typically a one-to-one relationship between VLANs and IP subnets as a matter of convenience, to facilitate isolation, and because of how the ARP protocol works.

As we saw at the beginning of this answer two different IP subnets can be used on the same physical Ethernet without issue. If you're using VLANs to shrink broadcast domains you won't want to share the same VLAN with two different IP subnets since you'll be combining their ARP and other broadcast traffic.

If you're using VLANs to segregate traffic for security or policy reasons then you also probably won't want to combine multiple subnets in the same VLAN since you'll be defeating the purpose of isolation.

IP uses a broadcast-based protocol, Address Resolution Protocol (ARP), to map IP addresses onto physical (Ethernet MAC) addresses. Since ARP is broadcast based, assigning different parts of the same IP subnet to different VLANs would be problematic because hosts in one VLAN wouldn't be able to receive ARP replies from hosts in the other VLAN, since broadcasts aren't forwarded between VLANs. You could solve this "problem" by using proxy-ARP but, ultimately, unless you have a really good reason to need to split an IP subnet across multiple VLANs it's better not to do so.

One Last Aside: VLANs and Security

Finally, it's worth noting that VLANs aren't a great security device. Many Ethernet switches have bugs that permit frames originating from one VLAN to be sent out ports assigned to another VLAN. Ethernet switch manufacturers have worked hard to fix these bugs, but it's doubtful that there will ever be a completely bug free implementation.

In the case of our contrived example the floor 2 employee who is moments away from providing free systems administration "services" to another tenant might be stopped from doing so by isolating his traffic into a VLAN. He might also figure out how to exploit bugs in the switch firmware, though, to allow his traffic to "leak" out onto another tenant's VLAN as well.

Metro Ethernet providers are relying, increasingly, on VLAN tagging functionality and the isolation that switches provide. It's not fair to say that there's no security offered by using VLANs. It is fair to say, though, that in situations with untrusted Internet connections or DMZ networks it's probably better to use physically separate switches to carry this "touchy" traffic rather than VLANs on switches that also carry your trusted "behind the firewall" traffic.

Bringing Layer 3 into the Picture

So far everything this answer has talked about relates to layer 2-- Ethernet frames. What happens if we start bringing layer 3 into this?

Let's go back to the contrived building example. You've embraced VLANs opted to configure each tenant's ports as members of separate VLANs. You've configured trunk ports such that each floor's switch can exchange frames tagged with the originating VLAN number to the switches on the floor above and below. One tenant can have computers spread across multiple floors but, because of your adept VLAN configuring skills, these physically distributed computers can all appear to be part of the same physical LAN.

You're so full of your IT accomplishments that you decide to start offering Internet connectivity to your tenants. You buy a fat Internet pipe and a router. You float the idea to all your tenants and two of them immediately buy-in. Luckily for you your router has three Ethernet ports. You connect one port to your fat Internet pipe, another port to a switch port assigned for access to the first tenant's VLAN, and the other to a port assigned for access to the second tenant's VLAN. You configure your router's ports with IP addresses in each tenant's network and the tenants start accessing the Internet through your service! Revenue increases and you're happy.

Soon, though, another tenant decides to get onto your Internet offering. You're out of ports on your router, though. What to do?

Fortunately you bought a router that supports configuring "virtual sub-interfaces" on its Ethernet ports. In short this functionality allows the router to receive and interpret frames tagged with originating VLAN numbers, and to have virtual (that is, non-physical) interfaces configured with IP addresses appropriate for each VLAN it will communicate with. In effect this permits you to "multiplex" a single Ethernet port on the router such that it appears to function as multiple physical Ethernet ports.

You attach your router to a trunk port on one of your switches and configure virtual sub-interfaces corresponding to each tenant's IP addressing scheme. Each virtual sub-interface is configured with the VLAN number assigned to each Customer. When a frame leaves the trunk port on the switch, bound for the router, it will carry a tag with the originating VLAN number (since it's a trunk port). The router will interpret this tag and treat the packet as though it arrived on a dedicated physical interface corresponding to that VLAN. Likewise, when the router sends a frame to the switch in response to a request it will add a VLAN tag to the frame such that the switch knows to which VLAN the response frame should be delivered. In effect, you've configured the router to "appear" as a physical device in multiple VLANs while only using a single physical connection between the switch and the router.

Routers on Sticks and Layer 3 Switches

Using virtual sub-interfaces you've been able to sell Internet connectivity to all your tenants without having to buy a router that has 25+ Ethernet interfaces. You're fairly happy with your IT accomplishments so you respond positively when two of your tenants come to you with a new request.

These tenants have opted to "partner" on a project and they want to allow access from client computers in one tenant's office (one given VLAN) to a server computer in the other tenant's office (another VLAN). Since they're both Customers of your Internet service it's a fairly simple change of an ACL in your core Internet router (on which there is a virtual sub-interface configured for each of these tenant's VLANs) to allow traffic to flow between their VLANs as well as to the Internet from their VLANs. You make the change and send the tenants on their way.

The next day you receive complaints from both tenants that access between the client computers in one office to the server in the second office is very slow. The server and client computers both have gigabit Ethernet connections to your switches but the files only transfer at around 45Mbps which, coincidentally, is roughly half of the speed with which your core router connects to its switch. Clearly the traffic flowing from the source VLAN to the router and back out from the router to the destination VLAN is being bottlenecked by the router's connection to the switch.

What you've done with your core router, allowing it to route traffic between VLANs, is commonly known as "router on a stick" (an arguably stupidly whimsical euphemism). This strategy can work well, but traffic can only flow between the VLANs up to the capacity of the router's connection to the switch. If, somehow, the router could be conjoined with the "guts" of the Ethernet switch itself it could route traffic even faster (since the Ethernet switch itself, per the manufacturer's spec sheet, is capable of switching over 2Gbps of traffic).

A "layer 3 switch" is an Ethernet switch that, logically speaking, contains a router buried inside itself. I find it tremendously helpful to think of a layer 3 switch as having a tiny and fast router hiding inside the switch. Further, I would advise you to think about the routing functionality as a distinctly separate function from the Ethernet switching function that the layer 3 switch provides. A layer 3 switch is, for all intents and purposes, two distinct devices wrapped up in a single chassis.

The embedded router in a layer 3 switch is connected to the switch's internal switching fabric at a speed that, typically, allows for routing of packets between VLANs at or near wire-speed. Analogously to the virtual sub-interfaces you configured on your "router on a stick" this embedded router inside the layer 3 switch can be configured with virtual interfaces that "appear" to be "access" connections into each VLAN. Rather than being called virtual sub-interfaces these logical connections from the VLANs into the embedded router inside a layer 3 switch are called Switch Virtual Interfaces (SVIs). In effect, the embedded router inside a layer 3 switch has some quantity of "virtual ports" that can be "plugged in" to any of the VLANs on the switch.

The embedded router performs the same way as a physical router except that it typically doesn't have all of the same dynamic routing protocol or access-control list (ACL) features as a physical router (unless you've bought a really nice layer 3 switch). The embedded router has the advantage, however, of being very fast and not having a bottleneck associated with a physical switch port that it's plugged into.

In the case of our example here with the "partnering" tenants you might opt to obtain a layer 3 switch, plug it into trunk ports such that traffic from both Customers VLANs reaches it, then configure SVIs with IP addresses and VLAN memberships such that it "appears" in both Customers VLANs. Once you've done that it's just a matter of tweaking the routing table on your core router and the embedded router in the layer 3 switch such that traffic flowing between the tenants' VLANs is routed by the embedded router inside the layer 3 switch versus the "router on a stick".

Using a layer 3 switch doesn't mean that there still won't be bottlenecks associated with the bandwidth of the trunk ports that interconnect your switches. This is an orthogonal concern to those that VLANs address, though. VLANs have nothing to do with bandwidth problems. Typically bandwidth problems are solved by either obtaining higher-speed inter-switch connections or using link-aggregation protocols to "bond" several lower-speed connections together into a virtual higher-speed connection. Unless all the devices creating frames to be routed by the embedded router inside the later 3 switch are, themselves, plugged into ports directly on the layer 3 switch you still need to worry about the bandwidth of the trunks between the switches. A layer 3 switch isn't a panacea, but it's typically faster than a "router on a stick".

Dynamic VLANs

Lastly, there is a function in some switches to provide dynamic VLAN membership. Rather than assigning a given port to be an access port for a given VLAN the port's configuration (access or trunk, and for which VLANs) can be altered dynamically when a device is connected. Dynamic VLANs are a more advanced topic but knowing that the functionality exists can be helpful.

The functionality varies between vendors but typically you can configure dynamic VLAN membership based on the MAC address of the connected device, 802.1X authentication status of the device, proprietary and standards-based protocols (CDP and LLDP, for example, to allow IP phones to "discover" the VLAN number for voice traffic), IP subnet assigned to the client device, or Ethernet protocol type.

Counting to 1

If you are already fluent in binary (base 2) notation you can skip this section.

For those of you who are left: Shame on you for not being fluent in binary notation!

Yes, that may be a bit harsh. It's really, really easy to learn to count in binary, and to learn shortcuts to convert binary to decimal and back. You really should know how to do it.

Counting in binary is so simple because you only have to know how to count to 1!

Think of a car's "odometer", except that unlike a traditional odometer each digit can only count up to 1 from 0. When the car is fresh from the factory the odometer reads "00000000".

When you've driven your first mile the odometer reads "00000001". So far, so good.

When you've driven your second mile the first digit of the odometer rolls back over to "0" (since its maximum value is "1") and the second digit of the odometer rolls over to "1", making the odometer read "00000010". This looks like the number 10 in decimal notation, but it's actually 2 (the number of miles you've driven the car so far) in binary notation.

When you've driven the third mile the odometer reads "00000011", since the first digit of the odometer turns again. The number "11", in binary notation, is the same as the decimal number 3.

Finally, when you've driven your fourth mile both digits (which were reading "1" at the end of the third mile) roll back over to zero position, and the 3rd digit rolls up to the "1" position, giving us "00000100". That's the binary representation of the decimal number 4.

You can memorize all of that if you want, but you really only need to understand how the little odometer "rolls over" as the number it's counting gets bigger. It's exactly the same as a traditional decimal odometer's operation, except that each digit can only be "0" or "1" on our fictional "binary odometer".

To convert a decimal number to binary you could roll the odometer forward, tick by tick, counting aloud until you've rolled it a number of times equal to the decimal number you want to convert to binary. Whatever is displayed on the odometer after all that counting and rolling would be the binary representation of the decimal number you counted up to.

Since you understand how the odometer rolls forward you'll also understand how it rolls backward, too. To convert a binary number displayed on the odometer back to decimal you could roll the odometer back one tick at a time, counting aloud until the odometer reads "00000000". When all that counting and rolling is done, the last number you say aloud would be the decimal representation of the binary number the odometer started with.

Converting values between binary and decimal this way would be very tedious. You could do it, but it wouldn't be very efficient. It's easier to learn a little algorithm to do it faster.

A quick aside: Each digit in a binary number is known as a "bit". That's "b" from "binary" and "it" from "digit". A bit is a bnary digit.

Converting a binary number like, say, "1101011" to decimal is a simple process with a handy little algorithm.

Start by counting the number of bits in the binary number. In this case, there are 7. Make 7 divisions on a sheet of paper (in your mind, in a text file, etc) and begin filling them in from right to left. In the rightmost slot, enter the number "1", because we'll always start with "1". In the next slot to the left enter double the value in the slot to the right (so, "2" in the next one, "4" in the next one) and continue until all the slots are full. (You'll end up memorizing these numbers, which are the powers of 2, as you do this more and more. I'm alright up to 131,072 in my head but I usually need a calculator or paper after that).

So, you should have the following on your paper in your little slots.

 64    |    32    |    16    |    8    |    4    |    2    |    1    |

Transcribe the bits from the binary number below the slots, like so:

 64    |    32    |    16    |    8    |    4    |    2    |    1    |
  1          1          0         1         0         1         1

Now, add some symbols and compute the answer to the problem:

 64    |    32    |    16    |    8    |    4    |    2    |    1    |
x 1        x 1        x 0       x 1       x 0       x 1       x 1
---        ---        ---       ---       ---       ---       ---
       +          +          +         +         +         +         =

Doing all the math, you should come up with:

 64    |    32    |    16    |    8    |    4    |    2    |    1    |
x 1        x 1        x 0       x 1       x 0       x 1       x 1
---        ---        ---       ---       ---       ---       ---
 64    +    32    +     0    +    8    +    0    +    2    +    1    =   107

That's got it. "1101011" in decimal is 107. It's just simple steps and easy math.

Converting decimal to binary is just as easy and is the same basic algorithm, run in reverse.

Say that we want to convert the number 218 to binary. Starting on the right of a sheet of paper, write the number "1". To the left, double that value (so, "2") and continue moving toward the left of the paper doubling the last value. If the number you are about to write is greater than the number being converted stop writing. otherwise, continue doubling the prior number and writing. (Converting a big number, like 34,157,216,092, to binary using this algorithm can be a bit tedious but it's certainly possible.)

So, you should have on your paper:

 128    |    64    |    32    |    16    |    8    |    4    |    2    |    1    |

You stopped writing numbers at 128 because doubling 128, which would give you 256, would be large than the number being converted (218).

Beginning from the leftmost number, write "218" above it (128) and ask yourself: "Is 218 larger than or equal to 128?" If the answer is yes, scratch a "1" below "128". Above "64", write the result of 218 minus 128 (90).

Looking at "64", ask yourself: "Is 90 larger than or equal to 64?" It is, so you'd write a "1" below "64", then subtract 64 from 90 and write that above "32" (26).

When you get to "32", though, you find that 32 is not greater than or equal to 26. In this case, write a "0" below "32", copy the number (26) from above 32" to above "16" and then continue asking yourself the same question with the rest of the numbers.

When you're all done, you should have:

 218         90         26         26        10         2         2         0
 128    |    64    |    32    |    16    |    8    |    4    |    2    |    1    |
   1          1          0          1         1         0         1         0

The numbers at the top are just notes used in computation and don't mean much to us. At the bottom, though, you see a binary number "11011010". Sure enough, 218, converted to binary, is "11011010".

Following these very simple procedures you can convert binary to decimal and back again w/o a calculator. The math is all very simple and the rules can be memorized with just a bit of practice.

Splitting Up Addresses

Think of IP routing like pizza delivery.

When you're asked to deliver a pizza to "123 Main Street" it's very clear to you, as a human, that you want to go to the building numbered "123" on the street named "Main Street". It's easy to know that you need to go to the 100-block of Main Street because the building number is between 100 and 199 and most city blocks are numbered in hundreds. You "just know" how to split the address up.

Routers deliver packets, not pizza. Their job is the same as a pizza driver: To get the cargo (packets) as close to the destination as possible. A router is connected to two or more IP subnets (to be at all useful). A router must examine destination IP addresses of packets and break those destination addresses up into their "street name" and "building number" components, just like the pizza driver, to make decisions about delivery.

Each computer (or "host") on an IP network is configured with a unique IP address and subnet mask. That IP address can be divided up into a "building number" component (like "123" in the example above) called the "host ID" and a "street name" component (like "Main Street" in the example above) called the "network ID". For our human eyes, it's easy to see where the building number and the street name are in "123 Main Street", but harder to see that division in "10.13.216.41 with a subnet mask of 255.255.192.0".

IP routers "just know" how to split up IP addresses into these component parts to make routing decisions. Since understanding how IP packets are routed hinges on understanding this process, we need to know how to break up IP addresses, too. Fortunately, extracting the host ID and the network ID out of an IP address and subnet mask is actually pretty easy.

Start by writing out the IP address in binary (use a calculator if you haven't learned to do this in your head just yet, but make a note learn how to do it-- it's really, really easy and impresses the opposite sex at parties):

      10.      13.     216.      41
00001010.00001101.11011000.00101001

Write out the subnet mask in binary, too:

     255.     255.     192.       0
11111111.11111111.11000000.00000000

Written side-by-side, you can see that the point in the subnet mask where the "1s" stop "lines up" to a point in the IP address. That's the point that the network ID and the host ID split. So, in this case:

      10.      13.     216.      41
00001010.00001101.11011000.00101001 - IP address
11111111.11111111.11000000.00000000 - subnet mask
00001010.00001101.11000000.00000000 - Portion of IP address covered by 1s in subnet mask, remaining bits set to 0
00000000.00000000.00011000.00101001 - Portion of IP address covered by 0s in subnet mask, remaining bits set to 0

Routers use the subnet mask to "mask out" the bits covered by 1s in the IP address (replacing the bits that are not "masked out" with 0s) to extract the network ID:

      10.      13.     192.       0
00001010.00001101.11000000.00000000 - Network ID

Likewise, by using the subnet mask to "mask out" the bits covered by 0s in the IP address (replacing the bits that are not "masked out" with 0s again) a router can extract the host ID:

       0.       0.      24.      41
00000000.00000000.00011000.00101001 - Portion of IP address covered by 0s in subnet mask, remaining bits set to 0

It's not as easy for our human eyes to see the "break" between the network ID and the host ID as it is between the "building number" and the "street name" in physical addresses during pizza delivery, but the ultimate effect is the same.

Now that you can split up IP addresses and subnet masks into host ID's and network ID's you can route IP just like a router does.

More Terminology

You're going to see subnet masks written all over the Internet and throughout the rest of this answer as (IP/number). This notation is known as "Classless Inter-Domain Routing" (CIDR) notation. "255.255.255.0" is made up of 24 bits of 1s at the beginning, and it's faster to write that as "/24" than as "255.255.255.0". To convert a CIDR number (like "/16") to a dotted-decimal subnet mask just write out that number of 1s, split it into groups of 8 bits, and convert it to decimal. (A "/16" is "255.255.0.0", for instance.)

Back in the "old days", subnet masks weren't specified, but rather were derived by looking at certain bits of the IP address. An IP address starting with 0 - 127, for example, had an implied subnet mask of 255.0.0.0 (called a "class A" IP address).

These implied subnet masks aren't used today and I don't recommend learning about them anymore unless you have the misfortune of dealing with very old equipment or old protocols (like RIPv1) that don't support classless IP addressing. I'm not going to mention these "classes" of addresses further because it's inapplicable today and can be confusing.

Some devices use a notation called "wildcard masks". A "wildcard mask" is nothing more than a subnet mask with all 0s where there would be 1s, and 1s where there would be 0s. The "wildcard mask" of a /26 is:

 11111111.11111111.11111111.11000000 - /26 subnet mask
 00000000.00000000.00000000.00111111 - /26 "wildcard mask"

Typically you see "wildcard masks" used to match host IDs in access-control lists or firewall rules. We won't discuss them any further here.

How a Router Works

As I've said before, IP routers have a similar job to a pizza delivery driver in that they need to get their cargo (packets) to its destination. When presented with a packet bound for address 192.168.10.2, an IP router needs to determine which of its network interfaces will best get that packet closer to its destination.

Let's say that you are an IP router, and you have interfaces connected to you numbered:

Ethernet0 - 192.168.20.1, subnet mask /24
Ethernet1 - 192.168.10.1, subnet mask /24

If you receive a packet to deliver with a destination address of "192.168.10.2", it's pretty easy to tell (with your human eyes) that the packet should be sent out the interface Ethernet1, because the Ethernet1 interface address corresponds to the packet's destination address. All the computers attached to the Ethernet1 interface will have IP addresses starting with "192.168.10.", because the network ID of the IP address assigned to your interface Ethernet1 is "192.168.10.0".

For a router, this route selection process is done by building a routing table and consulting the table each time a packet is to be delivered. A routing table contains network ID and destination interface names. You already know how to obtain a network ID from an IP address and subnet mask, so you're on your way to building a routing table. Here's our routing table for this router:

Network ID: 192.168.20.0 (11000000.10101000.00010100.00000000) - 24 bit subnet mask - Interface Ethernet0
Network ID: 192.168.10.0 (11000000.10101000.00001010.00000000) - 24 bit subnet mask - Interface Ethernet1

For our incoming packet bound for "192.168.10.2", we need only convert that packet's address to binary (as humans -- the router gets it as binary off the wire to begin with) and attempt to match it to each address in our routing table (up to the number of bits in the subnet mask) until we match an entry.

Incoming packet destination: 11000000.10101000.00001010.00000010

Comparing that to the entries in our routing table:

11000000.10101000.00001010.00000010 - Destination address for packet
11000000.10101000.00010100.00000000 - Interface Ethernet0
!!!!!!!!.!!!!!!!!.!!!????!.xxxxxxxx - ! indicates matched digits, ? indicates no match, x indicates not checked (beyond subnet mask)

11000000.10101000.00001010.00000010 - Destination address for packet
11000000.10101000.00001010.00000000 - Interface Ethernet1, 24 bit subnet mask
!!!!!!!!.!!!!!!!!.!!!!!!!!.xxxxxxxx - ! indicates matched digits, ? indicates no match, x indicates not checked (beyond subnet mask)

The entry for Ethernet0 matches the first 19 bits fine, but then stops matching. That means it's not the proper destination interface. You can see that the interface Ethernet1 matches 24 bits of the destination address. Ah, ha! The packet is bound for interface Ethernet1.

In a real-life router, the routing table is sorted in such a manner that the longest subnet masks are checked for matches first (i.e. the most specific routes), and numerically so that as soon as a match is found the packet can be routed and no further matching attempts are necessary (meaning that 192.168.10.0 would be listed first and 192.168.20.0 would never have been checked). Here, we're simplifying that a bit. Fancy data structures and algorithms make faster IP routers, but simple algorithms will produce the same results.

Static Routes

Up to this point, we've talked about our hypothetical router as having networks directly connected to it. That's not, obviously, how the world really works. In the pizza-driving analogy, sometimes the driver isn't allowed any further into the building than the front desk, and has to hand-off the pizza to somebody else for delivery to the final recipient (suspend your disbelief and bear with me while I stretch my analogy, please).

Let's start by calling our router from the earlier examples "Router A". You already know RouterA's routing table as:

Network ID: 192.168.20.0 (11000000.10101000.00010100.00000000) - subnet mask /24 - Interface RouterA-Ethernet0
Network ID: 192.168.10.0 (11000000.10101000.00001010.00000000) - subnet mask /24 - Interface RouterA-Ethernet1

Suppose that there's another router, "Router B", with the IP addresses 192.168.10.254/24 and 192.168.30.1/24 assigned to its Ethernet0 and Ethernet1 interfaces. It has the following routing table:

Network ID: 192.168.10.0 (11000000.10101000.00001010.00000000) - subnet mask /24 - Interface RouterB-Ethernet0
Network ID: 192.168.30.0 (11000000.10101000.00011110.00000000) - subnet mask /24 - Interface RouterB-Ethernet1

In pretty ASCII art, the network looks like this:

               Interface                      Interface
               Ethernet1                      Ethernet1
               192.168.10.1/24                192.168.30.254/24
     __________  V                  __________  V
    |          | V                 |          | V
----| ROUTER A |------- /// -------| ROUTER B |----
  ^ |__________|                 ^ |__________|
  ^                              ^
Interface                      Interface
Ethernet0                      Ethernet0
192.168.20.1/24                192.168.10.254/24

You can see that Router B knows how to "get to" a network, 192.168.30.0/24, that Router A knows nothing about.

Suppose that a PC with the IP address 192.168.20.13 attached to the network connected to router A's Ethernet0 interface sends a packet to Router A for delivery. Our hypothetical packet is destined for the IP address 192.168.30.46, which is a device attached to the network connected to the Ethernet1 interface of Router B.

With the routing table shown above, neither entry in Router A's routing table matches the destination 192.168.30.46, so Router A will return the packet to the sending PC with the message "Destination network unreachable".

To make Router A "aware" of the existence of the 192.168.30.0/24 network, we add the following entry to the routing table on Router A:

Network ID: 192.168.30.0 (11000000.10101000.00011110.00000000) - subnet mask /24 - Accessible via 192.168.10.254

In this way, Router A has a routing table entry that matches the 192.168.30.46 destination of our example packet. This routing table entry effectively says "If you get a packet bound for 192.168.30.0/24, send it on to 192.168.10.254 because he knows how to deal with it." This is the analogous "hand-off the pizza at the front desk" action that I mentioned earlier-- passing the packet on to somebody else who knows how to get it closer to its destination.

Adding an entry to a routing table "by hand" is known as adding a "static route".

If Router B wants to deliver packets to the 192.168.20.0 subnet mask 255.255.255.0 network, it will need an entry in its routing table, too:

Network ID: 192.168.20.0 (11000000.10101000.00010100.00000000) - subnet mask /24 - Accessible via: 192.168.10.1 (Router A's IP address in the 192.168.10.0 network)

This would create a path for delivery between the 192.168.30.0/24 network and the 192.168.20.0/24 network across the 192.168.10.0/24 network between these routers.

You always want to be sure that routers on both sides of such an "interstitial network" have a routing table entry for the "far end" network. If router B in our example didn't have a routing table entry for "far end" network 192.168.20.0/24 attached to router A our hypothetical packet from the PC at 192.168.20.13 would get to the destination device at 192.168.30.46, but any reply that 192.168.30.46 tried to send back would be returned by router B as "Destination network unreachable." One-way communication is generally not desirable. Always be sure you think about traffic flowing in both directions when you think about communication in computer networks.

You can get a lot of mileage out of static routes. Dynamic routing protocols like EIGRP, RIP, etc, are really nothing more than a way for routers to exchange routing information between each other that could, in fact, be configured with static routes. One large advantage to using dynamic routing protocols over static routes, though, is that dynamic routing protocols can dynamically change the routing table based on network conditions (bandwidth utilization, an interface "going down", etc) and, as such, using a dynamic routing protocol can result in a configuration that "routes around" failures or bottlenecks in the network infrastructure. (Dynamic routing protocols are WAY outside the scope of this answer, though.)

You Can't Get There From Here

In the case of our example Router A, what happens when a packet bound for "172.16.31.92" comes in?

Looking at the Router A routing table, neither destination interface or static route matches the first 24 bits of 172.18.31.92 (which is 10101100.00010010.00011111.01011100, by the way).

As we already know, Router A would return the packet to the sender via a "Destination network unreachable" message.

Say that there's another router (Router C) sitting at the address "192.168.20.254". Router C has a connection to the Internet!

                              Interface                      Interface                      Interface
                              Ethernet1                      Ethernet1                      Ethernet1
                              192.168.20.254/24              192.168.10.1/24                192.168.30.254/24
                    __________  V                  __________  V                  __________  V
((  heap o  ))     |          | V                 |          | V                 |          | V
(( internet )) ----| ROUTER C |------- /// -------| ROUTER A |------- /// -------| ROUTER B |----
((   w00t!  ))   ^ |__________|                 ^ |__________|                 ^ |__________|
                 ^                              ^                              ^
               Interface                      Interface                      Interface
               Ethernet0                      Ethernet0                      Ethernet0
               10.35.1.1/30                   192.168.20.1/24                192.168.10.254/24

It would be nice if Router A could route packets that do not match any local interface up to Router C such that Router C can send them on to the Internet. Enter the "default gateway" route.

Add an entry at the end of our routing table like this:

Network ID: 0.0.0.0 (00000000.00000000.00000000.00000000) - subnet mask /0 - Destination router: 192.168.20.254

When we attempt to match "172.16.31.92" to each entry in the routing table we end up hitting this new entry. It's a bit perplexing, at first. We're looking to match zero bits of the destination address with... wait... what? Matching zero bits? So, we're not looking for a match at all. This routing table entry is saying, basically, "If you get here, rather than giving up on delivery, send the packet on to the router at 192.168.20.254 and let him handle it".

192.168.20.254 is a destination we DO know how to deliver a packet to. When confronted with a packet bound for a destination for which we have no specific routing table entry this "default gateway" entry will always match (since it matches zero bits of the destination address) and gives us a "last resort" place that we can send packets for delivery. You'll sometimes hear the default gateway called the "gateway of last resort."

In order for a default gateway route to be effective it must refer to a router that is reachable using the other entries in the routing table. If you tried to specify a default gateway of 192.168.50.254 in Router A, for example, delivery to such a default gateway would fail. 192.168.50.254 isn't an address that Router A knows how to deliver packets to using any of the other routes in its routing table, so such an address would be ineffective as a default gateway. This can be stated concisely: The default gateway must be set to an address already reachable by using another route in the routing table.

Real routers typically store the default gateway as the last route in their routing table such that it matches packets after they've failed to match all other entries in the table.

Urban Planning and IP Routing

Breaking up a IP subnet into smaller IP subnets is like urban planning. In urban planning, zoning is used to adapt to natural features of the landscape (rivers, lakes, etc), to influence traffic flows between different parts of the city, and to segregate different types of land-use (industrial, residential, etc). IP subnetting is really much the same.

There are three main reasons why you would subnet a network:

You may want to communicate across different unlike communication media. If you have a T1 WAN connection between two buildings IP routers could be placed on the ends of these connections to facilitate communication across the T1. The networks on each end (and possibly the "interstitial" network on the T1 itself) would be assigned to unique IP subnets so that the routers can make decisions about which traffic should be sent across the T1 line.
In an Ethernet network, you might use subnetting to limit the amount of broadcast traffic in a given portion of the network. Application-layer protocols use the broadcast capability of Ethernet for very useful purposes. As you get more and more hosts packed into the same Ethernet network, though, the percentage of broadcast traffic on the wire (or air, in wireless Ethernet) can increase to such a point as to create problems for delivery of non-broadcast traffic. (In the olden days, broadcast traffic could overwhelm the CPU of hosts by forcing them to examine each broadcast packet. That's less likely today.) Excessive traffic on switched Ethernet can also come in form of "flooding of frames to unknown destinations". This condition is caused by an Ethernet switch being unable to keep track of every destination on the network and is the reason why switched Ethernet networks can't scale to an infinite number of hosts. The effect of flooding of frames to unknown destinations is similar to the the effect of excess broadcast traffic, for the purposes of subnetting.
You may want to "police" the types of traffic flowing between different groups of hosts. Perhaps you have print server devices and you only want authorized print queuing server computers to send jobs to them. By limiting the traffic allowed to flow to the print server device subnet users can't configure their PCs to talk directly to the print server devices to bypass print accounting. You might put the print server devices into a subnet all to themselves and create a rule in the router or firewall attached to that subnet to control the list of hosts permitted to send traffic to the print server devices. (Both routers and firewalls can typically make decisions about how or whether to deliver a packet based on the source and destination addresses of the packet. Firewalls are typically a sub-species of router with an obsessive personality. They can be very, very concerned about the payload of packets, whereas routers typically disregard payloads and just deliver the packets.)

In planning a city, you can plan how streets intersect with each other, and can use turn-only, one-way, and dead-end streets to influence traffic flows. You might want Main Street to be 30 blocks long, with each block having up to 99 buildings each. It's pretty easy to plan your street numbering such that each block in Main Street has a range of street numbers increasing by 100 for each block. It's very easy to know what the "starting number" in each subsequent block should be.

In planning IP subnets, you're concerned with building the right number of subnets (streets) with the right number of available host ID's (building numbers), and using routers to connect the subnets to each other (intersections). Rules about allowed source and destination addresses specified in the routers can further control the flow of traffic. Firewalls can act like obsessive traffic cops.

For the purposes of this answer, building our subnets is our only major concern. Instead of working in decimal, as you would with urban planning, you work in binary to describe the bounds of each subnet.

Continued on: How does IPv4 Subnetting Work?

(Yes ... we reached the maximum size of an answer (30000 characters).)

Network Hardware – Should Speeds Be Set to Autonegotiate or Fixed?

I have yet to see a problem with auto-negotiation of network speeds that isn't caused by either (a) a mismatch of manual on one end of the link and auto on the other or (b) a failing component of the link (cable, port, etc).
This depends on the admin, but my experience has shown me that if you manually specify the link speeds and duplex settings, than you are bound to run into speed mismatches. Why? Because it is nearly impossible to document the various connections between switches and servers and then follow that documentation when making changes. Most failures I have seen are because of 1(a) and you only get in to that situation when you start manually setting speed/duplex settings.

As mention in the Cisco documentation:

If you disable autonegotiation, it hides link drops and other physical layer problems. Only disable autonegotiation to end-devices, such as older Gigabit NICs that do not support Gigabit autonegotiation. Do not disable autonegotiation between switches unless absolutely required, as physical layer problems can go undetected and result in spanning tree loops.

Unless you are prepared to setup a change management system for network changes that requires the verification of speed/duplex (and don't forget flow control) or are willing to deal with occasional mismatches that come from manually specifying these settings on all network devices, then stick with the default configuration of auto/auto.

In the future, consider monitoring the errors on the switch ports with MRTG so you can spot these issues before you have a problem.

Edit: I do see a lot of people referencing negotiation failures on old equipment. Yes this was an issue a long time ago when the standards were being created and not all devices followed them. Are your NICs and switches less than 10 years old? If so, then this won't be an issue.