Not sure this is a question that can get a really good answer but it's important to know the preference order of OSPF:
intra area routes
inter area routes
external type 1
external type 2
NSSA type 1
NSSA type 2
So in some cases adjusting the metric has no effect on traffic engineering. It's important to change the auto-cost reference-bandwidth on Cisco devices so that there is a difference in cost between higher speed interfaces. By default this might be set to 100 Mbit. This would mean that 100 Mbit, gig and 10 gig interfaces would have the same cost.
For faster convergence it's good to have equal cost routes (ECMP) so if you are peering iBGP to a loopback you two paths and if one goes down the impact should not be that big.
I used to work for a large ISP and they set all the costs manually on the transit interfaces. The basic design was to choose the path that had the fewest core hops. The network was designed with access, distribution and core levels with dual links between each level. If there are multiple paths with same number of core hops then choose the path with the lowest delay.
So based on this they designed the paths and set costs on the interfaces. This might require a lot of planning. The bad thing about link state protocols is that they only use the bandwidth of the interface to calculate the metric. So you could have a case where the metric is the same but the distance is significantly longer via one of the paths.
We are talking quite large networks before this would have an impact though.
So I would design it based on bandwidth, number of devices to go through, physical distance (delay) and of course money could be a factor if one path is more expensive than the other.
You have an elegant physical setup, but logically, it’s deficient in a lot of ways. You’re relying on layer 2 for all of your connections, which doesn’t provide any real failover your setup should provide. Luckily, you aren’t wasting any bandwidth with STP because you don’t have any redundant links; VLAN 100
and VLAN 200
only have a single way out, if I’m reading this correctly.
I can't implement multiple area OSPF because the CPE public subnets
are too difficult to summarize and the public range is split over
several POPs without hierarchy, so this would be inefficient.
I don’t even think you need to go as far as setting up multiple areas, the segment your concerned with is fairly small. Cisco recommends you stay under 50 routers per area, though, most will agree that you can exceed that amount by quite a significant margin and still be considered healthy. I have about 80 routers per area with no issues; Ron has even more than that without any hiccups.
- Implementation of EtherChannel between the access switches and CMTS.
This all hinges off what happens higher up in your network. If you’re significantly oversubscribing your distribution to core links, then you may not see any real benefit in increasing your bandwidth at the access layer. The same holds true if you’re doing this with your access to distribution.
Again, as most have mentioned in the comments, implementing OSPF would be pretty easy. If you were utilizing a layer 3 protocol, you would share that load across equal/unequal cost links, too. Perhaps a redundant link between your distribution switches would allow each access switch a redundant path out. As it stands, if one of your access links goes down, you lose an entire subnet (VLAN 100
or VLAN 200
).
Best Answer
I would strongly recommend moving to an OSPF/iBGP design for something of this scale, with the core switches acting as BGP route reflectors. BGP has so many more administrative handles for tinkering with routes over OSPF, allowing better scale and filtering.
If you scale to the point that you have more networks than your ToR switches can program into CAM (unlikely if each is a different stub), you run into issues. Each additional zone is more CPU load on your ABRs (core switches) as well.
Have one OSPF area 0, with all loopbacks and router-to-router links in it. Then setup iBGP sessions between your cores and ToR, advertising default-only to ToR, and redistribute connected/static routes on ToR into BGP.