Fast Convergence Techniques

In reading the SRND on router access layer there are a few key takeaways.
On page 5 it says: “Cisco recommends a routed network core in all cases.”

Let’s examine the reasoning behind this statement. First of all, what’s the alternative? You can have a layer-2 core where your devices simply switch frames at layer-2. Think about the problems with this design.

  1. Troubleshooting is hard
  2. Traffic is not as deterministic
  3. Engineering based on metrics is difficult

Why do I think troubleshooting is harder than layer-3 networks? Think about the tools you have available to you to troubleshoot layer-2 issues. You’ve got CAM tables, spanning-tree outputs, and CDP/LLDP messages. That’s about it. Layer-2 ping exists but what’s the next step if that fails? Layer-3 networks have a lot more management and troubleshooting tools available.

Traffic is not necessarily deterministic either. You rely on the spanning-tree algorithms to determine a loop-free path through the network. This means manipulation of root bridges and link costs. It’s difficult to ensure a particular path in the event of a failure. You need to ensure you know which ports are blocking/forwarding and so on. Layer-3 cores don’t have this problem – all links can be forwarding.

I would argue that traffic engineering is difficult as well. You have MST where you can select particular VLANs to remain forwarding on a set of links, with the other VLANs forwarding on a different set of links, but that’s difficult to scale. Layer-3 networks have an easier time for traffic engineering, with a standard set of tools to implement and troubleshoot them.

With that said, let’s examine some of the other items in the SRND with respect to the CCDE written outline.

  1. Layer 2 Down Detection – use point to point fiber connections where possible because the detection of failure is very quick. You should also examine the topics of debounce and carrier-delay. With this you should also implement ip event dampening on all interfaces to minimize disruption to your network during multiple failures. The dampening feature is very similar to BGP’s route dampening feature.
  2. For all media types – SONET and point-to-point fiber are both very fast, other media types are not quite as quick to detect. If you can, use BFD on any interfaces as this will decrease the detection of failures. In multipoint networks this may be the only way to have subsecond failure detection.
  3. Fast hello timers – OSPF and EIGRP provide the following:
    1. OSPF: ip ospf dead-interval minimal hello-multiplier # (typically 4)
    2. EIGRP:
      1. ip hello-interval eigrp 1
      2. ip hold-time eigrp 3
  4. OSPF, EIGRP, IS-IS, BGP – IS-IS will need further research, but I guess it has similar mechanisms as OSPF. BGP has several things that can be tweaked to decrease convergence time:
    1. bgp path mtu
  5. Fast SPF Timers – OSPF has several in the SRND:
    1. SPF throttle tuning
      1. timers throttle spf
      2. Best practice: 10 100 500
    2. LSA throttle tuning
      1. timers throttle lsa
      2. Best practice: 10 100 5000
  6. OSPF, IS-IS – this may be a typo?
  7. Recursion and Convergence – the issue they’re talking about here is the fact that OSPF’s convergence will increase as more routers exist in the network. You can increase the amount of links/routes within the OSPF area without taking as major of a hit as an increase in the amount of routers. The SPF calculation recurses on each type 1 LSA created by every router in the network, which will increase convergence time.
  8. Impact of Third Party Next Hop & BGP recursion – have a look at this diagram.


IP Routing: Aggregation Concepts and Techniques

  • Purpose of route aggregation
  • Scalability and fault isolation
  • How to Aggregate

What is the purpose of route aggregation? We aggregate routes to decrease the sizes of our routing tables, hide information, and decrease convergence times. The boundary device – ASBR / ABR / whatever – will use a route summarization technique to aggregate prefixes towards the direction of advertisement. For example, if you own the network, and you’re advertising this network to a customer, you may wish to simply advertise the classful network rather than advertising more specific prefixes (like, etc.) even if you have your network subnetted. By only advertising the aggregate, the size of your customers routing tables is minimized, which decreases the amount of calculation needed during reconvergence and hides information about how your AS is subnetted. The advertisement of a default route towards the access layer is also an example of aggregation. How does one accomplish the task of aggregation with various routing protocols?

  • BGP: the use of the aggregate-address <NET> <MASK> <summary-only> command
    • at least one more-specific address in that NET need to be present in the IGP and either redistributed into BGP or present in a network statement
    • you can use a route to Null0 to create this IGP route
  • OSPF: stub and the area <Area #> range <NET> <MASK> command
    • best practice is to create a route to Null0 to avoid routing loops
    • you may also need to look at no rfc1583 compatible command
  • EIGRP: stub and the ip summary-address eigrp <AS> <NET> <MASK> interface command
    • Remember that when you do this, a route to Null0 is automatically created
    • EIGRP also automatically summarizes, so you may want to disable this with the no auto-summary command

References: OSPF Design Guide EIGRP Design Guide BGP Case Studies