RIPv2 advantages & disadvantages

RIPv2 gets a bad rap, in my opinion, because it still has some suitable network topologies that it works well for. It’s kind-of like the visceral reaction some people have towards static routing. There’s no ‘right’ answer for most of these design types of decisions, only ‘better’ answers. As for RIPv2, let’s see some of the advantages:

  • It’s a standardized protocol
  • It’s VLSM compliant
  • Provides fast convergence, and sends triggered updates when the network changes
  • Works with snapshot routing – making it ideal for dial networks

Of course, it’s not all sweetness & light, there are some disadvantages:

  • Max hopcount of 15, due to the ‘count-to-infinity‘ vulnerability
  • No concept of neighbors
  • Exchanges entire table with all neighbors every 30 seconds (except in the case of a triggered update)

I think the biggest advantage of RIP are its simplicity, standards compliance, and snapshot routing capability. I’ve recently had to use RIPv2 on a customer network to solve a very specific problem. The reality of the situation was that no other protocol would work as effectively as RIPv2 does.

I work a lot with Linkway TDMA satellite modems at work, and while these modems do support OSPF and BGP, they also support RIP. We found that the use of RIP solved a problem elegantly and with a minimum of configuration overhead. It’s also much easier to teach the operators than something like OSPF or (gasp) BGP.


Fast convergence & carrier-delay

I’m reading Definitive MPLS Network Designs and I came across an interesting detail with respect to carrier delay. Carrier delay is how long to wait before signaling to the control plane a detection of a network link failure. By default, the interface will wait 2 seconds before signaling failure. Typically, it is a best practice to configure the carrier-delay to 0 so it signals immediately. However, if the underlying transport has a recovery time, it may be better to wait before signaling the carrier loss. For example, if you have a protected SDH link with a recovery time of 80ms, you should wait at least this long to allow SDH to recover the transport.

The same question applies if you’re configuring a carrier-delay on a transport where the circuit frequently drops and then recovers itself. A good example would be a dirty serial link that frequently drops for a few seconds and then recovers itself.

IOS Command Reference


Fast Convergence Techniques: IS-IS

We would be irresponsible as network designers if we did not study and appreciate IS-IS for the problems it can solve. IS-IS is a link-state protocol similar to OSPF. IS-IS uses TLVs (similar to BGP), and is thus easily extended. IS-IS and OSPF are the two choices you have when deploying TE on MPLS networks, so you should know how IS-IS compares with OSPF when your design requires fast convergence. Cisco has several resources on their site, which I’ve distilled into a few rules of thumb:

  1. Increase LSP refresh timer to a high value
  2. Increase MAX LSP lifetime to a high value
  3. Tune PRC interval
  4. Tune SPF interval
  5. Tune LSP generation interval
  6. Use BFD in lieu of fast hellos (on multiaccess networks)
  7. Tune ISIS retransmit interval
  8. Set overload-bit on startup
  9. Disable hello padding
  10. Use a single IS type where possible (note that Cisco default is L1/L2)
  11. Use metric-style wide (not necessarily FC related, but is req’d for TE)

Also related: use isis mesh-group and configure point-to-point interface type on multiaccess interfaces where they are really point-to-point (think 2-member Ethernet segments).

Cisco Technology Support Page
Cisco IOS 12.4 IS-IS Fast Convergence
Cisco Configuration Example


Fast Hellos

One thing to think about from a design perspective is the scalability of fast hellos. If you have a router with several hundred interfaces, each sending subsecond hellos, the RP has to process a lot of hello packets. If you can design your network to have point-to-point interfaces, the interface flap will most likely be quicker and a more scalable way of detecting failures. If you can’t do this you may want to look into Bidirectional Forwarding Detection (BFD). BFD is protocol independent, so it can signal multiple interface protocols in the event of a failure.

OSPF Support For Fast Hello Packets


Fast Convergence Techniques

In reading the SRND on router access layer there are a few key takeaways.
On page 5 it says: “Cisco recommends a routed network core in all cases.”

Let’s examine the reasoning behind this statement. First of all, what’s the alternative? You can have a layer-2 core where your devices simply switch frames at layer-2. Think about the problems with this design.

  1. Troubleshooting is hard
  2. Traffic is not as deterministic
  3. Engineering based on metrics is difficult

Why do I think troubleshooting is harder than layer-3 networks? Think about the tools you have available to you to troubleshoot layer-2 issues. You’ve got CAM tables, spanning-tree outputs, and CDP/LLDP messages. That’s about it. Layer-2 ping exists but what’s the next step if that fails? Layer-3 networks have a lot more management and troubleshooting tools available.

Traffic is not necessarily deterministic either. You rely on the spanning-tree algorithms to determine a loop-free path through the network. This means manipulation of root bridges and link costs. It’s difficult to ensure a particular path in the event of a failure. You need to ensure you know which ports are blocking/forwarding and so on. Layer-3 cores don’t have this problem – all links can be forwarding.

I would argue that traffic engineering is difficult as well. You have MST where you can select particular VLANs to remain forwarding on a set of links, with the other VLANs forwarding on a different set of links, but that’s difficult to scale. Layer-3 networks have an easier time for traffic engineering, with a standard set of tools to implement and troubleshoot them.

With that said, let’s examine some of the other items in the SRND with respect to the CCDE written outline.

  1. Layer 2 Down Detection – use point to point fiber connections where possible because the detection of failure is very quick. You should also examine the topics of debounce and carrier-delay. With this you should also implement ip event dampening on all interfaces to minimize disruption to your network during multiple failures. The dampening feature is very similar to BGP’s route dampening feature.
  2. For all media types – SONET and point-to-point fiber are both very fast, other media types are not quite as quick to detect. If you can, use BFD on any interfaces as this will decrease the detection of failures. In multipoint networks this may be the only way to have subsecond failure detection.
  3. Fast hello timers – OSPF and EIGRP provide the following:
    1. OSPF: ip ospf dead-interval minimal hello-multiplier # (typically 4)
    2. EIGRP:
      1. ip hello-interval eigrp 1
      2. ip hold-time eigrp 3
  4. OSPF, EIGRP, IS-IS, BGP – IS-IS will need further research, but I guess it has similar mechanisms as OSPF. BGP has several things that can be tweaked to decrease convergence time:
    1. bgp path mtu
  5. Fast SPF Timers – OSPF has several in the SRND:
    1. SPF throttle tuning
      1. timers throttle spf
      2. Best practice: 10 100 500
    2. LSA throttle tuning
      1. timers throttle lsa
      2. Best practice: 10 100 5000
  6. OSPF, IS-IS – this may be a typo?
  7. Recursion and Convergence – the issue they’re talking about here is the fact that OSPF’s convergence will increase as more routers exist in the network. You can increase the amount of links/routes within the OSPF area without taking as major of a hit as an increase in the amount of routers. The SPF calculation recurses on each type 1 LSA created by every router in the network, which will increase convergence time.
  8. Impact of Third Party Next Hop & BGP recursion – have a look at this diagram.