Term: Fate Sharing

The term “fate sharing” with respect to the CCDE written outline is an unfamiliar term. I’ve done a bit of research and here’s what I found.

David D. Clark defines the term fate-sharing in his paper “The Design Philosophy of the DARPA Internet Protocols” which can be found here. It’s a short paper that is well worth the effort of reading, as it explains why we have a datagram based Internet today.

Clark describes fate-sharing as the property of a system where state information is shared between the communicating entities, and if an entity is lost, all state is lost with the entity. Think of it like this: if you have a TCP connection open to a web server, and that web server dies, your TCP connection dies with it.

Think of the alternative to fate-sharing: if intermediate entities had to maintain state about the communication, you wind up with the difficult problem of replicating that state, and it doesn’t necessarily obviate all failures – only n-1 failures, where n is the number of replicated copies of that state information.

This model pushes resiliency further up the stack. Think of how your web browser handles failures at the TCP layer, it can simply be refreshed and your transaction will still (sometimes) take place on a different server. The fate-sharing philosophy also necessarily creates the datagram-based communication model we have today.

The alternative to a fate-sharing system would be one in which the state is part of the infrastructure. An example of this could be a Permanent Virtual Circuit (PVC) that 1. contains state of the communication, and 2. maintains this state in the event of a failure. This model, if applied to a protocol that is transported over the Internet, would make some details of networking easier. For example: if you set up a permanent call that reroutes, the network can guarantee a particular quality of service, and the communicating systems wouldn’t need to worry about data arriving out-of-order, etc.

Now, I don’t claim to understand all this, but that’s what I’ve been able to gather. If there are smarter folks out there please comment.

The other way I’ve heard fate-sharing described is in Russ White’s 2007 Networkers presentation – Deploying IGPs. He describes fate sharing as the coupling of hardware and software states. A good example for hosts would be virtualization: if the hardware fails, all of the virtual machines share the fate of the hardware. In networking terms, if a dot1q trunk or DWDM link fails, all the devices using that path exclusively fail as well.

Now the key question: How does this term apply to good network design?

A good network designer has to consider that with the datagram model of TCP/IP, reliable data transfer is the responsibility of the protocol, not necessarily the network. This may seem obvious, but it also leaves us with some problems. QoS is not necessarily enforced at all points, so RSVP was developed. Now RSVP starts to push some state information onto the network, which becomes complicated when traffic is rerouted. This runs counter to the datagram model of TCP/IP, which means our designers must balance between the ease of engineering for fate-sharing systems (because we don’t need to guarantee delivery of datagrams) with the relative complexity (and further lack of good tools to manage) of systems which contain state embedded in the network.