random musings about networks and everything: To kill a spanning tree

There is lot of hate out there for STP. One of the main problems is lack of robustness against poor planning and poor operations causing broadcast/unknown unicast storms, which when happen typically kill all traffic through given switch unlike L3 issues which typically affect portion of traffic only. And unlike L3 problems you typically cannot reach the switch to troubleshoot the issue. So rather easy to cause downtime and quite hard to fix it.

Then there is more inherent problem in extreme capacity environments with unusable links and for high availability networks with convergence time.

What should you replace your STP with?

STP

Why replace at all? STP has proven track record, it is used ubiquitously anywhere from home LAN to complex enterprise LAN to service provider metro networks, clearly it has to work.

Virtually all of the problems in robustness are caused by assumption that ethernet is plug-and-play, if you deploy STP you have to design it.

It would really need its own article to explain basics of deploying STP successfully but few key issues are that you decide before hand which port will block in any given situation, which port will participate in STP and add BPDU guard/filter and broadcast+unknown unicast storm-controls.

But of course for complex networks which are being configured multiple times per day it is rather tall order to wish for planning and operations who won't break things.

With RSTP and MST you can get convergence in the 1s area, which might satisfy your convergence budged. Only thing that you cannot really fix is the unused blocking links, especially if you need high capacity in single VLAN, which is something largest IXPs worry about. When you notice you can't increase members in port-channel you'd really want more flexibility how to drive your traffic.

REP, EAPS, MRP, FRRP, RPR

These are all typically requiring ring topology (well REP strictly is not) and guarantee lower convergence time, down to 50ms. Also there is some inherent operational robustness as they are not configured by default on ports, so you are forced to do some planning when deploying them, usually you'll at least avoid running them with your customers.

However you are still blocking link, and topology restrictions make them unacceptable to many scenarios, they are mostly usable in service provider metro.

STACKING

When Cisco introduced 3750 stackwise years ago I was very much proponent of stacking technology, I was too desperate to get rid of STP so I assumed anything and everything must be better without reviewing the option.

While I've been quite happy to level of reliability of stackwise in 3750, it has had more software issues than STP, which is quite expected as it is more complex and less mature. But even when assuming that both implementations would be perfect, for my requirements STP would remain superior.

STP is vendor agnostic, in your high availability L2 setup you can pull one vendor switch out and replace with another vendor and you are causing maybe 1s downtime. In 3750 stackwise not only you obviously can't do that, but you can't even upgrade the software, so you're causing very long downtime when ever software upgrade needs to be done, as you are kiling whole stack for a moment. Certainly ISSU could be implemented in stacks, but as long as you can't stack CSCO with JNPR, you can't get the same level of robustness you get from STP.

Large selling point to many in stacks is ability to connect server redundantly to two switches, not because they need the capacity, but because they need the redundancy. This is actually not needed, all OS you'd possibly want to run can do sort-of 802.1AX (which is not 802.1AX at all) where you connect your server to two independent switches and use ARP towards default-gw to decide that primary link still should be used. It is very simple and very robust solution for redundancy requirement. If you need higher capacity just use 2+2 links with real 802.1AX.

JNPR seems to be attempting to put whole DC under single stack, I know I won't be listening to that sales pitch.

VPLS and TRILL

This is actually why I wrote the article, yesterday once again I participated in a chat about VPLS vs TRILL, they are on high level remarkably similar, both are doing best-path, loop free forwarding based on IGP.

TRILL is bit of a flash back to 90's when you carried user routes on IGP while in VPLS you'll typically use BGP which fits today's view of best practices better.

I've been lurking on rbridge mailing list for years and I really like the idea of if. But wish as I might I don't believe in commercial success of TRILL, cheaper and cheaper chips are getting MPLS/VPLS capability which is further driving its adaptation and further driving the costs down, due to economics of scale I fully see same thing happening to entry-level switches with MPLS as did happen with L3, high volume will drive the premium you pay for MPLS/VPLS to entry/consumer level.

I can't see how TRILL could overtake this momentum, you can do much more in MPLS than just VPLS, so why would you buy TRILL which is pretty much guaranteed to be more expensive due to low volume.

I was informed that TRILL was designed to be implementable in any devices which can stack MPLS labels, but looking at the draft I can say this is not true, EoS bit in MPLS label will hit rbridge nickname in TRILL header, which I'm sure most current MPLS chip couldn't keep set when they add another label as within TRILL they would had to. And generally I do not expect this level of programmability in most ASIC implementations, say EARL7/PFC3.

TRILL really should have happened 10 years ago and should today be available in pure L2 switches which cost under 2kEUR per 48x1GE port. Seeing devices like Alcatel SAS and Cisco ME3800X makes me quite pessimistic about future of TRILL, I do hope I'm wrong.

random musings about networks and everything

2010-07-06

To kill a spanning tree

No comments:

Post a Comment