2012-10-24

Sorry state of JunOS control plane protection

I've been looking into how to protect MX80 11.4R5 from various accidental and intentional attempts to congest control plane and I'm drawing pretty much blank.

Main discoveries so far.

  1. ISIS always leaked to control plane, even when no 'family iso' or 'protocol isis' on interface
  2. PVST always leaked to control plane. Even when just 'family inet' configured to interface
  3. LLDP protocol not matched by ddos-protection feature
  4. Essentially impossible to protect against attack from eBGP
  5. ddos-protection feature mis-dimensioned

ISIS

This is pretty bad for anyone running ISIS, as you cannot use ddos-protection to limit ISIS, as it won't distinguish between bad and good ISIS. If you don't use ISIS, just set ddos-protection limit low and you're good to go.

ISIS is punted with different code than IP packets, but resolving the punt path it goes to the same path. This path is still seeing full wire rate, i.e. there isn't magic 10kpps limit before it

HCFPC2(le_ruuter vty)# show jnh 0 exceptions control pkt punt via nh PUNT(34) 9134818 1065269880 HCFPC2(le_ruuter vty)# show jnh 0 exceptions nh 34 punt Nexthop Chain: CallNH:desc_ptr:0xc02bbc, mode=0, rst_stk=0x0, count=0x3 0xc02bb8 0 : 0x127fffffe00003f0 0xc02bb9 1 : 0x2ffffffe07924a00 0xc02bba 2 : 0xda00601499000a04 0xc02bbb 3 : 0x3af46014fcd08810 HCFPC2((le_ruuter vty)# show jnh 0 decode 0xda00601499000a04 IndexNH:key_ptr:0x80/0, desc_ptr=0xc02932, max=10, nbits=4 HCFPC2(le_ruuter vty)# show jnh 0 vread 0xc02932 4 Addr:0xc02932, Data = 0x42f47fffff8b0010 Addr:0xc02933, Data = 0xda026014b6801004 Addr:0xc02934, Data = 0x60040740000e822f Addr:0xc02935, Data = 0x60041bc0000e828a HCFPC2(le_ruuter vty)# show jnh 0 decode 0x60040740000e822f JNH_FW_START: opcode = 0x0000000c desc_ptr = 0x000080e8 base_ptr = 0x000e822f HCFPC2(le_ruuter vty)# show jnh 0 decode 0x60041bc0000e828a JNH_FW_START: opcode = 0x0000000c desc_ptr = 0x00008378 base_ptr = 0x000e828a HCFPC2(le_ruuter vty)# show filter Index Semantic Name -------- ---------- ------ 46137345 Classic HOSTBOUND_IPv4_FILTER 46137346 Classic HOSTBOUND_IPv6_FILTER HCFPC2(le_ruuter vty)# show filter index 46137345 detail JNH_FW_START: opcode = 0x0000000c desc_ptr = 0x000080e8 base_ptr = 0x000e822f HCFPC2(le_ruuter vty)# show filter index 46137346 detail JNH_FW_START: opcode = 0x0000000c desc_ptr = 0x00008378 base_ptr = 0x000e828a

PVST

If you don't need PVST, you can just limit in ddos-protection. But still it's pretty annoying it's leaked to control-plane, especially as Trio already does support 'punt mask' for LACP, STP, LLDP etc per physical interface, but even if STP punting is turned off, PVST is still punted.

HCFPC2(le_ruuter vty)# show ifd brief Index Name Type Flags Slot State ----- -------------------- ----------- ------ ----- ------ 190 xe-2/0/6 Ethernet 0x0000000000008000 2 Up HCFPC2(le_ruuter vty)# show jnh ifd 190 stream lacp:-, stp:-/0, esmc:-, lfm:-, erp:-, lldp:-, mvrp:-/-, smac_mcast_clear:-, vc:-, natVlan:-/4095, native tpid 0, tpidMask:0x0001

BGP

Problem with protecting against eBGP attack is, that policers work by bps (except DDoS policers, you can actually almost certainly make any policer in PFE bps->pps by changing its application by poking directly at memory, but it would be cleared by next reboot or 'commit full'). And you can only cope with maybe 4Mbps of traffic, so either you accept convergence issues in BGP or you accept that eBGP can bring you down. If you absolutely positively must fix this, then one way to get closer is to police <1400B BGP at very low rate and >1400B BGP at high enough rate for convergence, but you'd need separate policers per BGP so that one BGP neighbor cannot bring another down by killing hello packets.

One quick and dirty fix to protect eBGP from iBGP, but not from other eBGP would be to run all your eBGP as 'passive' and run your route reflectors as 'passive'. Then your PE would open connection to RR and your customers would open connection to your PE, this is already classified to different terms in ddos-protection filter:

HCFPC2(le_ruuter vty)# show filter Index Semantic Name -------- ---------- ------ 46137345 Classic HOSTBOUND_IPv4_FILTER HCFPC2(le_ruuter vty)# show filter index 46137345 program term HOSTBOUND_BGP_TERM1 term priority 0 payload-protocol 6 destination-port 179 then accept queue 0 policer template __ddos_BGP_aggregate_policer__ policer __ddos_BGP_aggregate_policer__-HOSTBOUND_BGP_TERM1 app_type 23 bandwidth-limit 34359738360 bits/sec burst-size-limit 16777215 bytes discard count __ddos_BGP_aggregate_pass__ ddos proto 5120 term HOSTBOUND_BGP_TERM2 term priority 0 payload-protocol 6 source-port 179 then accept queue 0 policer template __ddos_BGP_aggregate_policer__ policer __ddos_BGP_aggregate_policer__-HOSTBOUND_BGP_TERM2 app_type 23 bandwidth-limit 34359738360 bits/sec burst-size-limit 16777215 bytes discard count __ddos_BGP_aggregate_pass__ ddos proto 5120

Now only change needed, would be to put these under different BGP policer, then your customers would be policed separately to your iBGP and attack wouldn't bring core down.

Dimensioning

It's really strange how Juniper has dimensioned their boxes. MX80 goes down on 4Mbps/10kpps flood, while RE CPU (PQ3, 8572) and LC CPU (PQ3, 8544) both are 90% idle during the event, while all ISIS, LDP, BGP remain down until attack stops.

MX960 RP CPU (4xXEON) MPC2 LC CPU (PQ3, 8548) isn't faring significantly better than MX80, if attack and protected service are in same MPC, it cannot handle anywhere near stock ddos-protection 20kpps, but will bring core BGP down. Maybe MX960 can do 15kpps.

T4k RP CPU (2xXEON) FPC5 LC CPU (QorIQ P2020) can actually handle stock ddos-protection 20kpps rate, but not 30kpps, so if you can push two protocols to ddos-protection, it's still going to be down.

There clearly is some per linecard non-configurable policer which limits JunOS control-plane performance to much slower rate than what it realistically can handle, this is as stupid as 'mls rate-limit unicast cef receive' on 7600, essentially you're underclocking your control-plane, making it die under lower than max load. However, if we could control traffic in pps level, it wouldn't matter 5kpps is plenty for BGP convergence. But as we must limit in bps and prepare for worst-case scenario, policer values need to be ridiculously small. Maybe you allow from customer BGP, VRRP, DHCP, PIM, BFD. Now if you want that VRRP flood will only kill VRRP not other service, you need separate policer for each, but on aggregate they can't be over 4Mbps, so you're left with 800kbps per protocol, if you share equally, of course BGP is only capacity hungry protocol.

Solution

It's confusing why control plane protection is even user configurable feature, as it can be strictly restricted and pps limited per session dynamically as services are turned on, this is perfectly doable in Trio hardware, no user input needed.

And at very least you should be able to do L2 filters on L3 interfaces, so you could drop everything except IPv4, IPv6, ARP ethertypes and remove most hard to protect attack vectors.

If you need something useful today, put core and edge in different MPC and use ddos-protection feature so that edge cannot congest core. For single linecard system like MX80 there unfortunately isn't any really practical way today.

1 comment:

  1. It could be worse, try the control-plane filters on an EX (even the 8200, which is supposed to be capable of being a real router). They have no policer capabilities on lo0 filters at all, but even if you explicitly reject the traffic, it doesn't actually drop the packets until they've filled up the internal links and killed your control plane.

    As far as I can tell there is no point in configuring an lo0 filter at all, they're purely cosmetic. Even worse, they actually prevent you from seeing the offending traffic in tcpdump if you ARE under attack, while still allowing the attack to succeed. If you want to protect your control plane on those boxes, the only solution is to deploy per-interface ingress filters on EVERY interface, which has its own brand of issues. :)

    ReplyDelete