2016-04-22

Junos and DHCP relay

There are two different ways to configure DHCP in Junos, bootp helper and dhcp relay. These work in very different manner, bootp helper is being phased out and is not supported for example in QFX10k. Behaviour of bootp helper is obvious, it works like it works in every other sensible platform. Behaviour of dhcp-relay is very confusing and it's not documented at all anywhere.

If it's possible in your platform to configure bootp helper, do it. If not, complain to Junos about dhcp-relay implementation and ask them to fix it. The main problem with dhcp-relay implementation is that once you've configured it, you're punting all dhcp traffic in all interfaces. Normal transit traffic crossing your router is subject to this punt, so transit customers will experience larger jitter and delay of packets being punted and almost certainly reordering, because the non-dhcp packet that came after but was not subject to punt will be forwarded first. Technically reordering does not matter, as long as it does not happen inside a flow, but it's not desirable.

How the sequence of operation works in Junos for dhcp-relay:

  1. Transit packet touches ingress NPU
  2. After L2 lookup, before L3 lookup ingress NPU punts the transit packet to PFE CPU
  3. PFE CPU, for reasons obvious to people on drugs encapsulates the transit DHCP packet with new IP, UDP and 28magic bytes header. [ip_new, udp_new, 28_bytes, ip_old, udp_old, payload]
  4. PFE CPU sends the encapsulated DHCP packet to RE, so that JDHCPD can inspect it
  5. JDHCPD determines if it's relevant to it or not, in our case it's not, in normal configuration it proceeds to drop the transit packet as it was not interesting to us!
I don't disagree that on some instances it may be desirable to snoop transit DHCP messages, to see what server unicasts to our client, but that is small percentage of the traffic. We know the DHCP servers we have, why can't we decide what is punted and what is not?

If operator is doing her due diligence and has exact lo0 filter which only allows things to enter control-plane which are actually needed there, all of this is broken, but in a very confusing way. Firstly these transit DHCP packets are NOT subject to hardware level lo0 filter, you cannot drop them there, which to me makes sense, they are transit, lo0 filter shouldn't affect them. However after punt and encapsulation they magically become subject to lo0 filter on the software side! In your typical lo0 filter with ultimate 'discard all' term, you'll see these x.y.z.k => 255.255.255.255 packets being discarded and you might be bit confused how on earth we're getting DADDR 255.255.255.255 from our neighbouring core routers! So perhaps you'll run 'monitor traffic interface X matching "host 255.255.55.255 detail"' to understand better what is going on, well, you won't see any of these 255.255.255.255 packets you were dropping, because Junos has added support in their own tcpdump for this DHCP encapsulation, so you're actually seeing the original embedded headers, not the new top headers (where the host 255.255.255.255 is matching). If you add 'write-file dhcp.pcap' and open that file in wireshark, you'll see the injected new headers and packet being interpreted as DHCP, including the original header portion, which makes out for a VERY confusing looking DHCP packet. If you manually pop from the packet ip_new, udp_new and 28 bytes, you'll see the expected transit packet.

Strangeness does not end here, you can easily discard/log/count these 'destination-address 255.255.255.255' in lo0 filter (in software, not in hardware), but when you change that 'discard' to 'accept' you won't see anything in counter or logs anymore! Yet it is crucial that you do accept them, because otherwise they are dropped before jdhcpd has chance to process them, and you're killing all your transit DHCP. Even after you add this confusing rule to permit transit traffic to enter jdhcpd you're still going to be dropping all transit dhcp until you configure 'set forwarding-options dhcp-relay forward-snooped-clients all-interfaces'.

Problem here of course is, now we're not discriminating in lo0 filter the transit packets hitting SW processed lo0, and all real DHCP discover packets coming from local interfaces. I usually specifically only allow DHCP discover from interfaces where I've enabled DHCP. But with this dhcp-relay configuration, I have to allow it everywhere! I'm no longer protected from customers having L2 loops and injecting wire-rate of DHCP Discover to my control-plane, I now have to accept those, because I cannot discriminate in lo0 filter if they are transit or discovers.

What should JNPR do? Continue to support bootp helper style operation, where no transit traffic is ever punted. Make dhcp-relay work like that out-of-the-box, and people who need to snoop transit, must enable it and give them tools to enable it based on various keys, saddr/daddr, interface, npu. I'm pretty sure there are now bunch of JNPR boxes which silently drop transit DHCP, because there is no documentation anywhere on how this works.

I'm not as convinced as JTAC that this isn't simply a bug, it feels odd that all this really would be the intended behaviour. The telling problem here is, that JNPR is somehow avoiding lo0 evaluation in HW, I suspect it is, because the packet is not classified as IPv4 protocol but DHCP-snooping protocol (yes, Junos has ipv4, ipv6, mpls, bridge, fibrechannel and dhcp-snooping protocol route tables!) and as it's not IPv4 it's not subject to HW lo0 filter. However they seem to drop the ball after punt, making the embedded packet subject to SW lo0 filter, I think it really should not behave like this.

I wish I could say this is only situation when transit traffic can hit lo0 filter, but that's not true. Some JNPR platforms punt transit IP options and transit IPv6 HBH through lo0 filter. So in those cases you need to match on all local addresses and ip-options and drop, then second rule to allow all ip-options, unless you want to drop also all transit ip-options (which is probably just fine). Pretty much no one knows this, so people likely don't know what is their network's policy regarding ip-options and actual policy is just determined by your network upgrade cycle.