There is quite often chatter about L3 incompletes, and it seems there are lot of opinions what they are. Maybe some of these opinions are based on some particular counter bug in some release. Juniper has introduced also toggle to allow stopping the counter from working. It seems very silly to use this toggle, as it is really one of the few ways you can gather information about broken packets via SNMP.
What they (at least) are not
- Unknown unicast
- CDP
- BPDU
- Packet from connected host which does not ARP
- Packet from unconfigured VLAN
What they (at least) are
- IP header checksum error
- IP header error (impossibly small IHL, IP version 3, etc)
- IP header size does not match packet size
Troubleshooting
So if you are seeing them, what can you do? As it is aggregate counter for many different issues, how do you actually know which one is it and is there way to figure out who is sending them? Luckily for Trio based platforms answers and highly encouraging, we have very good tools to troubleshoot the issue.
To figure out what they exactly are, first you need to figure out your internal IFD index (not snmp ifindex)
im@ruuter> show interfaces xe-7/0/0 |match index:
Interface index: 224, SNMP ifIndex: 586
After figuring out index, we can login to the the PFE and check stream counters for that IFD
im@ruuter> start shell pfe network fpc7
NPC platform (1067Mhz MPC 8548 processor, 2048MB memory, 512KB flash)
NPC7(ruuter vty)# show jnh ifd 224 stream
ifd = 224, Stream = 33
Stream ID: 33 (inst = 0)
Cntr : 0x00c0f102
Encap : Ether
Encap = 0, StartNH = 0xc040e1
lacp:+, stp:-/0, esmc:-, lfm:-, erp:-, lldp:-, mvrp:-/-, smac_mcast_clear:-,
vc:-, dc:-, natVlan:-/4095, native tpid 0, tpidMask:0x0001
Input Statistics: 0003126353191368 pkts, 3351074223070319 bytes
Detail Statistics:
rx0: 0000000000000000 pkts, 0000000000000000 bytes
rx1: 0000007792865413 pkts, 0000923636240746 bytes
rx2: 0003118560325955 pkts, 3350150586829573 bytes
drop0: 0000000000000000 pkts, 0000000000000000 bytes
drop1: 0000000000000000 pkts, 0000000000000000 bytes
drop2: 0000000000000000 pkts, 0000000000000000 bytes
unknown-iif: 0000000000000000 pkts, 0000000000000000 bytes
checksum: 0000000000625225 pkts, 0000000268883747 bytes
unknown-proto: 0000000000024793 pkts, 0000000006398918 bytes
bad-ucastmac: 0000000218713670 pkts, 0000034352327467 bytes
bad-ucastmac-IPv6: 0000000002160892 pkts, 0000000172764339 bytes
bad-smac: 0000000000000000 pkts, 0000000000000000 bytes
in-stp: 0000000000000000 pkts, 0000000000000000 bytes
out-stp: 0000000000000000 pkts, 0000000000000000 bytes
vlan-check: 0000000000000000 pkts, 0000000000000000 bytes
frame-errors: 0000000000000108 pkts, 0000000000014451 bytes
bad-IPv4-hdr: 0000000000033339 pkts, 0000000012708126 bytes
bad-IPv4-len: 0000000000070901 pkts, 0000000025836710 bytes
bad-IPv6-hdr: 0000000000000133 pkts, 0000000000009508 bytes
bad-IPv6-len: 0000000000000993 pkts, 0000000000071269 bytes
out-mtu-errors: 0000000000003391 pkts, 0000000005122005 bytes
L4-len: 0000000000038084 pkts, 0000000001765247 bytes
Stream Features:
Topology: stream-(33)
Flavor: i-root (1), Refcount 0, Flags 0x1
Addr: 0x4513f3c8, Next: 0x4fdd3c78, Context 0x4513f3c0
Link 0: da40602e:32000303, Offset 12, Next: da40602e:32000303
Link 1: 00000000:00000000, Offset 12, Next: 00000000:00000000
Link 2: 00000000:00000000, Offset 12, Next: 00000000:00000000
Link 3: 00000000:00000000, Offset 12, Next: 00000000:00000000
Topology Neighbors:
[none]-> stream-(33)-> flist-master(stream)
Feature List: stream
[pfe-0]: 0xda40602e32000303;
f_mask:0x80000000000000; c_mask:0x8000000000000000; f_num:9; c_num:1, inst:0
Idx#8 iif-lookup:
[pfe-0]: 0xda40602e32000303
Here we can see 'checksum, bad-ipvX-hdr, bad-ipvX-len' at least all of these are 'L3 incompletes', there may be other reasons, but that's the absolutely minimum. We can also see aggregate counters for all the interfaces in given Trio, we'll need some of this information later:
NPC7(ruuter vty)# show jnh 0 exceptions terse
Reason Type Packets Bytes
==================================================================
PFE State Invalid
----------------------
sw error DISC(64) 197636729 13174899216
invalid fabric token DISC(75) 68 4311
unknown family DISC(73) 24793 6398918
iif down DISC(87) 4516 337076
egress pfe unspecified DISC(19) 5857595 1900968530
Packet Exceptions
----------------------
bad ipv4 hdr checksum DISC( 2) 660667 289608849
bad IPv6 options pkt DISC( 9) 3 216
bad IPv4 hdr DISC(11) 33339 12708126
bad IPv6 hdr DISC(56) 133 9508
bad IPv4 pkt len DISC(12) 108203 33978274
bad IPv6 pkt len DISC(57) 1009 72421
L4 len too short DISC(13) 143678 6622571
frag needed but DF set DISC(22) 21915 33137575
ttl expired PUNT( 1) 51770371 3124910479
IP options PUNT( 2) 777 108006
frame format error DISC( 0) 108 14451
my-mac check failed DISC(28) 218721556 34352693971
my-mac check failed IPv6 DISC(58) 2161073 172779483
DDOS policer violation notifs PUNT(15) 2438770 326360620
Firewall
----------------------
firewall discard DISC(67) 1284437202 544450843517
firewall discard V6 DISC(101) 34130853 4143870718
Routing
----------------------
discard route DISC(66) 3740477632 552767152133
discard route IPv6 DISC(102) 3894436247 281512808097
hold route DISC(70) 471 35151
resolve route PUNT(33) 10 776
resolve route V6 PUNT(69) 818 63097
control pkt punt via nh PUNT(34) 993912636 45817932600
host route PUNT(32) 228855708 19337408523
mcast host copy PUNT( 6) 2591 422909
reject route PUNT(40) 2855554 402183663
reject route V6 PUNT(68) 9277 1901206
The counters for L3 incompletes are 'bad ipv4 hdr checksum', 'bad ipvX hdr' and 'bad ipvX pkt len'. Notice how we're missing IPv6 hdr checksum, obviously because IPv6 does not have this, because it was deemed unnecessary, but we'll shortly see this may have been bad decision.
Now why could we possibly see L3 incomplete increasing? If frame is mangled, we'll see ethernet CRC failing (which is much better than IP checksum) and we'll never even check any of these, we'll drop frame much earlier. So clearly we received packet which had correct ethernet CRC yet it was broken. We recently had issue where pretty much all egress PE boxes started logging 'l3 incompletes' because IPv4 header checksum was failing, they incremented maybe 20 times per hour, so very moderately. But how is this possible? If someone generated broken IP packet and sent it to us, we'd drop it in ingress PE box with incrementing these counters, yet packet traversed through MPLS core all the way to egress PE. So clearly we were mangling them. Obviously core is like Jon Snow, it's just MPLS frame to it, it does not need to know it's IP nor should it verify it for correctness, so core will happily pass broken packets around. It seems like complex problem to try to figure out who is mangling the packets. Luckily Trio gives us ability to capture exception packets, here we need to use the exception number we see above in exceptions counters
NPC7(ruuter vty)# debug jnh exceptions 2 discard
NPC7(ruuter vty)# debug jnh exceptions-trace
NPC7(ruuter vty)# show jnh exceptions-trace
[1768975] jnh_exception_packet_trace: ###############
[1768976] jnh_exception_packet_trace:
[iif:344,code/info:130/0x0,score:tcp|(0x40),ptype:2/0,orig_ptype:2,offset:18,orig_offset:18,len:60]
[1768977] jnh_exception_packet_trace: 0x00: 20 40 82 00 00 00 01 58 00 12 00 3c 80 00 00 20
[1768978] jnh_exception_packet_trace: 0x10: 12 00 00 3c 00 00 00 00 00 28 c0 da 07 c0 00 00
[1768979] jnh_exception_packet_trace: 0x20: 12 1e d5 97 f8 88 47 00 00 03 3d 45 00 00 28 1e
[1768980] jnh_exception_packet_trace: 0x30: 4f 40 00 87 06 44 d4 XX XX 39 58 XX XX 3f 6a d0
[1768981] jnh_exception_packet_trace: 0x40: c0 e2 82 4f 0d a2 2d cc ec aa 6b fd 78 0f 10 22
[1768982] jnh_exception_packet_trace: 0x50: 60 cd 42 00 00
It should be lot simpler for us now to troubleshoot the issue, we just figure out where is that SADDR (XX XX 39 58) entering the network, with any luck, the mangling node is somewhere in the edge of the network and we will find some common theme in source addresses. If it's in core, you're pretty much out of luck, you'll replace whole network or accept that you mangle something. I wrote little script which you can give IP header, incorrect checksum, correct checksum and it'll show you what each 16b field should be, to result in correct checksum, with luck you can use it to figure out which part of the packet is being mangled, as some of those 16b fields are bound to result in impossible/invalid values to have correct checksum, which allows you to exclude them and concentrate efforts on rest of the fields. But field like ID can be anything, so impossible to exclude. SADDR is also problematic, DADDR usually not (if it were this value, it would not have been routed to my network).
Why this happens? How often this happens? I really would like to know. Obviously as CRC is correct, it's not happening because error in links/optics/etc. My guess is, this happens mostly because bad memory in forwarding logic. Packet can touch many memories on its path, ring => sram => dram => sram => ring, do all of these have ECC? Does PHY ring ever have ECC? I don't know, but clearly there can be problems, as L3 incompletes exist. How common they are? Probably lot more common than we think, as we only know about mangling when it happens to happen in IPv4 header (IPv6 header mangling would usually pass unnoticed). IPv4 header is 20B, packet lenght typically is 1500B, so are we only seeing 1.3% of the tip of the iceberg?
I guess key takeaway here is, don't use 'ignore-l3-incompletes', monitor your 'l3 incompletes' via SNMP, figure out why they are happening and fix them. Especially check right now if your egress PE is has L3 incompletes from core.