2011-11-04

junos vrf-import funnies

Consider this configuration:

> show configuration routing-instances VRF1 instance-type vrf; route-distinguisher 42:1; vrf-import [ VRF1-IMPORT VRF-DEFAULT-IMPORT ]; vrf-export [ VRF1-EXPORT VRF-DEFAULT-EXPORT ]; vrf-table-label; > show configuration policy-options policy-statement VRF1-IMPORT from community [ VRF1 VRF2 ]; > show configuration policy-options policy-statement VRF-DEFAULT-IMPORT term cust_routes { from protocol bgp; then default-action accept; } > show configuration policy-options community VRF1 members target:42:1; > show configuration policy-options community VRF2 members target:42:2;

If you configure this on any router on your network, it'll work, VRF will import correct and only correct routes. This will give you assumption, that VRF import in JunOS works like this:

  1. start with empty array of routes to evaluate policy against
  2. when you hit 'match community' push matching routes from bgp.l3vpn.0 to the list
  3. evaluate rules normally against the list

If you create multiple of these to single router, and you only have single 'from community [ X ]' in each, it also works perfectly. However, if you have more than one community in 'from community' AND you have more than one VRF using the 'VRF-DEFAULT-IMPORT' things go wrong. If we have three routes:

  1. 10.10.1.0/24 RT:42:1
  2. 10.10.2.0/24 RT:42:1 RT:42:2 RT:42:3
  3. 10.10.3.0/24 RT:42:1 RT:42:3

VRF1 will correctly import all of these, but it will also leak #2 to other VRFs in same PE having 'VRF-DEFAULT-IMPORT', it won't leak #1 or #3. It's not actually bug, but the fact that it works at all, is side-effect of optimization when route hits exactly 1 'show bgp targets' entry. And evaluation is not done, how the results in the simple test might indicate.

2011-11-03

no usage scenario for ssh-agent forwarding

Many people, especially those in consulting business have need to access multiple different organization 'jump boxes' from which they can ssh towards the organization servers. And due to security it makes sense to have different ssh key being allowed for different organization servers. For convenience people often allow ssh-agent towards the 'jump boxes'.

Problem with ssh-agent is, that it has no idea who is requesting the key signing, it could very well be organization1 evil admin asking for organization2 key, when sshing into organization2 jump-box, and your agent would simply allow this.

One solution to the problem could be that when ever signing is requested, user gets prompt 'localhost < organization2-jump < organization2 requests sign of organization1 identity, allow yes/no, [ ] always'. Now you'd have idea if sign request is legit or not. However this would require protocol changes to ssh, as ssh-agent has no idea who is requesting signing much less of the full path, which would be absolutely needed to make this feature work.

So I asked openssh dev mailing list, how this problem should be solved. Turns out there is recently added feature in openssh, which could potentially remove need for agent forwarding completely, to access organization1-server through organization1-jump you'd do ssh -oProxyCommand='ssh -W %h:%p organization1-jump' organization1-server, now obviously this is inconvenient, especially if there are more than 1 box through which you need to jump. .ssh/config can help somewhat:

# cat >> ~/.ssh/config Host org1-ultimate ProxyCommand ssh -W %h:%p org1-secondjump Host org1-secondjump ProxyCommand ssh -W %h:%p org1-firstjump ^d

Now you'd ssh 'ssh org1-ultimate', which would really go to org1-firstjump -> org1-secondjump -> org1-ultimate. ssh key would work without forwarding it, and transit nodes wouldn't see unencrypted data. However, still seems like large overhead, what if there would be syntactic sugar do do this:

# cat >> .ssh/config Host org1-ultimate path org1-firstjump, org1-secondjump ^d # ssh org2-firstjump,org2-secondjump,org2-ultimate # ssh org1-ultimate

2011-10-02

Playing hide and seek with JunOS

JunOS has some commands which either are unsupported, do not work in platform you're using, undocumented or unnecessary for vast majority of operators, these commands are hidden in the UI so they are only accessible if you know what (and more importantly why) you want (them).

Today I was searching for a way to quiet my SRX210HE-POE as it makes annoyingly lot noise, I failed to find configuration way to force it to normal spinning speed, but I did notice that CLI exposes hidden commands. I've actually found same in IOS several years back and wrote little perl script to search for them (exec only), it proved bad idea as several of them purposely crash your system. If you want to dig deeper, in IOS difference is incomplete and invalid command, however actually some commands are truly hidden in IOS, particular example is the toggle for unsupported transceivers.

Neither the JunOS nor IOS issue are something you can blame vendor at, vendor isn't trying to stop you from using them, they just want to be very clear that if you use them TAC ain't go your back.

The code is quick 2h hack (running it takes longer, but I'm certain the search/walk can be optimized) and it depends on ssh/telnet library I've done. This library was meant for optimal way to do exec commands, not configuration commands. And best way to do exec commands in JunOS is to open new ssh channels with exec('command') per command, this way you never ever need to do screen scraping for prompt, as when ssh channel closes, command has finished. Unfortunately this approach does not work for config, and I didn't bother disabling forcing this behavior in the library, so right now it only supports telnet (if you really want ssh, hack it to assume remote is 'cisco' then it'll open shell, instead of exec, since IOS does not support multiple channels over existing ssh connection).

2011-08-15

When should you advertise default route?

Never

There are two typical scenarios when people carry default route in dynamic routing protocol, I'll address these separately and explain why you shouldn't do it, and what you should do instead.

CE (eBGP) PE

This is probably the most common scenario, maybe you're giving your customer default route, maybe it's your own firewall or really any situation where neighbor won't carry full routing table and neighbor isn't strictly same administrative domain.

Problem with default route here is, that if your PE gets disconnected from core, you're still originating the default route and CE is unaware of this and you're blackholing customer traffic until BGP is manually shutdown. You could conditionally advertise default, but that is just useless overhead, instead of default you should advertise to CE any aggregate route which is originated from multiple core boxes, such as your PA aggregate, or really any stable route originated from multiple places, but not local PE.

Customer would just add this to their router:

# ios ip route 0.0.0.0 0.0.0.0 192.0.2.0 name floating_default # junos route 0.0.0.0/0 { qualified-next-hop 192.0.2.0 { interface xe-0/0/0.0; } resolve; }
Now if your PE gets disconnected from core, you'll stop originating 192.0.2.0/24 and this ip route no longer will recurse to CE<->PE interface. If there is no more 192.0.2.0/24 route available anywhere, static route is invalid, and next available default route can be used. If there still is 192.0.2.0/24 available via alternative provider that will be automatically used.

Slight cosmetic complain is that if you add interface to the static route, IOS disables recursion, so you cannot enforce that the static route will disappear if next hop does not recurse behind that one interface. But it is purely cosmetic, as functionality will remain regardless if 192.0.2.0/24 will continue to exist or completely disappear. If it will continue to exist, customer will just need to local-pref/med 192.0.2.0/24 to have expected backup default selection.

PE router without full table

Typical solution is to have two RR iBGP peers to originate default route. This has the problem that RR probably aren't always in optimal forwarding path, especially in single fault, but in many cases never. So you'd stop iBGP from originating default, and you'd instead add this to every router having full bgp view:

interface Loopback1 description Anycast default ip address 192.0.2.0 255.255.255.255 no ip redirects no ip proxy-arp ! router isis passive-interface Loopback1
Obviously PE box would just have static default towards 192.0.2.0, this way PE would always forward packet towards nearest core box which is up and has full bgp table, so you always get best path egress forwarding, without having full bgp view and without having best path RR. Effectively it is as if every router has iBGP session to you and is originating default

Exception that proves the rule

If the end device does not support recursing routes, then obviously this won't work. And there still are such devices, though it's unsure if you want to be routing in such devices to begin with

2011-08-11

IPv6 ACL bypass

IPv6 designers recognized that IPv4 header has several faults, these were addressed to a different degree. Particularly annoying was IPv4 options which caused TCP/UDP/ICMP data to shift, as it made IPv4 header length variable. IPv6 header is fixed length, there is 'next-header' option, which will instruct how to parse data after IP header. Typically 'next-header' would be TCP, UDP or ICMP, and rest of packet would be exactly like in IPv4 (apart from mandatory checksum in UDP).

Where the complexity (some might say design fault) is that 'next-header' could be any large number of more exotic extension header, each of which have 'next-header' field themselves. Standard does not specify any limitation how many headers you could have, so you need to be able to parse packet up-to MTU length. The final extension header typically would contain TCP/UDP/ICMP and normal IPv4 style packet would follow.

Unfortunately no practical router has MTU wide view to the packet, you have 64B, 128B or 256B view, after which you are completely unaware of the packet content, it's just bits in memory which you cannot process in any meaningful way. Your PC won't have same problem, it does not have specialized hardware to quickly forward large amount of packets, so your PC will happily parse packet up-to the MTU length.

What this translates to is, that you can craft IPv6 packet where TCP port information is after view of router, so router will not know it is TCP packet nor what ports it is using, but the receiving PC will understand it normally. So if you have ACL rule where you are dropping some tcp/udp/icmp packets then allowing rest, those rules can be by-passed in very typical router. Example could be:

term my_smtp { from { destination-address 2001:db8::42/128; } then accept; term no_spam { from { next-header tcp; destination-port 25; } then discard; } term accept { then accept; }

Now this will be bypassed, because our 'next-header' is not tcp, but contains extension-header. But far end unmodified PC with unmodified software will treat it normally. Or maybe it is server where you allow ssh from management net, drop all packet to tcp/22 and permit rest. As long as you permit rest, instead of discard rest, bypass will work

How this should be fixed? Well IPv6 should have modified ICMP/TCP/UDP/etc to contain 'next-header' field, and mandated that they appear before any extension header, forcing non-extension headers to live in fixed bit places. Obviously ship has sailed for this fix. Now it is heavily platform dependent what will happen, cisco.com claims that they punt packets which they fail to parse correctly, this is sane, just be sure to police the punts and you have pretty good solution. Juniper before trio is pretty much lost cause.

Juniper trio is behaving remarkably well, but CLI is lagging behind. Trio will actually find TCP/UDP headers as long as there are fewer than 29 'destination-option' headers before TCP/UDP. If there are 30 'destination-option' headers before TCP/UDP packet is dropped in hardware by 'bad IPv6 options pkt DISC(9)' exception. Problem is CLI is unaware of this capability and you don't have 'protocol tcp' to define you want TCP, you only have 'next-header TCP' which only monitors the first next-header field in IP packet. If you omit 'next-header' and just match 'destination-port' and you have 29 or fewer 'destination-option' headers, JNPR will match correctly, you just lose ability to differentiate between tcp and udp. This is true for 10.4R4 and 11.2R1.

How trio should be fixed is by adding 'protocol' match in CLI (trio already classifies packet correctly) and 'bad IPv6 options pkt DISC(9)' exception should punt (via policer) instead of discard, so that RE can parse the packet correctly. You could ask that what /realistic/ packet would be dropped by trio parser, but I think that is beside the point, IPv6 standard allows for it, so you should parse it, even via punt with poor performance.

You can see packets failing trio parser via PFE:

# show jnh 0 exceptions terse Reason Type Packets Bytes ================================================================== Packet Exceptions ---------------------- bad IPv6 options pkt DISC( 9) 24808567 26495549556