[vpn-help] little surprises on /dev/tap0
cgull at glup.org
Fri Jan 2 22:10:20 CST 2015
I'm using ike version 2.2.1 on a FreeBSD 10.1 machine. I ran into a problem that was rather challenging to debug. The primary bug turns out to be in the FreeBSD kernel, but it's something that I think iked could easily handle more robustly.
The stage: A FreeBSD 10.1 system, using a kernel configuration that adds NAT-T to GENERIC (options IPSEC; options IPSEC_NAT_T; device crypto). I also had to build ike from source to enable NAT-T there, I couldn't use FreeBSD's binary package. The machine is a desktop with a somewhat complicated network configuration-- it runs one client VPN connection with iked to a Cisco concentrator, another client VPN with OpenVPN to another location, VIrtualBox's netgraph hooks into kernel for bridging VMs, a jail with an independent IP address (note that this kernel does not have VIMAGE enabled), and probably one or two more things I've missed.
When I first got the Shrewsoft daemons running, everything appeared to run fine, except that traffic over the VPN would wedge after some indeterminate amount of time, typically 10-60 minutes. iked would keep running fine, and kept on negotiating new keys with no problems, and logged nothing unusual. After some extended debugging, I finally found the problem: somehow, the kernel is sending ARPs from the jail's IP address to the tap0 interface iked is using. I don't know how yet, there are no routes to explain that. After a while, these packets fill the interface's input queue and are dropped, since nobody's reading from /dev/tap0 and consuming them. Since the kernel's IPSec processing is on the read side of that queue, IPSec traffic stops getting through. Adding a little shell script that does "cat /dev/tap0 > logfile" makes the problem go away.
So there's clearly a bug on the kernel's side, but I do think that there is one thing iked could do: it could consume input from the tun/tap interfaces that it opens, and probably log that traffic, since it's in general unexpected. iked shouldn't *have* to do this, because those packets shouldn't appear in the first place. But on the other hand, since it opened /dev/tap0, it is the program responsible for consuming input from that device. Doing this will make iked more robust in the face of kernel forwarding errors. The fix looks pretty simple, to my eyes-- spawn a thread that loops forever reading that device, and optionally logging when input is actually seen. On a kernel that isn't offering up unexpected packets, that thread will just sit waiting forever in read() and won't have any significant costs.
More information about the vpn-help