[Vpn-help] DPD error ( was: no dhcp response from gateway )

Matthew Grooms mgrooms at shrew.net
Fri Jun 6 02:55:45 CDT 2008


Dietmar Papperitz wrote:
> Hallo Matthew,
> 
> today the vpn tunnel doesn't work stable with candidate release. I get a lot
> of disconnects with above error. I configured Shrew with an ip address which
> is every time the same. No automatic DHCP.
> 

Dietmar,

Thanks for providing the debug output. There are two issues. The first 
is that the IKE daemon was obviously reporting an incorrect error ( not 
a DHCP problem ). The other issues is that the client is terminating the 
tunnel voluntarily due to a DPD timeout. If you disable DPD, I'm certain 
the issue would disappear. However, you would loose functionality which 
is less than ideal. Lets try to correct the problem so you can keep DPD 
enabled.

Looking at your log, it appears that the DPD functionality of the client 
is working exactly as designed. The last ACK is received at 00:58:06. A 
request is sent at 00:58:16 and again at 00:58:26. No further ACK is 
received by 00:58:36 so the client declares the tunnel dead.

08/06/06 00:58:06 ii : DPD ARE-YOU-THERE sequence 16440733 requested
08/06/06 00:58:06 ii : DPD ARE-YOU-THERE-ACK sequence 16440733 accepted
08/06/06 00:58:16 ii : DPD ARE-YOU-THERE sequence 16440734 requested
08/06/06 00:58:26 ii : DPD ARE-YOU-THERE sequence 16440735 requested
08/06/06 00:58:36 !! : tunnel DPD timeout for peer 87.167.220.9:4500

There is also this odd looking output where we get a response from the 
Lancom router 12 seconds after we submit the request. Hmmmmmmm.

08/06/05 21:59:57 ii : DPD ARE-YOU-THERE sequence 16440307 requested
08/06/05 22:00:07 ii : DPD ARE-YOU-THERE sequence 16440308 requested
08/06/05 22:00:09 ii : DPD ARE-YOU-THERE-ACK sequence 16440307 accepted

After reading RFC3706 again, it states that a DPD message should be 
retransmitted if an ACK is not received. It also states that a host 
should keep track of its peers DPD sequence number and reject any 
message that contains an unexpected sequence number. How vague :|

6.2.  Selection and Maintenance of Sequence Numbers
    ...
    Each entity SHOULD also maintain its peer's R-U-THERE sequence
    number, and an entity SHOULD reject the R-U-THERE message if it
    fails to match the expected sequence number.

So there is a possibility that the Lancom is rejecting the sequence 
number. More about that below. Here is another part that references 
re-transmission.

5.4.  Impetus for DPD Exchange
    ...
    After some number of retransmitted messages, an implementation
    SHOULD assume its peer to be unreachable and delete IPSec and
    IKE SAs to the peer.

Retransmission could easily be interpreted as "use the same sequence 
number again". The Shrew Soft client increments the sequence number 
regardless to avoid a situation where a peer would refuse to respond if 
we request the same sequence number more than once. This can easily 
occur if a DPD ACK gets lost in transmission. However, the Lancom could 
be rejecting the notification because our retry sequence number is more 
than +1 from the last value it saw. There were several other instances 
where we retried DPD and did get a response. Its impossible to know for 
sure what is really going on unless we examine log output from the 
Lancom side of the connection as well.

So, here are 3 possible explanations for the problem ...

1) One ARE-YOU-THERE ( client to server ) message was lost on the wire 
and the Lancom is rejecting any future sequences because they are not 
sequential.

2) Two DPD messages were lost on the wire ( client to server or server 
to client ) during two consecutive DPD cycles. In this case, our DPD 
algorithm returned a false positive and declared the tunnel dead.

3) The Lancom was busy and queued the DPD response for later processing 
due to some internal event prioritization. This possibility is based on 
the extremely delayed response shown in the log output.

Problem (1) is very easy to fix but may make our DPD implementation 
incompatible with others unless we introduce a tunable.

Problem (2) is easy to fix by making our DPD algorithm more aggressive 
during the retransmission cycle.

Problem (3) is unlikely, in my opinion, and would be the most difficult 
to work around.

Do you have access to your Lancom log output? If you do, we can exactly 
identify the problem cause. If not or its too much of a hassle, try this 
build out and let me know if the problem is still there.

http://www.shrew.net/vpn/download.php?name=vpn-client&vers=2.1.0-dpd-1-x86
http://www.shrew.net/vpn/download.php?name=vpn-client&vers=2.1.0-dpd-1-a64

It implements the fix for (1). If you get disconnected again ( "gateway 
is not responding" ), I will implement a fix for (2) and we can try that 
instead.

Thanks,

-Matthew



More information about the vpn-help mailing list