[Vpn-help] DPD error ( was: no dhcp response from gateway )
Matthew Grooms
mgrooms at shrew.net
Fri Jun 6 02:55:45 CDT 2008
Dietmar Papperitz wrote:
> Hallo Matthew,
>
> today the vpn tunnel doesn't work stable with candidate release. I get a lot
> of disconnects with above error. I configured Shrew with an ip address which
> is every time the same. No automatic DHCP.
>
Dietmar,
Thanks for providing the debug output. There are two issues. The first
is that the IKE daemon was obviously reporting an incorrect error ( not
a DHCP problem ). The other issues is that the client is terminating the
tunnel voluntarily due to a DPD timeout. If you disable DPD, I'm certain
the issue would disappear. However, you would loose functionality which
is less than ideal. Lets try to correct the problem so you can keep DPD
enabled.
Looking at your log, it appears that the DPD functionality of the client
is working exactly as designed. The last ACK is received at 00:58:06. A
request is sent at 00:58:16 and again at 00:58:26. No further ACK is
received by 00:58:36 so the client declares the tunnel dead.
08/06/06 00:58:06 ii : DPD ARE-YOU-THERE sequence 16440733 requested
08/06/06 00:58:06 ii : DPD ARE-YOU-THERE-ACK sequence 16440733 accepted
08/06/06 00:58:16 ii : DPD ARE-YOU-THERE sequence 16440734 requested
08/06/06 00:58:26 ii : DPD ARE-YOU-THERE sequence 16440735 requested
08/06/06 00:58:36 !! : tunnel DPD timeout for peer 87.167.220.9:4500
There is also this odd looking output where we get a response from the
Lancom router 12 seconds after we submit the request. Hmmmmmmm.
08/06/05 21:59:57 ii : DPD ARE-YOU-THERE sequence 16440307 requested
08/06/05 22:00:07 ii : DPD ARE-YOU-THERE sequence 16440308 requested
08/06/05 22:00:09 ii : DPD ARE-YOU-THERE-ACK sequence 16440307 accepted
After reading RFC3706 again, it states that a DPD message should be
retransmitted if an ACK is not received. It also states that a host
should keep track of its peers DPD sequence number and reject any
message that contains an unexpected sequence number. How vague :|
6.2. Selection and Maintenance of Sequence Numbers
...
Each entity SHOULD also maintain its peer's R-U-THERE sequence
number, and an entity SHOULD reject the R-U-THERE message if it
fails to match the expected sequence number.
So there is a possibility that the Lancom is rejecting the sequence
number. More about that below. Here is another part that references
re-transmission.
5.4. Impetus for DPD Exchange
...
After some number of retransmitted messages, an implementation
SHOULD assume its peer to be unreachable and delete IPSec and
IKE SAs to the peer.
Retransmission could easily be interpreted as "use the same sequence
number again". The Shrew Soft client increments the sequence number
regardless to avoid a situation where a peer would refuse to respond if
we request the same sequence number more than once. This can easily
occur if a DPD ACK gets lost in transmission. However, the Lancom could
be rejecting the notification because our retry sequence number is more
than +1 from the last value it saw. There were several other instances
where we retried DPD and did get a response. Its impossible to know for
sure what is really going on unless we examine log output from the
Lancom side of the connection as well.
So, here are 3 possible explanations for the problem ...
1) One ARE-YOU-THERE ( client to server ) message was lost on the wire
and the Lancom is rejecting any future sequences because they are not
sequential.
2) Two DPD messages were lost on the wire ( client to server or server
to client ) during two consecutive DPD cycles. In this case, our DPD
algorithm returned a false positive and declared the tunnel dead.
3) The Lancom was busy and queued the DPD response for later processing
due to some internal event prioritization. This possibility is based on
the extremely delayed response shown in the log output.
Problem (1) is very easy to fix but may make our DPD implementation
incompatible with others unless we introduce a tunable.
Problem (2) is easy to fix by making our DPD algorithm more aggressive
during the retransmission cycle.
Problem (3) is unlikely, in my opinion, and would be the most difficult
to work around.
Do you have access to your Lancom log output? If you do, we can exactly
identify the problem cause. If not or its too much of a hassle, try this
build out and let me know if the problem is still there.
http://www.shrew.net/vpn/download.php?name=vpn-client&vers=2.1.0-dpd-1-x86
http://www.shrew.net/vpn/download.php?name=vpn-client&vers=2.1.0-dpd-1-a64
It implements the fix for (1). If you get disconnected again ( "gateway
is not responding" ), I will implement a fix for (2) and we can try that
instead.
Thanks,
-Matthew
More information about the vpn-help
mailing list