dns: Fix mDNS server never answering before upstream timeout#4203
Open
moonfruit wants to merge 1 commit into
Open
dns: Fix mDNS server never answering before upstream timeout#4203moonfruit wants to merge 1 commit into
moonfruit wants to merge 1 commit into
Conversation
The collection window was taken directly from the caller's deadline, so the merged response was only assembled at the exact moment the DNS query context (and the inbound connection canceler) expired, and the client never received it. - Clamp the collection window to 1s and keep 500ms headroom from the caller's deadline. - Return 250ms after the first answer arrives (except for PTR/ANY questions, which expect shared records from multiple responders). - Honor context cancellation in the read loop via context.AfterFunc. - Return NODATA instead of an error when the query was sent but nobody answered, so the inbound connection is not torn down while sibling queries are still in flight. - Skip point-to-point interfaces: mDNS is link-local and multicast writes on tunnels fail with ENOBUFS on some platforms.
06a33e4 to
7f7ae8e
Compare
Author
|
Hi @nekohasekai, Just a gentle follow-up on this PR. When you have a chance, could you please take a look at it? It would be greatly appreciated if the review could be completed and the PR merged when appropriate. Thank you! |
8ea55ff to
a9fe3f3
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
The
mdnsDNS server added in 1.13 cannot deliver answers to clients in practice. Testing on macOS with a minimal config (adirectinbound withhijack-dns, a rule routingdomain_suffix: localto anmdnsserver), every query times out, even though the log occasionally shows the resolved addresses — always at exactly 10 seconds.Root cause
The response collection window is taken directly from the caller's context deadline:
dns.Client.exchangeToTransportalways wraps the exchange incontext.WithTimeout(ctx, C.DNSTimeout)(10s), so the 1-secondmdnsTimeoutfallback never takes effect. Since the read loop inexchangeTargetonly exits when the read deadline fires, the merged response is assembled at the exact moment the query context expires — answers that arrived within milliseconds are held for the full 10 seconds.By then it is always too late: the inbound UDP connection's idle canceler (
canceler.New(..., C.DNSTimeout)inprotocol/dns/handle.go) was started slightly earlier (when the query packet was read), so it always wins the race and closes the connection beforeconn.WritePacketruns. The response is logged and then discarded; the client never receives anything, regardless of its own timeout.Two smaller issues compound this:
exchangeTargetonly honors the read deadline, sotask.Group's cancellation has no effect.qquery NS/MX/TXT alongside A/AAAA by default) makeExchangereturn an error, which tears down the whole inbound connection viacancel(err)and kills sibling in-flight queries.Changes
mdnsTimeout(1s), and keep 500ms of headroom from the caller's deadline so the response can always be delivered before upstream timers fire.ALL_FOR_NOWheuristic): mDNS answers from multiple responders/interfaces arrive in a burst within tens of milliseconds, so a short settle window preserves the multi-responder aggregation semantics without burning the full window. PTR and ANY questions are excluded — they expect shared records from many responders and continue to use the full window.context.AfterFuncresetting the read deadline.ENOBUFSon Apple platforms.Results (macOS 26.5, Wi-Fi)
q <this-host>.local Aq <this-host>.local A AAAAq <iphone>.local A AAAA