Skip to content

dns: Fix mDNS server never answering before upstream timeout#4203

Open
moonfruit wants to merge 1 commit into
SagerNet:testingfrom
moonfruit:fix-mdns-timeout
Open

dns: Fix mDNS server never answering before upstream timeout#4203
moonfruit wants to merge 1 commit into
SagerNet:testingfrom
moonfruit:fix-mdns-timeout

Conversation

@moonfruit

Copy link
Copy Markdown

Problem

The mdns DNS server added in 1.13 cannot deliver answers to clients in practice. Testing on macOS with a minimal config (a direct inbound with hijack-dns, a rule routing domain_suffix: local to an mdns server), every query times out, even though the log occasionally shows the resolved addresses — always at exactly 10 seconds.

Root cause

The response collection window is taken directly from the caller's context deadline:

deadline, loaded := ctx.Deadline()
if !loaded || deadline.IsZero() {
    deadline = time.Now().Add(mdnsTimeout)
}

dns.Client.exchangeToTransport always wraps the exchange in context.WithTimeout(ctx, C.DNSTimeout) (10s), so the 1-second mdnsTimeout fallback never takes effect. Since the read loop in exchangeTarget only exits when the read deadline fires, the merged response is assembled at the exact moment the query context expires — answers that arrived within milliseconds are held for the full 10 seconds.

By then it is always too late: the inbound UDP connection's idle canceler (canceler.New(..., C.DNSTimeout) in protocol/dns/handle.go) was started slightly earlier (when the query packet was read), so it always wins the race and closes the connection before conn.WritePacket runs. The response is logged and then discarded; the client never receives anything, regardless of its own timeout.

Two smaller issues compound this:

  • Context cancellation cannot unblock the read loop — exchangeTarget only honors the read deadline, so task.Group's cancellation has no effect.
  • Record types that nobody on the link answers (clients like q query NS/MX/TXT alongside A/AAAA by default) make Exchange return an error, which tears down the whole inbound connection via cancel(err) and kills sibling in-flight queries.

Changes

  • Clamp the collection window to mdnsTimeout (1s), and keep 500ms of headroom from the caller's deadline so the response can always be delivered before upstream timers fire.
  • Return early once responses settle: after the first answer arrives, keep collecting for another 250ms, then return. This follows the same quiet-period idea as Avahi (cf. its ALL_FOR_NOW heuristic): mDNS answers from multiple responders/interfaces arrive in a burst within tens of milliseconds, so a short settle window preserves the multi-responder aggregation semantics without burning the full window. PTR and ANY questions are excluded — they expect shared records from many responders and continue to use the full window.
  • Honor context cancellation in the read loop via context.AfterFunc resetting the read deadline.
  • Return NODATA instead of an error when the query was sent but nobody answered — normal for mDNS — so the inbound connection survives and sibling queries are unaffected.
  • Skip point-to-point interfaces: mDNS is link-local, and multicast writes on tunnel interfaces fail with ENOBUFS on Apple platforms.

Results (macOS 26.5, Wi-Fi)

Query Before After
q <this-host>.local A timeout (resolved at exactly 10s, never delivered) 293ms, addresses from all interfaces merged
q <this-host>.local A AAAA timeout 541ms, 3×A + 4×AAAA
q <iphone>.local A AAAA timeout 812ms
nonexistent name timeout + connection torn down 1.04s NODATA, connection intact

The collection window was taken directly from the caller's deadline, so
the merged response was only assembled at the exact moment the DNS
query context (and the inbound connection canceler) expired, and the
client never received it.

- Clamp the collection window to 1s and keep 500ms headroom from the
  caller's deadline.
- Return 250ms after the first answer arrives (except for PTR/ANY
  questions, which expect shared records from multiple responders).
- Honor context cancellation in the read loop via context.AfterFunc.
- Return NODATA instead of an error when the query was sent but nobody
  answered, so the inbound connection is not torn down while sibling
  queries are still in flight.
- Skip point-to-point interfaces: mDNS is link-local and multicast
  writes on tunnels fail with ENOBUFS on some platforms.
@nekohasekai nekohasekai force-pushed the testing branch 2 times, most recently from 06a33e4 to 7f7ae8e Compare June 12, 2026 01:18
@moonfruit

Copy link
Copy Markdown
Author

Hi @nekohasekai,

Just a gentle follow-up on this PR. When you have a chance, could you please take a look at it?

It would be greatly appreciated if the review could be completed and the PR merged when appropriate.

Thank you!

@nekohasekai nekohasekai force-pushed the testing branch 10 times, most recently from 8ea55ff to a9fe3f3 Compare June 17, 2026 06:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant