feat(agent): implement certificate renewal#1775
Conversation
Let maintainers know that an action is required on their side
|
There was a problem hiding this comment.
Pull request overview
Implements end-to-end support for QUIC agent-tunnel certificate renewal so agents can refresh their mTLS client certificates (via CSR signing) before expiration, without full re-enrollment.
Changes:
- Agent: adds per-reconnect “near expiry” renewal flow that requests a renewed cert/CA bundle and triggers an immediate reconnect using the new material.
- Agent: introduces enrollment helpers to detect impending certificate expiry, read CN from the existing cert, and generate a CSR from the existing private key.
- Gateway: wires
CertRenewalRequesthandling toCaManager::sign_agent_csr, signing the renewal CSR while binding identity to the already-authenticated mTLS peer.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| devolutions-agent/src/tunnel.rs | Adds a renewal outcome to the reconnect loop and performs a renewal request post-handshake/pre-traffic. |
| devolutions-agent/src/enrollment.rs | Adds certificate parsing/CN extraction/CSR generation helpers for renewal. |
| crates/agent-tunnel/src/listener.rs | Handles CertRenewalRequest by signing CSRs with the gateway CA and returning renewed cert/CA PEMs. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| /// must reuse the agent's name across renewals — the gateway looks the agent | ||
| /// up in its registry by that name, and the most authoritative source for it | ||
| /// is the cert the gateway itself signed last time. |
| /// Check whether the PEM-encoded certificate at `cert_path` expires within | ||
| /// `threshold_days`. The agent uses this on every reconnect to decide whether | ||
| /// to ask the gateway for a new certificate before opening real traffic. | ||
| pub fn is_cert_expiring(cert_path: &Utf8Path, threshold_days: u32) -> Result<bool> { | ||
| use std::io::BufReader; | ||
|
|
||
| let pem_str = std::fs::read_to_string(cert_path).with_context(|| format!("read certificate from {cert_path}"))?; | ||
| let der = rustls_pemfile::certs(&mut BufReader::new(pem_str.as_bytes())) | ||
| .next() | ||
| .context("empty PEM input")? | ||
| .context("parse certificate PEM")?; | ||
| let (_, cert) = | ||
| x509_parser::parse_x509_certificate(&der).map_err(|e| anyhow::anyhow!("parse X.509 certificate: {e}"))?; | ||
|
|
||
| let not_after_epoch = cert.validity().not_after.timestamp(); | ||
| let now_epoch = i64::try_from( | ||
| std::time::SystemTime::now() | ||
| .duration_since(std::time::UNIX_EPOCH) | ||
| .context("system clock before UNIX epoch")? | ||
| .as_secs(), | ||
| ) | ||
| .context("unix timestamp exceeds i64::MAX")?; | ||
|
|
||
| let threshold_secs = i64::from(threshold_days) * 86_400; | ||
| Ok(not_after_epoch - now_epoch <= threshold_secs) | ||
| } | ||
|
|
||
| /// Extract the `CommonName` from an existing PEM certificate. The renewal CSR | ||
| /// must reuse the agent's name across renewals — the gateway looks the agent | ||
| /// up in its registry by that name, and the most authoritative source for it | ||
| /// is the cert the gateway itself signed last time. | ||
| pub fn read_agent_name_from_cert(cert_path: &Utf8Path) -> Result<String> { | ||
| use std::io::BufReader; | ||
|
|
||
| let pem_str = std::fs::read_to_string(cert_path).with_context(|| format!("read certificate from {cert_path}"))?; | ||
| let der = rustls_pemfile::certs(&mut BufReader::new(pem_str.as_bytes())) | ||
| .next() | ||
| .context("empty PEM input")? | ||
| .context("parse certificate PEM")?; | ||
| let (_, cert) = | ||
| x509_parser::parse_x509_certificate(&der).map_err(|e| anyhow::anyhow!("parse X.509 certificate: {e}"))?; | ||
|
|
||
| let cn = cert | ||
| .subject() | ||
| .iter_common_name() | ||
| .next() | ||
| .context("certificate subject has no Common Name")? | ||
| .as_str() | ||
| .context("certificate Common Name is not valid UTF-8")?; | ||
|
|
||
| Ok(cn.to_owned()) | ||
| } | ||
|
|
||
| /// Build a renewal CSR using the agent's existing private key. Reusing the key | ||
| /// across renewals matches the design that says the private key never leaves | ||
| /// the agent — the gateway only ever sees CSRs. | ||
| pub fn generate_csr_from_existing_key(key_path: &Utf8Path, agent_name: &str) -> Result<String> { | ||
| let key_pem = std::fs::read_to_string(key_path).with_context(|| format!("read private key from {key_path}"))?; | ||
| let key_pair = rcgen::KeyPair::from_pem(&key_pem).context("parse private key PEM")?; | ||
|
|
||
| let mut params = rcgen::CertificateParams::default(); | ||
| params.distinguished_name.push(rcgen::DnType::CommonName, agent_name); | ||
|
|
||
| let csr = params.serialize_request(&key_pair).context("serialize renewal CSR")?; | ||
|
|
||
| csr.pem().context("encode CSR to PEM") | ||
| } |
| let agent_name = crate::enrollment::read_agent_name_from_cert(cert_path) | ||
| .context("read agent name from existing certificate for renewal")?; | ||
| let csr_pem = | ||
| crate::enrollment::generate_csr_from_existing_key(key_path, &agent_name).context("generate renewal CSR")?; |
| std::fs::write(cert_path.as_str(), &client_cert_pem).context("write renewed certificate")?; | ||
| std::fs::write(ca_path.as_str(), &gateway_ca_cert_pem).context("write renewed CA certificate")?; | ||
| info!("Certificate renewed; closing connection so new cert takes effect on reconnect"); | ||
| connection.close(0u32.into(), b"cert-renewed"); |
f8ab899 to
7c1cb34
Compare
Wires up the previously-stubbed CertRenewalRequest / CertRenewalResponse
control messages on both ends.
Gateway (crates/agent-tunnel/src/listener.rs):
Plumb `Arc<CaManager>` and the mTLS-authenticated `agent_name`
through `run_agent_connection` → `run_control_loop` →
`handle_control_message`. The renewal handler signs the agent's CSR
with the existing `sign_agent_csr` API, returning the new cert + CA
bundle, or an `Error { reason }` on failure. Identity (agent_id +
agent_name) is taken from the existing peer cert, never from the
CSR's subject — re-signing only swaps the public key the agent
already proved it owns.
Agent (devolutions-agent):
- enrollment.rs: add `is_cert_expiring`, `read_agent_name_from_cert`,
`generate_csr_from_existing_key` helpers.
- tunnel.rs: post-connect renewal check in `run_single_connection`.
If the cert is within 15 days of expiry, send a CertRenewalRequest
on the control stream, wait up to 30s for the response, persist
the new cert + CA, close the QUIC connection, and surface
`ConnectionOutcome::CertRenewed` so the outer reconnect loop can
re-handshake immediately with the new cert.
This is intentionally a per-reconnect check, not a periodic one. The
QUIC session has a 120s idle timeout and 15s keep-alive, so any blip /
VPN reconnect / host sleep / gateway restart drops the connection
within minutes; with a 1-year cert and a 15-day threshold the renewal
window is hit on the first reconnect after T-15d in any real
deployment.
7c1cb34 to
8ff29c6
Compare
When an agent advertises the KDC's subnet or DNS domain, route Kerberos traffic through the QUIC tunnel just like every other proxy path. This closes the last gap left after the transparent routing PR (#1741). Two paths now use the same routing pipeline as connection forwarding: - `/jet/KdcProxy` HTTP endpoint — `send_krb_message` consults the routing pipeline before falling back to direct TCP. - RDP CredSSP/NLA — `rdp_proxy.rs::send_network_request` previously hard-coded `None` for the agent handle. Plumb `agent_tunnel_handle`, `session_id`, and `explicit_agent_id` from `RdpProxy` down through `perform_credssp_as_*` -> `resolve_*_generator` -> `send_network_request`. The same change reaches the credential-injection clean path (`rd_clean_path.rs`). Session correlation: - RDP CredSSP callers pass the parent association's `jet_aid` so KDC sub-traffic ties back to its parent RDP session in agent-side logs. - The HTTP `/jet/KdcProxy` handler passes the KDC token's own `jti` (the most persistent identifier available without a parent association). `KdcToken` now carries `jti` alongside the claims for this purpose. Explicit-agent routing (matches every other proxy path): - `send_krb_message` takes `explicit_agent_id: Option<Uuid>` and forwards it to `agent_tunnel::routing::try_route`. When the parent association pins `jet_agent_id`, the KDC sub-traffic is routed via that agent or fails -- never silently falls back to a different agent or to direct connect. The HTTP handler passes `None`. Hardening (came along since they live in the same file): - 64 KiB `MAX_KDC_REPLY_MESSAGE_LEN` DoS cap on the announced TCP-framed KDC reply length, with overflow-safe length math. - UDP scheme guard: KDC over UDP keeps going direct because the agent tunnel only carries TCP today. Drive-by: `crates/agent-tunnel/src/listener.rs` move-after-move on `ca_manager` introduced by #1775 -- fixed with `Arc::clone` to keep master building on `--no-default-features` configurations. Stack: based on #1741. Picks up `agent_tunnel::routing::try_route`. Issue: DGW-384
When an agent advertises the KDC's subnet or DNS domain, route Kerberos traffic through the QUIC tunnel just like every other proxy path. This closes the last gap left after the transparent routing PR (#1741). Two paths now use the same routing pipeline as connection forwarding: - `/jet/KdcProxy` HTTP endpoint — `send_krb_message` consults the routing pipeline before falling back to direct TCP. - RDP CredSSP/NLA — `rdp_proxy.rs::send_network_request` previously hard-coded `None` for the agent handle. Plumb `agent_tunnel_handle`, `session_id`, and `explicit_agent_id` from `RdpProxy` down through `perform_credssp_as_*` -> `resolve_*_generator` -> `send_network_request`. The same change reaches the credential-injection clean path (`rd_clean_path.rs`). Session correlation: - RDP CredSSP callers pass the parent association's `jet_aid` so KDC sub-traffic ties back to its parent RDP session in agent-side logs. - The HTTP `/jet/KdcProxy` handler passes the KDC token's own `jti` (the most persistent identifier available without a parent association). `KdcToken` now carries `jti` alongside the claims for this purpose. Explicit-agent routing (matches every other proxy path): - `send_krb_message` takes `explicit_agent_id: Option<Uuid>` and forwards it to `agent_tunnel::routing::try_route`. When the parent association pins `jet_agent_id`, the KDC sub-traffic is routed via that agent or fails -- never silently falls back to a different agent or to direct connect. The HTTP handler passes `None`. Hardening (came along since they live in the same file): - 64 KiB `MAX_KDC_REPLY_MESSAGE_LEN` DoS cap on the announced TCP-framed KDC reply length, with overflow-safe length math. - UDP scheme guard: KDC over UDP keeps going direct because the agent tunnel only carries TCP today. Drive-by: `crates/agent-tunnel/src/listener.rs` move-after-move on `ca_manager` introduced by #1775 -- fixed with `Arc::clone` to keep master building on `--no-default-features` configurations. Stack: based on #1741. Picks up `agent_tunnel::routing::try_route`. Issue: DGW-384
When an agent advertises the KDC's subnet or DNS domain, route Kerberos traffic through the QUIC tunnel just like every other proxy path. This closes the last gap left after the transparent routing PR (#1741). Two paths now use the same routing pipeline as connection forwarding: - `/jet/KdcProxy` HTTP endpoint — `send_krb_message` consults the routing pipeline before falling back to direct TCP. - RDP CredSSP/NLA — `rdp_proxy.rs::send_network_request` previously hard-coded `None` for the agent handle. Plumb `agent_tunnel_handle`, `session_id`, and `explicit_agent_id` from `RdpProxy` down through `perform_credssp_as_*` -> `resolve_*_generator` -> `send_network_request`. The same change reaches the credential-injection clean path (`rd_clean_path.rs`). Session correlation: - RDP CredSSP callers pass the parent association's `jet_aid` so KDC sub-traffic ties back to its parent RDP session in agent-side logs. - The HTTP `/jet/KdcProxy` handler passes the KDC token's own `jti` (the most persistent identifier available without a parent association). `KdcToken` now carries `jti` alongside the claims for this purpose. Explicit-agent routing (matches every other proxy path): - `send_krb_message` takes `explicit_agent_id: Option<Uuid>` and forwards it to `agent_tunnel::routing::try_route`. When the parent association pins `jet_agent_id`, the KDC sub-traffic is routed via that agent or fails -- never silently falls back to a different agent or to direct connect. The HTTP handler passes `None`. Hardening (came along since they live in the same file): - 64 KiB `MAX_KDC_REPLY_MESSAGE_LEN` DoS cap on the announced TCP-framed KDC reply length, with overflow-safe length math. - UDP scheme guard: KDC over UDP keeps going direct because the agent tunnel only carries TCP today. Drive-by: `crates/agent-tunnel/src/listener.rs` move-after-move on `ca_manager` introduced by #1775 -- fixed with `Arc::clone` to keep master building on `--no-default-features` configurations. Stack: based on #1741. Picks up `agent_tunnel::routing::try_route`. Issue: DGW-384
When an agent advertises the KDC's subnet or DNS domain, route Kerberos traffic through the QUIC tunnel just like every other proxy path. This closes the last gap left after the transparent routing PR (#1741). Two paths now use the same routing pipeline as connection forwarding: - `/jet/KdcProxy` HTTP endpoint — `send_krb_message` consults the routing pipeline before falling back to direct TCP. - RDP CredSSP/NLA — `rdp_proxy.rs::send_network_request` previously hard-coded `None` for the agent handle. Plumb `agent_tunnel_handle`, `session_id`, and `explicit_agent_id` from `RdpProxy` down through `perform_credssp_as_*` -> `resolve_*_generator` -> `send_network_request`. The same change reaches the credential-injection clean path (`rd_clean_path.rs`). Session correlation: - RDP CredSSP callers pass the parent association's `jet_aid` so KDC sub-traffic ties back to its parent RDP session in agent-side logs. - The HTTP `/jet/KdcProxy` handler passes the KDC token's own `jti` (the most persistent identifier available without a parent association). `KdcToken` now carries `jti` alongside the claims for this purpose. Explicit-agent routing (matches every other proxy path): - `send_krb_message` takes `explicit_agent_id: Option<Uuid>` and forwards it to `agent_tunnel::routing::try_route`. When the parent association pins `jet_agent_id`, the KDC sub-traffic is routed via that agent or fails -- never silently falls back to a different agent or to direct connect. The HTTP handler passes `None`. Hardening (came along since they live in the same file): - 64 KiB `MAX_KDC_REPLY_MESSAGE_LEN` DoS cap on the announced TCP-framed KDC reply length, with overflow-safe length math. - UDP scheme guard: KDC over UDP keeps going direct because the agent tunnel only carries TCP today. Drive-by: `crates/agent-tunnel/src/listener.rs` move-after-move on `ca_manager` introduced by #1775 -- fixed with `Arc::clone` to keep master building on `--no-default-features` configurations. Stack: based on #1741. Picks up `agent_tunnel::routing::try_route`. Issue: DGW-384
`crates/agent-tunnel/src/listener.rs:152-163` moves the same
`Arc<CaManager>` into both `AgentTunnelHandle` and `AgentTunnelListener`,
which fails to compile:
error[E0382]: use of moved value: `ca_manager`
--> crates/agent-tunnel/src/listener.rs:162:13
Clone the Arc for the handle so the move into `Self` still type-checks.
Root cause: PR #1773 added the `ca_manager` field to
`AgentTunnelListener`; PR #1775 used `ca_manager` in the same `bind()`
body for the handle initializer. Each PR's CI was green against its own
base, but the merge of #1775 on top of #1773 produced a semantic
conflict that wasn't textually conflicting, so GitHub merged it without
re-running CI.
This is the same fix originally proposed in #1790 (closed because #1781
incidentally carried it). Reapplied here because the revert removes
that incidental fix.
`crates/agent-tunnel/src/listener.rs:152-163` moves the same
`Arc<CaManager>` into both `AgentTunnelHandle` and `AgentTunnelListener`,
which fails to compile:
error[E0382]: use of moved value: `ca_manager`
--> crates/agent-tunnel/src/listener.rs:162:13
Clone the Arc for the handle so the move into `Self` still type-checks.
Root cause: PR #1773 added the `ca_manager` field to
`AgentTunnelListener`; PR #1775 used `ca_manager` in the same `bind()`
body for the handle initializer. Each PR's CI was green against its own
base, but the merge of #1775 on top of #1773 produced a semantic
conflict that wasn't textually conflicting, so GitHub merged it without
re-running CI.
This is the same fix originally proposed in #1790 (closed because #1781
incidentally carried it). Reapplied here because the revert removes
that incidental fix.
`crates/agent-tunnel/src/listener.rs:152-163` moves the same
`Arc<CaManager>` into both `AgentTunnelHandle` and `AgentTunnelListener`,
which fails to compile:
error[E0382]: use of moved value: `ca_manager`
--> crates/agent-tunnel/src/listener.rs:162:13
Clone the Arc for the handle so the move into `Self` still type-checks.
Root cause: PR #1773 added the `ca_manager` field to
`AgentTunnelListener`; PR #1775 used `ca_manager` in the same `bind()`
body for the handle initializer. Each PR's CI was green against its own
base, but the merge of #1775 on top of #1773 produced a semantic
conflict that wasn't textually conflicting, so GitHub merged it without
re-running CI.
This is the same fix originally proposed in #1790 (closed because #1781
incidentally carried it). Reapplied here because the revert removes
that incidental fix.
## Summary Adds an optional **Agent Tunnel** wizard step to the Devolutions Agent MSI installer so admins can enrol the agent in a Gateway QUIC tunnel during install — UI or unattended. Three text fields in the dialog: enrolment string, advertise subnets, advertise domains. The feature is opt-in (`isEnabled: false, allowChange: true`); the dialog is skipped when the feature isn't selected, and an empty enrolment string skips tunnel setup even if the feature is on, so the installer remains usable for non-tunnel deployments. Smoke-tested end-to-end against a local DVLS + Gateway + Agent stack: MSI built clean, agent enrolled successfully, RDP TCP traffic routed through the agent tunnel to `IT-HELP-DC:3389`. ## MSI public properties (unattended install) ``` msiexec /i DevolutionsAgent.msi /qn ADDLOCAL="...,AgentTunnel" \ AGENT_TUNNEL_ENROLLMENT_STRING="dgw-enroll:v1:<base64>" \ AGENT_TUNNEL_ADVERTISE_SUBNETS="10.10.0.0/24" \ AGENT_TUNNEL_ADVERTISE_DOMAINS="corp.example.com" ``` ## Custom action `EnrollAgentTunnel` is deferred + elevated + `Impersonate=false`, runs `Before StartServices`, gated by `Features.AGENT_TUNNEL_FEATURE.BeingInstall()`. It: 1. Validates the `dgw-enroll:v1:` prefix, strips whitespace, pads base64url, decodes JSON payload (`api_base_url`, `enrollment_token`, optional `name`). 2. Shells out to `devolutions-agent.exe enroll <url> <token> <name> [subnets]` with a 60s timeout; on timeout the child is killed. 3. Token is redacted (`***`) symmetrically across the command-line log and child stdout/stderr. 4. After enrolment, advertise domains are patched into `Tunnel.AdvertiseDomains` of `%ProgramData%\Devolutions\Agent\agent.json`. This matches the design that advertise domains live in config, not on the CLI (replaces the closed #1774). ## Depends on - #1773 — DVLS-signed JWT enrolment flow (`/jet/tunnel/enroll`) - #1775 — agent cert renewal (referenced by the bundled agent binary) Both merged to master 2026-05-20. ## Test plan - [x] Cold MSI build clean, no errors (8 preexisting `CNDL1138`/`CNDL1006` warnings from existing code, not introduced here) - [x] Wizard order: Welcome → Features → AgentTunnel → InstallDir → VerifyReady - [x] AgentTunnel dialog skipped when feature not selected - [x] Empty enrolment string skips enrolment (no failure) - [x] Invalid `dgw-enroll:v1:` prefix → validation error in dialog - [x] Invalid base64 → validation error in dialog (caught client-side, not at CA time) - [x] Smoke test against local DVLS-signed enrolment JWT — agent registered, cert issued - [x] Smoke test routing: RDP TCP through agent tunnel to `IT-HELP-DC:3389` succeeds ## Review history Pre-review and codex-review passes folded into the single commit. Notable points addressed: - Robust JSON payload parsing (`JObject`, `Value<string>()`, `IsNullOrWhiteSpace` validation) - Symmetric token redaction (cmdline + stdout + stderr) - `WaitForExit(60_000)` return-value check with child kill on timeout - Whitespace-stripped + padded base64url decoding on both dialog validation and CA - ADDLOCAL CSV value trimming for admin-supplied lists with whitespace - Dropped 2 stray `InstallDirDlg*` strings leaked into en-us - Added 5 missing fr-fr translations
…iants Backfills coverage that the original test suite (#1741-era) predates: - New `common.rs` consolidates the QUIC + mTLS test scaffolding so all E2E tests share one `bind_test_listener` helper instead of duplicating ~80 lines per test. - New `cert.rs` pins three invariants of `agent-tunnel/src/cert.rs`: - `sign_agent_csr` must encode the caller-passed agent_id in the URN SAN, *ignoring* the CSR's Common Name (#1775 review feedback — prevents a compromised agent from impersonating via CSR subject). - `extract_agent_id_from_pem` round-trips a freshly signed cert. - `extract_agent_id_from_pem` rejects a cert with no urn:uuid SAN (e.g. the CA root cert). - Extra `routing.rs` cases against a live `AgentTunnelHandle`: - `route_and_connect` errors on empty candidate slice. - `try_route` with handle present and no match returns `Ok(None)`. - `try_route` with handle present and missing explicit agent_id errors rather than silently falling back to direct. - New `cert_renewal_preserves_mtls_identity_e2e` runs the full QUIC `CertRenewalRequest` → `CertRenewalResponse::Success` round-trip with a renewal CSR filed under `CN=evil-impersonator`. Asserts the renewed cert still encodes the mTLS-authenticated `agent_id` — guards the #1775 handler's "never trust the CSR subject" comment with a test. 32 tests now pass; 3 parallel runs verified stable (~0.3s each).
Summary
The
CertRenewalRequest/CertRenewalResponsecontrol messages were defined back in #1738 but both ends only had stubs ("not supported in this build"). This PR wires up the real handlers so an agent's certificate can be renewed without a full re-enrollment, before its 1-year cert expires and locks the agent out.On the gateway side, the listener now signs the agent's renewal CSR with the existing
CaManager::sign_agent_csrand returns the new cert + CA bundle. Identity is taken from the mTLS-authenticated peer cert, never from the CSR subject — only the public key gets re-bound. On the agent side,run_single_connectionruns a one-shot renewal check right after the QUIC handshake: if the current cert is within 15 days of expiry, it sends aCertRenewalRequest, persists the new cert + CA, closes the connection, and surfaces aConnectionOutcome::CertRenewedthat tells the outer reconnect loop to re-handshake immediately with the new cert.This is deliberately a per-reconnect check rather than a periodic in-session timer. With a 120s idle timeout, 15s keep-alive, and the variety of network blips / VPN drops / host sleep / gateway restarts that hit any real deployment, the connection comes back through this path far more often than once every 15 days — so the periodic timer would only fire in scenarios that don't exist in practice.