Skip to content

fix(frost): refuse scaffold FFI signing path without operator opt-in#3960

Merged
mswilkison merged 1 commit into
feat/frost-schnorr-migration-scaffoldfrom
fix/frost-readiness-dkg-placeholder-pubkeys-2026-05-22
May 22, 2026
Merged

fix(frost): refuse scaffold FFI signing path without operator opt-in#3960
mswilkison merged 1 commit into
feat/frost-schnorr-migration-scaffoldfrom
fix/frost-readiness-dkg-placeholder-pubkeys-2026-05-22

Conversation

@mswilkison

Copy link
Copy Markdown
Contributor

Summary

Closes the persistence-vs-execution gap that #3959 left open. Stacked on #3959.

The gap

#3959 fenced creation (resolver refuses to BUILD `legacy-wallet-pubkey` signer material without `KEEP_CORE_FROST_TBTC_SIGNER_ACCEPT_SCAFFOLD_KEY_GROUP=1`) and result acceptance (FFI primitive refuses to substitute the Rust-returned key group for the placeholder). What it left open: scaffold material persisted from a previous opted-in session can still drive the FFI signing path on later runs after the operator has unset the flag. The signing path feeds `buildTaggedTBTCSignerDKGPlaceholderPublicKeyHex(identifier)` — a non-curve-point 3-byte string like `"020001"` — into the Rust signer's `RunDKG` for every included participant.

Fix

Refuse to enter `signWithTBTCSignerCoarseEngine` when the payload's `KeyGroupSource == "legacy-wallet-pubkey"` and the env opt-in is not set. The fence sits immediately after the signer material is decoded, before the engine availability check, before member-set determination, before `buildTaggedTBTCSignerRunDKGInputsForIncludedMembers` runs — so placeholder participant pubkeys are never built when the flag is unset.

  • Per-call (not cached) — matches the contract on `AcceptScaffoldKeyGroupEnabled`.
  • Wraps `ErrNativeCryptographyUnavailable` so callers that already handle that condition treat this the same way.
  • Error message references both the env var and the placeholder source for actionable operator diagnostics.

Tests

  • New regression: `TestBuildTaggedLegacyCompatibleNativeExecutionFFISigningPrimitive_Sign_TBTCSignerPath_RefusesScaffoldMaterialWithoutOptIn` registers a working mock engine, omits the env var, calls `Sign`, asserts the refusal references both the env var and the placeholder source, and confirms `RunDKG`/`StartSignRound`/`FinalizeSignRound` were never called.
  • Opt-in additions to existing scaffold-path tests so they continue to exercise the path past the fence: `..._BootstrapVersion_InvalidCoarseSignatureFallsBack`, `..._NoEngineNoLegacyShare`, `..._AttemptVariationRunDKGConflictFallsBack`, `..._BootstrapVersion_AttemptVariationStartSignRoundConflictFallsBack`, `..._InvalidAttemptPolicy_DoesNotFallback`, `..._ConsumedAttemptReplay_DoesNotFallback`.

Verification

Local (GOCACHE under `/private/tmp`):

  • `go test -tags 'frost_native frost_tbtc_signer' ./pkg/frost/... ./pkg/bitcoin` — PASS
  • `go test -tags 'frost_native frost_tbtc_signer' ./pkg/tbtc -run 'TestConfigureFrostSigningBackend|TestNewNode_ConfiguresFrostSigningBackend|TestSigningExecutor_Sign|TestRegisterSignerMaterialResolverForBuild'` — PASS

Out of scope

L5 (FFI status-code semantics) and M4 / M7 (ROAST bounded transition evidence + ROAST-aware retry) are tracked as separate follow-ups.

#3959 fenced creation of scaffold-era signer material (resolver refuses
to build `legacy-wallet-pubkey` material without the env opt-in) and
result acceptance (FFI primitive refuses to substitute the Rust-returned
key group for the placeholder when source is scaffold). What neither
fix covered: scaffold material persisted from a previous opted-in
session can still drive the FFI signing path on later runs after the
operator has unset the flag. The signing path feeds
`buildTaggedTBTCSignerDKGPlaceholderPublicKeyHex(identifier)` — a
non-curve-point 3-byte string like `"020001"` — into the Rust signer's
`RunDKG` for every included participant. The bootstrap signer is
permissive about this shape; a future production signer would not be,
and the per-cycle leak of placeholder identifiers into telemetry and
blame attribution is its own concern.

Close the persistence-vs-execution gap by refusing to enter the FFI
signing path at all when the payload is scaffold-era and the operator
has not actively opted in for this process. The check sits at the top
of `signWithTBTCSignerCoarseEngine` immediately after the signer
material is decoded — before the engine availability check, before
member-set determination, before `buildTaggedTBTCSignerRunDKGInputsForIncludedMembers`
runs — so placeholder participant pubkeys are never built when the
flag is unset.

The fence is per-call (not cached), matching the contract on
`AcceptScaffoldKeyGroupEnabled`: flipping the env back unset recovers
fail-closed behavior without a restart. The error wraps
`ErrNativeCryptographyUnavailable` so any caller that already handles
the "native cryptography is unavailable" condition treats this the
same way; downstream callers that want to surface a more specific
"scaffold refused" diagnostic can match on `AcceptScaffoldKeyGroupEnvVar`
in the error message.

New regression test pins the fence behavior:
`..._Sign_TBTCSignerPath_RefusesScaffoldMaterialWithoutOptIn` registers
a working mock engine, omits the env var, calls `Sign`, and asserts
the refusal references both the env var and the placeholder source,
plus that `RunDKG`, `StartSignRound`, and `FinalizeSignRound` were
all not called.

The existing scaffold-path tests
(`..._BootstrapVersion_LegacyKeyGroupSourceUsesRunDKGResult`,
`..._BootstrapVersion_InvalidCoarseSignatureFallsBack`,
`..._NoEngineNoLegacyShare`,
`..._AttemptVariationRunDKGConflictFallsBack`,
`..._BootstrapVersion_AttemptVariationStartSignRoundConflictFallsBack`,
`..._InvalidAttemptPolicy_DoesNotFallback`,
`..._ConsumedAttemptReplay_DoesNotFallback`) opt in via `t.Setenv`
so they continue to exercise the scaffold path beyond the fence.

Verification (local, GOCACHE under /private/tmp):

  go test -tags 'frost_native frost_tbtc_signer' ./pkg/frost/...
  go test -tags 'frost_native frost_tbtc_signer' ./pkg/tbtc -run \
    'TestConfigureFrostSigningBackend|TestNewNode_ConfiguresFrostSigningBackend|TestSigningExecutor_Sign|TestRegisterSignerMaterialResolverForBuild'

All pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Base automatically changed from fix/frost-readiness-scaffold-opt-in-2026-05-22 to feat/frost-schnorr-migration-scaffold May 22, 2026 22:48
@mswilkison mswilkison merged commit 8546179 into feat/frost-schnorr-migration-scaffold May 22, 2026
15 checks passed
@mswilkison mswilkison deleted the fix/frost-readiness-dkg-placeholder-pubkeys-2026-05-22 branch May 22, 2026 22:48
mswilkison added a commit that referenced this pull request May 22, 2026
…#3961)

## Summary

\`isBuildTaggedTBTCSignerConsumedAttemptReplayError\` previously did a
case-insensitive substring match on \`err.Error()\` looking for
\`attempt_id\` plus \`already consumed for sign attempt\`. Any cosmetic
rewording of the Rust-side format string on either side would silently
disable replay detection. This PR makes the detector prefer the
structured \`code\` field on the FFI error envelope when it is
reachable, falling back to substring matching only for older signer
builds.

Stacked on #3960.

The matching Rust-side change is a separate PR on \`tlabs-xyz/tbtc\`
that introduces the dedicated \`EngineError::ConsumedAttemptReplay\`
variant with stable code \`consumed_attempt_replay\`.

## Changes

- New \`buildTaggedTBTCSignerStructuredError\` type carries \`Code\` and
\`Message\` from the FFI envelope.
\`buildTaggedTBTCSignerResultStatusError\` wraps it via \`%w\` so
callers can extract via \`errors.As\`.
- \`buildTaggedTBTCSignerErrorMessage\` refactored into
\`buildTaggedTBTCSignerErrorPayload\` returning the struct. Rendered
chain string still formats as \`\"code: message\"\` so log readers see
no change.
- Detector implements a small policy ladder:
  1. Structured envelope with \`consumed_attempt_replay\` code → replay.
2. Structured envelope with legacy \`validation_error\` code →
substring-match only the \`Message\` field (pre-dedicated-variant
signers route the replay path through Validation).
  3. Any other recognized code → not a replay (authoritative).
4. No structured envelope reachable → substring-match the whole rendered
string (pre-envelope signers).
- Two new tests:
- \`TestIsBuildTaggedTBTCSignerConsumedAttemptReplayError\` — eight
cases covering all four ladder branches.
- \`TestBuildTaggedTBTCSignerErrorPayload\` — structured / message-only
/ empty / malformed payloads.

## Verification

- \`go test -tags 'frost_native frost_tbtc_signer' ./pkg/frost/...
./pkg/bitcoin\` — PASS
- \`go test -tags 'frost_native frost_tbtc_signer' ./pkg/tbtc -run
'TestConfigureFrostSigningBackend|TestNewNode_ConfiguresFrostSigningBackend|TestSigningExecutor_Sign|TestRegisterSignerMaterialResolverForBuild'\`
— PASS
mswilkison added a commit that referenced this pull request May 23, 2026
…#3979)

## Summary

**Closes Phase 5 of RFC-21.** Adds the explicit-operator-opt-in
gate that protects production builds with the
\`frost_roast_retry\` tag from entering the orchestration path
without an explicit operator decision.

Production builds with the build tag enabled now refuse to wire
orchestration unless the operator sets
\`KEEP_CORE_FROST_ROAST_RETRY_ENABLED=true\`. The pattern matches
the existing
\`KEEP_CORE_FROST_TBTC_SIGNER_ACCEPT_SCAFFOLD_KEY_GROUP\` env var
from PR #3960: a build tag enables the code path, an env var
enables the wiring, both must agree for the feature to be live.

Stacked on #3978 (Phase 5.2).

## What lands

### New file: \`pkg/frost/signing/roast_retry_readiness.go\` (untagged)

| Surface | Notes |
|---|---|
| \`RoastRetryReadinessOptInEnvVar\` constant |
\`KEEP_CORE_FROST_ROAST_RETRY_ENABLED\` |
| \`ErrRoastRetryReadinessOptOut\` sentinel | for \`errors.Is\`
detection |
| \`EnsureRoastRetryReadinessOptIn() error\` | returns nil if env var is
\`"true"\` (case-insensitive, whitespace-trimmed); error otherwise |
| \`RoastRetryReadinessOptInEnabled() bool\` | boolean helper |
| Per-call (not cached) | operators can flip without restart |

### \`roast_retry_orchestration.go\` extended

\`BeginOrchestrationForSession\` now calls
\`EnsureRoastRetryReadinessOptIn\` **before** consulting the
registry. The env var is the load-bearing gate; missing opt-in
short-circuits orchestration even when the registry has a real
coordinator.

## Why per-call rather than init-time

The scaffold-opt-in env var from PR #3960 is also evaluated per
call. Operators can flip it during a debugging session without
restart. Same reasoning here -- the readiness env var is a kill
switch, not a build constant.

## Test coverage

| File | Cases |
|---|---|
| \`roast_retry_readiness_test.go\` (untagged, 7 cases) | accepts
\`"true"\`; case-insensitive; whitespace-trimmed; rejects unset (with
sentinel + env-var-name in error); rejects other values; bool helper
mirrors error; **env var name matches RFC-21 spec (locks the name)** |
| \`roast_retry_orchestration_frost_roast_retry_test.go\` (extended) |
New \`TestBeginOrchestrationForSession_ErrorsWhenReadinessOptInUnset\`
-- asserts a registered coordinator alone is not enough; missing env var
still short-circuits orchestration |

## Verification

| Command | Result |
|---|---|
| \`go build ./...\` | clean |
| \`go test ./pkg/frost/signing/...\` | pass (default build) |
| \`go test -tags 'frost_roast_retry' ./pkg/frost/signing/...\` | pass |
| \`go test -race -tags 'frost_native frost_tbtc_signer
frost_roast_retry' ./pkg/frost/...\` | pass (5 packages) |
| \`staticcheck -checks '-SA1019' ./pkg/frost/...\` | silent |
| \`go vet ./pkg/frost/...\` | clean |
| \`gofmt -l ./pkg/frost/signing/\` | silent |

## Phase 5 complete

| PR | Scope | State |
|---|---|---|
| 5.1 (#3977) | Retry adapter scaffolding | open |
| 5.2 (#3978) | Orchestration helpers + TTL sweep | open |
| **5.3 (this)** | **Readiness-gate env-var guard** | **open** |

Phase 6 will wire the production call sites and migrate the three
FROST/tbtc-signer receive loops onto the adapter in a single
coordinated change.

## Test plan

- [ ] CI green.
- [ ] Reviewer confirms the case-insensitive / whitespace-trimmed
parsing is acceptable. (Alternatives: strict \`"true"\` only.)
- [ ] Reviewer confirms the per-call evaluation is acceptable.
(Alternatives: read once at init.)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant