fix(tbtc): monitor the ECDSA and FROST sortition pools independently#4120
Merged
mswilkison merged 2 commits intoJun 27, 2026
Merged
Conversation
During the ECDSA->FROST migration an operator is a member of BOTH sortition pools at once: existing ECDSA wallets keep draining via redemptions while FROST is live. The seven sortition.Chain methods on TbtcChain switch on hasFrostAuthorization() (true whenever FROST is configured), and a single MonitorPool loop consumed them -- so the loop labeled "legacy ECDSA sortition pool monitoring" actually maintained the FROST pool during overlap, and post-cutover (DisableLegacyECDSA=true) the only loop stopped, leaving the FROST pool -- the one new FROST wallet DKG selects from -- unmonitored. Bind monitoring explicitly per pool. Two sortition.Chain views (ecdsaSortitionChain, frostSortitionChain) route directly to their own registry/pool with no hasFrostAuthorization() switch, and the node runs one MonitorPool loop per pool: - ECDSA loop: existing flags + policy, now correctly ECDSA-bound. - FROST loop: new DisableFrostSortitionPoolMonitoring flag (default-on), beta policy only (the ECDSA pre-params gate does not apply to FROST DKG), gated on FROST being configured and INDEPENDENT of DisableLegacyECDSA so the FROST pool stays monitored during the drain and after the legacy pool is retired. The operator is not necessarily registered in both pools, so the FROST loop treats sortition.ErrOperatorUnknown (now exported) as non-fatal: it warns and leaves FROST monitoring inactive rather than aborting node startup. The legacy loop keeps its existing fail-fast (the operator is ECDSA-registered during the drain). TbtcChain's own sortition.Chain methods are left unchanged, so heartbeat and other callers are unaffected; GetOperatorID stays ECDSA-bound. Both loops' join/update txs already share tc.transactionMutex + tc.nonceManager, so they cannot race on the operator account nonce. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Plus Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
…lags #4120 added Config.Tbtc.DisableFrostSortitionPoolMonitoring (default-on FROST monitoring) but initTbtcFlags did not bind a CLI flag for it, unlike the legacy opt-out -- so operators starting the node via CLI flags could not exercise the advertised opt-out. Bind --tbtc.disableFrostSortitionPoolMonitoring and cover it in the flags test. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
8f85add
into
feat/frost-schnorr-migration-scaffold
17 checks passed
mswilkison
added a commit
that referenced
this pull request
Jun 27, 2026
…ror type (#4122) ## Summary Addresses a Codex finding (relayed via the #4119 review): `TbtcChain.GetWallet` derived a **legacy** wallet ID on **any** error from the canonical `walletID` accessor. For a FROST wallet on a canonical Bridge, a transient call failure would silently yield the left-padded legacy ID — and callers use `WalletChainData.WalletID` to choose **P2TR (FROST)** vs **P2WPKH (legacy)** scripts, so the node would build or search the **wrong wallet script**. ## Why route by scheme (revised after Codex P1) The first revision distinguished by error type (a sentinel for the missing accessor, surface everything else). Codex correctly flagged a **P1 regression**: a *legacy on-chain Bridge* built with the *current* generated binding still satisfies the accessor interface, so its missing `walletID` function returns a normal RPC/ABI error — not the sentinel — and that revision would surface it and **break `GetWallet` on exactly the legacy deployments the fallback exists for**. Error type cannot reliably separate "function absent on-chain" from "transient." So this routes by **scheme**, using the wallet's `EcdsaWalletID` (which `GetWallet` already reads, and which the codebase already uses to infer scheme — zero ⇒ FROST): - **Legacy ECDSA wallet** (`EcdsaWalletID != 0`): its canonical wallet ID *equals* its legacy derivation, so fall back on **any** accessor error — and it's the only option on a legacy Bridge lacking the accessor. - **FROST wallet** (`EcdsaWalletID == 0`): requires the canonical ID; **surface** the error rather than return a wrong legacy ID. A FROST wallet only exists on a canonical-ID Bridge, so such an error is genuinely transient. Logic is extracted into `resolveWalletID(bridge, walletPublicKeyHash, ecdsaWalletID)`. ## Tests `TestResolveWalletID` covers all four cases: accessor success → canonical; FROST + accessor error → surfaced; **legacy + accessor error → legacy fallback** (the P1 regression guard — verified to fail if the routing surfaces errors for legacy wallets); legacy + missing-accessor binding → legacy fallback. gofmt + `go vet` clean; full `pkg/chain/ethereum` suite passes. _Found during the Codex review batch on #4115–#4120; revised per the Codex P1 re-review._ Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> 🤖 Generated with [Claude Code](https://claude.com/claude-code)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Follow-up to #3866 (review finding #3). Fixes sortition-pool monitoring for the dual-pool window of the ECDSA→FROST migration.
During the migration an operator is a member of both sortition pools at once: existing ECDSA wallets keep draining via redemptions while FROST is live. The seven
sortition.Chainmethods onTbtcChainswitch onhasFrostAuthorization()(true whenever FROST is configured), and a singleMonitorPoolloop consumed them — so:DisableLegacyECDSA=false): the loop labeled "legacy ECDSA sortition pool monitoring" actually maintained the FROST pool; the ECDSA pool got no maintenance.DisableLegacyECDSA=true): the only loop stopped entirely, leaving the FROST pool — the one new FROST wallet DKG selects from — unmonitored.Change
Bind monitoring explicitly per pool. Two
sortition.Chainviews (ecdsaSortitionChain,frostSortitionChain) route directly to their own registry/pool with nohasFrostAuthorization()switch, and the node runs oneMonitorPoolloop per pool:DisableFrostSortitionPoolMonitoringflag (default-on), beta policy only (the ECDSA pre-params gate doesn't apply to FROST DKG), gated on FROST being configured and independent ofDisableLegacyECDSAso the FROST pool stays monitored during the drain and after the legacy pool is retired.The operator isn't necessarily registered in both pools, so the FROST loop treats
sortition.ErrOperatorUnknown(now exported) as non-fatal — it warns and leaves FROST monitoring inactive rather than aborting startup. The legacy loop keeps its existing fail-fast (the operator is ECDSA-registered during the drain).TbtcChain's ownsortition.Chainmethods are left unchanged, so heartbeat and other callers are unaffected;GetOperatorIDstays ECDSA-bound. Both loops' join/update txs already sharetc.transactionMutex+tc.nonceManager, so they cannot race on the operator account nonce.Design input baked in (per maintainer + Codex)
GetOperatorIDasymmetry preserved (separateGetFrostOperatorIDexists).Known limitation
MonitorPool(shared with the beacon) hard-returns at startup without starting its ticker, so an operator that registers for FROST after node start needs a restart to begin FROST monitoring. Logged clearly; deliberately not changing the sharedMonitorPoolhere.Tests / verification
tbtc_sortition_chain_views_test.gopins each view to its intended pool (legacy→ECDSA, FROST→FROST, unconfigured→nil) — negative-checked (mis-binding the legacy view to the FROST pool makes it fail).go vetclean; untaggedgo build ./...clean; builds under-tags 'frost_native frost_tbtc_signer cgo frost_roast_retry';pkg/sortition+pkg/chain/ethereumtests pass; fullpkg/tbtcsuite passes.Found during review of #3866.
🤖 Generated with Claude Code