Skip to content

fix(tbtc): monitor the ECDSA and FROST sortition pools independently#4120

Merged
mswilkison merged 2 commits into
feat/frost-schnorr-migration-scaffoldfrom
fix/dual-pool-sortition-monitoring
Jun 27, 2026
Merged

fix(tbtc): monitor the ECDSA and FROST sortition pools independently#4120
mswilkison merged 2 commits into
feat/frost-schnorr-migration-scaffoldfrom
fix/dual-pool-sortition-monitoring

Conversation

@mswilkison

Copy link
Copy Markdown
Contributor

Summary

Follow-up to #3866 (review finding #3). Fixes sortition-pool monitoring for the dual-pool window of the ECDSA→FROST migration.

During the migration an operator is a member of both sortition pools at once: existing ECDSA wallets keep draining via redemptions while FROST is live. The seven sortition.Chain methods on TbtcChain switch on hasFrostAuthorization() (true whenever FROST is configured), and a single MonitorPool loop consumed them — so:

  • During overlap (DisableLegacyECDSA=false): the loop labeled "legacy ECDSA sortition pool monitoring" actually maintained the FROST pool; the ECDSA pool got no maintenance.
  • Post-cutover (DisableLegacyECDSA=true): the only loop stopped entirely, leaving the FROST pool — the one new FROST wallet DKG selects from — unmonitored.

The ECDSA drain itself is unaffected either way: signing an existing wallet uses the locally-held key share + the wallet's fixed roster, never sortition-pool state. This is a pool-membership/selection-eligibility fix, not a fund-availability one.

Change

Bind monitoring explicitly per pool. Two sortition.Chain views (ecdsaSortitionChain, frostSortitionChain) route directly to their own registry/pool with no hasFrostAuthorization() switch, and the node runs one MonitorPool loop per pool:

  • ECDSA loop — existing flags + policy, now correctly ECDSA-bound (data path and beta policy read the ECDSA pool).
  • FROST loop — new DisableFrostSortitionPoolMonitoring flag (default-on), beta policy only (the ECDSA pre-params gate doesn't apply to FROST DKG), gated on FROST being configured and independent of DisableLegacyECDSA so the FROST pool stays monitored during the drain and after the legacy pool is retired.

The operator isn't necessarily registered in both pools, so the FROST loop treats sortition.ErrOperatorUnknown (now exported) as non-fatal — it warns and leaves FROST monitoring inactive rather than aborting startup. The legacy loop keeps its existing fail-fast (the operator is ECDSA-registered during the drain).

TbtcChain's own sortition.Chain methods are left unchanged, so heartbeat and other callers are unaffected; GetOperatorID stays ECDSA-bound. Both loops' join/update txs already share tc.transactionMutex + tc.nonceManager, so they cannot race on the operator account nonce.

Design input baked in (per maintainer + Codex)

  • FROST loop join policy = beta-operator only (no FROST equivalent of the ECDSA pre-params gate; FROST DKG readiness is announced separately).
  • Operator compensation is out-of-band, so ECDSA-pool staleness during the drain is benign; the FROST loop is the load-bearing one.
  • GetOperatorID asymmetry preserved (separate GetFrostOperatorID exists).

Known limitation

MonitorPool (shared with the beacon) hard-returns at startup without starting its ticker, so an operator that registers for FROST after node start needs a restart to begin FROST monitoring. Logged clearly; deliberately not changing the shared MonitorPool here.

Tests / verification

  • New tbtc_sortition_chain_views_test.go pins each view to its intended pool (legacy→ECDSA, FROST→FROST, unconfigured→nil) — negative-checked (mis-binding the legacy view to the FROST pool makes it fail).
  • gofmt + go vet clean; untagged go build ./... clean; builds under -tags 'frost_native frost_tbtc_signer cgo frost_roast_retry'; pkg/sortition + pkg/chain/ethereum tests pass; full pkg/tbtc suite passes.
  • Adversarial review (3 angles → verify): 0 confirmed findings; adapter fidelity confirmed line-for-line across all 13 methods; cross-loop nonce safety confirmed.

Found during review of #3866.

🤖 Generated with Claude Code

During the ECDSA->FROST migration an operator is a member of BOTH sortition
pools at once: existing ECDSA wallets keep draining via redemptions while FROST
is live. The seven sortition.Chain methods on TbtcChain switch on
hasFrostAuthorization() (true whenever FROST is configured), and a single
MonitorPool loop consumed them -- so the loop labeled "legacy ECDSA sortition
pool monitoring" actually maintained the FROST pool during overlap, and
post-cutover (DisableLegacyECDSA=true) the only loop stopped, leaving the FROST
pool -- the one new FROST wallet DKG selects from -- unmonitored.

Bind monitoring explicitly per pool. Two sortition.Chain views
(ecdsaSortitionChain, frostSortitionChain) route directly to their own
registry/pool with no hasFrostAuthorization() switch, and the node runs one
MonitorPool loop per pool:
- ECDSA loop: existing flags + policy, now correctly ECDSA-bound.
- FROST loop: new DisableFrostSortitionPoolMonitoring flag (default-on), beta
  policy only (the ECDSA pre-params gate does not apply to FROST DKG), gated on
  FROST being configured and INDEPENDENT of DisableLegacyECDSA so the FROST pool
  stays monitored during the drain and after the legacy pool is retired.

The operator is not necessarily registered in both pools, so the FROST loop
treats sortition.ErrOperatorUnknown (now exported) as non-fatal: it warns and
leaves FROST monitoring inactive rather than aborting node startup. The legacy
loop keeps its existing fail-fast (the operator is ECDSA-registered during the
drain). TbtcChain's own sortition.Chain methods are left unchanged, so heartbeat
and other callers are unaffected; GetOperatorID stays ECDSA-bound. Both loops'
join/update txs already share tc.transactionMutex + tc.nonceManager, so they
cannot race on the operator account nonce.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented Jun 27, 2026

Copy link
Copy Markdown

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 9eabbf7d-5ce6-4479-91ee-335063e00237

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/dual-pool-sortition-monitoring

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

…lags

#4120 added Config.Tbtc.DisableFrostSortitionPoolMonitoring (default-on FROST
monitoring) but initTbtcFlags did not bind a CLI flag for it, unlike the legacy
opt-out -- so operators starting the node via CLI flags could not exercise the
advertised opt-out. Bind --tbtc.disableFrostSortitionPoolMonitoring and cover it
in the flags test.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@mswilkison mswilkison merged commit 8f85add into feat/frost-schnorr-migration-scaffold Jun 27, 2026
17 checks passed
@mswilkison mswilkison deleted the fix/dual-pool-sortition-monitoring branch June 27, 2026 22:06
mswilkison added a commit that referenced this pull request Jun 27, 2026
…ror type (#4122)

## Summary

Addresses a Codex finding (relayed via the #4119 review):
`TbtcChain.GetWallet` derived a **legacy** wallet ID on **any** error
from the canonical `walletID` accessor. For a FROST wallet on a
canonical Bridge, a transient call failure would silently yield the
left-padded legacy ID — and callers use `WalletChainData.WalletID` to
choose **P2TR (FROST)** vs **P2WPKH (legacy)** scripts, so the node
would build or search the **wrong wallet script**.

## Why route by scheme (revised after Codex P1)

The first revision distinguished by error type (a sentinel for the
missing accessor, surface everything else). Codex correctly flagged a
**P1 regression**: a *legacy on-chain Bridge* built with the *current*
generated binding still satisfies the accessor interface, so its missing
`walletID` function returns a normal RPC/ABI error — not the sentinel —
and that revision would surface it and **break `GetWallet` on exactly
the legacy deployments the fallback exists for**. Error type cannot
reliably separate "function absent on-chain" from "transient."

So this routes by **scheme**, using the wallet's `EcdsaWalletID` (which
`GetWallet` already reads, and which the codebase already uses to infer
scheme — zero ⇒ FROST):

- **Legacy ECDSA wallet** (`EcdsaWalletID != 0`): its canonical wallet
ID *equals* its legacy derivation, so fall back on **any** accessor
error — and it's the only option on a legacy Bridge lacking the
accessor.
- **FROST wallet** (`EcdsaWalletID == 0`): requires the canonical ID;
**surface** the error rather than return a wrong legacy ID. A FROST
wallet only exists on a canonical-ID Bridge, so such an error is
genuinely transient.

Logic is extracted into `resolveWalletID(bridge, walletPublicKeyHash,
ecdsaWalletID)`.

## Tests

`TestResolveWalletID` covers all four cases: accessor success →
canonical; FROST + accessor error → surfaced; **legacy + accessor error
→ legacy fallback** (the P1 regression guard — verified to fail if the
routing surfaces errors for legacy wallets); legacy + missing-accessor
binding → legacy fallback. gofmt + `go vet` clean; full
`pkg/chain/ethereum` suite passes.

_Found during the Codex review batch on #4115#4120; revised per the
Codex P1 re-review._

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

🤖 Generated with [Claude Code](https://claude.com/claude-code)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant