perf(flagd): run all 3 e2e resolver modes concurrently via @TestFactory#1753
perf(flagd): run all 3 e2e resolver modes concurrently via @TestFactory#1753aepfli wants to merge 15 commits into
Conversation
Replace the single shared Docker Compose stack with a pre-warmed ContainerPool. Each Cucumber scenario borrows its own isolated ContainerEntry (flagd + envoy + temp dir), eliminating the process-level contention that prevented parallel execution. Key changes: - ContainerEntry: encapsulates a single Docker Compose stack + temp dir - ContainerPool: manages a fixed-size pool with acquire/release semantics and reference counting so multiple suite runners sharing a JVM only start/stop containers once - ProviderSteps: borrows a container per scenario, replaces global API.shutdown() with per-provider NoOpProvider swap through the SDK lifecycle (properly detaches event emitters) - State: carries the borrowed ContainerEntry and provider domain name Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Simon Schrottner <simon.schrottner@dynatrace.com>
Enable cucumber.execution.parallel.enabled=true with fixed parallelism matching the container pool size (2). Correctness safeguards: - @env-var scenarios serialised behind an ENV_VARS exclusive resource lock (requires @env-var tag in test-harness, see companion PR) - @Grace scenarios serialised behind a CONTAINER_RESTART lock to avoid reconnection timeouts under parallel container restarts - ConfigCucumberTest disables parallelism entirely (env-var mutations in <0.4s suite — no benefit, avoids races) - EventSteps: drain-based event matching replaces clear() to prevent stale events from satisfying later assertions Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Simon Schrottner <simon.schrottner@dynatrace.com>
Temporary: CI needs the @env-var tag from flagd-testbed#359. Revert to released branch once that PR is merged and tagged. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Simon Schrottner <simon.schrottner@dynatrace.com>
Switch Cucumber plugin from 'pretty' (prints every step) to 'summary' (only prints failures and a final count). Keeps CI logs readable. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Simon Schrottner <simon.schrottner@dynatrace.com>
Switch Cucumber strategy from 'fixed' to 'dynamic' (factor=1.0, i.e. one thread per available processor). ContainerPool default pool size also scales with availableProcessors() so pool slots match thread count. Both are still overridable: -Dflagd.e2e.pool.size=N -Dcucumber.execution.parallel.config.dynamic.factor=N Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Simon Schrottner <simon.schrottner@dynatrace.com>
Default pool size was Runtime.availableProcessors() which on large machines (22 CPUs) spawned too many simultaneous Docker Compose stacks and caused ContainerLaunchException. Cap at min(availableProcessors, 4). Cucumber threads still scale with CPUs (dynamic factor=1) — extra threads simply block waiting for a free container, which is safe. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Simon Schrottner <simon.schrottner@dynatrace.com>
Replace three sequential @suite runner classes (RunRpcTest, RunInProcessTest, RunFileTest) with a single RunE2ETests class using Jupiter @testfactory methods. Each factory (rpc, inProcess, file) runs its Cucumber engine concurrently via @execution(CONCURRENT), giving wall-clock time ≈ max(RPC, InProcess, File) instead of their sum (~2:00 vs ~3:20 in Maven/CI). Full scenario tree is preserved in IDEs: each factory returns Stream<DynamicNode> mirroring the Cucumber TestPlan via CucumberResultListener. Container pool uses JVM shutdown hook for lifecycle (no explicit init/shutdown needed from test classes) and a Semaphore to serialize disruptive container operations across parallel engines. Envoy clusters now use connect_timeout=0.25s and active TCP health checks (interval=1s) so upstream reconnection after flagd restart is detected within one health-check cycle rather than waiting for the next client connection. Known parallel-load failures (also present in base branch sequentially): - file()[4][1-3]: FILE resolver lacks flag-set metadata support (SDK limitation) - inProcess()[3][2], [6][1-3], [8][2]: contextEnrichment pre-existing failures - inProcess()[2][5-7]: TargetUri scenarios sensitive to shared container pool contention; all 3 engines share 4 containers so gRPC init occasionally hits the 2s deadline under peak load. Pass reliably in sequential mode. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Simon Schrottner <simon.schrottner@dynatrace.com>
The retryBackoffMaxMs option controls both the initial-connection throttle in SyncStreamQueueSource (when getMetadata() fails) and the post-disconnect reconnect backoff. Under parallel load, envoy's upstream gRPC connection to flagd may not be established when the first getMetadata() call fires. The call times out after deadline=1000ms, shouldThrottle is set, and the retry waits retryBackoffMaxMs=2000ms — beyond the waitForInitialization window of deadline*2=2000ms. Reducing retryBackoffMaxMs breaks the reconnect event tests that need a slow-enough backoff for error events to fire. Exclude @targetURI from the inProcess @testfactory until flagd issue #1584 is resolved (removing getMetadata() entirely), at which point the throttle timing problem disappears and these scenarios can be re-enabled. RPC @targetURI scenarios are unaffected (different code path, no metadata call). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Simon Schrottner <simon.schrottner@dynatrace.com>
There was a problem hiding this comment.
Code Review
This pull request refactors the flagd E2E test suite to support concurrent execution of RPC, in-process, and file resolver modes using JUnit 5 @testfactory. It introduces a lazy-initialized ContainerPool with a JVM shutdown hook, a semaphore for managing container restarts, and improved synchronization for gRPC and file availability. Feedback identifies a thread-safety issue in CucumberResultListener and suggests optimizing ContainerPool initialization to reduce lock contention.
- CucumberResultListener: replace LinkedHashSet/LinkedHashMap with thread-safe ConcurrentHashMap equivalents. Cucumber runs scenarios in parallel via PARALLEL_EXECUTION_ENABLED_PROPERTY_NAME, so the listener collections are written from multiple threads during launcher.execute(). - ContainerPool.ensureInitialized: replace synchronized method with double-checked locking (fast-path initialized.get() before entering synchronized block). After the pool is warmed up, concurrent acquire() calls skip the lock entirely and go straight to pool.take(). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Simon Schrottner <simon.schrottner@dynatrace.com>
fd899ba to
26ec47a
Compare
Resolve conflicts after #1752 merged to main: - ContainerPool: keep TestFactory design (shutdown hook + lazy init + restart semaphore) - ProviderSteps: keep per-instance resolverType and restart-slot handling - test-harness submodule: bump to v3.8.0, which already contains the env-var tag and envoy health-check changes from the temporary feat/add-env-var-tag branch - .gitmodules: drop temporary branch pin - Keep RunRpcTest/RunInProcessTest/RunFileTest deleted (replaced by RunE2ETests)
|
Caution Review failedAn error occurred during the review process. Please try again later. 📝 WalkthroughWalkthroughFlagd e2e tests now lazily start a shared container pool, store resolver mode in per-scenario state, wait for file or gRPC readiness before provider startup, coordinate restart-slot access around stop/restart actions, and run resolver-specific Cucumber suites through a JUnit Platform listener-backed runner. ChangesFlagd e2e harness refactor
Sequence Diagram(s)sequenceDiagram
participant RunE2ETests
participant Launcher
participant CucumberResultListener
participant TestPlan
RunE2ETests->>Launcher: discover(LauncherDiscoveryRequest)
Launcher-->>RunE2ETests: TestPlan
RunE2ETests->>Launcher: execute(request, CucumberResultListener)
Launcher->>CucumberResultListener: executionStarted / executionFinished / executionSkipped
RunE2ETests->>RunE2ETests: buildNodes(TestPlan roots and children)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
…3.8.0 Two fixes so the parallel @testfactory e2e suite passes against test-harness v3.8.0: 1. Resolver glue recursion: the three resolver @before hooks lived under the shared STEPS glue package (e2e.steps.resolver.*). Cucumber scans glue packages recursively, so every engine loaded all three hooks and state.resolverType was set non-deterministically by whichever ran last — causing cross-resolver contamination (e.g. an @In-Process scenario built with the FILE resolver, swapped selector results, caching scenarios losing their cache). Moved them to a sibling package e2e.resolver.* so each engine loads only its own hook. 2. Added fractional-v1 to the excludeTags of all three resolvers, matching the existing RunRpcTest/RunInProcessTest/RunFileTest runners on main (the provider does not implement the v1 fractional bucketing that v3.8.0's fractional scenarios assert). Signed-off-by: Simon Schrottner <simon.schrottner@gmail.com>
…ol start The container pool starts POOL_SIZE stacks in parallel on first acquire(). Testcontainers resolves its DockerClientProviderStrategy lazily via a non thread-safe ServiceLoader, so concurrent first-time resolution intermittently threw ConcurrentModificationException, leaving containers unstarted and producing 'Mapped port can only be obtained after the container is started' errors. Resolve the client once, single-threaded, before the parallel starts. Signed-off-by: Simon Schrottner <simon.schrottner@gmail.com>
@env-var scenarios mutate process-global environment variables (e.g. FLAGD_SYNC_PORT) that a concurrently-running config-default scenario reads at build() time, causing intermittent wrong-value assertions (e.g. in-process port resolving to 9999 instead of 8015). The existing ENV_VARS read-write lock only serialized @env-var scenarios against each other. Give every scenario in an engine a READ lock on ENV_VARS (keyed by its include tag); @env-var scenarios escalate to read-write, so they run exclusively while no other scenario in the engine reads config. Signed-off-by: Simon Schrottner <simon.schrottner@gmail.com>
eae5d7f to
bd86a10
Compare
Notes on the concurrent
|
Old (3 sequential @Suite runners) |
New (RunE2ETests, 3 modes concurrent) |
|
|---|---|---|
| in-process | 90.1s | — |
| rpc | 69.6s | — |
| file | 47.5s | — |
| e2e test phase | ~207s (sum) | ~130s (CI: 134s) |
≈ 35–40% faster (~1.6×), ~75s saved per run.
Worth knowing: the concurrent run (~130s) is actually slower than the slowest single mode (in-process, 90s) — the three engines contend for the shared container pool (POOL_SIZE = min(cores, 4)) and CPU, so the gain is ~1.6×, not ~3×. The bottleneck moves from "sum of modes" to "container pool + CPU saturation."
Developer experience (the real upside)
- Single entry point instead of three runner classes.
@TestFactoryproduces an expandable per-scenario tree in IDEs — each engine → feature → scenario shows accurate pass/fail/skip, and individual scenarios are re-runnable straight from the IDE.- Opens the door to richer reporting and per-scenario re-runs that the
@Suiterunners couldn't offer.
Fragility (the thing to be aware of going forward)
The speedup comes from running three concurrent Cucumber engines in one JVM, which is unconventional and has sharp edges. This branch had to fix three contamination bugs that all stem from it:
- Resolver-glue recursion — the per-resolver
@Beforehooks lived under the sharedstepsglue package; Cucumber scans glue recursively, so every engine loaded all three andresolverTypewas set non-deterministically (an@in-processscenario could build with the FILE resolver). Fixed by moving them to a sibling package. - Process-global env vars —
@env-varscenarios mutate real environment variables that concurrent config-default scenarios read atbuild()time. Cucumber'sexclusive-resourceslocks only coordinate within a single engine, not across the three, and env vars are JVM-global. Fixed with per-engine read locks onENV_VARS. - Testcontainers init race —
DockerClientProviderStrategyresolves via a non-thread-safeServiceLoader; parallel first-time pool startup threwConcurrentModificationException. Fixed by warming the Docker client once before the parallel starts.
Net: the isolation model works, but it's load-bearing and non-obvious. Anyone adding scenarios or tags later needs to keep it in mind, or contamination can quietly return.
Possibility for later (not for this PR)
Most of the gain is just from parallelizing the three modes. The same win is achievable by running them as three separate Surefire forks (JVM-level isolation) instead of three engines in one JVM — that removes the entire class of cross-talk bugs above, at the cost of ~3× container-pool memory. Worth considering if the in-JVM model ever becomes a maintenance burden.
There was a problem hiding this comment.
Actionable comments posted: 6
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
providers/flagd/src/test/java/dev/openfeature/contrib/providers/flagd/e2e/steps/ProviderSteps.java (1)
47-60: 🩺 Stability & Availability | 🟠 Major | ⚡ Quick winKeep teardown
/stopunder the restart-slot semaphore.
ContainerPooldefines this semaphore as the JVM-wide guard for disruptive containerstop/restartoperations, buttearDown()releases it before its own/stopcall and never acquires it for ordinary cleanup. That reintroduces overlapping container shutdowns during parallel teardown, which is the overload this guard is meant to prevent.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@providers/flagd/src/test/java/dev/openfeature/contrib/providers/flagd/e2e/steps/ProviderSteps.java` around lines 47 - 60, Keep the teardown /stop call protected by the restart-slot semaphore in ProviderSteps.tearDown(): currently the method releases state.restartSlotAcquired before invoking the client-driven /stop request, and it never acquires the semaphore for normal cleanup, so update the teardown flow to acquire the ContainerPool restart slot before any disruptive stop/restart action and release it only after the stop and ContainerPool.release cleanup complete. Use the existing state.restartSlotAcquired, ContainerPool.acquire/release, and the tearDown() cleanup block to preserve the JVM-wide guard across all shutdown paths.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In
`@providers/flagd/src/test/java/dev/openfeature/contrib/providers/flagd/e2e/ContainerPool.java`:
- Around line 97-122: Initialization cleanup in ContainerPool does not stop
in-flight ContainerEntry::start tasks when one future.get() fails, so leaked
container starts can survive past the failed attempt. Update the exception path
around the future collection loop to cancel all submitted futures and ensure the
ExecutorService is fully terminated before rethrowing, in addition to stopping
entries already recorded in all. Use the existing ContainerPool initialization
block, the futures list, and the executor lifecycle in the try/catch/finally to
keep pool, all, and initialized consistent after failure.
- Around line 83-97: Ensure ContainerPool only sets initialized to true after
the startup path in ensureInitialized() completes successfully. Move the
initialized publication out of the early compareAndSet gate and into the success
path after pool startup finishes, so acquire() cannot observe a half-initialized
state and block on pool.take() if startup fails. Keep the existing
double-check/synchronization pattern in ensureInitialized(), but make sure any
exception during startup leaves initialized false so later callers can retry or
surface the Docker error instead of hanging.
In
`@providers/flagd/src/test/java/dev/openfeature/contrib/providers/flagd/e2e/ContainerUtil.java`:
- Around line 42-69: The waitForGrpcPort method in ContainerUtil currently falls
through when the deadline is reached, so readiness failures are hidden and later
show up as provider setup timeouts. Update waitForGrpcPort to fail fast by
throwing an exception after the retry loop expires, using the existing
container/port context from container, state, and mappedPort to make the error
clear. Keep the current socket polling behavior for RPC and IN_PROCESS, but
ensure the caller cannot continue if the gRPC port never becomes ready.
In
`@providers/flagd/src/test/java/dev/openfeature/contrib/providers/flagd/e2e/RunE2ETests.java`:
- Around line 90-125: The resolverTests method can silently return an empty
DynamicNode stream when discovery finds no scenarios, so add a fail-fast
assertion after launcher.discover(request) and before launcher.execute/request
tree building. Use the discovered TestPlan in resolverTests and verify it
contains at least one test node (e.g., by checking testPlan contents for
id.isTest()) so an empty selector/tag combination fails the suite instead of
reporting green.
- Around line 127-169: Surface non-success container outcomes in buildNodes()
instead of only replaying leaf test status. Update the
RunE2ETests.buildNodes(TestPlan, TestIdentifier, CucumberResultListener) logic
so container-level FAILED/ABORTED results from listener are turned into failing
or aborted DynamicNode entries, rather than causing children to be skipped or
the branch to be dropped. Use the existing listener accessors (wasStarted,
hasResult, getResult, getSkipReason) and the DynamicTest/DynamicContainer
creation path to preserve the real root cause in output.
In
`@providers/flagd/src/test/java/dev/openfeature/contrib/providers/flagd/e2e/steps/ProviderSteps.java`:
- Around line 167-176: The FILE resolver setup in ProviderSteps currently waits
for the offline flag file in a polling loop but then continues even if the file
never becomes readable. Update the FILE branch to detect when the deadline is
reached with a missing/empty file and abort setup immediately by failing the
step or throwing an error before building the provider. Use the existing
ProviderSteps logic around state.resolverType, FlagdOptions built, and flagFile
to keep the check close to the file-path validation.
---
Outside diff comments:
In
`@providers/flagd/src/test/java/dev/openfeature/contrib/providers/flagd/e2e/steps/ProviderSteps.java`:
- Around line 47-60: Keep the teardown /stop call protected by the restart-slot
semaphore in ProviderSteps.tearDown(): currently the method releases
state.restartSlotAcquired before invoking the client-driven /stop request, and
it never acquires the semaphore for normal cleanup, so update the teardown flow
to acquire the ContainerPool restart slot before any disruptive stop/restart
action and release it only after the stop and ContainerPool.release cleanup
complete. Use the existing state.restartSlotAcquired,
ContainerPool.acquire/release, and the tearDown() cleanup block to preserve the
JVM-wide guard across all shutdown paths.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: 01e9717c-de70-401e-a422-47284d776e6d
📒 Files selected for processing (12)
providers/flagd/src/test/java/dev/openfeature/contrib/providers/flagd/e2e/ContainerPool.javaproviders/flagd/src/test/java/dev/openfeature/contrib/providers/flagd/e2e/ContainerUtil.javaproviders/flagd/src/test/java/dev/openfeature/contrib/providers/flagd/e2e/CucumberResultListener.javaproviders/flagd/src/test/java/dev/openfeature/contrib/providers/flagd/e2e/RunE2ETests.javaproviders/flagd/src/test/java/dev/openfeature/contrib/providers/flagd/e2e/RunFileTest.javaproviders/flagd/src/test/java/dev/openfeature/contrib/providers/flagd/e2e/RunInProcessTest.javaproviders/flagd/src/test/java/dev/openfeature/contrib/providers/flagd/e2e/RunRpcTest.javaproviders/flagd/src/test/java/dev/openfeature/contrib/providers/flagd/e2e/State.javaproviders/flagd/src/test/java/dev/openfeature/contrib/providers/flagd/e2e/resolver/file/FileSetup.javaproviders/flagd/src/test/java/dev/openfeature/contrib/providers/flagd/e2e/resolver/inprocess/InProcessSetup.javaproviders/flagd/src/test/java/dev/openfeature/contrib/providers/flagd/e2e/resolver/rpc/RpcSetup.javaproviders/flagd/src/test/java/dev/openfeature/contrib/providers/flagd/e2e/steps/ProviderSteps.java
💤 Files with no reviewable changes (3)
- providers/flagd/src/test/java/dev/openfeature/contrib/providers/flagd/e2e/RunRpcTest.java
- providers/flagd/src/test/java/dev/openfeature/contrib/providers/flagd/e2e/RunFileTest.java
- providers/flagd/src/test/java/dev/openfeature/contrib/providers/flagd/e2e/RunInProcessTest.java
- ContainerPool: publish initialized only after all containers start, so a failed startup (incl. the Docker-client warmup) leaves it false and the next acquire() retries instead of blocking forever on an empty pool; cancel in-flight starts and shutdownNow on failure so no container leaks into the next attempt. - ContainerUtil.waitForGrpcPort: throw on timeout instead of returning, so a never-ready gRPC port surfaces here rather than as an opaque setProviderAndWait timeout later. - ProviderSteps: abort FILE setup when the offline flag file never becomes readable (only when the scenario expects a ready provider — the 'unavailable' scenario intentionally points at a missing file). - RunE2ETests: fail fast if a resolver discovers zero scenarios (tag/selector drift would otherwise silently drop a whole suite); materialize a failed/aborted container that produced no child nodes as a failing node instead of letting the branch vanish. Signed-off-by: Simon Schrottner <simon.schrottner@gmail.com>
Signed-off-by: Simon Schrottner <simon.schrottner@gmail.com>
Summary
Stacked on top of #1752. Replaces three sequential
@Suiterunner classes with a singleRunE2ETestsclass using Jupiter@TestFactorymethods, cutting wall-clock time from ~3:20 to ~2:00.Changes
1.
RunE2ETests— three concurrent@TestFactorymethodsEach factory (
rpc(),inProcess(),file()) launches a full Cucumber engine for its resolver mode. With@Execution(CONCURRENT)andjunit.jupiter.execution.parallel.enabled=true, all three engines run simultaneously:Each factory returns a
Stream<DynamicNode>mirroring the Cucumber TestPlan (engine → feature → scenario), so IDEs show the full expandable tree with accurate pass/fail/skip per scenario — individual scenarios can be re-run from IntelliJ.2.
CucumberResultListener— fullTestExecutionListenerCaptures every lifecycle event (started, finished, skipped, dynamic) from the Cucumber engine and maps them to
DynamicTestresults. The listener tracks both started and finished state to correctly report scenarios that started but did not complete.3.
ContainerPool— JVM shutdown hook + restart semaphoreensureInitialized()viaAtomicBoolean— pool starts once on firstacquire(), shared across all three concurrent enginesinitialize()/shutdown()— no lifecycle calls needed from test classesSemaphore(1)serialises disruptive container operations (stop/restart) across parallel engines to prevent cascading init timeouts4. Per-resolver glue packages
RpcSetup,InProcessSetup,FileSetup— each a simple@Beforehook that setsstate.resolverType, allowing all three engines to share the same step definitions with isolated per-scenario state.5. Deleted
RunRpcTest,RunInProcessTest,RunFileTestNo longer needed —
RunE2ETestsreplaces all three.6. Envoy cluster improvements (test-harness submodule)
Added
connect_timeout: 0.25sand active TCP health checks (interval: 1s) to both flagd clusters in envoy config. Envoy now detects and recovers from upstream flagd restarts within one health-check cycle.Known limitations
@targetURI @in-processscenarios (7) are excluded from the parallel run. Root cause:retryBackoffMaxMscontrols both the initial-connection throttle inSyncStreamQueueSource(whengetMetadata()fails because envoy's upstream isn't ready yet) and the post-disconnect reconnect backoff. These cannot be tuned independently — reducing the backoff for fast initial connection breaks the reconnect event timing tests. Tracked in flagd#1584 — oncegetMetadata()is removed, these can be re-enabled by removing"targetURI"from theinProcess()excludeTags.file()[4][1-3]— FILE resolver lacks flag-set metadata support (SDK limitation, pre-existing)inProcess()[3][2],[6][1-3],[8][2]— contextEnrichment failures (pre-existing, also fail on base branch)Run from repo root
./mvnw -pl providers/flagd -Pe2e test