Harden flaky merge-architectures generative test#289
Conversation
The merge-architectures fdef intermittently crashed in CI with "Couldn't
satisfy such-that predicate after 100 tries."
clojure.spec's gensub wraps every generator — including custom :gen ones —
with (such-that #(valid? spec %) g 100). Across the large composite generator
behind merge-architectures (variants x architectures), several sub-generators
emitted spec-invalid values or couldn't produce one, starving that filter:
- ::variant's ::core/all branch dissoc'd :distro / :base-image / :base-image-tag
/ :docker-tag, all required by ::variant-base, so every ::all variant it built
was invalid and got filtered (starving when they clustered). It was also wrong
— real ::all/latest variants from ->map keep those keys. Keep them (so ::all
variants are valid and actually exercised) and source :build-tool-versions
from its spec.
- :build-tool-versions was generated by gen/map over the two-element
#{"lein" "tools-deps"} key domain, which occasionally targets >2 distinct keys
and starves. Build it by construction (zip the tool names to versions).
- ::non-blank-string had no :gen, so gensub generated from string? and filtered
with such-that; at size 0 that yields "", and short strings collide in
:distinct collections (e.g. ::architectures). Generate 8-32 char alphanumerics
by construction.
- ::jdk-version likewise had no :gen; pos-int? yields values < 8. Generate in
[8,30].
- Bound the generated base-variant collection: a handful exercises the merge
fully and avoids amplifying rare residual starvation across a large
collection.
Separately, merge-architectures' :fn derived its key set from the first variant
and projected the rest onto it, miscounting heterogeneous variants (e.g. ::all
variants carrying :build-tool-versions). Count distinct (dissoc v :architecture)
instead, mirroring equal-except-architecture?.
This cuts the crash rate from ~50% to under 1% across 200k+ generative trials.
The remaining rare starvation is inherent to spec's gensub such-that filtering
over this composite generator; fully eliminating it would mean hand-rolling the
variant generators from primitives, left as a follow-up.
| :fn #(let [ret-count (-> % :ret count) | ||
| arg-variants (-> % :args :variants) | ||
| ;; Count the variants that are unique once architecture is | ||
| ;; ignored. This mirrors `equal-except-architecture?`, which |
There was a problem hiding this comment.
Hardening 👌
Just thinking out loud, non-blocking on this PR. This comment got me to checking out equal-except-architecture?, unique-variants, and the old variant-keys.
the domain concept here is "two builds are the same image except for CPU architecture." This equivalence relation is what merge-architectures builds on. But we seem to apply this concept ad hoc, and in two different ways:
- production:
equal-except-architecture?->some - test logic: previously with
select-keysand now withdissoc
They give the same count but they're not the same definition. Saying they 'mirror' each other isn't quite true.
Might be worth a follow up... a single source of truth for "same image except arch".
There was a problem hiding this comment.
Great call — followed it up in 70d4abe.
There were genuinely two definitions of "same image except architecture": production's equal-except-architecture? (compare-based equality on the variant minus :architecture) and the test (dissoc :architecture + set). Same count, different definitions, as you noted.
Both now collapse onto docker/full-tag as the single source of truth — it already encodes JDK, distro, and build tool but never the architecture, so it's the natural identity for "same image." equal-except-architecture? is now just (and (not= arch) (= full-tag full-tag)), and the test counts distinct full-tags. Dropped the now-unused equal? helper too.
The production merge (equal-except-architecture?) and the merge-architectures generative test each had their own definition of when two builds are the same image ignoring CPU architecture -- compare-based equality on the variant minus :architecture vs. a dissoc/set in the test. They agreed in count but weren't the same definition. Collapse both onto docker/full-tag, which already encodes JDK, distro, and build tool but never architecture, as the single source of truth. Drop the now -unused equal? helper.
Problem
The
docker-clojure.variant/merge-architecturesfdef generative test intermittently crashed CI withCouldn't satisfy such-that predicate after 100 tries(it flaked once on #286 and re-ran green). It was failing roughly half the time at higher trial counts.Root cause
clojure.spec'sgensubwraps every generator — including custom:gens — with(such-that #(valid? spec %) g 100). Anything in the spec tree feeding::variantthat emits a spec-invalid value, or can't produce a valid one at some generator size, starves that filter and throws. Several did:::variant's::core/allbranchdissoc'd:distro/:base-image/:base-image-tag/:docker-tag— all required by::variant-base— so every::allvariant it built was invalid and silently filtered (starving when they clustered). It was also just wrong: real::all/latest variants from->mapkeep those keys. Now kept (so::allvariants are valid and actually exercised).:build-tool-versionswas generated bygen/mapover the two-element#{"lein" "tools-deps"}key domain, which occasionally targets >2 distinct keys and starves. Now built by construction.::non-blank-string/::jdk-versionhad no:gen, sogensubgenerated fromstring?/pos-int?and filtered withsuch-that; at size 0 those yield""/ values< 8. (""also fails, and short strings collide in:distinctsets like::architectures.) Now generated valid by construction (8–32 char alphanumerics; ints in[8,30]).Also fixed
merge-architectures':fn, which derived its key set from the first variant and projected the rest onto it — miscounting heterogeneous variants (e.g.::allones carrying:build-tool-versions). It now counts distinct(dissoc v :architecture), mirroringequal-except-architecture?.Result
Cuts the crash rate from ~50% to under 1% across 200k+ generative trials (
bb testandcljfmtpass). The property itself is correct — 0 counterexamples in 12k+ direct samples.Known residual
A rare (<1%) starvation remains: it's inherent to spec's
gensubsuch-that(valid?)filtering over this large composite generator — each fixed source exposes another at similar rarity. Fully eliminating it would mean replacing spec-driven generation of::variant/::argswith hand-rolled primitive generators (nogensub). Left as a follow-up; happy to do it if preferred.