Skip to content

Harden CybORG agent validation#106

Merged
Muhtasham merged 1 commit into
CodeClash-ai:mainfrom
Muhtasham:feat/cyborg-validation-hardening
Jun 23, 2026
Merged

Harden CybORG agent validation#106
Muhtasham merged 1 commit into
CodeClash-ai:mainfrom
Muhtasham:feat/cyborg-validation-hardening

Conversation

@Muhtasham

@Muhtasham Muhtasham commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

Summary

  • instantiate submitted CybORG MyAgent during validation using the same constructor fallbacks as the runtime
  • reject submissions that import and subclass correctly but would crash immediately during episode setup
  • add regression coverage for runtime-style construction and constructor failures

Why

A real OpenAI smoke run generated a CybORG agent that passed validation but crashed during evaluation because its constructor called RandomAgent.__init__(seed=...). The arena isolated the crash and scored it correctly, but validation should catch this before the round starts.

Verification

  • uv run pytest -q tests/arenas/test_cyborg.py tests/arenas/test_scml.py -> 14 passed
  • uv run ruff check codeclash/arenas/cyborg/cyborg.py tests/arenas/test_cyborg.py -> passed
  • uv run ruff check . -> passed
  • uv run pytest -q -> 187 passed
  • uv run python main.py configs/examples/CybORG__dummy__r1__s2.yaml -o /tmp/codeclash-cyborg-dummy.pUV7lx -> passed full Docker smoke, per-episode results had status: ok
  • uv run python main.py configs/examples/SCML__dummy__r1__s2.yaml -o /tmp/codeclash-scml-dummy.WT6xqD -> passed full Docker smoke, per-sim score details written
  • long local SCML smoke: tournament.rounds=5, sims_per_round=5, n_steps=8, output /tmp/codeclash-scml-long.CZnbSS -> passed 6 competition executions with 30 total SCML worlds
  • long local CybORG smoke: tournament.rounds=5, episodes_per_round=5, steps_per_episode=10, output /tmp/codeclash-cyborg-long.ZgGyM7 -> passed 6 competition executions with 30 total CybORG episodes, all visible episode records had status: ok

Note: I also ran short real OpenAI-backed smoke loops for SCML and CybORG to verify the coding-agent path. Those are integration smoke evidence only, not benchmark results.

CI note: pre-commit and test pass on this PR. markdown-link-check currently fails on an unrelated pre-existing 404 in codeclash/viewer/_STATIC_README.md.

@Muhtasham Muhtasham force-pushed the feat/cyborg-validation-hardening branch 2 times, most recently from 2494fc4 to 6dafcbf Compare June 23, 2026 00:33
@Muhtasham Muhtasham merged commit 4263ffe into CodeClash-ai:main Jun 23, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant