Feat/bot leaderboard/v2.3#4435
Conversation
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
🚀 Preview EnvironmentYour preview environment is ready!
Details
ℹ️ Preview Environment InfoIsolation:
Limitations:
Cleanup:
|
Metac bots with metac_bot metadata but used as internal agents (metac-azimuth, metac-agent) should not be included in leaderboard calculations. Co-authored-by: Cursor <cursoragent@cursor.com>
lsabor
left a comment
There was a problem hiding this comment.
Nothing looked out of place to me in what @colesussmeier added. I'm the initial author, so I can't approve, but feel free to merge as is or address the tiny nit I added.
bb7f574 to
842c814
Compare
* Make estimate_variances_from_head_to_head return a uniform tuple
Always return tuple[float, float | None] instead of conditionally
returning either a bare float or a tuple, so callers have a single
shape to unpack. The second element stays None unless
include_discrimination is set.
Co-authored-by: Cursor <cursoragent@cursor.com>
* Extract AIB_PROJECT_IDS module constant
Replace the AIB project-id list duplicated inside gather_data with a
single module-level AIB_PROJECT_IDS constant.
Co-authored-by: Cursor <cursoragent@cursor.com>
* Add aib_minibench_only question-selection mode
Factor the question project filter into a project_filter Q object and
add an aib_minibench_only flag that restricts the leaderboard to AIB
and Minibench questions. Tag CSV output with the _AIBMiniB suffix.
Co-authored-by: Cursor <cursoragent@cursor.com>
* Drop the community aggregate on low-human and minibench questions
Add a min_human_forecasters threshold: on community questions with
fewer than that many distinct human forecasters, keep the question but
drop the Community Aggregate head-to-head matches. Do the same for
minibench questions, which have no real human crowd (also skip building
the aggregate for them in gather_data). Tag CSV output with _MinHF.
Co-authored-by: Cursor <cursoragent@cursor.com>
* Implement non_metac_bots_by_year
Replace the NotImplementedError with the per-year split for third-party
bots: rewrite their head-to-head ids to year-tagged strings ("name
(YYYY)"), parallel to the cp/pro aggregate split. This also drops them
from non_metac_bot_ids membership so the per-year history bypasses the
recency filter. Guard with an assert that include_non_metac_bots is set.
Co-authored-by: Cursor <cursoragent@cursor.com>
* Apply the participation threshold per parent across year splits
Add participation_parent_key to map year-split player ids
("... (YYYY)") to their parent, and apply min_participation_count to
the parent's combined question set. This keeps an established
aggregate/bot from being dropped just because individual per-year
slices are sparse.
Co-authored-by: Cursor <cursoragent@cursor.com>
* Combine year-split players into one leaderboard entry per model
Add combine_year_split_players, which collapses per-year community/pro
aggregates and non_metac_bots_by_year bots into a single combined entry
(contribution-count-weighted mean skill, CI via SE propagation, summed
counts), mirroring the front-end re-aggregation. Apply it to the
leaderboard DB save and CSV output while keeping the per-year fit
intact for the discrimination and distribution diagnostics.
Co-authored-by: Cursor <cursoragent@cursor.com>
* Update default run config and tidy parameter comments
Set the Command.handle() run configuration to the current v2.3 defaults
(include_minibench, min_human_forecasters, non_metac_bots_by_year, bot
recency/score windows, ALS off, etc.) and wire the new
aib_minibench_only / min_human_forecasters kwargs through the call.
Move the explanatory comments off the function signature.
Co-authored-by: Cursor <cursoragent@cursor.com>
* Remove low-human participation questions prior to gather_data step
* docstring note for combine_year_split_players
---------
Co-authored-by: Cursor <cursoragent@cursor.com>
No description provided.