Feat/bot leaderboard/v2.3 by lsabor · Pull Request #4435 · Metaculus/metaculus

lsabor · 2026-02-26T19:00:45Z

No description provided.

…eaderboard/v2.3

coderabbitai · 2026-02-26T19:00:55Z

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 61c05057-700c-45ec-8cf9-95b6088710ec

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/bot-leaderboard/v2.3

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-02-26T19:16:21Z

🚀 Preview Environment

Your preview environment is ready!

Resource	Details
🌐 Preview URL	https://metaculus-pr-4435-feat-bot-leaderboard-v2-3-preview.mtcl.cc
📦 Docker Image	`ghcr.io/metaculus/metaculus:feat-bot-leaderboard-v2.3-72a65ac`
🗄️ PostgreSQL	NeonDB branch `preview/pr-4435-feat-bot-leaderboard-v2-3`
⚡ Redis	Fly Redis `mtc-redis-pr-4435-feat-bot-leaderboard-v2-3`

Details

Commit: 3fd3ebcb237f35a258e5d5671b198b2c25537a31
Branch: feat/bot-leaderboard/v2.3
Fly App: metaculus-pr-4435-feat-bot-leaderboard-v2-3

ℹ️ Preview Environment Info

Isolation:

PostgreSQL and Redis are fully isolated from production
Each PR gets its own database branch and Redis instance
Changes pushed to this PR will trigger a new deployment

Limitations:

Background workers and cron jobs are not deployed in preview environments
If you need to test background jobs, use Heroku staging environments

Cleanup:

This preview will be automatically destroyed when the PR is closed

…eaderboard/v2.3

Metac bots with metac_bot metadata but used as internal agents (metac-azimuth, metac-agent) should not be included in leaderboard calculations. Co-authored-by: Cursor <cursoragent@cursor.com>

lsabor

Nothing looked out of place to me in what @colesussmeier added. I'm the initial author, so I can't approve, but feel free to merge as is or address the tiny nit I added.

* Make estimate_variances_from_head_to_head return a uniform tuple Always return tuple[float, float | None] instead of conditionally returning either a bare float or a tuple, so callers have a single shape to unpack. The second element stays None unless include_discrimination is set. Co-authored-by: Cursor <cursoragent@cursor.com> * Extract AIB_PROJECT_IDS module constant Replace the AIB project-id list duplicated inside gather_data with a single module-level AIB_PROJECT_IDS constant. Co-authored-by: Cursor <cursoragent@cursor.com> * Add aib_minibench_only question-selection mode Factor the question project filter into a project_filter Q object and add an aib_minibench_only flag that restricts the leaderboard to AIB and Minibench questions. Tag CSV output with the _AIBMiniB suffix. Co-authored-by: Cursor <cursoragent@cursor.com> * Drop the community aggregate on low-human and minibench questions Add a min_human_forecasters threshold: on community questions with fewer than that many distinct human forecasters, keep the question but drop the Community Aggregate head-to-head matches. Do the same for minibench questions, which have no real human crowd (also skip building the aggregate for them in gather_data). Tag CSV output with _MinHF. Co-authored-by: Cursor <cursoragent@cursor.com> * Implement non_metac_bots_by_year Replace the NotImplementedError with the per-year split for third-party bots: rewrite their head-to-head ids to year-tagged strings ("name (YYYY)"), parallel to the cp/pro aggregate split. This also drops them from non_metac_bot_ids membership so the per-year history bypasses the recency filter. Guard with an assert that include_non_metac_bots is set. Co-authored-by: Cursor <cursoragent@cursor.com> * Apply the participation threshold per parent across year splits Add participation_parent_key to map year-split player ids ("... (YYYY)") to their parent, and apply min_participation_count to the parent's combined question set. This keeps an established aggregate/bot from being dropped just because individual per-year slices are sparse. Co-authored-by: Cursor <cursoragent@cursor.com> * Combine year-split players into one leaderboard entry per model Add combine_year_split_players, which collapses per-year community/pro aggregates and non_metac_bots_by_year bots into a single combined entry (contribution-count-weighted mean skill, CI via SE propagation, summed counts), mirroring the front-end re-aggregation. Apply it to the leaderboard DB save and CSV output while keeping the per-year fit intact for the discrimination and distribution diagnostics. Co-authored-by: Cursor <cursoragent@cursor.com> * Update default run config and tidy parameter comments Set the Command.handle() run configuration to the current v2.3 defaults (include_minibench, min_human_forecasters, non_metac_bots_by_year, bot recency/score windows, ALS off, etc.) and wire the new aib_minibench_only / min_human_forecasters kwargs through the call. Move the explanatory comments off the function signature. Co-authored-by: Cursor <cursoragent@cursor.com> * Remove low-human participation questions prior to gather_data step * docstring note for combine_year_split_players --------- Co-authored-by: Cursor <cursoragent@cursor.com>

lsabor added 12 commits February 14, 2026 08:08

save work

2c4aa4b

Merge branch 'main' of github.com:Metaculus/metaculus into feat/bot-l…

65b12ed

…eaderboard/v2.3

save work

c4d4a9d

save work

bfd9e50

save work

5ce53e8

Merge branch 'main' of github.com:Metaculus/metaculus into feat/bot-l…

db79fa0

…eaderboard/v2.3

verbose flag

d46cec0

Merge branch 'main' of github.com:Metaculus/metaculus into feat/bot-l…

c8b305b

…eaderboard/v2.3

save work

895ebc3

Merge branch 'main' of github.com:Metaculus/metaculus into feat/bot-l…

4a7ff5a

…eaderboard/v2.3

Merge branch 'main' of github.com:Metaculus/metaculus into feat/bot-l…

45b87b9

…eaderboard/v2.3

add displaying coverage

fc4f649

lsabor had a problem deploying to testing_env February 26, 2026 19:01 — with GitHub Actions Failure

lsabor temporarily deployed to testing_env February 26, 2026 19:01 — with GitHub Actions Inactive

lsabor temporarily deployed to Preview February 26, 2026 19:11 — with GitHub Actions Inactive

save work

81dccc6

lsabor had a problem deploying to testing_env February 28, 2026 01:56 — with GitHub Actions Failure

lsabor temporarily deployed to testing_env February 28, 2026 01:56 — with GitHub Actions Inactive

lsabor temporarily deployed to Preview February 28, 2026 02:01 — with GitHub Actions Inactive

github-actions Bot temporarily deployed to Preview March 1, 2026 07:15 Inactive

github-actions Bot temporarily deployed to Preview March 2, 2026 15:24 Inactive

github-actions Bot temporarily deployed to Preview March 8, 2026 07:22 Inactive

github-actions Bot temporarily deployed to Preview March 15, 2026 07:20 Inactive

Merge branch 'main' of github.com:Metaculus/metaculus into feat/bot-l…

f0d7f52

…eaderboard/v2.3

lsabor temporarily deployed to testing_env March 21, 2026 15:09 — with GitHub Actions Inactive

lsabor had a problem deploying to testing_env March 21, 2026 15:09 — with GitHub Actions Failure

lsabor temporarily deployed to Preview March 21, 2026 15:17 — with GitHub Actions Inactive

github-actions Bot temporarily deployed to Preview March 22, 2026 07:17 Inactive

lsabor temporarily deployed to Preview May 20, 2026 19:28 — with GitHub Actions Inactive

lambda tweak added to v2.3

c389717

colesussmeier temporarily deployed to testing_env May 21, 2026 20:39 — with GitHub Actions Inactive

colesussmeier had a problem deploying to testing_env May 21, 2026 20:39 — with GitHub Actions Failure

colesussmeier linked an issue May 21, 2026 that may be closed by this pull request

FutureEval Leaderboard Additions #4752

Open

4 tasks

colesussmeier temporarily deployed to Preview May 21, 2026 20:47 — with GitHub Actions Inactive

github-actions Bot had a problem deploying to Preview May 24, 2026 07:56 Failure

github-actions Bot had a problem deploying to Preview May 25, 2026 09:36 Failure

Exclude agent bots from global bot leaderboard scoring.

765d01d

Metac bots with metac_bot metadata but used as internal agents (metac-azimuth, metac-agent) should not be included in leaderboard calculations. Co-authored-by: Cursor <cursoragent@cursor.com>

colesussmeier had a problem deploying to testing_env June 5, 2026 20:29 — with GitHub Actions Failure

colesussmeier had a problem deploying to testing_env June 5, 2026 20:29 — with GitHub Actions Error

colesussmeier had a problem deploying to Preview June 5, 2026 20:32 — with GitHub Actions Error

add ALS

842c814

colesussmeier had a problem deploying to testing_env June 5, 2026 20:33 — with GitHub Actions Failure

colesussmeier temporarily deployed to testing_env June 5, 2026 20:33 — with GitHub Actions Inactive

colesussmeier temporarily deployed to Preview June 5, 2026 20:36 — with GitHub Actions Inactive

lsabor commented Jun 7, 2026

View reviewed changes

Comment thread scoring/management/commands/update_global_bot_leaderboard.py Outdated

github-actions Bot had a problem deploying to Preview June 14, 2026 08:32 Failure

colesussmeier had a problem deploying to testing_env June 19, 2026 20:07 — with GitHub Actions Failure

colesussmeier temporarily deployed to testing_env June 19, 2026 20:07 — with GitHub Actions Inactive

colesussmeier temporarily deployed to Preview June 19, 2026 20:16 — with GitHub Actions Inactive

colesussmeier force-pushed the feat/bot-leaderboard/v2.3 branch from bb7f574 to 842c814 Compare June 19, 2026 20:48

colesussmeier had a problem deploying to testing_env June 19, 2026 20:48 — with GitHub Actions Failure

colesussmeier temporarily deployed to testing_env June 19, 2026 20:48 — with GitHub Actions Inactive

colesussmeier temporarily deployed to Preview June 19, 2026 20:51 — with GitHub Actions Inactive

github-actions Bot had a problem deploying to Preview June 21, 2026 08:31 Failure

colesussmeier temporarily deployed to testing_env June 22, 2026 19:32 — with GitHub Actions Inactive

colesussmeier had a problem deploying to testing_env June 22, 2026 19:32 — with GitHub Actions Failure

colesussmeier deployed to Preview June 22, 2026 19:35 — with GitHub Actions View deployment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feat/bot leaderboard/v2.3#4435

Feat/bot leaderboard/v2.3#4435
lsabor wants to merge 20 commits into
mainfrom
feat/bot-leaderboard/v2.3

lsabor commented Feb 26, 2026

Uh oh!

coderabbitai Bot commented Feb 26, 2026 •

edited

Loading

Review skipped

Uh oh!

github-actions Bot commented Feb 26, 2026 •

edited

Loading

Uh oh!

lsabor left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

lsabor commented Feb 26, 2026

Uh oh!

coderabbitai Bot commented Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Uh oh!

github-actions Bot commented Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🚀 Preview Environment

Details

Uh oh!

lsabor left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

coderabbitai Bot commented Feb 26, 2026 •

edited

Loading

github-actions Bot commented Feb 26, 2026 •

edited

Loading