Skip to content

REFACTOR rename SeedDatasetProvider.fetch_dataset to fetch_dataset_async#1735

Merged
romanlutz merged 4 commits into
microsoft:mainfrom
romanlutz:romanlutz/fetch-dataset-async-rename
May 15, 2026
Merged

REFACTOR rename SeedDatasetProvider.fetch_dataset to fetch_dataset_async#1735
romanlutz merged 4 commits into
microsoft:mainfrom
romanlutz:romanlutz/fetch-dataset-async-rename

Conversation

@romanlutz
Copy link
Copy Markdown
Contributor

What

Renames SeedDatasetProvider.fetch_dataset (and its 36 subclass overrides) to fetch_dataset_async, with a non-breaking deprecation bridge for both callers and external implementers.

Why

The project style guide mandates that all async functions and methods end in _async. SeedDatasetProvider.fetch_dataset was a long-standing systemic violation — surfaced during the review of #1715 (review comment).

How

  • Base class (pyrit/datasets/seed_datasets/seed_dataset_provider.py):
    • fetch_dataset_async is the new primary contract.
    • fetch_dataset is kept as a non-abstract concrete shim that emits DeprecationWarning and delegates to fetch_dataset_async. Preserves backward compatibility for external callers of the public API.
    • Bidirectional dispatch: external implementers who override only the legacy fetch_dataset continue to work; __init_subclass__ emits DeprecationWarning at class-definition time pointing them to the new name.
    • If neither is overridden, NotImplementedError is raised on call.
  • All 36 internal subclasses (1 local + 35 remote loaders): renamed async def fetch_datasetasync def fetch_dataset_async.
  • All internal call sites (tests/unit/datasets/, tests/integration/datasets/, tests/end_to_end/, tests/unit/executor/attack/single_turn/test_many_shot_jailbreak.py, doc/code/datasets/4_dataset_coding.{py,ipynb}): renamed .fetch_dataset(...).fetch_dataset_async(...).
  • 5 new unit tests in TestFetchDatasetDeprecation cover both directions of the deprecation bridge plus the no-override NotImplementedError path.

Out of scope

Verification

  • uv run pytest tests/unit/datasets tests/unit/setup tests/unit/executor/attack/single_turn/test_many_shot_jailbreak.py -n 4655 passed
  • uv run pytest tests/end_to_end/test_all_datasets.py -k "XSTest or SimpleSafety"2 passed (e2e discovery still works)
  • uv run ruff check and uv run ruff format --check — clean
  • uv run pre-commit run --all-files — all hooks pass

romanlutz and others added 2 commits May 14, 2026 21:48
…sync

Per the project style guide, all async methods must end in '_async'.
`SeedDatasetProvider.fetch_dataset` (and its 36 subclass overrides)
violated this convention.

This commit:

- Adds new `fetch_dataset_async` on the base class.
- Keeps the legacy `fetch_dataset` as a non-abstract concrete shim that
  emits `DeprecationWarning` and delegates to `fetch_dataset_async`.
  This preserves backward compatibility for external CALLERS of the
  public API.
- Adds bidirectional dispatch in the base so external IMPLEMENTERS who
  override only the legacy `fetch_dataset` continue to work, but
  `__init_subclass__` emits `DeprecationWarning` at class-definition
  time pointing them to the new name.
- Renames `async def fetch_dataset` to `async def fetch_dataset_async`
  in all 36 internal subclasses (1 local + 35 remote loaders).
- Renames `.fetch_dataset(...)` call sites to `.fetch_dataset_async(...)`
  in all internal callers (tests, end-to-end test, integration smoke
  test, doc notebook pair).
- Adds 5 new unit tests covering the deprecation bridge in both
  directions.

All 655 dataset/setup/executor unit tests pass; ruff lint and format
are clean. End-to-end discovery via
`SeedDatasetProvider.get_all_providers()` still works.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Comment thread pyrit/datasets/seed_datasets/seed_dataset_provider.py
romanlutz and others added 2 commits May 15, 2026 15:11
Per @hannahwestra25's review feedback on microsoft#1735, specify the planned removal version in all three DeprecationWarning messages, matching the convention used elsewhere in the codebase (e.g., prompt_target.py, audio_transcript_scorer.py).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…-rename' into romanlutz/fetch-dataset-async-rename
@romanlutz romanlutz added this pull request to the merge queue May 15, 2026
Merged via the queue into microsoft:main with commit 9bc03c0 May 15, 2026
48 checks passed
@romanlutz romanlutz deleted the romanlutz/fetch-dataset-async-rename branch May 15, 2026 22:46
eeee2345 added a commit to eeee2345/PyRIT that referenced this pull request May 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants