Skip to content

Cap IFD chain length in parse_all_ifds to prevent DoS#1530

Merged
brendancol merged 3 commits intoxarray-contrib:mainfrom
brendancol:security/ifd-chain-length-cap
May 9, 2026
Merged

Cap IFD chain length in parse_all_ifds to prevent DoS#1530
brendancol merged 3 commits intoxarray-contrib:mainfrom
brendancol:security/ifd-chain-length-cap

Conversation

@brendancol
Copy link
Copy Markdown
Contributor

Summary

parse_all_ifds in xrspatial/geotiff/_header.py walks the IFD chain via next_ifd_offset and deduplicates with a seen set of offsets. That catches cycles but not long acyclic chains. A crafted BigTIFF can chain millions of distinct IFD offsets, each pointing at a small valid IFD scattered through a sparse multi-GB file, forcing O(N) memory in attacker-controlled N. Finding S3 in a recent security audit of xrspatial.geotiff.

This PR caps the chain at MAX_IFDS = 256 and raises ValueError once that ceiling is hit. 256 is generous: real-world COGs carry the full-resolution IFD plus a handful of overview levels and (optionally) per-band masks, so they sit well under 64 even for deep pyramids. The cycle-detection seen set is preserved untouched.

This is the chain-length counterpart to S2 (#1527), which bounded per-IFD entry counts via MAX_IFD_ENTRY_COUNT. Threat model in both cases is untrusted TIFF input: web download, fsspec source, third-party catalog, user upload.

Changes

  • xrspatial/geotiff/_header.py: define module-level MAX_IFDS = 256 near the existing constants, and raise ValueError in parse_all_ifds when the chain reaches that length.
  • xrspatial/geotiff/tests/test_ifd_chain_cap.py: new test module.

Test plan

  • pytest xrspatial/geotiff/tests/test_ifd_chain_cap.py -x -q (6 passed)
  • pytest xrspatial/geotiff/tests/test_header.py xrspatial/geotiff/tests/test_overview_filter.py xrspatial/geotiff/tests/test_polish_1488.py -x -q (63 passed)
  • Real COG with several overview levels still parses (covered by test_legitimate_cog_with_overviews_passes)
  • Boundary check: a chain of MAX_IFDS - 1 parses; MAX_IFDS raises
  • Big-endian path covered

@github-actions github-actions Bot added the performance PR touches performance-sensitive code label May 8, 2026
@brendancol brendancol requested a review from Copilot May 9, 2026 01:22
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a hard upper bound on the number of TIFF Image File Directories (IFDs) that parse_all_ifds will traverse, preventing memory/time DoS via extremely long (but acyclic) next_ifd_offset chains in untrusted TIFF/BigTIFF inputs.

Changes:

  • Introduce module-level MAX_IFDS = 256 and raise ValueError when the parsed IFD chain reaches that limit.
  • Add a new test module exercising the cap (over-limit rejection, boundary behavior, message contents, and big-endian coverage).

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
xrspatial/geotiff/_header.py Defines MAX_IFDS and enforces it in parse_all_ifds with a ValueError to cap chain traversal.
xrspatial/geotiff/tests/test_ifd_chain_cap.py Adds tests validating the new IFD-chain length cap behavior, including boundary and big-endian scenarios.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread xrspatial/geotiff/tests/test_ifd_chain_cap.py Outdated
Comment thread xrspatial/geotiff/tests/test_ifd_chain_cap.py Outdated
brendancol added a commit to brendancol/xarray-spatial that referenced this pull request May 9, 2026
- Remove unused TAG_IMAGE_LENGTH import (F401).
- Fix test_chain_at_boundary_passes docstring: the summary said "Exactly
  MAX_IFDS is allowed" but the body and code show MAX_IFDS - 1 is the
  largest accepted chain. Reword the summary to match.
brendancol added 3 commits May 9, 2026 06:08
`parse_all_ifds` walks the IFD chain via `next_ifd_offset` and
deduplicates with a `seen` set of offsets. That catches cycles, but
not long acyclic chains: a crafted BigTIFF can chain millions of
distinct IFD offsets that each point at small valid IFDs scattered
through a sparse multi-GB file, forcing O(N) memory in attacker-
controlled N.

This change adds a `MAX_IFDS = 256` ceiling and raises `ValueError`
once the chain hits it. 256 is generous: real-world COGs carry the
full-resolution IFD plus a handful of overview levels and (optionally)
per-band masks, so they sit well under 64 even for deep pyramids. The
cycle-detection `seen` set is preserved untouched.

Threat model is untrusted TIFF input (web download, fsspec source,
third-party catalog, user upload). Counterpart to S2 (xarray-contrib#1527), which
bounded per-IFD entry counts.

Tests in test_ifd_chain_cap.py:
- chain past the cap is rejected with a clear `ValueError`
- chain at MAX_IFDS - 1 still parses
- chain at MAX_IFDS hits the cap (off-by-one boundary)
- error message names MAX_IFDS and the threat
- big-endian chains hit the same cap
- a real COG with overview levels parses unaffected
- Remove unused TAG_IMAGE_LENGTH import (F401).
- Fix test_chain_at_boundary_passes docstring: the summary said "Exactly
  MAX_IFDS is allowed" but the body and code show MAX_IFDS - 1 is the
  largest accepted chain. Reword the summary to match.
Switch parse_all_ifds from `len(ifds) >= MAX_IFDS` to
`len(ifds) > MAX_IFDS` so the cap matches the `> MAX_IFD_ENTRY_COUNT`
convention used elsewhere in this module: "MAX = N" reads as "up to and
including N is allowed". Update the boundary test to assert MAX_IFDS
parses cleanly and MAX_IFDS + 1 raises.
@brendancol brendancol force-pushed the security/ifd-chain-length-cap branch from 20de139 to 0e71676 Compare May 9, 2026 13:08
@brendancol brendancol merged commit b00ed45 into xarray-contrib:main May 9, 2026
10 of 11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants