Skip to content

gh-107398: Fix tarfile stream mode exception when process the file with the gzip extra field#126304

Merged
serhiy-storchaka merged 11 commits into
python:mainfrom
Zheaoli:manjusaka/gh107398
May 21, 2026
Merged

gh-107398: Fix tarfile stream mode exception when process the file with the gzip extra field#126304
serhiy-storchaka merged 11 commits into
python:mainfrom
Zheaoli:manjusaka/gh107398

Conversation

@Zheaoli
Copy link
Copy Markdown
Contributor

@Zheaoli Zheaoli commented Nov 1, 2024

Comment thread Lib/test/test_tarfile.py Outdated
Comment thread Misc/NEWS.d/next/Library/2024-11-02-02-02-31.gh-issue-107398.uUtA6Q.rst Outdated
@Zheaoli
Copy link
Copy Markdown
Contributor Author

Zheaoli commented Nov 4, 2024

@serhiy-storchaka PTAL when you got time(

@Zheaoli
Copy link
Copy Markdown
Contributor Author

Zheaoli commented Nov 22, 2024

@serhiy-storchaka ping (

@serhiy-storchaka serhiy-storchaka self-requested a review November 22, 2024 14:30
Copy link
Copy Markdown
Member

@serhiy-storchaka serhiy-storchaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see some refactoring, but except a struct.error the new code looks equivalent to the old one. Could you please point what was changed in the behavior?

Comment thread Lib/tarfile.py Outdated
if self.__read(2) != b"\037\213":
raise ReadError("not a gzip file")
if self.__read(1) != b"\010":
(method, flag, _) = struct.unpack("<BBIxx", self.__read(8))
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What will happen if it read less than 8 bytes?

Why use I and then ignore the result?

Comment thread Lib/tarfile.py Outdated
xlen = ord(self.__read(1)) + 256 * ord(self.__read(1))
self.read(xlen)
extra_len, = struct.unpack("<H", self.__read(2))
self.__read(extra_len)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps the only relevant change is in this line.

@Zheaoli
Copy link
Copy Markdown
Contributor Author

Zheaoli commented Nov 26, 2024

Hi @serhiy-storchaka Thanks for the review!

  1. You are right, the core change is self.__read(extra_len) so we can skip the header correctly
  2. I prefer use the struct.pack instead of cast the binary offset manually. I think this will make the code more readable and keep same style with gzip.py

@Zheaoli
Copy link
Copy Markdown
Contributor Author

Zheaoli commented May 22, 2025

@serhiy-storchaka PTAL when you get time

@github-actions
Copy link
Copy Markdown

This PR is stale because it has been open for 30 days with no activity.

@github-actions github-actions Bot added the stale Stale PR or inactive for long period of time. label Apr 19, 2026
@serhiy-storchaka serhiy-storchaka enabled auto-merge (squash) May 21, 2026 15:50
@serhiy-storchaka serhiy-storchaka added needs backport to 3.13 bugs and security fixes needs backport to 3.14 bugs and security fixes needs backport to 3.15 pre-release feature fixes, bugs and security fixes labels May 21, 2026
@serhiy-storchaka serhiy-storchaka changed the title gh-107398: Fix tarfile stream mode exception when process the file with extra header data gh-107398: Fix tarfile stream mode exception when process the file with the gzip extra field May 21, 2026
@serhiy-storchaka serhiy-storchaka merged commit 65f9932 into python:main May 21, 2026
144 of 149 checks passed
@miss-islington-app
Copy link
Copy Markdown

Thanks @Zheaoli for the PR, and @serhiy-storchaka for merging it 🌮🎉.. I'm working now to backport this PR to: 3.13, 3.14, 3.15.
🐍🍒⛏🤖

@bedevere-app
Copy link
Copy Markdown

bedevere-app Bot commented May 21, 2026

GH-150199 is a backport of this pull request to the 3.15 branch.

@bedevere-app bedevere-app Bot removed the needs backport to 3.15 pre-release feature fixes, bugs and security fixes label May 21, 2026
@bedevere-app
Copy link
Copy Markdown

bedevere-app Bot commented May 21, 2026

GH-150200 is a backport of this pull request to the 3.14 branch.

@bedevere-app bedevere-app Bot removed the needs backport to 3.14 bugs and security fixes label May 21, 2026
@bedevere-app
Copy link
Copy Markdown

bedevere-app Bot commented May 21, 2026

GH-150201 is a backport of this pull request to the 3.13 branch.

@bedevere-app bedevere-app Bot removed the needs backport to 3.13 bugs and security fixes label May 21, 2026
serhiy-storchaka added a commit that referenced this pull request May 21, 2026
…file with the gzip extra field (GH-126304) (GH-150201)

(cherry picked from commit 65f9932)

Co-authored-by: Nadeshiko Manju <me@manjusaka.me>
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
serhiy-storchaka added a commit that referenced this pull request May 21, 2026
…file with the gzip extra field (GH-126304) (GH-150200)

(cherry picked from commit 65f9932)

Co-authored-by: Nadeshiko Manju <me@manjusaka.me>
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
serhiy-storchaka added a commit that referenced this pull request May 21, 2026
…file with the gzip extra field (GH-126304) (GH-150199)

(cherry picked from commit 65f9932)

Co-authored-by: Nadeshiko Manju <me@manjusaka.me>
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

stale Stale PR or inactive for long period of time.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

tarfiles can't open tgz files with gzip features like FEXTRA & FCOMMENT when mode='r|*'

3 participants