Skip to content

gh-150075: tar.addfile() doesn't set member offsets#150278

Open
grantlouisherman wants to merge 1 commit into
python:mainfrom
grantlouisherman:fix-tar-addfile-no-offsets
Open

gh-150075: tar.addfile() doesn't set member offsets#150278
grantlouisherman wants to merge 1 commit into
python:mainfrom
grantlouisherman:fix-tar-addfile-no-offsets

Conversation

@grantlouisherman
Copy link
Copy Markdown

Original bug report:

Bug report

Bug description:

It appears that tarfile.TarFile.addfile() sets neither offset nor offset_data attributes in the TarInfo added to self.members, when it's done adding the file. Members in a reopened TAR have correct values.

Reproducer

from io import BytesIO
from tarfile import TarFile, TarInfo

with TarFile("test.tar", "w") as tar:
    data = b"data"

    tarinfo = TarInfo("test1.txt")
    tarinfo.size = len(data)
    tar.addfile(tarinfo, BytesIO(data))

    tarinfo = TarInfo("test2.txt")
    tarinfo.size = len(data)
    tar.addfile(tarinfo, BytesIO(data))

    for member in tar.getmembers():
        print(member.name, member.offset, member.offset_data)

with TarFile("test.tar", "r") as tar:
    for member in tar.getmembers():
        print(member.name, member.offset, member.offset_data)

Expected output

test1.txt 0 512
test2.txt 1024 1536
test1.txt 0 512
test2.txt 1024 1536

Actual output

test1.txt 0 0
test2.txt 0 0
test1.txt 0 512
test2.txt 1024 1536

CPython versions tested on:

3.15

Operating systems tested on:

Linux


Fix
I added the offset info for a given file based on the current offset of the file and then after the block size is calculated. I also included a test to validate the behavior

…file

Signed-off-by: grantlouisherman <grantlouisherman041@gmail.com>
Comment thread Lib/tarfile.py
self.offset += len(buf)
# add original offset to block size
bufsize=self.copybufsize
tarinfo.offset_data = self.offset
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hi there, this seems wrong. you are replacing information about where the file data starts with information about where the file header starts.

Copy link
Copy Markdown
Author

@grantlouisherman grantlouisherman May 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I could be incorrect in my understanding but the offset is showing where the current offset pointer is. Then we set the pointer for offset_data because now we have added the Len of the buff size which now points to the start of the data and then the offset on line 2378 adds the block size. So then when we add another file we start pointing at the end of the previous block.

@openly-retro
Copy link
Copy Markdown

Hi, I'm a Python dev with some free time on their hands.

@grantlouisherman looking through the code for this file, tarfile.offset is available "When TarFile is opened for reading" https://github.com/python/cpython/blob/main/Lib/tarfile.py#L2832

it makes sense that inside the context of writing TarInfo objects to the archive, that offset will not be accessible on the objects themselves at that time. it's not necessarily a bug just because you can't access it, even if your LLM told you it thinks there is a bug.

i see in your PR that you potentially copied some wrong attributes (copying offset into offset_data). think about the ramifications of that, and what that means for your test..

other ways to improve your tests are to assert against fixed, known values. otherwise your test can be reporting all value comparisons are correct but if they are all zeroes then it may be a false positive.

i also noticed that this is your third attempt to open a pull request with these changes, the previous two being closed because you were not sure of the correct git operations to do to update your branch. learning git is fun and having the skills to make sure you know what commands to run is a great help, as opening and closing pull requests, in a public repo with over 2k open pull requests, makes a lot of noise and slows people down from getting your changes properly reviewed and merged.

Please reply with:

  1. what is the git command that you will use to rebase this branch on main before this PR can get merged (so you can avoid making this PR for a fourth time)
  2. how did you discover the behavior here with addfile -- see Incorrect parsing of TarInfo header when GNU long name and type AREGTYPE are combined #141707 for a good example of the author sharing their discovery process
  3. what is the use case for your changes, and why is the solution going to help the library overall
  4. links to any open, reported bugs that this code change solves -- I see 101 open issues mentioning tarfile and 47 mentioning tarfile and offset
  5. is your issue different than the solution provided in https://github.com/python/cpython/pull/133919/changes , and if not, this issue may be a duplicate of tarfile: TarFile.offset attribute is not updated with the remainder when closing a TarFile #129255 -- what do you think?

@grantlouisherman
Copy link
Copy Markdown
Author

grantlouisherman commented May 24, 2026

@openly-retro
“it's not necessarily a bug just because you can't access it, even if your LLM told you it thinks there is a bug.” > an LLM didn’t report this, as you saw, it was a bug reported by a user. And the solution was something I figured out from just playing around with the code.

I respect that you probably know more than me but the mere fact that that is wrong I can’t take an LLM comment seriously when it starts with a delusion. I’d rather for @ethanfurman to confirm or deny these assertions that you made.

@openly-retro
Copy link
Copy Markdown

it was a bug reported by a user

sorry I missed that part, you can ignore points 2-4 in my original comment, I should direct those questions to the person who started the original issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants