Skip to content

feat(storage): full object checksum: integrate full-object checksum in AsyncMultiRangeDownloader#17263

Open
chandra-siri wants to merge 2 commits into
feat/full-object-rolling-checksumfrom
feat/downloader-checksum-integration
Open

feat(storage): full object checksum: integrate full-object checksum in AsyncMultiRangeDownloader#17263
chandra-siri wants to merge 2 commits into
feat/full-object-rolling-checksumfrom
feat/downloader-checksum-integration

Conversation

@chandra-siri
Copy link
Copy Markdown
Contributor

PR 3 of 3: Connect full-object checksum in AsyncMultiRangeDownloader, close stream on corruption, and add end-to-end system tests.

@chandra-siri chandra-siri requested a review from a team as a code owner May 26, 2026 20:54
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces checksum validation for full object downloads in the AsyncMultiRangeDownloader by tracking rolling checksums and handling DataCorruption exceptions. It also defers the CRC32C C-extension check to the download phase. The review feedback points out an optimization opportunity in the is_full_object_read heuristic to prevent unnecessary CRC32C computations when checksums are disabled or when the object is unfinalized.

Comment on lines +440 to +443
is_full_object_read = (
(offset == 0 and length == 0) or
(self.persisted_size is not None and offset == 0 and length == self.persisted_size)
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The current heuristic for is_full_object_read does not take into account whether checksum validation is enabled (enable_checksum) or if the object is finalized (self.is_finalized).

  1. If enable_checksum is False: The downloader will still set is_full_object_read to True and initialize rolling_checksum in _DownloadState, incurring unnecessary CRC32C computation overhead on every chunk downloaded.
  2. If self.is_finalized is False: For unfinalized appendable objects, there is no finalized server-side CRC32C to validate against. Computing the rolling checksum in this case is a waste of CPU cycles.

We should restrict is_full_object_read to only be True when both enable_checksum and self.is_finalized are True.

            is_full_object_read = (
                enable_checksum and
                self.is_finalized and
                (
                    (offset == 0 and length == 0) or
                    (self.persisted_size is not None and offset == 0 and length == self.persisted_size)
                )
            )

@chandra-siri chandra-siri changed the title feat(storage): integrate full-object checksum in AsyncMultiRangeDownloader feat(storage): full object checksum: integrate full-object checksum in AsyncMultiRangeDownloader May 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant