feat(storage): full object checksum: integrate full-object checksum in AsyncMultiRangeDownloader#17263
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces checksum validation for full object downloads in the AsyncMultiRangeDownloader by tracking rolling checksums and handling DataCorruption exceptions. It also defers the CRC32C C-extension check to the download phase. The review feedback points out an optimization opportunity in the is_full_object_read heuristic to prevent unnecessary CRC32C computations when checksums are disabled or when the object is unfinalized.
| is_full_object_read = ( | ||
| (offset == 0 and length == 0) or | ||
| (self.persisted_size is not None and offset == 0 and length == self.persisted_size) | ||
| ) |
There was a problem hiding this comment.
The current heuristic for is_full_object_read does not take into account whether checksum validation is enabled (enable_checksum) or if the object is finalized (self.is_finalized).
- If
enable_checksumisFalse: The downloader will still setis_full_object_readtoTrueand initializerolling_checksumin_DownloadState, incurring unnecessary CRC32C computation overhead on every chunk downloaded. - If
self.is_finalizedisFalse: For unfinalized appendable objects, there is no finalized server-side CRC32C to validate against. Computing the rolling checksum in this case is a waste of CPU cycles.
We should restrict is_full_object_read to only be True when both enable_checksum and self.is_finalized are True.
is_full_object_read = (
enable_checksum and
self.is_finalized and
(
(offset == 0 and length == 0) or
(self.persisted_size is not None and offset == 0 and length == self.persisted_size)
)
)
PR 3 of 3: Connect full-object checksum in AsyncMultiRangeDownloader, close stream on corruption, and add end-to-end system tests.