feat(storage): full object checksum: implement rolling checksum and verification in reads resumption strategy#17262
Conversation
…esumption strategy
There was a problem hiding this comment.
Code Review
This pull request introduces support for full-object rolling checksum verification during resumed reads, allowing verification once the stream completes. It also adds a flag to conditionally disable checksum validation. The review feedback suggests optimizing performance by bypassing rolling checksum updates when checksums are disabled, and recommends adding a unit test to verify that full-object checksum mismatches are ignored when checksum validation is disabled.
| if read_state.rolling_checksum is not None: | ||
| read_state.rolling_checksum.update(data) |
There was a problem hiding this comment.
If checksum_enabled is False, we should avoid updating the rolling checksum to prevent unnecessary CPU overhead, especially when downloading large objects where checksums have been explicitly disabled for performance.
| if read_state.rolling_checksum is not None: | |
| read_state.rolling_checksum.update(data) | |
| if checksum_enabled and read_state.rolling_checksum is not None: | |
| read_state.rolling_checksum.update(data) |
| def test_update_state_checksum_mismatch_ignored_when_disabled(self): | ||
| """Test that a CRC32C mismatch is ignored when enable_checksum is False.""" | ||
| self._add_download(_READ_ID) | ||
| self.state["enable_checksum"] = False | ||
| response = self._create_response(b"data", _READ_ID, offset=0, crc=999999) | ||
|
|
||
| # Should NOT raise DataCorruption! | ||
| self.strategy.update_state_from_response(response, self.state) |
There was a problem hiding this comment.
To ensure comprehensive test coverage, we should also verify that the full-object checksum verification is bypassed when enable_checksum is set to False.
def test_update_state_checksum_mismatch_ignored_when_disabled(self):
"""Test that a CRC32C mismatch is ignored when enable_checksum is False."""
self._add_download(_READ_ID)
self.state["enable_checksum"] = False
response = self._create_response(b"data", _READ_ID, offset=0, crc=999999)
# Should NOT raise DataCorruption!
self.strategy.update_state_from_response(response, self.state)
def test_update_state_full_object_checksum_ignored_when_disabled(self):
"""Test that full object checksum mismatch is ignored when enable_checksum is False."""
self._add_download(_READ_ID, offset=0, length=9, is_full_object_read=True)
self.state["enable_checksum"] = False
self.state["full_obj_server_crc32c"] = 111111 # Wrong server checksum!
resp1 = self._create_response(b"test", _READ_ID, offset=0)
self.strategy.update_state_from_response(resp1, self.state)
resp2 = self._create_response(b"data1", _READ_ID, offset=4, range_end=True)
# Should NOT raise DataCorruption!
self.strategy.update_state_from_response(resp2, self.state)
PR 2 of 3: Upgrade _DownloadState to support rolling checksums, and verify full-object crc32c at completion in _ReadResumptionStrategy.