Skip to content

Add timeout to standard pilot fetch#1255

Draft
peter941221 wants to merge 2 commits into
riverqueue:masterfrom
peter941221:fix/standard-pilot-fetch-timeout
Draft

Add timeout to standard pilot fetch#1255
peter941221 wants to merge 2 commits into
riverqueue:masterfrom
peter941221:fix/standard-pilot-fetch-timeout

Conversation

@peter941221
Copy link
Copy Markdown

Summary

Fix StandardPilot.JobGetAvailable so a stalled fetch does not hang a producer indefinitely.

Problem

producer.dispatchWork intentionally strips cancellation from the work context before fetching jobs so an in-flight fetch is allowed to complete during shutdown:

  • producer.go:744-766

That is reasonable, but StandardPilot.JobGetAvailable forwarded directly to exec.JobGetAvailable with no timeout at all:

  • rivershared/riverpilot/standard_pilot.go:18-22

This meant a stalled driver call could block a standard-pilot producer forever. The pro pilot already applies per-attempt fetch timeouts, so the standard pilot was the outlier.

Change

Add a 10-second timeout inside StandardPilot.JobGetAvailable before calling the driver.

This keeps the existing shutdown semantics intact:

  • fetches still ignore parent cancellation from dispatchWork
  • but they are now bounded, so a wedged DB call eventually returns instead of freezing the producer forever

The timeout is local to the standard pilot so there is no driver SQL change and no producer state-machine change.

Testing

  • added rivershared/riverpilot/standard_pilot_test.go
  • covered MaxToLock <= 0 no-op behavior
  • covered a hung JobGetAvailable call timing out with context.DeadlineExceeded
  • covered parent cancellation still winning when the incoming context is already canceled

Verification

Locally verified with:

  • GOPROXY=https://goproxy.cn,direct GOSUMDB=off go test ./rivershared/riverpilot -count=1

Closes #1026.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

JobGetAvailable call has no timeout

1 participant