fix: live pre-merge build-status revalidation (BERTE-602)#268
fix: live pre-merge build-status revalidation (BERTE-602)#268charlesprost wants to merge 2 commits into
Conversation
Documents pytest binary location, GitHub release process, and the devdocs release checklist reference. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
cd4b948 to
61b5e95
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #268 +/- ##
==========================================
+ Coverage 89.28% 89.51% +0.22%
==========================================
Files 77 78 +1
Lines 10287 10493 +206
==========================================
+ Hits 9185 9393 +208
+ Misses 1102 1100 -2
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Introduces revalidate_build_status(), called immediately before the merge/queue decision in _handle_pull_request. Unlike check_build_status, it bypasses BUILD_STATUS_CACHE and fetches live state from the GitHub Actions API, then filters workflow runs strictly to each wbranch's own head_branch before aggregating. Root cause (PR artesca#5155): a push run on the PR branch was still in_progress when bert-e polled. The run shared a SHA with development/4, which had already completed successfully. AggregatedWorkflowRuns.state returned SUCCESSFUL because it grouped by branch then returned SUCCESSFUL if *any* branch succeeded, poisoning the cache entry and hiding the in-progress run on the w-branch. New AggregatedWorkflowRuns.state_for_branch(name) filters and deduplicates within a single branch before calling branch_state(), preventing cross- branch SHA poisoning. revalidate_build_status uses this method. Hosts without get_commit_status (e.g. Bitbucket) are skipped gracefully. Tests added in test_github_build_status.py document the poisoning scenario and verify the per-branch fix. test_revalidate_build_status.py covers the full decision matrix: PR 5155 shape → BuildInProgress, clean PR → passes, bypass/no-key/no-method → skipped, failed/no-runs → appropriate exception. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
61b5e95 to
93143c7
Compare
matthiasL-scality
left a comment
There was a problem hiding this comment.
Code review — non-blocking observations
| ) | ||
|
|
||
|
|
||
| def _make_run(*, run_id, sha, branch, status, conclusion, |
There was a problem hiding this comment.
Duplicate helper — _make_run is defined identically in test_github_build_status.py:232. Consider moving it to a shared conftest.py so both test modules can import it. Not a blocker, but avoids drift if the run schema changes.
| that shares the same SHA cannot cause a false SUCCESSFUL result. | ||
| Unlike .state, this never returns SUCCESSFUL for a different branch. | ||
| """ | ||
| conclusion_ranking = { |
There was a problem hiding this comment.
Duplicate dedup logic — conclusion_ranking and the best-by-workflow_id loop replicate the same logic already in remove_unwanted_workflows (lines 660–671). Consider extracting a private _best_runs_for(runs) helper that both methods can call. Not a blocker.
| assert worst_status == 'SUCCESSFUL' | ||
|
|
||
|
|
||
| def revalidate_build_status(job, wbranches): |
There was a problem hiding this comment.
Non-blocker — long-term fix should be tracked as an improvement.
This function is a sound safety net for the immediate incident. However, the root cause is that BUILD_STATUS_CACHE is keyed only by SHA (BUILD_STATUS_CACHE[build_key][sha]), so two branches sharing the same SHA compete for the same cache slot. The "never overwrite SUCCESSFUL" guard then permanently locks the slot once any branch on that SHA succeeds.
The durable fix would be to include the branch name in the cache key:
# today
BUILD_STATUS_CACHE[build_key][sha]
# proposed
BUILD_STATUS_CACHE[build_key][(sha, branch_name)]Each w-branch would then have its own independent cache slot; a renamed branch can never inherit a SUCCESSFUL result from its predecessor. This would also allow removing revalidate_build_status (or reducing it to a lightweight assertion) once the cache is correct.
This is not a blocker for this PR. Please open a follow-up Jira improvement ticket to track the cache key refactor.
Summary
revalidate_build_status(), called immediately before the merge/queue decision in_handle_pull_request. It bypassesBUILD_STATUS_CACHEand fetches live state from the GitHub Actions API, filtering workflow runs strictly to the current w-branch before aggregating.AggregatedWorkflowRuns.state_for_branch(name): filters and deduplicates within a single branch, preventing a sibling branch sharing the same SHA from producing a falseSUCCESSFUL.BuildInProgress).Root cause (artesca#5155, 2026-05-20)
The source branch was renamed; the new w-branch shared its commit SHA with the old w-branch. GitHub's API returned workflow runs for both names.
AggregatedWorkflowRuns.statepicked the best status across branches, returningSUCCESSFULbecause the old w-branch completed. That result was latched intoBUILD_STATUS_CACHE(which never overwritesSUCCESSFUL). When the peer approval landed 15 s later, Bert-E readSUCCESSFULfrom cache and merged — while the actual push CI on the new w-branch was still running.What this PR does
bert_e/git_host/github/__init__.pystate_for_branch(branch_name): filters_workflow_runstobranch_namebefore deduplicating byworkflow_id, so a sibling branch cannot mask the result. Logs the filtered run snapshot (branch + runs + conclusion) for post-mortem diagnostics.bert_e/workflow/gitwaterflow/__init__.pyrevalidate_build_status(job, wbranches): live, cache-bypassing final check at the merge decision boundary. Callsget_commit_status()directly (always hits the API), usesstate_for_branchfor GitHub Actions keys, skips gracefully for hosts withoutget_commit_status(e.g. Bitbucket)._handle_pull_requestafter the interactive confirm and beforebuild_queue_collection.Test plan
test_artesca5155_renamed_branch_inprogress_blocked— exact incident shape, assertsBuildInProgresstest_artesca5155_old_aggregator_would_have_passed— documents the pre-fix bug:AggregatedWorkflowRuns.statereturnsSUCCESSFULfor the same datatest_pr5155_shape_raises_build_in_progress— same SHA on w-branch (in_progress) and destination (success)state_for_branchunit tests: cross-branch poisoning, dedup within branch, workflow_dispatch exclusion, no-runs case🤖 Generated with Claude Code