-
Type:
Bug
-
Resolution: Unresolved
-
Priority:
Minor
-
Component/s: gerrit-trigger-plugin
-
None
-
Environment:Jenkins: 2.277.1
gerrit-trigger:2.33.0
Prerequisites:
- Gerrit is set up with replication
- All git operations in jenkins jobs are running from replicas
- Gerrit trigger plugin is configured to block builds until replication is completed
Scenario:
- Change is pushed to refs/for/master in gerrit
- Jenkins has a pre-submit job configured for this repo, triggered by patchset-created-event in Gerrit trigger plugin.
- This job is enqueued
- Awaits for replication in queue
- Job starts
- Job completes successfully.
- Same change is submitted / merged in gerrit
- Another post-submit job on same change in jenkins should run, triggered via change-merged-event.
Nothing else has been merged in between so the merge revision will be the exact same as the one which ran through the pre-submit job above.- Job is enqueued
- ROOT CAUSE: Job does NOT await replication - Due to ReplicationQueueTaskDispatcher.updateFromReplicationCache containing a cache on the same ref from #2.2 above
(This cache only uses server, host, ref, project as a key) - Job is started
- Code in the job assumes that change is merged to master and that it can simply run git fetch origin master
- ERROR SYMPTOM: - Replica has the ref, but it has not yet received the push event for master so the fetch will serve potentially old head of master!
Â
The scenario above has been confirmed with logs on the merge event stating "processed a replication event from the cache" even before the replica has received the push.
Â
The thesis here is that change-merge-event should never use the replication cache, or the cache has to be extended to also take branch pushes into account, not just refs. Because post-submit jobs often want to assume that master-branch is in a consistent state and contains the most recent changes. They don't directly operate on the GERRIT_REFSPEC as you would in a pre-submit job.
Alternatively each job should have an option for using the replication cache or not since only the job author knows if the job uses "master" or "GERRIT_REFSPEC".
Â
As of now there is no pretty alternative except sleeping or polling in the job. I see there is also a config for cache expiration which i guess could be set to 1 sec or something to effectively disable it completely, but that will have global impact.