-
Bug
-
Resolution: Unresolved
-
Major
-
None
-
Jenkins 2.252
Gerrit 3.2.3
plugin 0.4.4
AWS EC2 slaves
I haven't been able to pinpoint what triggers the issue, but seemingly randomly, when my multibranch pipeline job is triggered by a new patchset to a change, or using the "Re-run" link in the Gerrit Checks panel, the build sometimes skips the SCM checkout step, with nothing logged to the console, and nothing in the jenkins.log. It's as if it just ignores the checkout completely. I tried adding a checkout scm step to the pipeline, but that does not improve anything.
When scanning for changes (which runs on the master), it is able to communicate properly with the Gerrit server, finds the change refs, and triggers the build as expected. But after selecting the slave agent to run on, the SCM checkout simply does nothing. The console shows:
12:59:57 Branch indexing 12:59:57 > git rev-parse --is-inside-work-tree # timeout=10 12:59:57 Setting origin to https://gerrit.local/a/project 12:59:57 > git config remote.origin.url https://gerrit.local/a/project # timeout=10 12:59:57 Fetching origin... 12:59:57 Fetching upstream changes from origin 12:59:57 > git --version # timeout=10 12:59:57 > git --version # 'git version 2.14.5' 12:59:57 > git config --get remote.origin.url # timeout=10 12:59:57 using GIT_ASKPASS to set credentials Gerrit REST API 12:59:57 > git fetch --tags --progress -- origin +refs/heads/dev:refs/remotes/origin/dev +refs/heads/master:refs/remotes/origin/master +refs/heads/release*:refs/remotes/origin/release* # timeout=10 12:59:58 Seen branch in repository origin/04/106504/3 12:59:58 Seen branch in repository origin/26/97726/2 12:59:58 Seen branch in repository origin/27/97727/2 ... 12:59:58 Seen 177 remote branches 13:00:04 Obtained gerrit-checks.jenkins from 3addffcd266a2117b2394e458f6b6dd5f70ec89e 13:00:04 Running in Durability level: MAX_SURVIVABILITY 13:00:05 [Pipeline] Start of Pipeline 13:00:06 [Pipeline] node 13:00:06 Running on jenkins-macmini-06 in /var/lib/jenkins/workspace/rrit-pipeline_53_106353_35 13:00:06 [Pipeline] { 13:00:07 [Pipeline] stage 13:00:07 [Pipeline] { (Declarative: Checkout SCM) 13:00:07 [Pipeline] checkout 13:00:07 [Pipeline] } 13:00:07 [Pipeline] // stage ... 13:00:10 [Pipeline] sh 13:00:11 + ls -a 13:00:11 . 13:00:11 .. 13:00:11 [Pipeline] sh 13:00:11 + git show --stat 13:00:11 fatal: not a git repository (or any of the parent directories): .git
Note that the Declarative: Checkout SCM stage, which is where it should be cloning the repository, is empty with no git commands executed, no errors or status reported. Those last 2 sh commands are added in my pipeline, to verify that the repo was cloned correctly, and they both show that it was not. At the same time, nothing is reported to jenkins.log on the master.
However, sometimes it does clone the repository as expected, but I don't know yet what causes it to work or not work. Any way to increase logging so that I can identify what is causing this?
When the clone does work correctly, the same "Declarative: Checkout SCM" stage shows the expected git commands being executed:
18:41:34 Running on EC2 (TMG MGMT) - jenkins-mobile-spot (sir-cm69gqwj) in /var/lib/jenkins/workspace/rrit-pipeline_53_106353_28 18:41:34 [Pipeline] { 18:41:34 [Pipeline] stage 18:41:34 [Pipeline] { (Declarative: Checkout SCM) 18:41:34 [Pipeline] checkout 18:41:34 using credential gerrit-http-rest-as-jenkins 18:41:35 Cloning the remote Git repository 18:41:35 Cloning repository https://gerrit.local/a/project 18:41:35 > git init /var/lib/jenkins/workspace/rrit-pipeline_53_106353_28 # timeout=10 18:41:35 Using reference repository: /var/lib/jenkins/git-reference/project.git/ 18:41:35 Fetching upstream changes from https://gerrit.local/a/project 18:41:35 > git --version # timeout=10 18:41:35 using GIT_ASKPASS to set credentials Gerrit REST API 18:41:35 > git fetch --tags --progress -- https://gerrit.local/a/project +refs/heads/*:refs/remotes/origin/* # timeout=10 18:41:42 > git config remote.origin.url https://gerrit.local/a/project # timeout=10 18:41:42 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10 18:41:42 > git config remote.origin.url https://gerrit.local/a/project # timeout=10 18:41:42 Fetching upstream changes from https://gerrit.local/a/project 18:41:42 using GIT_ASKPASS to set credentials Gerrit REST API 18:41:42 > git fetch --tags --progress -- https://gerrit.local/a/project refs/changes/53/106353/28:refs/remotes/origin/53/106353/28 # timeout=10 18:41:44 Checking out Revision fabfae86429b68d13fa8a814683a144e36502160 (53/106353/28) 18:41:45 Commit message: "Add gerrit checks jenkinsfile" 18:41:45 First time build. Skipping changelog. 18:41:45 [Pipeline] } 18:41:45 [Pipeline] // stage
I don't think the AWS slave is the problem, because I tried switching the agent directive to select a macOS slave, and I see the same problem. I also ruled out the specific jenkins slave that executes the job, as when there's only a single slave to choose from, and it picks that slave every time, I still see the same random failures.
Early on the success rate was seemingly 50%, but as I've been trying to diagnose the issue and figure out why it's happening, it actually seems more like 10% success.