-
Type:
Bug
-
Resolution: Unresolved
-
Priority:
Major
-
Component/s: gerrit-code-review-plugin
-
None
-
Environment:Jenkins 2.252
Gerrit 3.2.3
plugin 0.4.4
AWS EC2 slaves
I haven't been able to pinpoint what triggers the issue, but seemingly randomly, when my multibranch pipeline job is triggered by a new patchset to a change, or using the "Re-run" link in the Gerrit Checks panel, the build sometimes skips the SCM checkout step, with nothing logged to the console, and nothing in the jenkins.log. It's as if it just ignores the checkout completely. I tried adding a checkout scm step to the pipeline, but that does not improve anything.
When scanning for changes (which runs on the master), it is able to communicate properly with the Gerrit server, finds the change refs, and triggers the build as expected. But after selecting the slave agent to run on, the SCM checkout simply does nothing. The console shows:
12:59:57 Branch indexing
12:59:57 > git rev-parse --is-inside-work-tree # timeout=10
12:59:57 Setting origin to https://gerrit.local/a/project
12:59:57 > git config remote.origin.url https://gerrit.local/a/project # timeout=10
12:59:57 Fetching origin...
12:59:57 Fetching upstream changes from origin
12:59:57 > git --version # timeout=10
12:59:57 > git --version # 'git version 2.14.5'
12:59:57 > git config --get remote.origin.url # timeout=10
12:59:57 using GIT_ASKPASS to set credentials Gerrit REST API
12:59:57 > git fetch --tags --progress -- origin +refs/heads/dev:refs/remotes/origin/dev +refs/heads/master:refs/remotes/origin/master +refs/heads/release*:refs/remotes/origin/release* # timeout=10
12:59:58 Seen branch in repository origin/04/106504/3
12:59:58 Seen branch in repository origin/26/97726/2
12:59:58 Seen branch in repository origin/27/97727/2
...
12:59:58 Seen 177 remote branches
13:00:04 Obtained gerrit-checks.jenkins from 3addffcd266a2117b2394e458f6b6dd5f70ec89e
13:00:04 Running in Durability level: MAX_SURVIVABILITY
13:00:05 [Pipeline] Start of Pipeline
13:00:06 [Pipeline] node
13:00:06 Running on jenkins-macmini-06 in /var/lib/jenkins/workspace/rrit-pipeline_53_106353_35
13:00:06 [Pipeline] {
13:00:07 [Pipeline] stage
13:00:07 [Pipeline] { (Declarative: Checkout SCM)
13:00:07 [Pipeline] checkout
13:00:07 [Pipeline] }
13:00:07 [Pipeline] // stage
...
13:00:10 [Pipeline] sh
13:00:11 + ls -a
13:00:11 .
13:00:11 ..
13:00:11 [Pipeline] sh
13:00:11 + git show --stat
13:00:11 fatal: not a git repository (or any of the parent directories): .git
Note that the Declarative: Checkout SCM stage, which is where it should be cloning the repository, is empty with no git commands executed, no errors or status reported. Those last 2 sh commands are added in my pipeline, to verify that the repo was cloned correctly, and they both show that it was not. At the same time, nothing is reported to jenkins.log on the master.
However, sometimes it does clone the repository as expected, but I don't know yet what causes it to work or not work. Any way to increase logging so that I can identify what is causing this?
When the clone does work correctly, the same "Declarative: Checkout SCM" stage shows the expected git commands being executed:
18:41:34 Running on EC2 (TMG MGMT) - jenkins-mobile-spot (sir-cm69gqwj) in /var/lib/jenkins/workspace/rrit-pipeline_53_106353_28
18:41:34 [Pipeline] {
18:41:34 [Pipeline] stage
18:41:34 [Pipeline] { (Declarative: Checkout SCM)
18:41:34 [Pipeline] checkout
18:41:34 using credential gerrit-http-rest-as-jenkins
18:41:35 Cloning the remote Git repository
18:41:35 Cloning repository https://gerrit.local/a/project
18:41:35 > git init /var/lib/jenkins/workspace/rrit-pipeline_53_106353_28 # timeout=10
18:41:35 Using reference repository: /var/lib/jenkins/git-reference/project.git/
18:41:35 Fetching upstream changes from https://gerrit.local/a/project
18:41:35 > git --version # timeout=10
18:41:35 using GIT_ASKPASS to set credentials Gerrit REST API
18:41:35 > git fetch --tags --progress -- https://gerrit.local/a/project +refs/heads/*:refs/remotes/origin/* # timeout=10
18:41:42 > git config remote.origin.url https://gerrit.local/a/project # timeout=10
18:41:42 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
18:41:42 > git config remote.origin.url https://gerrit.local/a/project # timeout=10
18:41:42 Fetching upstream changes from https://gerrit.local/a/project
18:41:42 using GIT_ASKPASS to set credentials Gerrit REST API
18:41:42 > git fetch --tags --progress -- https://gerrit.local/a/project refs/changes/53/106353/28:refs/remotes/origin/53/106353/28 # timeout=10
18:41:44 Checking out Revision fabfae86429b68d13fa8a814683a144e36502160 (53/106353/28)
18:41:45 Commit message: "Add gerrit checks jenkinsfile"
18:41:45 First time build. Skipping changelog.
18:41:45 [Pipeline] }
18:41:45 [Pipeline] // stage
I don't think the AWS slave is the problem, because I tried switching the agent directive to select a macOS slave, and I see the same problem. I also ruled out the specific jenkins slave that executes the job, as when there's only a single slave to choose from, and it picks that slave every time, I still see the same random failures.
Early on the success rate was seemingly 50%, but as I've been trying to diagnose the issue and figure out why it's happening, it actually seems more like 10% success.