Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-63398

Pipeline randomly skips SCM Checkout step

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • None
    • Jenkins 2.252
      Gerrit 3.2.3
      plugin 0.4.4
      AWS EC2 slaves

      I haven't been able to pinpoint what triggers the issue, but seemingly randomly, when my multibranch pipeline job is triggered by a new patchset to a change, or using the "Re-run" link in the Gerrit Checks panel, the build sometimes skips the SCM checkout step, with nothing logged to the console, and nothing in the jenkins.log. It's as if it just ignores the checkout completely.  I tried adding a checkout scm step to the pipeline, but that does not improve anything.

      When scanning for changes (which runs on the master), it is able to communicate properly with the Gerrit server, finds the change refs, and triggers the build as expected. But after selecting the slave agent to run on, the SCM checkout simply does nothing. The console shows:

      12:59:57  Branch indexing
      12:59:57   > git rev-parse --is-inside-work-tree # timeout=10
      12:59:57  Setting origin to https://gerrit.local/a/project
      12:59:57   > git config remote.origin.url https://gerrit.local/a/project # timeout=10
      12:59:57  Fetching origin...
      12:59:57  Fetching upstream changes from origin
      12:59:57   > git --version # timeout=10
      12:59:57   > git --version # 'git version 2.14.5'
      12:59:57   > git config --get remote.origin.url # timeout=10
      12:59:57  using GIT_ASKPASS to set credentials Gerrit REST API
      12:59:57   > git fetch --tags --progress -- origin +refs/heads/dev:refs/remotes/origin/dev +refs/heads/master:refs/remotes/origin/master +refs/heads/release*:refs/remotes/origin/release* # timeout=10
      12:59:58  Seen branch in repository origin/04/106504/3
      12:59:58  Seen branch in repository origin/26/97726/2
      12:59:58  Seen branch in repository origin/27/97727/2
      ...
      12:59:58  Seen 177 remote branches
      13:00:04  Obtained gerrit-checks.jenkins from 3addffcd266a2117b2394e458f6b6dd5f70ec89e
      13:00:04  Running in Durability level: MAX_SURVIVABILITY
      13:00:05  [Pipeline] Start of Pipeline
      13:00:06  [Pipeline] node
      13:00:06  Running on jenkins-macmini-06 in /var/lib/jenkins/workspace/rrit-pipeline_53_106353_35
      13:00:06  [Pipeline] {
      13:00:07  [Pipeline] stage
      13:00:07  [Pipeline] { (Declarative: Checkout SCM)
      13:00:07  [Pipeline] checkout
      13:00:07  [Pipeline] }
      13:00:07  [Pipeline] // stage
      ...
      13:00:10  [Pipeline] sh
      13:00:11  + ls -a
      13:00:11  .
      13:00:11  ..
      13:00:11  [Pipeline] sh
      13:00:11  + git show --stat
      13:00:11  fatal: not a git repository (or any of the parent directories): .git

      Note that the Declarative: Checkout SCM stage, which is where it should be cloning the repository, is empty with no git commands executed, no errors or status reported. Those last 2 sh commands are added in my pipeline, to verify that the repo was cloned correctly, and they both show that it was not. At the same time, nothing is reported to jenkins.log on the master.

      However, sometimes it does clone the repository as expected, but I don't know yet what causes it to work or not work. Any way to increase logging so that I can identify what is causing this?

      When the clone does work correctly, the same "Declarative: Checkout SCM" stage shows the expected git commands being executed:

      18:41:34  Running on EC2 (TMG MGMT) - jenkins-mobile-spot (sir-cm69gqwj) in /var/lib/jenkins/workspace/rrit-pipeline_53_106353_28
      18:41:34  [Pipeline] {
      18:41:34  [Pipeline] stage
      18:41:34  [Pipeline] { (Declarative: Checkout SCM)
      18:41:34  [Pipeline] checkout
      18:41:34  using credential gerrit-http-rest-as-jenkins
      18:41:35  Cloning the remote Git repository
      18:41:35  Cloning repository https://gerrit.local/a/project
      18:41:35   > git init /var/lib/jenkins/workspace/rrit-pipeline_53_106353_28 # timeout=10
      18:41:35  Using reference repository: /var/lib/jenkins/git-reference/project.git/
      18:41:35  Fetching upstream changes from https://gerrit.local/a/project
      18:41:35   > git --version # timeout=10
      18:41:35  using GIT_ASKPASS to set credentials Gerrit REST API
      18:41:35   > git fetch --tags --progress -- https://gerrit.local/a/project +refs/heads/*:refs/remotes/origin/* # timeout=10
      18:41:42   > git config remote.origin.url https://gerrit.local/a/project # timeout=10
      18:41:42   > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
      18:41:42   > git config remote.origin.url https://gerrit.local/a/project # timeout=10
      18:41:42  Fetching upstream changes from https://gerrit.local/a/project
      18:41:42  using GIT_ASKPASS to set credentials Gerrit REST API
      18:41:42   > git fetch --tags --progress -- https://gerrit.local/a/project refs/changes/53/106353/28:refs/remotes/origin/53/106353/28 # timeout=10
      18:41:44  Checking out Revision fabfae86429b68d13fa8a814683a144e36502160 (53/106353/28)
      18:41:45  Commit message: "Add gerrit checks jenkinsfile"
      18:41:45  First time build. Skipping changelog.
      18:41:45  [Pipeline] }
      18:41:45  [Pipeline] // stage
      

      I don't think the AWS slave is the problem, because I tried switching the agent directive to select a macOS slave, and I see the same problem. I also ruled out the specific jenkins slave that executes the job, as when there's only a single slave to choose from, and it picks that slave every time, I still see the same random failures.

      Early on the success rate was seemingly 50%, but as I've been trying to diagnose the issue and figure out why it's happening, it actually seems more like 10% success.

            lucamilanesio Luca Domenico Milanesio
            jhansche Joe Hansche
            Votes:
            1 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated: