Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-46503

Pipeline hanging after updating Jenkinsfile but before starting to build

    • Icon: Bug Bug
    • Resolution: Duplicate
    • Icon: Minor Minor
    • jira-plugin
    • Debian 8.8. Jenkins 2.60. Tomcat 8.0.14.
      Pipeline 2.5, Pipeline: Groovy 2.38, Git plugin 3.5.1, Lockable Resources 2.0.

      We have a linux master (4 executors, never full) running a number of pipeline jobs that all pull Jenkinsfile from git before triggering the actual build on one of a pool of a half dozen workers (osx, linux, and windows). Most builds (dozens a day) work fine, but occasionally (1-3 times a week) things will hang before actually starting to run any Groovy.

      This has been happening for months now, and various packages have been upgraded in response to no avail.

      Initially, I thought that the issue was network connectivity to the git server - but I have since pulled the pipeline script onto a local file:// repo and experience the same issue.

      When affected, builds look like this:

      Started by upstream project "Project/Project-Sync" build number 1682
      originally caused by:
       Started by an SCM change
      Checking out git file:///var/lib/jenkins/git-cache into /var/lib/jenkins/workspace/Project/Project-WIN@script to read Jenkinsfile
       > git rev-parse --is-inside-work-tree # timeout=10
      Fetching changes from the remote Git repository
       > git config remote.origin.url file:///var/lib/jenkins/git-cache # timeout=10
      Fetching upstream changes from file:///var/lib/jenkins/git-cache
       > git --version # timeout=10
       > git fetch --tags --progress file:///var/lib/jenkins/git-cache +refs/heads/*:refs/remotes/origin/*
       > git rev-parse refs/remotes/origin/master^{commit} # timeout=10
       > git rev-parse refs/remotes/origin/origin/master^{commit} # timeout=10
      Checking out Revision fd354d30a6b141d6fc81267d4708de10de5b5966 (refs/remotes/origin/master)
      Commit message: "Typo."
       > git config core.sparsecheckout # timeout=10
       > git checkout -f fd354d30a6b141d6fc81267d4708de10de5b5966
       > git rev-list fd354d30a6b141d6fc81267d4708de10de5b5966 # timeout=10

      And then they just sit there like that forever.

      A good goes more like this:

      Started by upstream project "Project/Project-Sync" build number 1684
      originally caused by:
      Started by user Ammon Lauritzen
      ...
      Checking out Revision fd354d30a6b141d6fc81267d4708de10de5b5966 (refs/remotes/origin/master)
      Commit message: "Typo."
      > git config core.sparsecheckout # timeout=10
      > git checkout -f fd354d30a6b141d6fc81267d4708de10de5b5966
      > git rev-list fd354d30a6b141d6fc81267d4708de10de5b5966 # timeout=10
      [Pipeline] node
      Running on master in /var/lib/jenkins/workspace/Project/Project-WIN
      [Pipeline] {
      [Pipeline] echo
      prepping master environment
      [Pipeline] sh
      [Project-WIN] Running shell script
      ...

      The only difference in the log is that it stops before the [Pipeline] lines start logging.

      The build cannot be cancelled normally via the UI, by restarting Tomcat, or by rebooting the server - I have to instead abort it via the script console.

      It happens more often with nightly builds than SCM-triggered CI builds, and more often on the weekends... but it happens plenty of times in the middle of the day as well. It is not always the same projects that hang, and not always projects destined for Windows slaves that hang (that was just the most recent instance of the problem).

          [JENKINS-46503] Pipeline hanging after updating Jenkinsfile but before starting to build

          Mark Waite added a comment -

          I don't see at clear indication that this is related to the git plugin.

          Some tests to check more deeply:

          • use jgit instead of command line git, does the behavior change inn a relevant way?
          • use a different server for scm, soya the behavior change?
          • use different plugin versions, soya behaviour change?

          Mark Waite added a comment - I don't see at clear indication that this is related to the git plugin. Some tests to check more deeply: use jgit instead of command line git, does the behavior change inn a relevant way? use a different server for scm, soya the behavior change? use different plugin versions, soya behaviour change?

          I don't see any indication that it is related to git either - but the issue occurs at the handoff point between the SCM system that pulled the script and actually invoking the script so I thought it worth mentioning.

          I have tried a different server, as mentioned above (switching from remote git server to local file url).

          I have tried different plugin versions, as mentioned above (upgrading plugins over the course of time that this issue has affected us).

          I have NOT however tried switching from default CLI git to jgit, will do so.

          Ammon Lauritzen added a comment - I don't see any indication that it is related to git either - but the issue occurs at the handoff point between the SCM system that pulled the script and actually invoking the script so I thought it worth mentioning. I have tried a different server, as mentioned above (switching from remote git server to local file url). I have tried different plugin versions, as mentioned above (upgrading plugins over the course of time that this issue has affected us). I have NOT however tried switching from default CLI git to jgit, will do so.

          Andrew Bayer added a comment -

          Can you get a thread dump (i.e., from JENKINS_URL/threadDump) when you've got a hung job?

          Andrew Bayer added a comment - Can you get a thread dump (i.e., from JENKINS_URL/threadDump ) when you've got a hung job?

          I will grab one the next time this happens.

          Ammon Lauritzen added a comment - I will grab one the next time this happens.

          The switch to jgit did not change anything, we have a hang and I am parsing the thread dump right now. I'm not familiar with the underlying code, but just reading the stack trace, it looks to me like we are deadlocking as a pair of jobs (both reading the same Jenkinsfile but with different configs) are launched near simultaneously by an SCM poll job.

          No other jobs were running at the time, and I was unable to kill either task normally or by script console and had to restart tomcat this time in order to clean them up.

          Attached is the thread dump from the moment I identified the problem.

          jenkins-thread-dump-20170829.txt

          Ammon Lauritzen added a comment - The switch to jgit did not change anything, we have a hang and I am parsing the thread dump right now. I'm not familiar with the underlying code, but just reading the stack trace, it looks to me like we are deadlocking as a pair of jobs (both reading the same Jenkinsfile but with different configs) are launched near simultaneously by an SCM poll job. No other jobs were running at the time, and I was unable to kill either task normally or by script console and had to restart tomcat this time in order to clean them up. Attached is the thread dump from the moment I identified the problem. jenkins-thread-dump-20170829.txt

          We had another mysterious hang from a pair of jobs a few hours ago, which I did not catch immediately - so the thread dump is from about 3 hours after the pipelines got stuck. This time, there are no obvious stack traces pointing at the individual jobs and one miscellaneous build was running at the time of the snapshot. Additionally, I was able to issue a kill via script console on these (unlike the most recent incident).

          I am not seeing anything particularly interesting in this dump. I don't know which worker was running the one build, but win04 was definitely running one of them this time as well as last time.

          jenkins-thread-dump-20170905.txt

          Ammon Lauritzen added a comment - We had another mysterious hang from a pair of jobs a few hours ago, which I did not catch immediately - so the thread dump is from about 3 hours after the pipelines got stuck. This time, there are no obvious stack traces pointing at the individual jobs and one miscellaneous build was running at the time of the snapshot. Additionally, I was able to issue a kill via script console on these (unlike the most recent incident). I am not seeing anything particularly interesting in this dump. I don't know which worker was running the one build, but win04 was definitely running one of them this time as well as last time. jenkins-thread-dump-20170905.txt

          And another hang this morning. This time it was a different pair of jobs, running on OSX workers instead of Windows. A third job from the same codebase was running happily at the time of this thread dump. Nothing terribly interesting here as far as I can tell at a cursory glance.

          jenkins-thread-dump-20170907.txt

          I am wondering if the problem is with the Parameterized Trigger plugin. The jobs that lock up are (always?) being launched as a post-build trigger to a sync job. I have updated these parent jobs to use two triggers instead of one in the hope that splitting them up slightly more will help avoid any possible race condition.

          Ammon Lauritzen added a comment - And another hang this morning. This time it was a different pair of jobs, running on OSX workers instead of Windows. A third job from the same codebase was running happily at the time of this thread dump. Nothing terribly interesting here as far as I can tell at a cursory glance. jenkins-thread-dump-20170907.txt I am wondering if the problem is with the Parameterized Trigger plugin. The jobs that lock up are (always?) being launched as a post-build trigger to a sync job. I have updated these parent jobs to use two triggers instead of one in the hope that splitting them up slightly more will help avoid any possible race condition.

          I'm also seeing this on some builds today.

          Jenkins 2.89.3, all plugins up to date.  git-client plugin is at 2.7.1 - downgrading to 2.7.0 seems to have helped.

          Thread dump:

          Thread #8
          	at DSL.git(running in thread: org.jenkinsci.plugins.workflow.steps.SynchronousNonBlockingStepExecution [#114])
          	at WorkflowScript.run(WorkflowScript:38)
          	at DSL.ws(Native Method)
          	at WorkflowScript.run(WorkflowScript:37)
          	at DSL.node(running on buildnode3-2)
          	at WorkflowScript.run(WorkflowScript:36)
          	at DSL.stage(Native Method)
          	at WorkflowScript.run(WorkflowScript:35)
          	at DSL.lock(Native Method)
          	at WorkflowScript.run(WorkflowScript:32)

          Mostyn Bramley-Moore added a comment - I'm also seeing this on some builds today. Jenkins 2.89.3, all plugins up to date.  git-client plugin is at 2.7.1 - downgrading to 2.7.0 seems to have helped. Thread dump: Thread #8 at DSL.git(running in thread: org.jenkinsci.plugins.workflow.steps.SynchronousNonBlockingStepExecution [#114]) at WorkflowScript.run(WorkflowScript:38) at DSL.ws(Native Method) at WorkflowScript.run(WorkflowScript:37) at DSL.node(running on buildnode3-2) at WorkflowScript.run(WorkflowScript:36) at DSL.stage(Native Method) at WorkflowScript.run(WorkflowScript:35) at DSL.lock(Native Method) at WorkflowScript.run(WorkflowScript:32)

          The git-client plugin downgrade was not sufficient, I had several builds hang in the same way in the meantime

          Mostyn Bramley-Moore added a comment - The git-client plugin downgrade was not sufficient, I had several builds hang in the same way in the meantime

          Sverre Moe added a comment - - edited

          We also began to experience this same problem recently, just after Jenkins upgrade from 2.100 to 2.103 and an upgrade of several plugins (to many to list or remember).

          Sverre Moe added a comment - - edited We also began to experience this same problem recently, just after Jenkins upgrade from 2.100 to 2.103 and an upgrade of several plugins (to many to list or remember).

          For the record, we do still observe this issue intermittently - but builds over the new year were less frequent, so it is possible that we just missed the right sets of circumstances due to decreased activity. A week ago, I upgraded the master to 2.101 (and all plugins to current), and have not observed the issue since this particular upgrade.

          Ammon Lauritzen added a comment - For the record, we do still observe this issue intermittently - but builds over the new year were less frequent, so it is possible that we just missed the right sets of circumstances due to decreased activity. A week ago, I upgraded the master to 2.101 (and all plugins to current), and have not observed the issue since this particular upgrade.

          My current workaround is to avoid using the checkout step (it doesn't scale well enough for my needs anyway) and instead do this manually with an sh step.  I will report back here if this doesn't avoid the issue.

          Mostyn Bramley-Moore added a comment - My current workaround is to avoid using the checkout step (it doesn't scale well enough for my needs anyway) and instead do this manually with an sh step.  I will report back here if this doesn't avoid the issue.

          Hung Vo added a comment -

          I also have same issue after upgrade to 2.89.3 LTS along with bunch of plugin. Now most of the pipeline job stuck at check out step

          Hung Vo added a comment - I also have same issue after upgrade to 2.89.3 LTS along with bunch of plugin. Now most of the pipeline job stuck at check out step

          Shai Azulay added a comment -

          I have the same problem after upgrading all the plugins and Core to 2.89.3 LTS

          Shai Azulay added a comment - I have the same problem after upgrading all the plugins and Core to 2.89.3 LTS

          Sverre Moe added a comment -

          markewaite

          {quote}

          The JENKINS-45447 and JENKINS-47169 fix in git client plugin 2.7.0 and in git plugin 3.7.0 might be visible as a hang in `git rev-list`. 

          {quote}

           

          I was thinking about reverting back to git client plugin 2.6.0 and git plugin 3.6.4.

          We have even seen this issue on projects that has no git tags.

          Sverre Moe added a comment - markewaite {quote} The JENKINS-45447 and JENKINS-47169 fix in git client plugin 2.7.0 and in git plugin 3.7.0 might be visible as a hang in `git rev-list`.  {quote}   I was thinking about reverting back to git client plugin 2.6.0 and git plugin 3.6.4. We have even seen this issue on projects that has no git tags.

          Mark Waite added a comment -

          djviking if you're seeing the problem on a git repository which has no tags or only a few tags, then the changes in git client plugin 2.7.0 and git plugin 3.7.0 are not relevant to the problem. Reverting to an earlier version of the git plugin and the git client plugin is unlikely to resolve the issue.

          Mark Waite added a comment - djviking if you're seeing the problem on a git repository which has no tags or only a few tags, then the changes in git client plugin 2.7.0 and git plugin 3.7.0 are not relevant to the problem. Reverting to an earlier version of the git plugin and the git client plugin is unlikely to resolve the issue.

          We have the same issue with Jenkins 2.105 (first time observed with 2.103), Git plugin 3.7.0 and Git Client Plugin 2.7.1.

          The build is running on the OpenShift slave. The code seem to get checked out correctly, but job is stuck.

          However, in our case the issue is not intermittent. It always occur when build is triggered after some changes are made to the repository. It happens when it is triggered manually or by the github webhook.

          Also, what i noticed is that when the checkout is stuck the last git command in the log has the incorrect commit ID:

          Checking out Revision 2d4ce8b023949526bdf7a6dc0658cd1a276f5e10 (master)
          > git config core.sparsecheckout # timeout=10
          > git checkout -f 2d4ce8b023949526bdf7a6dc0658cd1a276f5e10
          Commit message: "JT-1: Forcing checkout"
          > git rev-list --no-walk 8f5141bdcb00abddd5961a0dff6589f2654b5a87 # timeout=10

          See, that the id of the git rev-list is different from the previous git checkout.

          If i abort the stuck build, subsequent manual build works just fine and the commit ID is consistently the same throughout the log.

          There is nothing interesting in the thread dump.

          Igor Kolomiyets added a comment - We have the same issue with Jenkins 2.105 (first time observed with 2.103), Git plugin 3.7.0 and Git Client Plugin 2.7.1. The build is running on the OpenShift slave. The code seem to get checked out correctly, but job is stuck. However, in our case the issue is not intermittent. It always occur when build is triggered after some changes are made to the repository. It happens when it is triggered manually or by the github webhook. Also, what i noticed is that when the checkout is stuck the last git command in the log has the incorrect commit ID: Checking out Revision 2d4ce8b023949526bdf7a6dc0658cd1a276f5e10 (master) > git config core.sparsecheckout # timeout=10 > git checkout -f 2d4ce8b023949526bdf7a6dc0658cd1a276f5e10 Commit message: "JT-1: Forcing checkout" > git rev-list --no-walk 8f5141bdcb00abddd5961a0dff6589f2654b5a87 # timeout=10 See, that the id of the git rev-list is different from the previous git checkout. If i abort the stuck build, subsequent manual build works just fine and the commit ID is consistently the same throughout the log. There is nothing interesting in the thread dump.

          Sverre Moe added a comment -

          We have the same observations as ikolomiyets mentions.

          Have attached some generated jstack output from both Jenkins og Slave.

          • jstack -F <slave-pid>
          • jstack -F <jenkins-pid>

          Sverre Moe added a comment - We have the same observations as ikolomiyets mentions. Have attached some generated jstack output from both Jenkins og Slave. jstack -F <slave-pid> jstack -F <jenkins-pid>

          Just confirmed that behaviour is consistent regardless if the job is executed on openshift hosted dynamic slave, master node or the bare metal slave.

          Also, it is observed when the git repository is hosted on the github or the git repo hosted on the linux server which is accessed over the SSH.

          Will test it over another Jenkins installation that was running perfectly well just two weeks ago.

          Igor Kolomiyets added a comment - Just confirmed that behaviour is consistent regardless if the job is executed on openshift hosted dynamic slave, master node or the bare metal slave. Also, it is observed when the git repository is hosted on the github or the git repo hosted on the linux server which is accessed over the SSH. Will test it over another Jenkins installation that was running perfectly well just two weeks ago.

          Sverre Moe added a comment -

          Running with downgraded Git plugin 3.6.4 and Git Client plugin 2.6.0 we are hardly experiencing the problem. It is far less frequent.

          A build that didn't hang, where the git commit hash on rev-list where different from checkout.

           > git checkout -f bc4dd5d1939f16e0124251327b0490a0059b353d
          Commit message: "Fix something"
           > git rev-list 1ced60c576f7dcd1e030b92fc08a405e4c8a5800 # timeout=10
          

          Sverre Moe added a comment - Running with downgraded Git plugin 3.6.4 and Git Client plugin 2.6.0 we are hardly experiencing the problem. It is far less frequent. A build that didn't hang, where the git commit hash on rev-list where different from checkout. > git checkout -f bc4dd5d1939f16e0124251327b0490a0059b353d Commit message: "Fix something" > git rev-list 1ced60c576f7dcd1e030b92fc08a405e4c8a5800 # timeout=10

          Hung Vo added a comment -

          In my case i was able to fix by downgrade the jira-plugin to 2.5 as there was a bug with the JiraChangeLogAnnotator. If any one upgrade to 2.5.1 should downgrade to 2.5 to fix the issue SEVERE: ChangeLogAnnotator hudson.plugins.jira.JiraChangeLogAnnotator@xxx failed to annotate message.

          Hung Vo added a comment - In my case i was able to fix by downgrade the jira-plugin to 2.5 as there was a bug with the JiraChangeLogAnnotator. If any one upgrade to 2.5.1 should downgrade to 2.5 to fix the issue SEVERE: ChangeLogAnnotator hudson.plugins.jira.JiraChangeLogAnnotator@xxx failed to annotate message.

          You're legend! Downgrading jira-plugin to 2.5 did the trick.

          Igor Kolomiyets added a comment - You're legend! Downgrading jira-plugin to 2.5 did the trick.

          Sverre Moe added a comment -

          Why would the JIRA plugin affect this? We are not even using it in our pipeline.

          Sverre Moe added a comment - Why would the JIRA plugin affect this? We are not even using it in our pipeline.

          Mark Waite added a comment - - edited

          djviking refer to JENKINS-48357 for details of the impact of Jira plugin 2.5.1. This bug was reported before the release of Jira plugin 2.5.1. I suspect that Jira plugin 2.5.1 related bugs are a "side path" from this bug. The real bug is likely still in the system as reported originally.

          Mark Waite added a comment - - edited djviking refer to JENKINS-48357 for details of the impact of Jira plugin 2.5.1. This bug was reported before the release of Jira plugin 2.5.1. I suspect that Jira plugin 2.5.1 related bugs are a "side path" from this bug. The real bug is likely still in the system as reported originally.

          Michael Neale added a comment -

          markewaite I hit this too - I am trying downgrade to 2.5.0 and will report back if it is all ok after that. This may be a dupe of https://issues.jenkins-ci.org/browse/JENKINS-43106 but I put a support bundle there. in what log did people see the JIRA error? 

           

          I swear that JIRA plugin (and the Atlassian dependencies) are a disaster. 

          Michael Neale added a comment - markewaite I hit this too - I am trying downgrade to 2.5.0 and will report back if it is all ok after that. This may be a dupe of https://issues.jenkins-ci.org/browse/JENKINS-43106  but I put a support bundle there. in what log did people see the JIRA error?    I swear that JIRA plugin (and the Atlassian dependencies) are a disaster. 

          Sverre Moe added a comment -

          We have also downgraded JIRA plugin to 2.5.0 and it seems to have resolved the problem.

          Sverre Moe added a comment - We have also downgraded JIRA plugin to 2.5.0 and it seems to have resolved the problem.

          Sam Van Oort added a comment -

          markewaite I'm downgrading the priority to reflect the fix via JIRA... not clear if this should stay as a pipeline bug though or if we can close it out.

          Sam Van Oort added a comment - markewaite I'm downgrading the priority to reflect the fix via JIRA... not clear if this should stay as a pipeline bug though or if we can close it out.

          Carlton Brown added a comment -

          Seeing the same issue here... no luck with the workarounds around downgrading the JRIA plugin

          Carlton Brown added a comment - Seeing the same issue here... no luck with the workarounds around downgrading the JRIA plugin

            Unassigned Unassigned
            allaryin Ammon Lauritzen
            Votes:
            7 Vote for this issue
            Watchers:
            21 Start watching this issue

              Created:
              Updated:
              Resolved: