Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-43587

Pipeline fails to resume after master restart/plugin upgrade

    • Icon: Bug Bug
    • Resolution: Fixed
    • Icon: Major Major
    • pipeline
    • None
    • Jenkins 2.46.1, Latest version of pipeline plugins (pipeline-build-step: 2.5, pipeline-rest-api: 2.6, pipeline-stage-step: 2.2, etc)
    • durable-task 1.18

      During a recent Jenkins plugin upgrade and master restart, it seems that Jenkins failed to resume at least two Pipeline jobs. The pipeline was in the middle of a sh() step when the master was restarted. Both jobs have output similar to the following in the console:

      Resuming build at Thu Apr 13 15:01:50 EDT 2017 after Jenkins restart
      Waiting to resume part of <job name...>: ???
      Ready to run at Thu Apr 13 15:01:51 EDT 2017

       

      However this text has been displayed for several minutes now with no obvious indication on what the job is waiting for. We can see that the pipeline is still running on the correct executor that it was running on pre-restart however, if we log into the server, there is no durable task or process of the script that the sh() step was running. From logging of the script that we were running, we can tell that the command did finish successfully but can't understand how Jenkins lost track of it. From the logging, the time when the command finished was around the same time when the master was restarting (it is difficult to pinpoint exactly). 

          [JENKINS-43587] Pipeline fails to resume after master restart/plugin upgrade

          These were the plugins that were being upgraded at the time:

          • blueocean-commons.jpi
          • blueocean-jwt.jpi
          • blueocean-web.jpi
          • blueocean-rest.jpi
          • blueocean-rest-impl.jpi
          • blueocean-pipeline-api-impl.jpi
          • blueocean-github-pipeline.jpi
          • blueocean-git-pipeline.jpi
          • blueocean-config.jpi
          • blueocean-events.jpi
          • blueocean-personalization.jpi
          • blueocean-i18n.jpi
          • blueocean-dashboard.jpi
          • blueocean.jpi
          • hashicorp-vault-plugin.jpi
          • analysis-core.jpi
          • pipeline-maven.jpi
          • workflow-api.jpi
          • warnings.jpi
          • ssh-slaves.jpi
          • mask-passwords.jpi
          • violation-comments-to-stash.jpi

          Erik Lattimore added a comment - These were the plugins that were being upgraded at the time: blueocean-commons.jpi blueocean-jwt.jpi blueocean-web.jpi blueocean-rest.jpi blueocean-rest-impl.jpi blueocean-pipeline-api-impl.jpi blueocean-github-pipeline.jpi blueocean-git-pipeline.jpi blueocean-config.jpi blueocean-events.jpi blueocean-personalization.jpi blueocean-i18n.jpi blueocean-dashboard.jpi blueocean.jpi hashicorp-vault-plugin.jpi analysis-core.jpi pipeline-maven.jpi workflow-api.jpi warnings.jpi ssh-slaves.jpi mask-passwords.jpi violation-comments-to-stash.jpi

          And here is the thread dump from the job:

          Thread #34
          at DSL.sh(completed process (code 0) in /home/jenkins/workspace/<jobname>@2@tmp/durable-039a0a47 on <hostname> (pid: 6808); recurrence period: 0ms)
          at WorkflowScript.deployStep(WorkflowScript:401)
          at DSL.timeout(killer task nowhere to be found)
          at WorkflowScript.deployStep(WorkflowScript:400)
          at DSL.ws(Native Method)
          at WorkflowScript.deployStep(WorkflowScript:366)
          at DSL.sshagent(Native Method)
          at WorkflowScript.deployStep(WorkflowScript:308)
          at DSL.lock(Native Method)
          at WorkflowScript.deployStep(WorkflowScript:307)
          at DSL.node(running on voltron.coalition.local)
          at WorkflowScript.deployStep(WorkflowScript:306)
          at DSL.stage(Native Method)
          at WorkflowScript.deployStep(WorkflowScript:305)
          at WorkflowScript.run(WorkflowScript:419)
          at DSL.timestamps(Native Method)
          at WorkflowScript.run(WorkflowScript:416)
          

          Erik Lattimore added a comment - And here is the thread dump from the job: Thread #34 at DSL.sh(completed process (code 0) in /home/jenkins/workspace/<jobname>@2@tmp/durable-039a0a47 on <hostname> (pid: 6808); recurrence period: 0ms) at WorkflowScript.deployStep(WorkflowScript:401) at DSL.timeout(killer task nowhere to be found) at WorkflowScript.deployStep(WorkflowScript:400) at DSL.ws(Native Method) at WorkflowScript.deployStep(WorkflowScript:366) at DSL.sshagent(Native Method) at WorkflowScript.deployStep(WorkflowScript:308) at DSL.lock(Native Method) at WorkflowScript.deployStep(WorkflowScript:307) at DSL.node(running on voltron.coalition.local) at WorkflowScript.deployStep(WorkflowScript:306) at DSL.stage(Native Method) at WorkflowScript.deployStep(WorkflowScript:305) at WorkflowScript.run(WorkflowScript:419) at DSL.timestamps(Native Method) at WorkflowScript.run(WorkflowScript:416)

          The pipeline is roughly:

          stage('Deploy') {
            node(getNode(tenant, vpc)) {
              lock(getLockableResource(tenant, vpc)) {
                sshagent([GIT_AUTH]) {
                  ws {
                    try {
                      for (int i = 0; i < products.size(); i++) {
                        timeout(time: 2, unit: 'HOURS') {
                          sh("deploy.py ${products[i]}")
                        }
                      }
                    } finally {
                      deleteDir()
                    }
                  }
                }
              }
            }
          }

           

          Erik Lattimore added a comment - The pipeline is roughly: stage('Deploy') { node(getNode(tenant, vpc)) { lock(getLockableResource(tenant, vpc)) { sshagent([GIT_AUTH]) { ws { try { for (int i = 0; i < products.size(); i++) { timeout(time: 2, unit: 'HOURS') { sh("deploy.py ${products[i]}") } } } finally { deleteDir() } } } } } }  

          Finally, the node that this was running on as 5 executors.

          Erik Lattimore added a comment - Finally, the node that this was running on as 5 executors.

          Hmm, in the second case, it seems like the process actually died when the master was restarted because this one did not run to completion but just terminated abruptly based on the logs

          Erik Lattimore added a comment - Hmm, in the second case, it seems like the process actually died when the master was restarted because this one did not run to completion but just terminated abruptly based on the logs

          Jon B added a comment -

          I am also getting stranded with 'Ready to run at'

          In my case, I run Jenkins within a Docker container. If one of my pipelines is running and someone does a Docker restart on the container, it strands with "

          "Ready to run at"

          Side note - it seems like the Jenkins pipeline features are really clunky. If a slave server goes away, I'm seeing similar hanging problems. Any advice/guidance would be appreciated.

          Jon B added a comment - I am also getting stranded with 'Ready to run at' In my case, I run Jenkins within a Docker container. If one of my pipelines is running and someone does a Docker restart on the container, it strands with " "Ready to run at" Side note - it seems like the Jenkins pipeline features are really clunky. If a slave server goes away, I'm seeing similar hanging problems. Any advice/guidance would be appreciated.

          Jon B added a comment - - edited

          It may be worth noting that the pipeline im getting stranded on calls another pipeline with

          build job: 'mysubpipeline', parameters: [
          [$class: 'StringParameterValue', name: 'BRANCH_NAME', value: "$BRANCH_NAME"]
          ]

           

          Jon B added a comment - - edited It may be worth noting that the pipeline im getting stranded on calls another pipeline with build job: 'mysubpipeline' , parameters: [ [$class: 'StringParameterValue' , name: 'BRANCH_NAME' , value: "$BRANCH_NAME" ] ]  

          Jon B added a comment -

          Still a problem months later.  Below you can see an example of a pipeline from SCM script that evidently had started at the moment a Jenkins restart took place. This usually takes only 5 seconds or so to run.. but you can see below, it stranded with a message that says:

          Resuming build at Sat Sep 09 20:26:40 PDT 2017 after Jenkins restart"

          but it's been stuck there like that for hours. 

          Here's the full content of the console:

          Checking out git git@github.com:myghaccount/oneoff-pipelines.git into /var/jenkins_home/workspace/update_jenkins_slaves_list@script to read update_jenkins_slaves_list/Jenkinsfile.groovy
          > git rev-parse --is-inside-work-tree # timeout=10
          Fetching changes from the remote Git repository
          > git config remote.origin.url git@github.com:myghaccount/oneoff-pipelines.git # timeout=10
          Fetching upstream changes from git@github.com:myghaccount/oneoff-pipelines.git
          > git --version # timeout=10
          using GIT_SSH to set credentials Private key used for pulling from GitHub.
          > git fetch --tags --progress git@github.com:myghaccount/oneoff-pipelines.git +refs/heads/*:refs/remotes/origin/*
          > git rev-parse origin/master^{commit} # timeout=10
          Checking out Revision 9054109618b393c75aca0c7bc6b0cf82607ba76e (origin/master)
          Commit message: "Use 6.0.36 of cp"
          > git config core.sparsecheckout # timeout=10
          > git checkout -f 9054109618b393c75aca0c7bc6b0cf82607ba76e
          > git rev-list 9054109618b393c75aca0c7bc6b0cf82607ba76e # timeout=10
          Loading library common-pipelines@v6.0.36
          > git rev-parse --is-inside-work-tree # timeout=10
          Setting origin to git@github.com:myghaccount/common-pipelines.git
          > git config remote.origin.url git@github.com:myghaccount/common-pipelines.git # timeout=10
          Fetching origin...
          Fetching upstream changes from origin
          > git --version # timeout=10
          using GIT_SSH to set credentials 
          > git fetch --tags --progress origin +refs/heads/*:refs/remotes/origin/*
          > git rev-parse v6.0.36^{commit} # timeout=10
          > git rev-parse --is-inside-work-tree # timeout=10
          Fetching changes from the remote Git repository
          > git config remote.origin.url git@github.com:myghaccount/common-pipelines.git # timeout=10
          Fetching without tags
          Fetching upstream changes from git@github.com:myghaccount/common-pipelines.git
          > git --version # timeout=10
          using GIT_SSH to set credentials 
          > git fetch --no-tags --progress git@github.com:myghaccount/common-pipelines.git +refs/heads/*:refs/remotes/origin/*
          Checking out Revision 639244106a2defe4944a70600ca93b93d0cff9b9 (v6.0.36)
          Commit message: "remove stopJobsMakingUseOfNode (#57)"
          > git config core.sparsecheckout # timeout=10
          > git checkout -f 639244106a2defe4944a70600ca93b93d0cff9b9
          > git rev-list 639244106a2defe4944a70600ca93b93d0cff9b9 # timeout=10
          Resuming build at Sat Sep 09 20:26:40 PDT 2017 after Jenkins restart

          Jon B added a comment - Still a problem months later.  Below you can see an example of a pipeline from SCM script that evidently had started at the moment a Jenkins restart took place. This usually takes only 5 seconds or so to run.. but you can see below, it stranded with a message that says: Resuming build at Sat Sep 09 20:26:40 PDT 2017 after Jenkins restart" but it's been stuck there like that for hours.  Here's the full content of the console: Checking out git git@github.com:myghaccount/oneoff-pipelines.git into / var /jenkins_home/workspace/update_jenkins_slaves_list@script to read update_jenkins_slaves_list/Jenkinsfile.groovy > git rev-parse --is-inside-work-tree # timeout=10 Fetching changes from the remote Git repository > git config remote.origin.url git@github.com:myghaccount/oneoff-pipelines.git # timeout=10 Fetching upstream changes from git@github.com:myghaccount/oneoff-pipelines.git > git --version # timeout=10 using GIT_SSH to set credentials Private key used for pulling from GitHub. > git fetch --tags --progress git@github.com:myghaccount/oneoff-pipelines.git +refs/heads/*:refs/remotes/origin/* > git rev-parse origin/master^{commit} # timeout=10 Checking out Revision 9054109618b393c75aca0c7bc6b0cf82607ba76e (origin/master) Commit message: "Use 6.0.36 of cp" > git config core.sparsecheckout # timeout=10 > git checkout -f 9054109618b393c75aca0c7bc6b0cf82607ba76e > git rev-list 9054109618b393c75aca0c7bc6b0cf82607ba76e # timeout=10 Loading library common-pipelines@v6.0.36 > git rev-parse --is-inside-work-tree # timeout=10 Setting origin to git@github.com:myghaccount/common-pipelines.git > git config remote.origin.url git@github.com:myghaccount/common-pipelines.git # timeout=10 Fetching origin... Fetching upstream changes from origin > git --version # timeout=10 using GIT_SSH to set credentials > git fetch --tags --progress origin +refs/heads/*:refs/remotes/origin/* > git rev-parse v6.0.36^{commit} # timeout=10 > git rev-parse --is-inside-work-tree # timeout=10 Fetching changes from the remote Git repository > git config remote.origin.url git@github.com:myghaccount/common-pipelines.git # timeout=10 Fetching without tags Fetching upstream changes from git@github.com:myghaccount/common-pipelines.git > git --version # timeout=10 using GIT_SSH to set credentials > git fetch --no-tags --progress git@github.com:myghaccount/common-pipelines.git +refs/heads/*:refs/remotes/origin/* Checking out Revision 639244106a2defe4944a70600ca93b93d0cff9b9 (v6.0.36) Commit message: "remove stopJobsMakingUseOfNode (#57)" > git config core.sparsecheckout # timeout=10 > git checkout -f 639244106a2defe4944a70600ca93b93d0cff9b9 > git rev-list 639244106a2defe4944a70600ca93b93d0cff9b9 # timeout=10 Resuming build at Sat Sep 09 20:26:40 PDT 2017 after Jenkins restart

          This problem is affecting us as well. The issue is debilitating considering the affected builds essentially hang indefinitely. For smaller Jenkins build farms admins may be able to eyeball the running build to see if any are hung, but at scale this is impractical. Sometimes builds will stay locked for hours or days before they get noticed ... particularly for builds that typically take many hours or days to complete when they run successfully!

          Any input, suggestions, workarounds or fixes would be greatly appreciated.

          Kevin Phillips added a comment - This problem is affecting us as well. The issue is debilitating considering the affected builds essentially hang indefinitely. For smaller Jenkins build farms admins may be able to eyeball the running build to see if any are hung, but at scale this is impractical. Sometimes builds will stay locked for hours or days before they get noticed ... particularly for builds that typically take many hours or days to complete when they run successfully! Any input, suggestions, workarounds or fixes would be greatly appreciated.

          Alex Taylor added a comment -

          elatt leedega

           

          I think this issue is caused by a defect in the Durable task plugin solved here: https://issues.jenkins-ci.org/browse/JENKINS-47791. It seems that when you put Jenkins into Shutdown mode it no longer tracks if the process is alive. Could I get you guys to update the durable task plugin to the latest version and try the restart again?

          Alex Taylor added a comment - elatt leedega   I think this issue is caused by a defect in the Durable task plugin solved here: https://issues.jenkins-ci.org/browse/JENKINS-47791 . It seems that when you put Jenkins into Shutdown mode it no longer tracks if the process is alive. Could I get you guys to update the durable task plugin to the latest version and try the restart again?

          Unfortunately, I no longer have access to this Jenkins environment so I won't be able to confirm. I looked at the JENKINS-47791 and it looks like a nice simplification; however, I can't say for sure if that was the root cause. I'll defer to leedega, otherwise feel free to close this as cannot reproduce.

          Erik Lattimore added a comment - Unfortunately, I no longer have access to this Jenkins environment so I won't be able to confirm. I looked at the JENKINS-47791 and it looks like a nice simplification; however, I can't say for sure if that was the root cause. I'll defer to leedega , otherwise feel free to close this as cannot reproduce.

          Jon B added a comment - - edited

          I'm running the latest jenkins with durable task plugin 1.7 and I just got the following after restarting all of my slaves in my aws autoscaling group:

          Cannot contact ip-172-31-248-165.us-west-2.compute.internal: java.io.IOException: remote file operation failed: /ebs/jenkins/workspace/genericpipeline-deploy at hudson.remoting.Channel@415f5ed4:ip-172-31-248-165.us-west-2.compute.internal: hudson.remoting.ChannelClosedException: Remote call on ip-172-31-248-165.us-west-2.compute.internal failed. The channel is closing down or has closed down

          I'm not 100% sure if that directly relates to this JIRA ticket. All of the slave boxes have relaunched but this pipeline just appears to be hung now. 

          The desired behavior would of course be that if the slave machine vaporizes that another slave with the same label gets the job instead.

          Jon B added a comment - - edited I'm running the latest jenkins with durable task plugin 1.7 and I just got the following after restarting all of my slaves in my aws autoscaling group: Cannot contact ip-172-31-248-165.us-west-2.compute.internal: java.io.IOException: remote file operation failed: /ebs/jenkins/workspace/genericpipeline-deploy at hudson.remoting.Channel@415f5ed4:ip-172-31-248-165.us-west-2.compute.internal: hudson.remoting.ChannelClosedException: Remote call on ip-172-31-248-165.us-west-2.compute.internal failed. The channel is closing down or has closed down I'm not 100% sure if that directly relates to this JIRA ticket. All of the slave boxes have relaunched but this pipeline just appears to be hung now.  The desired behavior would of course be that if the slave machine vaporizes that another slave with the same label gets the job instead.

          Alex Taylor added a comment -

          piratejohnny

          For that issue, the pipeline does not have the ability to resume on an agent with the same label because it needs access to the same workspace it was building on before the restart to resume properly. In this case(if it did not have the same workspace) it would try to reconnect to the same agent(which is destroyed) and it would eventually timeout after it can not find the workspace. In your case if you want it to resume on an agent with the same label you would need to persist that workspace somehow.

           

          Either way not related to this ticket in particular(also I assume you meant durable-task 1.17 rather than 1.7 )

          Alex Taylor added a comment - piratejohnny For that issue, the pipeline does not have the ability to resume on an agent with the same label because it needs access to the same workspace it was building on before the restart to resume properly. In this case(if it did not have the same workspace) it would try to reconnect to the same agent(which is destroyed) and it would eventually timeout after it can not find the workspace. In your case if you want it to resume on an agent with the same label you would need to persist that workspace somehow.   Either way not related to this ticket in particular(also I assume you meant durable-task 1.17 rather than 1.7 )

          +1
          Same behaviour also on Jenkins 2.164.2
          Our master doesn't have any executors and after master restart all the agents are staying like this for a while without resuming:

          Mircea-Andrei Albu added a comment - +1 Same behaviour also on Jenkins 2.164.2 Our master doesn't have any executors and after master restart all the agents are staying like this for a while without resuming:

          Alex Taylor added a comment -

          mirceaalbu I think this may be a different issue since this is a much later version you are updating Jenkins to. I would open a new JIRA with the Jenkins logs included since there may be an error there about why the build did not resume

          Alex Taylor added a comment - mirceaalbu I think this may be a different issue since this is a much later version you are updating Jenkins to. I would open a new JIRA with the Jenkins logs included since there may be an error there about why the build did not resume

          papanito added a comment - - edited

          I face the same (or similar) issue. I actually get this:

          Resuming build at Mon May 11 01:26:46 CEST 2020 after Jenkins restart
          Waiting to resume part of Delivery Pipelines » mdp-delivery-pipeline » master mdp-release-1.5.52#23: In the quiet period. Expires in 0 ms
          [Pipeline] End of Pipeline
          [Bitbucket] Notifying commit build result
          [Bitbucket] Build result notified
          

          jenkins.log

          We are using Jenkins ver. 2.222.1

          papanito added a comment - - edited I face the same (or similar) issue. I actually get this: Resuming build at Mon May 11 01:26:46 CEST 2020 after Jenkins restart Waiting to resume part of Delivery Pipelines » mdp-delivery-pipeline » master mdp-release-1.5.52#23: In the quiet period. Expires in 0 ms [Pipeline] End of Pipeline [Bitbucket] Notifying commit build result [Bitbucket] Build result notified jenkins.log We are using Jenkins ver. 2.222.1

          Alex Taylor added a comment -

          papanito This issue is for a pipeline which is hung waiting to restart and not a build which failed immediately after restart. If you feel there is an error after the build resumed then please create a new issue as your listed problem has nothing to do with the current jira.

          Additionally if you want help diagnosing the problem you will need to attached a full build folder to that new jira case as that is where the information about why the build stopped will be located. But just based on that very short log, it seems to be operating correctly so I am not clear on why you believe it to be a failure.

          Alex Taylor added a comment - papanito This issue is for a pipeline which is hung waiting to restart and not a build which failed immediately after restart. If you feel there is an error after the build resumed then please create a new issue as your listed problem has nothing to do with the current jira. Additionally if you want help diagnosing the problem you will need to attached a full build folder to that new jira case as that is where the information about why the build stopped will be located. But just based on that very short log, it seems to be operating correctly so I am not clear on why you believe it to be a failure.

          Alex Taylor added a comment -

          This issue is being marked fixed as it was originally reported for a durable task plugin issue which has since been fixed and released.

          If people are seeing similar issues in later versions of Jenkins, please open a new case and maybe mention it is similar to this one.

          Additionally if you are experiencing this issue on a particular build, please attach the full build folder zipped up as that will contain all the relevant data

          Alex Taylor added a comment - This issue is being marked fixed as it was originally reported for a durable task plugin issue which has since been fixed and released. If people are seeing similar issues in later versions of Jenkins, please open a new case and maybe mention it is similar to this one. Additionally if you are experiencing this issue on a particular build, please attach the full build folder zipped up as that will contain all the relevant data

          papanito added a comment -

          papanito added a comment - JENKINS-62248

          Frédéric Meyrou added a comment - - edited

          Dear,

          I have a very similar issue, but my Jenkins LTS version and plugins are now all up-to-date.

          After a difficult restart I have many jobs pending with the following kind of message :

          00:00:00.008 Started by timer
          00:00:00.219 Opening connection to http://jirasvnprod.agfahealthcare.com/svn/idrg/diagnosis-coding/
          00:00:37.968 Obtained Jenkinsfile_PROPERTIES from 119148
          00:00:37.968 Running in Durability level: MAX_SURVIVABILITY
          00:00:47.292 [Pipeline] Start of Pipeline
          00:01:53.178 [Pipeline] node
          00:02:08.424 Still waiting to schedule task
          00:02:08.425 All nodes of label ‘SHARED&&BORDEAUX&&WINDOWS64’ are offline (>>> ACTUALLY they are online!)
          00:52:09.681 Ready to run at Sun Nov 15 17:54:08 CET 2020
          00:52:09.681 Resuming build at Sun Nov 15 17:54:08 CET 2020 after Jenkins restart
          18:54:07.898 Ready to run at Mon Nov 16 11:56:06 CET 2020
          18:54:07.898 Resuming build at Mon Nov 16 11:56:06 CET 2020 after Jenkins restart

          >>> We are now the 18th! 

          Do you guys have a console groovy script to ends all thoses Jobs (I have more then 500 of them on a platform 10K Jobs)
          I need to scan all Jobs i this situation and kill them.

          Any help apreciated.

          ./Fred

           

          Frédéric Meyrou added a comment - - edited Dear, I have a very similar issue, but my Jenkins LTS version and plugins are now all up-to-date. After a difficult restart I have many jobs pending with the following kind of message : 00:00:00.008 Started by timer 00:00:00.219 Opening connection to http://jirasvnprod.agfahealthcare.com/svn/idrg/diagnosis-coding/ 00:00:37.968 Obtained Jenkinsfile_PROPERTIES from 119148 00:00:37.968 Running in Durability level: MAX_SURVIVABILITY 00:00:47.292 [Pipeline] Start of Pipeline 00:01:53.178 [Pipeline] node 00:02:08.424 Still waiting to schedule task 00:02:08.425 All nodes of label ‘SHARED&&BORDEAUX&&WINDOWS64’ are offline (>>> ACTUALLY they are online!) 00:52:09.681 Ready to run at Sun Nov 15 17:54:08 CET 2020 00:52:09.681 Resuming build at Sun Nov 15 17:54:08 CET 2020 after Jenkins restart 18:54:07.898 Ready to run at Mon Nov 16 11:56:06 CET 2020 18:54:07.898 Resuming build at Mon Nov 16 11:56:06 CET 2020 after Jenkins restart >>> We are now the 18th!  Do you guys have a console groovy script to ends all thoses Jobs ( I have more then 500 of them on a platform 10K Jobs ) I need to scan all Jobs i this situation and kill them. Any help apreciated. ./Fred  

          If anyone else has a lot of zombie jobs, this is a script that I come up with to kill them, without killing any non-zombie job: 

          def x = 0
          for (job in Hudson.instance.getAllItems(org.jenkinsci.plugins.workflow.job.WorkflowJob)) {
            
            try{
              def isZombie = job.getLastBuild().dump()==~ /.*state=null.*/
              def isRunning = job.getLastBuild().completed
              if(!isRunning && isZombie ){
                x = x +1
                println "Candidate for Zombie: ${job}"
                job.getLastBuild().doKill()
              }
            }catch(e){
            }
          }
          println "Number of zombies killed: ${x}"  

          it timeouts in jenkinsurl/script around ~600 zombies, so it's possible you'll have to run it more times.

           

          Tomas Hartmann added a comment - If anyone else has a lot of zombie jobs, this is a script that I come up with to kill them, without killing any non-zombie job:  def x = 0 for (job in Hudson.instance.getAllItems(org.jenkinsci.plugins.workflow.job.WorkflowJob)) { try { def isZombie = job.getLastBuild().dump()==~ /.*state= null .*/ def isRunning = job.getLastBuild().completed if (!isRunning && isZombie ){ x = x +1 println "Candidate for Zombie: ${job}" job.getLastBuild().doKill() } } catch (e){ } } println " Number of zombies killed: ${x}" it timeouts in jenkinsurl/script around ~600 zombies, so it's possible you'll have to run it more times.  

          Patrick Riegler added a comment - - edited

          We ran into a similar problem and it took us a while to figure out the solution.

          In our case the issue was, that the name of the "package" in the included library wasn't properly defined.
          It is essential, that the name of the package is the same as the folder path and filename as described in this example:
          https://www.jenkins.io/doc/book/pipeline/shared-libraries/#writing-libraries

          // src/org/foo/Zot.groovy 
          package org.foo 
          
          def checkOutFrom(repo) { 
            git url: "git@github.com:jenkinsci/${repo}" 
          } 
          
          return this 

          and in the pipeline script use:

          def z = new org.foo.Zot()
          z.checkOutFrom(repo) 

          It would work if the package is called:

          package org.something.foo  

          but the class cannot be found after serialization.

          I hope this is of help 

          Patrick Riegler added a comment - - edited We ran into a similar problem and it took us a while to figure out the solution. In our case the issue was, that the name of the "package" in the included library wasn't properly defined. It is essential, that the name of the package is the same as the folder path and filename as described in this example: https://www.jenkins.io/doc/book/pipeline/shared-libraries/#writing-libraries // src/org/foo/Zot.groovy package org.foo  def checkOutFrom(repo) { git url: "git@github.com:jenkinsci/${repo}" } return this and in the pipeline script use: def z = new org.foo.Zot() z.checkOutFrom(repo) It would work if the package is called: package org.something.foo  but the class cannot be found after serialization. I hope this is of help 

            ataylor Alex Taylor
            elatt Erik Lattimore
            Votes:
            13 Vote for this issue
            Watchers:
            29 Start watching this issue

              Created:
              Updated:
              Resolved: