[JENKINS-43587] Pipeline fails to resume after master restart/plugin upgrade

Type: Bug
Resolution: Fixed
Priority: Major
Component/s: pipeline
Labels:
None
Environment:
Jenkins 2.46.1, Latest version of pipeline plugins (pipeline-build-step: 2.5, pipeline-rest-api: 2.6, pipeline-stage-step: 2.2, etc)

Similar Issues:
Powered by SuggestiMate

Show
Released As:
durable-task 1.18

During a recent Jenkins plugin upgrade and master restart, it seems that Jenkins failed to resume at least two Pipeline jobs. The pipeline was in the middle of a sh() step when the master was restarted. Both jobs have output similar to the following in the console:

Resuming build at Thu Apr 13 15:01:50 EDT 2017 after Jenkins restart
Waiting to resume part of <job name...>: ???
Ready to run at Thu Apr 13 15:01:51 EDT 2017

However this text has been displayed for several minutes now with no obvious indication on what the job is waiting for. We can see that the pipeline is still running on the correct executor that it was running on pre-restart however, if we log into the server, there is no durable task or process of the script that the sh() step was running. From logging of the script that we were running, we can tell that the command did finish successfully but can't understand how Jenkins lost track of it. From the logging, the time when the command finished was around the same time when the master was restarting (it is difficult to pinpoint exactly).

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

jenkins.log
46 kB
2020-05-11 11:13
image-2019-05-17-16-09-05-599.png
15 kB
2019-05-17 13:09

is related to

JENKINS-46961 Pipelines interrupted while starting incorrectly resume after Jenkins restarts and cannot be killed

Reopened

JENKINS-62248 Pipeline fails to resume after master restart

Fixed but Unreleased

relates to

JENKINS-39552 After restart, interrupted pipeline deadlocks waiting for executor

Closed

JENKINS-67164 Pipelines missing from FlowExecutionList hang forever after resuming

Resolved

Erik Lattimore added a comment - 2017-04-13 19:45

These were the plugins that were being upgraded at the time:

blueocean-commons.jpi
blueocean-jwt.jpi
blueocean-web.jpi
blueocean-rest.jpi
blueocean-rest-impl.jpi
blueocean-pipeline-api-impl.jpi
blueocean-github-pipeline.jpi
blueocean-git-pipeline.jpi
blueocean-config.jpi
blueocean-events.jpi
blueocean-personalization.jpi
blueocean-i18n.jpi
blueocean-dashboard.jpi
blueocean.jpi
hashicorp-vault-plugin.jpi
analysis-core.jpi
pipeline-maven.jpi
workflow-api.jpi
warnings.jpi
ssh-slaves.jpi
mask-passwords.jpi
violation-comments-to-stash.jpi

Erik Lattimore added a comment - 2017-04-13 19:45 These were the plugins that were being upgraded at the time: blueocean-commons.jpi blueocean-jwt.jpi blueocean-web.jpi blueocean-rest.jpi blueocean-rest-impl.jpi blueocean-pipeline-api-impl.jpi blueocean-github-pipeline.jpi blueocean-git-pipeline.jpi blueocean-config.jpi blueocean-events.jpi blueocean-personalization.jpi blueocean-i18n.jpi blueocean-dashboard.jpi blueocean.jpi hashicorp-vault-plugin.jpi analysis-core.jpi pipeline-maven.jpi workflow-api.jpi warnings.jpi ssh-slaves.jpi mask-passwords.jpi violation-comments-to-stash.jpi

Erik Lattimore added a comment - 2017-04-13 19:46

And here is the thread dump from the job:

Thread #34
at DSL.sh(completed process (code 0) in /home/jenkins/workspace/<jobname>@2@tmp/durable-039a0a47 on <hostname> (pid: 6808); recurrence period: 0ms)
at WorkflowScript.deployStep(WorkflowScript:401)
at DSL.timeout(killer task nowhere to be found)
at WorkflowScript.deployStep(WorkflowScript:400)
at DSL.ws(Native Method)
at WorkflowScript.deployStep(WorkflowScript:366)
at DSL.sshagent(Native Method)
at WorkflowScript.deployStep(WorkflowScript:308)
at DSL.lock(Native Method)
at WorkflowScript.deployStep(WorkflowScript:307)
at DSL.node(running on voltron.coalition.local)
at WorkflowScript.deployStep(WorkflowScript:306)
at DSL.stage(Native Method)
at WorkflowScript.deployStep(WorkflowScript:305)
at WorkflowScript.run(WorkflowScript:419)
at DSL.timestamps(Native Method)
at WorkflowScript.run(WorkflowScript:416)

Erik Lattimore added a comment - 2017-04-13 19:46 And here is the thread dump from the job: Thread #34 at DSL.sh(completed process (code 0) in /home/jenkins/workspace/<jobname>@2@tmp/durable-039a0a47 on <hostname> (pid: 6808); recurrence period: 0ms) at WorkflowScript.deployStep(WorkflowScript:401) at DSL.timeout(killer task nowhere to be found) at WorkflowScript.deployStep(WorkflowScript:400) at DSL.ws(Native Method) at WorkflowScript.deployStep(WorkflowScript:366) at DSL.sshagent(Native Method) at WorkflowScript.deployStep(WorkflowScript:308) at DSL.lock(Native Method) at WorkflowScript.deployStep(WorkflowScript:307) at DSL.node(running on voltron.coalition.local) at WorkflowScript.deployStep(WorkflowScript:306) at DSL.stage(Native Method) at WorkflowScript.deployStep(WorkflowScript:305) at WorkflowScript.run(WorkflowScript:419) at DSL.timestamps(Native Method) at WorkflowScript.run(WorkflowScript:416)

Erik Lattimore added a comment - 2017-04-13 19:53

The pipeline is roughly:

stage('Deploy') {
  node(getNode(tenant, vpc)) {
    lock(getLockableResource(tenant, vpc)) {
      sshagent([GIT_AUTH]) {
        ws {
          try {
            for (int i = 0; i < products.size(); i++) {
              timeout(time: 2, unit: 'HOURS') {
                sh("deploy.py ${products[i]}")
              }
            }
          } finally {
            deleteDir()
          }
        }
      }
    }
  }
}

Erik Lattimore added a comment - 2017-04-13 19:53 The pipeline is roughly: stage('Deploy') { node(getNode(tenant, vpc)) { lock(getLockableResource(tenant, vpc)) { sshagent([GIT_AUTH]) { ws { try { for (int i = 0; i < products.size(); i++) { timeout(time: 2, unit: 'HOURS') { sh("deploy.py ${products[i]}") } } } finally { deleteDir() } } } } } }

Erik Lattimore added a comment - 2017-04-13 20:01

Finally, the node that this was running on as 5 executors.

Erik Lattimore added a comment - 2017-04-13 20:01 Finally, the node that this was running on as 5 executors.

Erik Lattimore added a comment - 2017-04-13 20:21

Hmm, in the second case, it seems like the process actually died when the master was restarted because this one did not run to completion but just terminated abruptly based on the logs

Erik Lattimore added a comment - 2017-04-13 20:21 Hmm, in the second case, it seems like the process actually died when the master was restarted because this one did not run to completion but just terminated abruptly based on the logs

Jon B added a comment - 2017-04-19 09:07

I am also getting stranded with 'Ready to run at'

In my case, I run Jenkins within a Docker container. If one of my pipelines is running and someone does a Docker restart on the container, it strands with "

"Ready to run at"

Side note - it seems like the Jenkins pipeline features are really clunky. If a slave server goes away, I'm seeing similar hanging problems. Any advice/guidance would be appreciated.

Jon B added a comment - 2017-04-19 09:07 I am also getting stranded with 'Ready to run at' In my case, I run Jenkins within a Docker container. If one of my pipelines is running and someone does a Docker restart on the container, it strands with " "Ready to run at" Side note - it seems like the Jenkins pipeline features are really clunky. If a slave server goes away, I'm seeing similar hanging problems. Any advice/guidance would be appreciated.

Jon B added a comment - 2017-04-19 09:13 - edited

It may be worth noting that the pipeline im getting stranded on calls another pipeline with

build job: 'mysubpipeline', parameters: [
[$class: 'StringParameterValue', name: 'BRANCH_NAME', value: "$BRANCH_NAME"]
]

Jon B added a comment - 2017-04-19 09:13 - edited It may be worth noting that the pipeline im getting stranded on calls another pipeline with build job: 'mysubpipeline' , parameters: [ [$class: 'StringParameterValue' , name: 'BRANCH_NAME' , value: "$BRANCH_NAME" ] ]

Jon B added a comment - 2017-09-10 05:39

Still a problem months later. Below you can see an example of a pipeline from SCM script that evidently had started at the moment a Jenkins restart took place. This usually takes only 5 seconds or so to run.. but you can see below, it stranded with a message that says:

Resuming build at Sat Sep 09 20:26:40 PDT 2017 after Jenkins restart"

but it's been stuck there like that for hours.

Here's the full content of the console:

Checking out git git@github.com:myghaccount/oneoff-pipelines.git into /var/jenkins_home/workspace/update_jenkins_slaves_list@script to read update_jenkins_slaves_list/Jenkinsfile.groovy
> git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
> git config remote.origin.url git@github.com:myghaccount/oneoff-pipelines.git # timeout=10
Fetching upstream changes from git@github.com:myghaccount/oneoff-pipelines.git
> git --version # timeout=10
using GIT_SSH to set credentials Private key used for pulling from GitHub.
> git fetch --tags --progress git@github.com:myghaccount/oneoff-pipelines.git +refs/heads/*:refs/remotes/origin/*
> git rev-parse origin/master^{commit} # timeout=10
Checking out Revision 9054109618b393c75aca0c7bc6b0cf82607ba76e (origin/master)
Commit message: "Use 6.0.36 of cp"
> git config core.sparsecheckout # timeout=10
> git checkout -f 9054109618b393c75aca0c7bc6b0cf82607ba76e
> git rev-list 9054109618b393c75aca0c7bc6b0cf82607ba76e # timeout=10
Loading library common-pipelines@v6.0.36
> git rev-parse --is-inside-work-tree # timeout=10
Setting origin to git@github.com:myghaccount/common-pipelines.git
> git config remote.origin.url git@github.com:myghaccount/common-pipelines.git # timeout=10
Fetching origin...
Fetching upstream changes from origin
> git --version # timeout=10
using GIT_SSH to set credentials 
> git fetch --tags --progress origin +refs/heads/*:refs/remotes/origin/*
> git rev-parse v6.0.36^{commit} # timeout=10
> git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
> git config remote.origin.url git@github.com:myghaccount/common-pipelines.git # timeout=10
Fetching without tags
Fetching upstream changes from git@github.com:myghaccount/common-pipelines.git
> git --version # timeout=10
using GIT_SSH to set credentials 
> git fetch --no-tags --progress git@github.com:myghaccount/common-pipelines.git +refs/heads/*:refs/remotes/origin/*
Checking out Revision 639244106a2defe4944a70600ca93b93d0cff9b9 (v6.0.36)
Commit message: "remove stopJobsMakingUseOfNode (#57)"
> git config core.sparsecheckout # timeout=10
> git checkout -f 639244106a2defe4944a70600ca93b93d0cff9b9
> git rev-list 639244106a2defe4944a70600ca93b93d0cff9b9 # timeout=10
Resuming build at Sat Sep 09 20:26:40 PDT 2017 after Jenkins restart

Jon B added a comment - 2017-09-10 05:39 Still a problem months later. Below you can see an example of a pipeline from SCM script that evidently had started at the moment a Jenkins restart took place. This usually takes only 5 seconds or so to run.. but you can see below, it stranded with a message that says: Resuming build at Sat Sep 09 20:26:40 PDT 2017 after Jenkins restart" but it's been stuck there like that for hours. Here's the full content of the console: Checking out git git@github.com:myghaccount/oneoff-pipelines.git into / var /jenkins_home/workspace/update_jenkins_slaves_list@script to read update_jenkins_slaves_list/Jenkinsfile.groovy > git rev-parse --is-inside-work-tree # timeout=10 Fetching changes from the remote Git repository > git config remote.origin.url git@github.com:myghaccount/oneoff-pipelines.git # timeout=10 Fetching upstream changes from git@github.com:myghaccount/oneoff-pipelines.git > git --version # timeout=10 using GIT_SSH to set credentials Private key used for pulling from GitHub. > git fetch --tags --progress git@github.com:myghaccount/oneoff-pipelines.git +refs/heads/*:refs/remotes/origin/* > git rev-parse origin/master^{commit} # timeout=10 Checking out Revision 9054109618b393c75aca0c7bc6b0cf82607ba76e (origin/master) Commit message: "Use 6.0.36 of cp" > git config core.sparsecheckout # timeout=10 > git checkout -f 9054109618b393c75aca0c7bc6b0cf82607ba76e > git rev-list 9054109618b393c75aca0c7bc6b0cf82607ba76e # timeout=10 Loading library common-pipelines@v6.0.36 > git rev-parse --is-inside-work-tree # timeout=10 Setting origin to git@github.com:myghaccount/common-pipelines.git > git config remote.origin.url git@github.com:myghaccount/common-pipelines.git # timeout=10 Fetching origin... Fetching upstream changes from origin > git --version # timeout=10 using GIT_SSH to set credentials > git fetch --tags --progress origin +refs/heads/*:refs/remotes/origin/* > git rev-parse v6.0.36^{commit} # timeout=10 > git rev-parse --is-inside-work-tree # timeout=10 Fetching changes from the remote Git repository > git config remote.origin.url git@github.com:myghaccount/common-pipelines.git # timeout=10 Fetching without tags Fetching upstream changes from git@github.com:myghaccount/common-pipelines.git > git --version # timeout=10 using GIT_SSH to set credentials > git fetch --no-tags --progress git@github.com:myghaccount/common-pipelines.git +refs/heads/*:refs/remotes/origin/* Checking out Revision 639244106a2defe4944a70600ca93b93d0cff9b9 (v6.0.36) Commit message: "remove stopJobsMakingUseOfNode (#57)" > git config core.sparsecheckout # timeout=10 > git checkout -f 639244106a2defe4944a70600ca93b93d0cff9b9 > git rev-list 639244106a2defe4944a70600ca93b93d0cff9b9 # timeout=10 Resuming build at Sat Sep 09 20:26:40 PDT 2017 after Jenkins restart

Kevin Phillips added a comment - 2017-12-01 18:27

This problem is affecting us as well. The issue is debilitating considering the affected builds essentially hang indefinitely. For smaller Jenkins build farms admins may be able to eyeball the running build to see if any are hung, but at scale this is impractical. Sometimes builds will stay locked for hours or days before they get noticed ... particularly for builds that typically take many hours or days to complete when they run successfully!

Any input, suggestions, workarounds or fixes would be greatly appreciated.

Kevin Phillips added a comment - 2017-12-01 18:27 This problem is affecting us as well. The issue is debilitating considering the affected builds essentially hang indefinitely. For smaller Jenkins build farms admins may be able to eyeball the running build to see if any are hung, but at scale this is impractical. Sometimes builds will stay locked for hours or days before they get noticed ... particularly for builds that typically take many hours or days to complete when they run successfully! Any input, suggestions, workarounds or fixes would be greatly appreciated.

Alex Taylor added a comment - 2017-12-04 18:45

elatt leedega

I think this issue is caused by a defect in the Durable task plugin solved here: https://issues.jenkins-ci.org/browse/JENKINS-47791. It seems that when you put Jenkins into Shutdown mode it no longer tracks if the process is alive. Could I get you guys to update the durable task plugin to the latest version and try the restart again?

Alex Taylor added a comment - 2017-12-04 18:45 elatt leedega I think this issue is caused by a defect in the Durable task plugin solved here: https://issues.jenkins-ci.org/browse/JENKINS-47791 . It seems that when you put Jenkins into Shutdown mode it no longer tracks if the process is alive. Could I get you guys to update the durable task plugin to the latest version and try the restart again?

Erik Lattimore added a comment - 2017-12-04 22:55

Unfortunately, I no longer have access to this Jenkins environment so I won't be able to confirm. I looked at the ~~JENKINS-47791~~ and it looks like a nice simplification; however, I can't say for sure if that was the root cause. I'll defer to leedega, otherwise feel free to close this as cannot reproduce.

Erik Lattimore added a comment - 2017-12-04 22:55 Unfortunately, I no longer have access to this Jenkins environment so I won't be able to confirm. I looked at the JENKINS-47791 and it looks like a nice simplification; however, I can't say for sure if that was the root cause. I'll defer to leedega , otherwise feel free to close this as cannot reproduce.

Jon B added a comment - 2017-12-06 08:11 - edited

I'm running the latest jenkins with durable task plugin 1.7 and I just got the following after restarting all of my slaves in my aws autoscaling group:

Cannot contact ip-172-31-248-165.us-west-2.compute.internal: java.io.IOException: remote file operation failed: /ebs/jenkins/workspace/genericpipeline-deploy at hudson.remoting.Channel@415f5ed4:ip-172-31-248-165.us-west-2.compute.internal: hudson.remoting.ChannelClosedException: Remote call on ip-172-31-248-165.us-west-2.compute.internal failed. The channel is closing down or has closed down

I'm not 100% sure if that directly relates to this JIRA ticket. All of the slave boxes have relaunched but this pipeline just appears to be hung now.

The desired behavior would of course be that if the slave machine vaporizes that another slave with the same label gets the job instead.

Jon B added a comment - 2017-12-06 08:11 - edited I'm running the latest jenkins with durable task plugin 1.7 and I just got the following after restarting all of my slaves in my aws autoscaling group: Cannot contact ip-172-31-248-165.us-west-2.compute.internal: java.io.IOException: remote file operation failed: /ebs/jenkins/workspace/genericpipeline-deploy at hudson.remoting.Channel@415f5ed4:ip-172-31-248-165.us-west-2.compute.internal: hudson.remoting.ChannelClosedException: Remote call on ip-172-31-248-165.us-west-2.compute.internal failed. The channel is closing down or has closed down I'm not 100% sure if that directly relates to this JIRA ticket. All of the slave boxes have relaunched but this pipeline just appears to be hung now. The desired behavior would of course be that if the slave machine vaporizes that another slave with the same label gets the job instead.

Alex Taylor added a comment - 2017-12-06 13:36

piratejohnny

For that issue, the pipeline does not have the ability to resume on an agent with the same label because it needs access to the same workspace it was building on before the restart to resume properly. In this case(if it did not have the same workspace) it would try to reconnect to the same agent(which is destroyed) and it would eventually timeout after it can not find the workspace. In your case if you want it to resume on an agent with the same label you would need to persist that workspace somehow.

Either way not related to this ticket in particular(also I assume you meant durable-task 1.17 rather than 1.7 )

Alex Taylor added a comment - 2017-12-06 13:36 piratejohnny For that issue, the pipeline does not have the ability to resume on an agent with the same label because it needs access to the same workspace it was building on before the restart to resume properly. In this case(if it did not have the same workspace) it would try to reconnect to the same agent(which is destroyed) and it would eventually timeout after it can not find the workspace. In your case if you want it to resume on an agent with the same label you would need to persist that workspace somehow. Either way not related to this ticket in particular(also I assume you meant durable-task 1.17 rather than 1.7 )

Mircea-Andrei Albu added a comment - 2019-05-17 13:09

+1
Same behaviour also on Jenkins 2.164.2
Our master doesn't have any executors and after master restart all the agents are staying like this for a while without resuming:

Mircea-Andrei Albu added a comment - 2019-05-17 13:09 +1 Same behaviour also on Jenkins 2.164.2 Our master doesn't have any executors and after master restart all the agents are staying like this for a while without resuming:

Alex Taylor added a comment - 2019-05-17 21:09

mirceaalbu I think this may be a different issue since this is a much later version you are updating Jenkins to. I would open a new JIRA with the Jenkins logs included since there may be an error there about why the build did not resume

Alex Taylor added a comment - 2019-05-17 21:09 mirceaalbu I think this may be a different issue since this is a much later version you are updating Jenkins to. I would open a new JIRA with the Jenkins logs included since there may be an error there about why the build did not resume

papanito added a comment - 2020-05-11 11:13 - edited

I face the same (or similar) issue. I actually get this:

Resuming build at Mon May 11 01:26:46 CEST 2020 after Jenkins restart
Waiting to resume part of Delivery Pipelines » mdp-delivery-pipeline » master mdp-release-1.5.52#23: In the quiet period. Expires in 0 ms
[Pipeline] End of Pipeline
[Bitbucket] Notifying commit build result
[Bitbucket] Build result notified

jenkins.log

We are using Jenkins ver. 2.222.1

papanito added a comment - 2020-05-11 11:13 - edited I face the same (or similar) issue. I actually get this: Resuming build at Mon May 11 01:26:46 CEST 2020 after Jenkins restart Waiting to resume part of Delivery Pipelines » mdp-delivery-pipeline » master mdp-release-1.5.52#23: In the quiet period. Expires in 0 ms [Pipeline] End of Pipeline [Bitbucket] Notifying commit build result [Bitbucket] Build result notified jenkins.log We are using Jenkins ver. 2.222.1

Alex Taylor added a comment - 2020-05-11 12:54

papanito This issue is for a pipeline which is hung waiting to restart and not a build which failed immediately after restart. If you feel there is an error after the build resumed then please create a new issue as your listed problem has nothing to do with the current jira.

Additionally if you want help diagnosing the problem you will need to attached a full build folder to that new jira case as that is where the information about why the build stopped will be located. But just based on that very short log, it seems to be operating correctly so I am not clear on why you believe it to be a failure.

Alex Taylor added a comment - 2020-05-11 12:54 papanito This issue is for a pipeline which is hung waiting to restart and not a build which failed immediately after restart. If you feel there is an error after the build resumed then please create a new issue as your listed problem has nothing to do with the current jira. Additionally if you want help diagnosing the problem you will need to attached a full build folder to that new jira case as that is where the information about why the build stopped will be located. But just based on that very short log, it seems to be operating correctly so I am not clear on why you believe it to be a failure.

Alex Taylor added a comment - 2020-05-11 12:58

This issue is being marked fixed as it was originally reported for a durable task plugin issue which has since been fixed and released.

If people are seeing similar issues in later versions of Jenkins, please open a new case and maybe mention it is similar to this one.

Additionally if you are experiencing this issue on a particular build, please attach the full build folder zipped up as that will contain all the relevant data

Alex Taylor added a comment - 2020-05-11 12:58 This issue is being marked fixed as it was originally reported for a durable task plugin issue which has since been fixed and released. If people are seeing similar issues in later versions of Jenkins, please open a new case and maybe mention it is similar to this one. Additionally if you are experiencing this issue on a particular build, please attach the full build folder zipped up as that will contain all the relevant data

papanito added a comment - 2020-05-12 05:02

~~JENKINS-62248~~

papanito added a comment - 2020-05-12 05:02 JENKINS-62248

Frédéric Meyrou added a comment - 2020-11-18 10:11 - edited

Dear,

I have a very similar issue, but my Jenkins LTS version and plugins are now all up-to-date.

After a difficult restart I have many jobs pending with the following kind of message :

00:00:00.008 Started by timer
00:00:00.219 Opening connection to http://jirasvnprod.agfahealthcare.com/svn/idrg/diagnosis-coding/
00:00:37.968 Obtained Jenkinsfile_PROPERTIES from 119148
00:00:37.968 Running in Durability level: MAX_SURVIVABILITY
00:00:47.292 [Pipeline] Start of Pipeline
00:01:53.178 [Pipeline] node
00:02:08.424 Still waiting to schedule task
00:02:08.425 All nodes of label ‘SHARED&&BORDEAUX&&WINDOWS64’ are offline (>>> ACTUALLY they are online!)
00:52:09.681 Ready to run at Sun Nov 15 17:54:08 CET 2020
00:52:09.681 Resuming build at Sun Nov 15 17:54:08 CET 2020 after Jenkins restart
18:54:07.898 Ready to run at Mon Nov 16 11:56:06 CET 2020
18:54:07.898 Resuming build at Mon Nov 16 11:56:06 CET 2020 after Jenkins restart

>>> We are now the 18th!

Do you guys have a console groovy script to ends all thoses Jobs (I have more then 500 of them on a platform 10K Jobs)
I need to scan all Jobs i this situation and kill them.

Any help apreciated.

./Fred

Frédéric Meyrou added a comment - 2020-11-18 10:11 - edited Dear, I have a very similar issue, but my Jenkins LTS version and plugins are now all up-to-date. After a difficult restart I have many jobs pending with the following kind of message : 00:00:00.008 Started by timer 00:00:00.219 Opening connection to http://jirasvnprod.agfahealthcare.com/svn/idrg/diagnosis-coding/ 00:00:37.968 Obtained Jenkinsfile_PROPERTIES from 119148 00:00:37.968 Running in Durability level: MAX_SURVIVABILITY 00:00:47.292 [Pipeline] Start of Pipeline 00:01:53.178 [Pipeline] node 00:02:08.424 Still waiting to schedule task 00:02:08.425 All nodes of label ‘SHARED&&BORDEAUX&&WINDOWS64’ are offline (>>> ACTUALLY they are online!) 00:52:09.681 Ready to run at Sun Nov 15 17:54:08 CET 2020 00:52:09.681 Resuming build at Sun Nov 15 17:54:08 CET 2020 after Jenkins restart 18:54:07.898 Ready to run at Mon Nov 16 11:56:06 CET 2020 18:54:07.898 Resuming build at Mon Nov 16 11:56:06 CET 2020 after Jenkins restart >>> We are now the 18th! Do you guys have a console groovy script to ends all thoses Jobs ( I have more then 500 of them on a platform 10K Jobs ) I need to scan all Jobs i this situation and kill them. Any help apreciated. ./Fred

Tomas Hartmann added a comment - 2021-08-24 14:12

If anyone else has a lot of zombie jobs, this is a script that I come up with to kill them, without killing any non-zombie job:

def x = 0
for (job in Hudson.instance.getAllItems(org.jenkinsci.plugins.workflow.job.WorkflowJob)) {
  
  try{
    def isZombie = job.getLastBuild().dump()==~ /.*state=null.*/
    def isRunning = job.getLastBuild().completed
    if(!isRunning && isZombie ){
      x = x +1
      println "Candidate for Zombie: ${job}"
      job.getLastBuild().doKill()
    }
  }catch(e){
  }
}
println "Number of zombies killed: ${x}"

it timeouts in jenkinsurl/script around ~600 zombies, so it's possible you'll have to run it more times.

Tomas Hartmann added a comment - 2021-08-24 14:12 If anyone else has a lot of zombie jobs, this is a script that I come up with to kill them, without killing any non-zombie job: def x = 0 for (job in Hudson.instance.getAllItems(org.jenkinsci.plugins.workflow.job.WorkflowJob)) { try { def isZombie = job.getLastBuild().dump()==~ /.*state= null .*/ def isRunning = job.getLastBuild().completed if (!isRunning && isZombie ){ x = x +1 println "Candidate for Zombie: ${job}" job.getLastBuild().doKill() } } catch (e){ } } println " Number of zombies killed: ${x}" it timeouts in jenkinsurl/script around ~600 zombies, so it's possible you'll have to run it more times.

Patrick Riegler added a comment - 2022-01-21 09:05 - edited

We ran into a similar problem and it took us a while to figure out the solution.

In our case the issue was, that the name of the "package" in the included library wasn't properly defined.
It is essential, that the name of the package is the same as the folder path and filename as described in this example:
https://www.jenkins.io/doc/book/pipeline/shared-libraries/#writing-libraries

// src/org/foo/Zot.groovy 
package org.foo 

def checkOutFrom(repo) { 
  git url: "git@github.com:jenkinsci/${repo}" 
} 

return this

and in the pipeline script use:

def z = new org.foo.Zot()
z.checkOutFrom(repo)

It would work if the package is called:

package org.something.foo

but the class cannot be found after serialization.

I hope this is of help

Patrick Riegler added a comment - 2022-01-21 09:05 - edited We ran into a similar problem and it took us a while to figure out the solution. In our case the issue was, that the name of the "package" in the included library wasn't properly defined. It is essential, that the name of the package is the same as the folder path and filename as described in this example: https://www.jenkins.io/doc/book/pipeline/shared-libraries/#writing-libraries // src/org/foo/Zot.groovy package org.foo def checkOutFrom(repo) { git url: "git@github.com:jenkinsci/${repo}" } return this and in the pipeline script use: def z = new org.foo.Zot() z.checkOutFrom(repo) It would work if the package is called: package org.something.foo but the class cannot be found after serialization. I hope this is of help

Assignee:: Alex Taylor

Reporter:: Erik Lattimore

Votes:: 13 Vote for this issue

Watchers:: 29 Start watching this issue

Created:: 2017-04-13 19:41

Updated:: 2022-01-21 09:06

Resolved:: 2020-05-11 12:58

Jenkins

Details

Description

Attachments

Attachments

Issue Links

Activity

Collapse comment: Erik Lattimore added a comment - 2017-04-13 19:45

Expand comment: Erik Lattimore added a comment - 2017-04-13 19:45

Collapse comment: Erik Lattimore added a comment - 2017-04-13 19:46

Expand comment: Erik Lattimore added a comment - 2017-04-13 19:46

Collapse comment: Erik Lattimore added a comment - 2017-04-13 19:53

Expand comment: Erik Lattimore added a comment - 2017-04-13 19:53

Collapse comment: Erik Lattimore added a comment - 2017-04-13 20:01

Expand comment: Erik Lattimore added a comment - 2017-04-13 20:01

Collapse comment: Erik Lattimore added a comment - 2017-04-13 20:21

Expand comment: Erik Lattimore added a comment - 2017-04-13 20:21

Collapse comment: Jon B added a comment - 2017-04-19 09:07

Expand comment: Jon B added a comment - 2017-04-19 09:07

Collapse comment: Jon B added a comment - 2017-04-19 09:13, Edited by Jon B - 2017-04-19 09:16

Expand comment: Jon B added a comment - 2017-04-19 09:13, Edited by Jon B - 2017-04-19 09:16

Collapse comment: Jon B added a comment - 2017-09-10 05:39

Expand comment: Jon B added a comment - 2017-09-10 05:39

Collapse comment: Kevin Phillips added a comment - 2017-12-01 18:27

Expand comment: Kevin Phillips added a comment - 2017-12-01 18:27

Collapse comment: Alex Taylor added a comment - 2017-12-04 18:45

Expand comment: Alex Taylor added a comment - 2017-12-04 18:45

Collapse comment: Erik Lattimore added a comment - 2017-12-04 22:55

Expand comment: Erik Lattimore added a comment - 2017-12-04 22:55

Collapse comment: Jon B added a comment - 2017-12-06 08:11, Edited by Jon B - 2017-12-06 08:15

Expand comment: Jon B added a comment - 2017-12-06 08:11, Edited by Jon B - 2017-12-06 08:15

Collapse comment: Alex Taylor added a comment - 2017-12-06 13:36

Expand comment: Alex Taylor added a comment - 2017-12-06 13:36

Collapse comment: Mircea-Andrei Albu added a comment - 2019-05-17 13:09

Expand comment: Mircea-Andrei Albu added a comment - 2019-05-17 13:09

Collapse comment: Alex Taylor added a comment - 2019-05-17 21:09

Expand comment: Alex Taylor added a comment - 2019-05-17 21:09

Collapse comment: papanito added a comment - 2020-05-11 11:13, Edited by papanito - 2020-05-11 11:14

Expand comment: papanito added a comment - 2020-05-11 11:13, Edited by papanito - 2020-05-11 11:14

Collapse comment: Alex Taylor added a comment - 2020-05-11 12:54

Expand comment: Alex Taylor added a comment - 2020-05-11 12:54

Collapse comment: Alex Taylor added a comment - 2020-05-11 12:58

Expand comment: Alex Taylor added a comment - 2020-05-11 12:58

Collapse comment: papanito added a comment - 2020-05-12 05:02

Expand comment: papanito added a comment - 2020-05-12 05:02

Collapse comment: Frédéric Meyrou added a comment - 2020-11-18 10:11, Edited by Frédéric Meyrou - 2020-11-18 10:11

Expand comment: Frédéric Meyrou added a comment - 2020-11-18 10:11, Edited by Frédéric Meyrou - 2020-11-18 10:11

Collapse comment: Tomas Hartmann added a comment - 2021-08-24 14:12

Expand comment: Tomas Hartmann added a comment - 2021-08-24 14:12

Collapse comment: Patrick Riegler added a comment - 2022-01-21 09:05, Edited by Patrick Riegler - 2022-01-21 09:06

Expand comment: Patrick Riegler added a comment - 2022-01-21 09:05, Edited by Patrick Riegler - 2022-01-21 09:06

People

Dates