Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-43587

Pipeline fails to resume after master restart/plugin upgrade

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Resolved (View Workflow)
    • Priority: Major
    • Resolution: Fixed
    • Component/s: pipeline
    • Labels:
      None
    • Environment:
      Jenkins 2.46.1, Latest version of pipeline plugins (pipeline-build-step: 2.5, pipeline-rest-api: 2.6, pipeline-stage-step: 2.2, etc)
    • Similar Issues:
    • Released As:
      durable-task 1.18

      Description

      During a recent Jenkins plugin upgrade and master restart, it seems that Jenkins failed to resume at least two Pipeline jobs. The pipeline was in the middle of a sh() step when the master was restarted. Both jobs have output similar to the following in the console:

      Resuming build at Thu Apr 13 15:01:50 EDT 2017 after Jenkins restart
      Waiting to resume part of <job name...>: ???
      Ready to run at Thu Apr 13 15:01:51 EDT 2017

       

      However this text has been displayed for several minutes now with no obvious indication on what the job is waiting for. We can see that the pipeline is still running on the correct executor that it was running on pre-restart however, if we log into the server, there is no durable task or process of the script that the sh() step was running. From logging of the script that we were running, we can tell that the command did finish successfully but can't understand how Jenkins lost track of it. From the logging, the time when the command finished was around the same time when the master was restarting (it is difficult to pinpoint exactly). 

        Attachments

          Issue Links

            Activity

            Hide
            papanito papanito added a comment - - edited

            I face the same (or similar) issue. I actually get this:

            Resuming build at Mon May 11 01:26:46 CEST 2020 after Jenkins restart
            Waiting to resume part of Delivery Pipelines » mdp-delivery-pipeline » master mdp-release-1.5.52#23: In the quiet period. Expires in 0 ms
            [Pipeline] End of Pipeline
            [Bitbucket] Notifying commit build result
            [Bitbucket] Build result notified
            

            jenkins.log

            We are using Jenkins ver. 2.222.1

            Show
            papanito papanito added a comment - - edited I face the same (or similar) issue. I actually get this: Resuming build at Mon May 11 01:26:46 CEST 2020 after Jenkins restart Waiting to resume part of Delivery Pipelines » mdp-delivery-pipeline » master mdp-release-1.5.52#23: In the quiet period. Expires in 0 ms [Pipeline] End of Pipeline [Bitbucket] Notifying commit build result [Bitbucket] Build result notified jenkins.log We are using Jenkins ver. 2.222.1
            Hide
            ataylor Alex Taylor added a comment -

            papanito This issue is for a pipeline which is hung waiting to restart and not a build which failed immediately after restart. If you feel there is an error after the build resumed then please create a new issue as your listed problem has nothing to do with the current jira.

            Additionally if you want help diagnosing the problem you will need to attached a full build folder to that new jira case as that is where the information about why the build stopped will be located. But just based on that very short log, it seems to be operating correctly so I am not clear on why you believe it to be a failure.

            Show
            ataylor Alex Taylor added a comment - papanito This issue is for a pipeline which is hung waiting to restart and not a build which failed immediately after restart. If you feel there is an error after the build resumed then please create a new issue as your listed problem has nothing to do with the current jira. Additionally if you want help diagnosing the problem you will need to attached a full build folder to that new jira case as that is where the information about why the build stopped will be located. But just based on that very short log, it seems to be operating correctly so I am not clear on why you believe it to be a failure.
            Hide
            ataylor Alex Taylor added a comment -

            This issue is being marked fixed as it was originally reported for a durable task plugin issue which has since been fixed and released.

            If people are seeing similar issues in later versions of Jenkins, please open a new case and maybe mention it is similar to this one.

            Additionally if you are experiencing this issue on a particular build, please attach the full build folder zipped up as that will contain all the relevant data

            Show
            ataylor Alex Taylor added a comment - This issue is being marked fixed as it was originally reported for a durable task plugin issue which has since been fixed and released. If people are seeing similar issues in later versions of Jenkins, please open a new case and maybe mention it is similar to this one. Additionally if you are experiencing this issue on a particular build, please attach the full build folder zipped up as that will contain all the relevant data
            Hide
            papanito papanito added a comment -
            Show
            papanito papanito added a comment - JENKINS-62248
            Hide
            fredericmeyrou Frédéric Meyrou added a comment - - edited

            Dear,

            I have a very similar issue, but my Jenkins LTS version and plugins are now all up-to-date.

            After a difficult restart I have many jobs pending with the following kind of message :

            00:00:00.008 Started by timer
            00:00:00.219 Opening connection to http://jirasvnprod.agfahealthcare.com/svn/idrg/diagnosis-coding/
            00:00:37.968 Obtained Jenkinsfile_PROPERTIES from 119148
            00:00:37.968 Running in Durability level: MAX_SURVIVABILITY
            00:00:47.292 [Pipeline] Start of Pipeline
            00:01:53.178 [Pipeline] node
            00:02:08.424 Still waiting to schedule task
            00:02:08.425 All nodes of label ‘SHARED&&BORDEAUX&&WINDOWS64’ are offline (>>> ACTUALLY they are online!)
            00:52:09.681 Ready to run at Sun Nov 15 17:54:08 CET 2020
            00:52:09.681 Resuming build at Sun Nov 15 17:54:08 CET 2020 after Jenkins restart
            18:54:07.898 Ready to run at Mon Nov 16 11:56:06 CET 2020
            18:54:07.898 Resuming build at Mon Nov 16 11:56:06 CET 2020 after Jenkins restart

            >>> We are now the 18th! 

            Do you guys have a console groovy script to ends all thoses Jobs (I have more then 500 of them on a platform 10K Jobs)
            I need to scan all Jobs i this situation and kill them.

            Any help apreciated.

            ./Fred

             

            Show
            fredericmeyrou Frédéric Meyrou added a comment - - edited Dear, I have a very similar issue, but my Jenkins LTS version and plugins are now all up-to-date. After a difficult restart I have many jobs pending with the following kind of message : 00:00:00.008 Started by timer 00:00:00.219 Opening connection to http://jirasvnprod.agfahealthcare.com/svn/idrg/diagnosis-coding/ 00:00:37.968 Obtained Jenkinsfile_PROPERTIES from 119148 00:00:37.968 Running in Durability level: MAX_SURVIVABILITY 00:00:47.292 [Pipeline] Start of Pipeline 00:01:53.178 [Pipeline] node 00:02:08.424 Still waiting to schedule task 00:02:08.425 All nodes of label ‘SHARED&&BORDEAUX&&WINDOWS64’ are offline (>>> ACTUALLY they are online!) 00:52:09.681 Ready to run at Sun Nov 15 17:54:08 CET 2020 00:52:09.681 Resuming build at Sun Nov 15 17:54:08 CET 2020 after Jenkins restart 18:54:07.898 Ready to run at Mon Nov 16 11:56:06 CET 2020 18:54:07.898 Resuming build at Mon Nov 16 11:56:06 CET 2020 after Jenkins restart >>> We are now the 18th!  Do you guys have a console groovy script to ends all thoses Jobs ( I have more then 500 of them on a platform 10K Jobs ) I need to scan all Jobs i this situation and kill them. Any help apreciated. ./Fred  

              People

              Assignee:
              ataylor Alex Taylor
              Reporter:
              elatt Erik Lattimore
              Votes:
              13 Vote for this issue
              Watchers:
              27 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: