Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-51568

Pipeline jobs hanging in Build Executor even if it is finished

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • None
    • jenkins 2.89.4
      EL7

      We have huge Jenkins instance, which runs about 20k builds a day.

      At some moment after couple days without restart of jenkins master, pipeline jobs starts to hang executors after build finish. Freestyle and maven jobs works fine.
      Busy executor looks like:

      But build status is "finished":

      There are records about start and finish build in jenkins.log, but executor wasn't released at May 28, 2018 4:35:14 PM:

      May 28, 2018 4:28:07 PM org.jenkinsci.plugins.scriptsecurity.sandbox.groovy.SandboxResolvingClassLoader$4$1 load
      WARNING: took 5,770ms to load/not load groovy.lang.GroovyObject$groovy$util$script15275137998231310805619$SSH_LOGIN from classLoader hudson.PluginManager$UberClassLoader
      2018/05/28 05:07:798 - job/KKA/job/TRIGGER_JOB_NEW_BUILD_IN_NEXUS_FLAG/ #4533 Started by timer
      May 28, 2018 4:28:07 PM org.jenkinsci.plugins.scriptsecurity.sandbox.groovy.SandboxResolvingClassLoader$4$1 load
      WARNING: took 6,783ms to load/not load groovy.lang.GroovyObject$groovy$util$script15275138109721310805619$SSH_LOGIN from classLoader hudson.PluginManager$UberClassLoader
      
      ...
      
      WARNING: Owner[PPRBTEAM/archive/Deimos/deimos-module-list-wf/23333:PPRBTEAM/archive/Deimos/deimos-module-list-wf #23333] was not in the list to begin with: [Owner[GringoTesting/Try/1:GringoTesting/Try #1], Owner[PPRB_DevOps/Install_EIP/Install_EIP_2_cli/211:PPRB_DevOps/Install_EIP/Install_EIP_2_cli #211], Owner[AutoTransaction/AutoTransaction_release_major-2018-04-30_deprecated/275:AutoTransaction/AutoTransaction_release_major-2018-04-30_deprecated #275], Owner[DataFactory/AHD/Pipeline_dev/49:DataFactory/AHD/Pipeline_dev #49], Owner[CSUO/PipelineBadCode/19:CSUO/PipelineBadCode #19], Owner[PPRB_DevOps/Install_EIP/Install_EIP_2_cli/616:PPRB_DevOps/Install_EIP/Install_EIP_2_cli #616], Owner[KKMB/bundle-barriers/barrier-sbof-commondealinf-r1.28.0/7:KKMB/bundle-barriers/barrier-sbof-commondealinf-r1.28.0 #7], Owner[ESB/FS/FS_CI_RFC_PR_8/3060:ESB/FS/FS_CI_RFC_PR_8 #3060], Owner[TFS/TestJobs/integrationOnly/809:TFS/TestJobs/integrationOnly #809], Owner[PPRB_DevOps/KBT/Install_KBT(DEV)_ear/265:PPRB_DevOps/KBT/Install_KBT(DEV)_ear #265], Owner[MBP/mbp-ci/123:MBP/mbp-ci #123], Owner[PRCRED/EKB3/loans-for-persons-ekb3-pipeline___force-WF-and-envelope_Stend1_dev/338:PRCRED/EKB3/loans-for-persons-ekb3-pipeline___force-WF-and-envelope_Stend1_dev #338], Owner[Kalita/tmp/test_pr/46:Kalita/tmp/test_pr #46], Owner[Kalita/tmp/test_pr/47:Kalita/tmp/test_pr #47], Owner[ASKOO/koo-release-build/feature%2F513/8:ASKOO/koo-release-build/feature%2F513 #8], Owner[ASKOO/koo-release-build/develop/117:ASKOO/koo-release-build/develop #117], Owner[ASKOO/koo-release-build/feature%2F513/9:ASKOO/koo-release-build/feature%2F513 #9], Owner[ASKOO/koo-release-build/develop/118:ASKOO/koo-release-build/develop #118], Owner[GBK/QG_check_minor250518/205:GBK/QG_check_minor250518 #205], Owner[GBK/QG_Check_dev/345:GBK/QG_Check_dev #345], Owner[PRCRED/EKB3/loans-for-persons-ekb3-pipeline___force-WF-and-envelope_Stend1_dev/339:PRCRED/EKB3/loans-for-persons-ekb3-pipeline___force-WF-and-envelope_Stend1_dev #339], Owner[GateWayDP/Gateways/Gateway_ESBGW_CI_PIPELINE/363:GateWayDP/Gateways/Gateway_ESBGW_CI_PIPELINE #363], Owner[DataFactory/stork/Build_PullRequest/524:DataFactory/stork/Build_PullRequest #524], Owner[KKA/KKA_PIPE_CI/352:KKA/KKA_PIPE_CI #352], Owner[KKA/KKA_PIPE_DEPLOY/154:KKA/KKA_PIPE_DEPLOY #154], Owner[GBK/QG_check_minor250518/206:GBK/QG_check_minor250518 #206], Owner[PPRB_DepositCashOperations/card/BuildDistr_Develop_Nexus_Prod_QG/57:PPRB_DepositCashOperations/card/BuildDistr_Develop_Nexus_Prod_QG #57], Owner[PRCRED/CBIR/AUTODEPLOY_PIPE_KENNY/33:PRCRED/CBIR/AUTODEPLOY_PIPE_KENNY #33], Owner[Kalita/Regular_clt_dev_builds/295:Kalita/Regular_clt_dev_builds #295], Owner[DataFactory/stork/Build_PullRequest/525:DataFactory/stork/Build_PullRequest #525], Owner[Kalita/tmp/test_pr/48:Kalita/tmp/test_pr #48], Owner[AEP/AEP_QG/806:AEP/AEP_QG #806], Owner[SNUiL/DevOps/CI-Builds/CI-Build-PIR-29/549:SNUiL/DevOps/CI-Builds/CI-Build-PIR-29 #549], Owner[GBK/QG_Check_dev/346:GBK/QG_Check_dev #346], Owner[GBK/QG_check_minor250518/207:GBK/QG_check_minor250518 #207], Owner[ECOD/DEVELOP/NexusArtifactFlag/1:ECOD/DEVELOP/NexusArtifactFlag #1], Owner[Kalita/Parallel_Pipeline_2/1003:Kalita/Parallel_Pipeline_2 #1003], Owner[HPSM/HPSM_pipeline/403:HPSM/HPSM_pipeline #403], Owner[ESB_CF/IIB9_PIPELINE/3751:ESB_CF/IIB9_PIPELINE #3751], Owner[DataFactory/stork/Build_PullRequest/526:DataFactory/stork/Build_PullRequest #526], Owner[PPRB_DepositCashOperations/card/BuildDistr_Develop_Nexus_Prod_QG/58:PPRB_DepositCashOperations/card/BuildDistr_Develop_Nexus_Prod_QG #58], Owner[DevOps/AHD/sleep-test/2:DevOps/AHD/sleep-test #2], Owner[ESB_SMP/FullBuild/484:ESB_SMP/FullBuild #484], Owner[DataFactory/stork/Build_PullRequest/527:DataFactory/stork/Build_PullRequest #527], Owner[SBK/SMARTREGRESS_TEST_IFT1/4:SBK/SMARTREGRESS_TEST_IFT1 #4], Owner[impprb/checkQG/249:impprb/checkQG #249], Owner[DevOps/AHD/sleep-test/4:DevOps/AHD/sleep-test #4], Owner[TFS/PRBuilders/BuilderPRForCore/903:TFS/PRBuilders/BuilderPRForCore #903], Owner[HPSM/HPSM_pipeline/404:HPSM/HPSM_pipeline #404], Owner[FCCM8/regular_release/41:FCCM8/regular_release #41], Owner[ESB/DevOps/Other/AUTOTESTS/DO_PR_INIT/380:ESB/DevOps/Other/AUTOTESTS/DO_PR_INIT #380], Owner[PPRB_DevOps/KBT/Install_KBT(DEV)_ear/268:PPRB_DevOps/KBT/Install_KBT(DEV)_ear #268], Owner[DataFactory/stork/Build_PullRequest/528:DataFactory/stork/Build_PullRequest #528], Owner[SNUiL/DevOps/CI-Deploy/CI-Deploy-to-testing-from-git/389:SNUiL/DevOps/CI-Deploy/CI-Deploy-to-testing-from-git #389], Owner[ASCC/ascc_full_RELEASE/02.013.00_STG-19937_jenkins_release_job/41:ASCC/ascc_full_RELEASE/02.013.00_STG-19937_jenkins_release_job #41], Owner[ASKOO/koo-release-build/feature%2F253/29:ASKOO/koo-release-build/feature%2F253 #29], Owner[GateWayDP/Gateways/Gateway_EDOGO_CI_PIPELINE/334:GateWayDP/Gateways/Gateway_EDOGO_CI_PIPELINE #334], Owner[DEPOZITORY/PB/69:DEPOZITORY/PB #69], Owner[DataFactory/stork/Auto_Test_DEV/2115:DataFactory/stork/Auto_Test_DEV #2115], Owner[GBK/QG_check_minor250518/208:GBK/QG_check_minor250518 #208], Owner[TDS/GREEN_AN_GREEN/87:TDS/GREEN_AN_GREEN #87], Owner[PPRB_DepositCashOperations/common/Publish_to_IFT_universal/320:PPRB_DepositCashOperations/common/Publish_to_IFT_universal #320], Owner[PPRB_DevOps/Quality_Gate_pipes/Universal_Quality_Gate_pipe/9156:PPRB_DevOps/Quality_Gate_pipes/Universal_Quality_Gate_pipe #9156], Owner[ASCC/server1/02.013.00/1496:ASCC/server1/02.013.00 #1496], Owner[ASCC/ascc_server_branch_build/02.013.00_STG-19937_jenkins_release_job/37:ASCC/ascc_server_branch_build/02.013.00_STG-19937_jenkins_release_job #37], Owner[ESB/FS/Meshkov/FS_CI_RFC_tst/763:ESB/FS/Meshkov/FS_CI_RFC_tst #763], Owner[PPRB_DevOps/Quality_Gate_pipes/Universal_Quality_Gate_pipe/9157:PPRB_DevOps/Quality_Gate_pipes/Universal_Quality_Gate_pipe #9157], Owner[mmt/DEV/1051:mmt/DEV #1051], Owner[ASKOO/koo-release-build/support%2F02.021/18:ASKOO/koo-release-build/support%2F02.021 #18], Owner[Kalita/Parallel_Pipeline_2/1005:Kalita/Parallel_Pipeline_2 #1005], Owner[DataFactory/stork/Build_PullRequest/529:DataFactory/stork/Build_PullRequest #529], Owner[SNUiL/DevOps/CI-Builds/CI-Build-SNUILDEV-3794-COURIER/249:SNUiL/DevOps/CI-Builds/CI-Build-SNUILDEV-3794-COURIER #249], Owner[ASBS/buildByPipeline/532:ASBS/buildByPipeline #532], Owner[ASCC/ascc_server_branch_build/02.014.00_STG-18499_CompositeOutCashOrders/14:ASCC/ascc_server_branch_build/02.014.00_STG-18499_CompositeOutCashOrders #14], Owner[adpSWIFT/adpSWIFT_PIPELINE/3102:adpSWIFT/adpSWIFT_PIPELINE #3102], Owner[ASCC/server1/02.014.00/407:ASCC/server1/02.014.00 #407], Owner[ESB/DevOps/Dev/ESB_KF_CI00223537/PartialESBInstall/33:ESB/DevOps/Dev/ESB_KF_CI00223537/PartialESBInstall #33], Owner[PPRB_DevOps/Quality_Gate_pipes/Universal_Quality_Gate_pipe/9160:PPRB_DevOps/Quality_Gate_pipes/Universal_Quality_Gate_pipe #9160], Owner[ESB/FS/FS_PR_INIT/24747:ESB/FS/FS_PR_INIT #24747], Owner[PPRB.OIP/kbt-scripts/ucp-corp/359:PPRB.OIP/kbt-scripts/ucp-corp #359], Owner[PPRB_DevOps/Install_EIP/Install_EIP_2_cli/733:PPRB_DevOps/Install_EIP/Install_EIP_2_cli #733], Owner[edosgo/elgo-mvd-clientverify/elgo-mvd-clientverify2/38:edosgo/elgo-mvd-clientverify/elgo-mvd-clientverify2 #38], Owner[PPRB_DevOps/Install_EIP/Install_EIP_2_cli/734:PPRB_DevOps/Install_EIP/Install_EIP_2_cli #734], Owner[MBP/mbp-ci/140:MBP/mbp-ci #140], Owner[PPRBDOC/upload_to_pipe/deprecated/QualityGateOnOurPipe/QualityGate-Order/50:PPRBDOC/upload_to_pipe/deprecated/QualityGateOnOurPipe/QualityGate-Order #50], Owner[ESB/FS/FS_CI_RFC/3478:ESB/FS/FS_CI_RFC #3478], Owner[DataFactory/stork/Build_required_distrib/122:DataFactory/stork/Build_required_distrib #122], Owner[TDS/Update_stand_by_url/920:TDS/Update_stand_by_url #920], Owner[CBDBO/Pipeline/2871:CBDBO/Pipeline #2871], Owner[PRCRED/EKB3/loans-for-persons-ekb3-pipeline___force-WF-and-envelope_Stend2_major-2-2018-05-27/315:PRCRED/EKB3/loans-for-persons-ekb3-pipeline___force-WF-and-envelope_Stend2_major-2-2018-05-27 #315], Owner[edosgo/elgo-fns-clientverify/elgo-fns-clientverify-release/193:edosgo/elgo-fns-clientverify/elgo-fns-clientverify-release #193], Owner[DepositPfETL/deposit-client-validation-pipeline-parameters/19:DepositPfETL/deposit-client-validation-pipeline-parameters #19], Owner[mmt/TEST/1:mmt/TEST #1], Owner[edosgo/elgo-remote-starter/323:edosgo/elgo-remote-starter #323], Owner[ESB/DevOps/Other/AUTOTESTS/PartialESBRestoreExGroup/302:ESB/DevOps/Other/AUTOTESTS/PartialESBRestoreExGroup #302], Owner[PPRB_DevOps/Install_EIP/Install_EIP_2_cli/735:PPRB_DevOps/Install_EIP/Install_EIP_2_cli #735], Owner[PPRBTEAM/Gera/gera-autodeploy-wf/116:PPRBTEAM/Gera/gera-autodeploy-wf #116], Owner[EKPiT/deploy-db-dev1/100:EKPiT/deploy-db-dev1 #100], Owner[TDS/PR_pipeline/903:TDS/PR_pipeline #903], Owner[edosgo/elgo-msh-reestrcontract/elgo-msh-reestrcontract-dev-barrier-pipeline/14:edosgo/elgo-msh-reestrcontract/elgo-msh-reestrcontract-dev-barrier-pipeline #14], Owner[ESB/FS/FS_CI_INIT/3648:ESB/FS/FS_CI_INIT #3648], Owner[KKA/TRIGGER_JOB_NEW_BUILD_IN_NEXUS/4907:KKA/TRIGGER_JOB_NEW_BUILD_IN_NEXUS #4907], Owner[edosgo/elgo-msh-reestrcontract/elgo-msh-reestrcontract-dev-barrier-deploy/21:edosgo/elgo-msh-reestrcontract/elgo-msh-reestrcontract-dev-barrier-deploy #21], Owner[Tengri/HDPLocalCiInsallation/151:Tengri/HDPLocalCiInsallation #151], Owner[mgr/API/PUBLISH_ALL/68435:mgr/API/PUBLISH_ALL #68435], Owner[EPS/Main_Pre_Build_Distr_DevOps2018_Pipeline/6405:EPS/Main_Pre_Build_Distr_DevOps2018_Pipeline #6405], Owner[KKA/TRIGGER_JOB_NEW_BUILD_IN_NEXUS_FLAG/4533:KKA/TRIGGER_JOB_NEW_BUILD_IN_NEXUS_FLAG #4533], Owner[PPRBCPRB/Pipeline_Server_Build_For_Dev_Server/10969:PPRBCPRB/Pipeline_Server_Build_For_Dev_Server #10969], Owner[PPRB_CEP/PSI_TEST_Pipe/7763:PPRB_CEP/PSI_TEST_Pipe #7763]]
      ...
      
      May 28, 2018 4:35:14 PM org.jenkinsci.plugins.workflow.job.WorkflowRun finish
      INFO: KKA/TRIGGER_JOB_NEW_BUILD_IN_NEXUS_FLAG #4533 completed: SUCCESS
      
      ....

      As a result, build queue grows since all avaliable executors are busy.

      Some workaround to defer restart: periodically run script to free executors:

      import hudson.model.*;
      
      nodes = jenkins.model.Jenkins.instance.nodes
      nodes.removeAll(Collections.singleton(null))
      
      nodes.each { node ->
      manager.listener.logger.println("-------PROCESSING NODE: $node.displayName -------------------")
      def exec = node.toComputer()
      if (exec == null) {	
      manager.listener.logger.println("------- WARNING: $node.displayName: NULL. Removing! -------------------")
      Jenkins.instance.removeNode(node)
      return;
      }
      
      exec.getExecutors().each { job ->
      if (job.busy && job.progress == -1) {
      manager.listener.logger.println("JOB $job.name LOOKS LIKE STUCK. KILLING.")
      def owner = job.owner
      owner.removeExecutor((hudson.model.Executor) job)
      }
      }
      }
      
      return null
      
       

        1. thread_dump_avaneesh.txt
          543 kB
        2. NormalCase.PNG
          NormalCase.PNG
          28 kB
        3. BusyTimers.png
          BusyTimers.png
          37 kB
        4. image-2018-11-27-11-32-58-457.png
          image-2018-11-27-11-32-58-457.png
          230 kB
        5. image-2018-11-27-11-31-06-881.png
          image-2018-11-27-11-31-06-881.png
          462 kB
        6. thread_dump.html
          490 kB
        7. sleep-test-script.PNG
          sleep-test-script.PNG
          12 kB
        8. build-pipeline-steps.PNG
          build-pipeline-steps.PNG
          44 kB
        9. finished-build.PNG
          finished-build.PNG
          66 kB
        10. stuck-executor.png
          stuck-executor.png
          4 kB

          [JENKINS-51568] Pipeline jobs hanging in Build Executor even if it is finished

          Alexander Moiseenko added a comment - - edited

          At the same time I see, that sleep step is not working properly too:

          simple job

          node('Linux_Default') {
              sleep time: 5, unit: 'SECONDS'
              echo "Well done!"
          }
          

          runs for hours, and starts to work properly after restart.

          Alexander Moiseenko added a comment - - edited At the same time I see, that sleep step is not working properly too: simple job node( 'Linux_Default' ) {     sleep time: 5, unit: 'SECONDS'     echo "Well done!" } runs for hours, and starts to work properly after restart.

          possibly links to JENKINS-46283

          Alexander Moiseenko added a comment - possibly links to  JENKINS-46283

          Sam Van Oort added a comment -

          dnusbaum Would you be able to take a peek at this please?

          Sam Van Oort added a comment - dnusbaum Would you be able to take a peek at this please?

          Devin Nusbaum added a comment - - edited

          brainsam I thought this might be a dupe of JENKINS-45571, but in your case it looks like the stuck executors are full Executors running on build agents rather than the flyweight executors that run on the master, so it seems like it might be something else.

          What versions of the Pipeline Groovy Plugin, Pipeline Job Plugin, Durable Task Plugin, and the Pipeline Nodes and Processes Plugin are you running?

          EDIT: Also, how is the Linux_default agent configured inside of Jenkins (i.e. SSH, EC2, JNLP, etc.)?

          Devin Nusbaum added a comment - - edited brainsam I thought this might be a dupe of JENKINS-45571 , but in your case it looks like the stuck executors are full Executors running on build agents rather than the flyweight executors that run on the master, so it seems like it might be something else. What versions of the Pipeline Groovy Plugin, Pipeline Job Plugin, Durable Task Plugin, and the Pipeline Nodes and Processes Plugin are you running? EDIT: Also, how is the Linux_default agent configured inside of Jenkins (i.e. SSH, EC2, JNLP, etc.)?

          Hello.

          Same problem again, our new env:

          Jenkins 2.121.1, 

          Pipeline Groovy 2.55

          Pipeline: job 2.26

          Durable Task Plugin: 1.26

          Pipeline Nodes and Processes: 2.22

           

          workaround script doesn't help anymore, after durable task plugin update probably.

           

          We've created simple pipeline job with `sleep` step to check if problem have reappered again: 

          pipeline {
             
              options {
                  timeout(time: 60, unit: 'SECONDS')
              }
                 
              stages {
                  
                  stage('sleep') {
                      steps {
                          sleep 5
                      }
                  }
              }
          }

          which finishes successfully in normal case and fails when executors gets stuck

          Alexander Moiseenko added a comment - Hello. Same problem again, our new env: Jenkins 2.121.1,  Pipeline Groovy 2.55 Pipeline: job 2.26 Durable Task Plugin: 1.26 Pipeline Nodes and Processes: 2.22   workaround script doesn't help anymore, after durable task plugin update probably.   We've created simple pipeline job with `sleep` step to check if problem have reappered again:  pipeline { options { timeout(time: 60, unit: 'SECONDS' ) } stages { stage( 'sleep' ) { steps { sleep 5 } } } } which finishes successfully in normal case and fails when executors gets stuck

          Sam Van Oort added a comment - - edited

          jglick Could you please take a look? Appears that the latest comment may reflect a regression due to controller.watch API.

          Edit: though it's not entirely clear from context

          Sam Van Oort added a comment - - edited jglick Could you please take a look? Appears that the latest comment may reflect a regression due to controller.watch API. Edit: though it's not entirely clear from context

          Jesse Glick added a comment -

          I am confused by the relationship between your screenshots and the thread dump. build-pipeline-steps.PNG and finished-build.PNG display build #4533. stuck-executor.PNG displays build #4908 running on a one-executor agent jenkins-agent-linux-008, and thread-dump.txt indicates that this agent is currently processing a (Git) checkout. They are not even builds of the same job: one is of KKA/TRIGGER_JOB_NEW_BUILD_IN_NEXUS_FLAG, the other KKA/TRIGGER_JOB_NEW_BUILD_IN_NEXUS. Nothing about this seems improper—you have some running builds, and some completed builds. Maybe I am missing something, or maybe you chose the wrong attachments.

          I do see one anomalous thing in the thread dump: most of the pool threads for DurableTaskStep retain a Thread.name set in this block even after the block has completed and the thread is parked; WithThreadName ought to be resetting the name reliably, even before the re-schedule call. The code which adds the waiting for JNLP4-connect connection from … suffix to the thread name is here, which also looks like it should be cleaning up properly. I have no explanation for this bug, though I also see no reason to think it is related to your problem.

          Jesse Glick added a comment - I am confused by the relationship between your screenshots and the thread dump. build-pipeline-steps.PNG and finished-build.PNG display build #4533. stuck-executor.PNG displays build #4908 running on a one-executor agent jenkins-agent-linux-008 , and thread-dump.txt indicates that this agent is currently processing a (Git) checkout . They are not even builds of the same job : one is of KKA/TRIGGER_JOB_NEW_BUILD_IN_NEXUS_FLAG , the other KKA/TRIGGER_JOB_NEW_BUILD_IN_NEXUS . Nothing about this seems improper—you have some running builds, and some completed builds. Maybe I am missing something, or maybe you chose the wrong attachments. I do see one anomalous thing in the thread dump: most of the pool threads for DurableTaskStep retain a Thread.name set in this block even after the block has completed and the thread is parked; WithThreadName ought to be resetting the name reliably, even before the re- schedule call . The code which adds the waiting for JNLP4-connect connection from … suffix to the thread name is here , which also looks like it should be cleaning up properly. I have no explanation for this bug, though I also see no reason to think it is related to your problem.

          I seem to have the same problem, so I created a job with the simple sleep 5 pipeline above and setup a cron job to check if the job has failed to finish recently then grab a thread dump and restart jenkins. I also noticed the test job starts taking longer and longer in the sleep step.

           

          David van Laatum added a comment - I seem to have the same problem, so I created a job with the simple sleep 5 pipeline above and setup a cron job to check if the job has failed to finish recently then grab a thread dump and restart jenkins. I also noticed the test job starts taking longer and longer in the sleep step.  

          Alexander Moiseenko added a comment - - edited

          I think we've found root cause of a problem. Broken `sleep` and `timeout` methods and executor hanging all relates to jenkins.util.Timer threads. 

          During hang state threads looks like:

          And in normal case:

           

          In [^thread-dump.txt] we can see jenkins.util.Timer stack traces, in our case the root cause was logfilesizechecker plugin, https://github.com/jenkinsci/logfilesizechecker-plugin/blob/master/src/main/java/hudson/plugins/logfilesizechecker/LogfilesizecheckerWrapper.java#L78

          that uses timer every second to check log size and this operation creates additional cpu load:

          "jenkins.util.Timer [#9]" - Thread t@267
             java.lang.Thread.State: RUNNABLE
                  at java.util.concurrent.ConcurrentSkipListMap.cpr(ConcurrentSkipListMap.java:655)
                  at java.util.concurrent.ConcurrentSkipListMap.findPredecessor(ConcurrentSkipListMap.java:682)
                  at java.util.concurrent.ConcurrentSkipListMap.doGet(ConcurrentSkipListMap.java:781)
                  at java.util.concurrent.ConcurrentSkipListMap.get(ConcurrentSkipListMap.java:1546)
                  at jenkins.model.Nodes.getNode(Nodes.java:295)
                  at jenkins.model.Jenkins.getNode(Jenkins.java:2058)
                  at hudson.model.Computer.getNode(Computer.java:590)
                  at hudson.slaves.SlaveComputer.getNode(SlaveComputer.java:199)
                  at hudson.slaves.SlaveComputer.getNode(SlaveComputer.java:96)
                  at jenkins.model.Jenkins$7.compare(Jenkins.java:1927)
                  at jenkins.model.Jenkins$7.compare(Jenkins.java:1925)
                  at java.util.TimSort.countRunAndMakeAscending(TimSort.java:360)
                  at java.util.TimSort.sort(TimSort.java:234)
                  at java.util.Arrays.sort(Arrays.java:1438)
                  at jenkins.model.Jenkins.getComputers(Jenkins.java:1925)
                  at hudson.model.Executor.of(Executor.java:941)
                  at hudson.model.Run.getExecutor(Run.java:530)
                  at hudson.plugins.logfilesizechecker.LogfilesizecheckerWrapper$LogSizeTimerTask.doRun(LogfilesizecheckerWrapper.java:107)
                  at hudson.triggers.SafeTimerTask.run(SafeTimerTask.java:51)
                  at jenkins.security.ImpersonatingScheduledExecutorService$1.run(ImpersonatingScheduledExecutorService.java:58)
                  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
                  at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
                  at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
                  at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
                  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
                  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
                  at java.lang.Thread.run(Thread.java:748)   Locked ownable synchronizers:
                  - locked <4dd03582> (a java.util.concurrent.ThreadPoolExecutor$Worker) 

           

          Our jenkins.util.Timer threads are mostly in park state and sleep works fine, since we've changed DELAY value from 1 second to 10 in https://github.com/jenkinsci/logfilesizechecker-plugin/blob/master/src/main/java/hudson/plugins/logfilesizechecker/LogfilesizecheckerWrapper.java#L46

           

           

          Alexander Moiseenko added a comment - - edited I think we've found root cause of a problem. Broken `sleep` and `timeout` methods and executor hanging all relates to jenkins.util.Timer threads.  During hang state threads looks like: And in normal case:   In  [^thread-dump.txt]  we can see jenkins.util.Timer stack traces, in our case the root cause was logfilesizechecker plugin, https://github.com/jenkinsci/logfilesizechecker-plugin/blob/master/src/main/java/hudson/plugins/logfilesizechecker/LogfilesizecheckerWrapper.java#L78 that uses timer every second to check log size and this operation creates additional cpu load: "jenkins.util.Timer [#9]" - Thread t@267 java.lang.Thread.State: RUNNABLE at java.util.concurrent.ConcurrentSkipListMap.cpr(ConcurrentSkipListMap.java:655) at java.util.concurrent.ConcurrentSkipListMap.findPredecessor(ConcurrentSkipListMap.java:682) at java.util.concurrent.ConcurrentSkipListMap.doGet(ConcurrentSkipListMap.java:781) at java.util.concurrent.ConcurrentSkipListMap.get(ConcurrentSkipListMap.java:1546) at jenkins.model.Nodes.getNode(Nodes.java:295) at jenkins.model.Jenkins.getNode(Jenkins.java:2058) at hudson.model.Computer.getNode(Computer.java:590) at hudson.slaves.SlaveComputer.getNode(SlaveComputer.java:199) at hudson.slaves.SlaveComputer.getNode(SlaveComputer.java:96) at jenkins.model.Jenkins$7.compare(Jenkins.java:1927) at jenkins.model.Jenkins$7.compare(Jenkins.java:1925) at java.util.TimSort.countRunAndMakeAscending(TimSort.java:360) at java.util.TimSort.sort(TimSort.java:234) at java.util.Arrays.sort(Arrays.java:1438) at jenkins.model.Jenkins.getComputers(Jenkins.java:1925) at hudson.model.Executor.of(Executor.java:941) at hudson.model.Run.getExecutor(Run.java:530) at hudson.plugins.logfilesizechecker.LogfilesizecheckerWrapper$LogSizeTimerTask.doRun(LogfilesizecheckerWrapper.java:107) at hudson.triggers.SafeTimerTask.run(SafeTimerTask.java:51) at jenkins.security.ImpersonatingScheduledExecutorService$1.run(ImpersonatingScheduledExecutorService.java:58) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:748) Locked ownable synchronizers: - locked <4dd03582> (a java.util.concurrent.ThreadPoolExecutor$Worker)   Our jenkins.util.Timer threads are mostly in park state and sleep works fine, since we've changed DELAY value from 1 second to 10 in https://github.com/jenkinsci/logfilesizechecker-plugin/blob/master/src/main/java/hudson/plugins/logfilesizechecker/LogfilesizecheckerWrapper.java#L46    

          Jesse Glick added a comment -

          Reassigning acc. to diagnosis by original reporter. Other comments may well have completely unrelated issues.

          Jesse Glick added a comment - Reassigning acc. to diagnosis by original reporter. Other comments may well have completely unrelated issues.

          We also have a heap of jobs with log file size check on I have disabled to see if it stops happening

          David van Laatum added a comment - We also have a heap of jobs with log file size check on I have disabled to see if it stops happening

          seems to have fixed if for us too jenkins has been stable since I disabled the log file size check on all builds

          David van Laatum added a comment - seems to have fixed if for us too jenkins has been stable since I disabled the log file size check on all builds

          Hi David

          I got exactly the same problem. What was your workaround? Where did you set the log size check?

          Raphael Greger added a comment - Hi David I got exactly the same problem. What was your workaround? Where did you set the log size check?

          in the job config there is an option "Abort the build if its log file size is too big" from memory I used the configuration slicer plugin to remove it from all jobs

          David van Laatum added a comment - in the job config there is an option "Abort the build if its log file size is too big" from memory I used the configuration slicer plugin to remove it from all jobs

          Thank you David. I didn't find this configuration but I did some restriction with the log of job config history. Now the problem seems to be vanished. 

          Raphael Greger added a comment - Thank you David. I didn't find this configuration but I did some restriction with the log of job config history. Now the problem seems to be vanished. 

          li added a comment -

          look like the issue is still exist. is there any update? 2.289.1

          li added a comment - look like the issue is still exist. is there any update? 2.289.1

          Avaneesh added a comment - - edited

          Issue found on our Jenkins instance as well.

          We schedule more than 1k jobs per day. The jobs that are triggered by the user are completed normally and free up the executor. But the jobs that are triggered by the timer or upstream Jenkins project continue to hold the executor causing issues. The jobs don't go even if I manually kill them.

          Java Version:

          openjdk version "1.8.0_322"
          OpenJDK Runtime Environment (Zulu 8.60.0.21-CA-linux64) (build 1.8.0_322-b06)
          OpenJDK 64-Bit Server VM (Zulu 8.60.0.21-CA-linux64) (build 25.322-b06, mixed mode)

          Thread Dump: thread_dump_avaneesh.txt

          System Information:

          Jenkins: 2.332.1
          OS: Linux - 5.13.0-1019-aws
          ---
          Parameterized-Remote-Trigger:3.1.5.1
          PrioritySorter:4.1.0
          ace-editor:1.1
          ant:1.13
          antisamy-markup-formatter:2.7
          apache-httpcomponents-client-4-api:4.5.13-1.0
          artifact-manager-s3:617.vd98e61689f41
          artifactdeployer:1.2
          audit-trail:3.10
          authentication-tokens:1.4
          authorize-project:1.4.0
          aws-credentials:191.vcb_f183ce58b_9
          aws-global-configuration:1.7
          aws-java-sdk:1.12.163-315.v2b_716ec8e4df
          aws-java-sdk-cloudformation:1.12.163-315.v2b_716ec8e4df
          aws-java-sdk-codebuild:1.12.163-315.v2b_716ec8e4df
          aws-java-sdk-ec2:1.12.163-315.v2b_716ec8e4df
          aws-java-sdk-ecr:1.12.163-315.v2b_716ec8e4df
          aws-java-sdk-ecs:1.12.163-315.v2b_716ec8e4df
          aws-java-sdk-elasticbeanstalk:1.12.163-315.v2b_716ec8e4df
          aws-java-sdk-iam:1.12.163-315.v2b_716ec8e4df
          aws-java-sdk-logs:1.12.163-315.v2b_716ec8e4df
          aws-java-sdk-minimal:1.12.163-315.v2b_716ec8e4df
          aws-java-sdk-ssm:1.12.163-315.v2b_716ec8e4df
          blueocean:1.25.3
          blueocean-autofavorite:1.2.5
          blueocean-bitbucket-pipeline:1.25.3
          blueocean-commons:1.25.3
          blueocean-config:1.25.3
          blueocean-core-js:1.25.3
          blueocean-dashboard:1.25.3
          blueocean-display-url:2.4.1
          blueocean-events:1.25.3
          blueocean-git-pipeline:1.25.3
          blueocean-github-pipeline:1.25.3
          blueocean-i18n:1.25.3
          blueocean-jira:1.25.3
          blueocean-jwt:1.25.3
          blueocean-personalization:1.25.3
          blueocean-pipeline-api-impl:1.25.3
          blueocean-pipeline-editor:1.25.3
          blueocean-pipeline-scm-api:1.25.3
          blueocean-rest:1.25.3
          blueocean-rest-impl:1.25.3
          blueocean-web:1.25.3
          bootstrap4-api:4.6.0-3
          bootstrap5-api:5.1.3-6
          bouncycastle-api:2.25
          branch-api:2.7.0
          build-environment:1.7
          build-name-setter:2.2.0
          build-timeout:1.20
          caffeine-api:2.9.2-29.v717aac953ff3
          categorized-view:1.12
          checks-api:1.7.2
          cloud-stats:0.27
          cloudbees-bitbucket-branch-source:757.vddedc5f2589a_
          cloudbees-folder:6.714.v79e858ef76a_2
          command-launcher:1.6
          conditional-buildstep:1.4.2
          config-autorefresh-plugin:1.0
          config-file-provider:3.9.0
          configurationslicing:430.v966357576543
          copyartifact:1.46.3
          credentials:1074.v60e6c29b_b_44b_
          credentials-binding:1.27.1
          custom-view-tabs:1.3
          cvs:2.19
          display-url-api:2.3.6
          docker-commons:1.19
          docker-java-api:3.1.5.2
          docker-workflow:1.28
          durable-task:495.v29cd95ec10f2
          dynamic-axis:1.0.3
          ec2:1.68
          ec2-fleet:2.5.1
          echarts-api:5.3.0-2
          email-ext:2.87
          envinject-api:1.180.v98d833b_27470
          extended-read-permission:3.2
          extensible-choice-parameter:1.8.0
          external-monitor-job:191.v363d0d1efdf8
          extra-columns:1.25
          favorite:2.4.1
          favorite-view:1.0
          font-awesome-api:6.0.0-1
          fstrigger:1.00
          generic-webhook-trigger:1.83
          git:4.10.3
          git-client:3.11.0
          git-parameter:0.9.15
          git-server:1.10
          github:1.34.3
          github-api:1.301-378.v9807bd746da5
          github-branch-source:1583.v18d333ef7379
          gitlab-api:1.0.6
          gitlab-oauth:1.13
          gitlab-plugin:1.5.29
          gradle:1.38
          greenballs:1.15.1
          handlebars:3.0.8
          handy-uri-templates-2-api:2.1.8-1.0
          htmlpublisher:1.29
          jackson2-api:2.13.2-260.v43d711474c77
          javadoc:217.v905b_86277a_2a_
          javax-activation-api:1.2.0-2
          javax-mail-api:1.6.2-5
          jaxb:2.3.0.1
          jdk-tool:1.5
          jenkins-design-language:1.25.3
          jersey2-api:2.35-4
          jira:3.7
          jjwt-api:0.11.2-9.c8b45b8bb173
          jnr-posix-api:3.1.7-3
          jquery:1.12.4-1
          jquery-detached:1.2.1
          jquery-ui:1.0.2
          jquery3-api:3.6.0-2
          jsch:0.1.55.2
          junit:1.56
          ldap:2.8
          locale:144.v1a_998824ddb_3
          lockable-resources:2.14
          log-parser:2.2
          mailer:408.vd726a_1130320
          mapdb-api:1.0.9.0
          matrix-auth:2.6.7
          matrix-project:758.v7a_ea_491852f3
          maven-plugin:3.18
          mercurial:2.16
          metrics:4.1.6.1
          momentjs:1.1.1
          monitoring:1.90.0
          naginator:1.18.1
          nested-view:1.24
          next-build-number:1.8
          node-iterator-api:1.5.1
          nodelabelparameter:1.10.3
          okhttp-api:4.9.3-105.vb96869f8ac3a
          openstack-cloud:2.61
          pam-auth:1.7
          parameterized-trigger:2.44
          permissive-script-security:0.7
          pipeline-aws:1.43
          pipeline-build-step:2.16
          pipeline-github-lib:36.v4c01db_ca_ed16
          pipeline-graph-analysis:188.v3a01e7973f2c
          pipeline-input-step:446.vf27b_0b_83500e
          pipeline-milestone-step:100.v60a_03cd446e1
          pipeline-model-api:2.2064.v5eef7d0982b_e
          pipeline-model-declarative-agent:1.1.1
          pipeline-model-definition:2.2064.v5eef7d0982b_e
          pipeline-model-extensions:2.2064.v5eef7d0982b_e
          pipeline-rest-api:2.23
          pipeline-stage-step:291.vf0a8a7aeeb50
          pipeline-stage-tags-metadata:2.2064.v5eef7d0982b_e
          pipeline-stage-view:2.23
          pipeline-utility-steps:2.12.0
          plain-credentials:1.8
          plugin-util-api:2.15.0
          popper-api:1.16.1-2
          popper2-api:2.11.4-1
          pubsub-light:1.16
          purge-build-queue-plugin:24.v3e0e709b_f62e
          rebuild:1.33
          resource-disposer:0.17
          run-condition:1.5
          s3:0.12.1
          scm-api:595.vd5a_df5eb_0e39
          script-security:1145.vb_cf6cf6ed960
          secondary-timestamper-plugin:1.1
          slack:608.v19e3b_44b_b_9ff
          snakeyaml-api:1.29.1
          sse-gateway:1.25
          ssh:2.6.1
          ssh-agent:1.24.1
          ssh-credentials:1.19
          ssh-slaves:1.806.v2253cedd3295
          ssh-steps:2.0.0
          sshd:3.1.0
          structs:308.v852b473a2b8c
          subversion:2.15.3
          text-finder:1.18
          timestamper:1.17
          token-macro:285.vff7645a_56ff0
          trilead-api:1.0.13
          urltrigger:1.02
          variant:1.4
          view-job-filters:2.3
          windows-slaves:1.8
          workflow-aggregator:2.7
          workflow-api:1143.v2d42f1e9dea_5
          workflow-basic-steps:941.vdfe1b_a_132c64
          workflow-cps:2682.va_473dcddc941
          workflow-cps-global-lib:564.ve62a_4eb_b_e039
          workflow-durable-task-step:1128.v8c259d125340
          workflow-job:1174.vdcb_d054cf74a_
          workflow-multibranch:711.vdfef37cda_816
          workflow-scm-step:2.13
          workflow-step-api:622.vb_8e7c15b_c95a_
          workflow-support:815.vd60466279fc8
          ws-cleanup:0.40
          xtrigger-api:0.4 

          Avaneesh added a comment - - edited Issue found on our Jenkins instance as well. We schedule more than 1k jobs per day. The jobs that are triggered by the user are completed normally and free up the executor. But the jobs that are triggered by the timer or upstream Jenkins project continue to hold the executor causing issues. The jobs don't go even if I manually kill them. Java Version: openjdk version "1.8.0_322" OpenJDK Runtime Environment (Zulu 8.60.0.21-CA-linux64) (build 1.8.0_322-b06) OpenJDK 64-Bit Server VM (Zulu 8.60.0.21-CA-linux64) (build 25.322-b06, mixed mode) Thread Dump: thread_dump_avaneesh.txt System Information: Jenkins: 2.332.1 OS: Linux - 5.13.0-1019-aws --- Parameterized-Remote-Trigger:3.1.5.1 PrioritySorter:4.1.0 ace-editor:1.1 ant:1.13 antisamy-markup-formatter:2.7 apache-httpcomponents-client-4-api:4.5.13-1.0 artifact-manager-s3:617.vd98e61689f41 artifactdeployer:1.2 audit-trail:3.10 authentication-tokens:1.4 authorize-project:1.4.0 aws-credentials:191.vcb_f183ce58b_9 aws-global-configuration:1.7 aws-java-sdk:1.12.163-315.v2b_716ec8e4df aws-java-sdk-cloudformation:1.12.163-315.v2b_716ec8e4df aws-java-sdk-codebuild:1.12.163-315.v2b_716ec8e4df aws-java-sdk-ec2:1.12.163-315.v2b_716ec8e4df aws-java-sdk-ecr:1.12.163-315.v2b_716ec8e4df aws-java-sdk-ecs:1.12.163-315.v2b_716ec8e4df aws-java-sdk-elasticbeanstalk:1.12.163-315.v2b_716ec8e4df aws-java-sdk-iam:1.12.163-315.v2b_716ec8e4df aws-java-sdk-logs:1.12.163-315.v2b_716ec8e4df aws-java-sdk-minimal:1.12.163-315.v2b_716ec8e4df aws-java-sdk-ssm:1.12.163-315.v2b_716ec8e4df blueocean:1.25.3 blueocean-autofavorite:1.2.5 blueocean-bitbucket-pipeline:1.25.3 blueocean-commons:1.25.3 blueocean-config:1.25.3 blueocean-core-js:1.25.3 blueocean-dashboard:1.25.3 blueocean-display-url:2.4.1 blueocean-events:1.25.3 blueocean-git-pipeline:1.25.3 blueocean-github-pipeline:1.25.3 blueocean-i18n:1.25.3 blueocean-jira:1.25.3 blueocean-jwt:1.25.3 blueocean-personalization:1.25.3 blueocean-pipeline-api-impl:1.25.3 blueocean-pipeline-editor:1.25.3 blueocean-pipeline-scm-api:1.25.3 blueocean- rest :1.25.3 blueocean- rest -impl:1.25.3 blueocean-web:1.25.3 bootstrap4-api:4.6.0-3 bootstrap5-api:5.1.3-6 bouncycastle-api:2.25 branch-api:2.7.0 build-environment:1.7 build-name-setter:2.2.0 build-timeout:1.20 caffeine-api:2.9.2-29.v717aac953ff3 categorized-view:1.12 checks-api:1.7.2 cloud-stats:0.27 cloudbees-bitbucket-branch-source:757.vddedc5f2589a_ cloudbees-folder:6.714.v79e858ef76a_2 command-launcher:1.6 conditional-buildstep:1.4.2 config-autorefresh-plugin:1.0 config-file-provider:3.9.0 configurationslicing:430.v966357576543 copyartifact:1.46.3 credentials:1074.v60e6c29b_b_44b_ credentials-binding:1.27.1 custom-view-tabs:1.3 cvs:2.19 display-url-api:2.3.6 docker-commons:1.19 docker-java-api:3.1.5.2 docker-workflow:1.28 durable-task:495.v29cd95ec10f2 dynamic-axis:1.0.3 ec2:1.68 ec2-fleet:2.5.1 echarts-api:5.3.0-2 email-ext:2.87 envinject-api:1.180.v98d833b_27470 extended-read-permission:3.2 extensible-choice-parameter:1.8.0 external-monitor-job:191.v363d0d1efdf8 extra-columns:1.25 favorite:2.4.1 favorite-view:1.0 font-awesome-api:6.0.0-1 fstrigger:1.00 generic -webhook-trigger:1.83 git:4.10.3 git-client:3.11.0 git-parameter:0.9.15 git-server:1.10 github:1.34.3 github-api:1.301-378.v9807bd746da5 github-branch-source:1583.v18d333ef7379 gitlab-api:1.0.6 gitlab-oauth:1.13 gitlab-plugin:1.5.29 gradle:1.38 greenballs:1.15.1 handlebars:3.0.8 handy-uri-templates-2-api:2.1.8-1.0 htmlpublisher:1.29 jackson2-api:2.13.2-260.v43d711474c77 javadoc:217.v905b_86277a_2a_ javax-activation-api:1.2.0-2 javax-mail-api:1.6.2-5 jaxb:2.3.0.1 jdk-tool:1.5 jenkins-design-language:1.25.3 jersey2-api:2.35-4 jira:3.7 jjwt-api:0.11.2-9.c8b45b8bb173 jnr-posix-api:3.1.7-3 jquery:1.12.4-1 jquery-detached:1.2.1 jquery-ui:1.0.2 jquery3-api:3.6.0-2 jsch:0.1.55.2 junit:1.56 ldap:2.8 locale:144.v1a_998824ddb_3 lockable-resources:2.14 log-parser:2.2 mailer:408.vd726a_1130320 mapdb-api:1.0.9.0 matrix-auth:2.6.7 matrix-project:758.v7a_ea_491852f3 maven-plugin:3.18 mercurial:2.16 metrics:4.1.6.1 momentjs:1.1.1 monitoring:1.90.0 naginator:1.18.1 nested-view:1.24 next-build-number:1.8 node-iterator-api:1.5.1 nodelabelparameter:1.10.3 okhttp-api:4.9.3-105.vb96869f8ac3a openstack-cloud:2.61 pam-auth:1.7 parameterized-trigger:2.44 permissive-script-security:0.7 pipeline-aws:1.43 pipeline-build-step:2.16 pipeline-github-lib:36.v4c01db_ca_ed16 pipeline-graph-analysis:188.v3a01e7973f2c pipeline-input-step:446.vf27b_0b_83500e pipeline-milestone-step:100.v60a_03cd446e1 pipeline-model-api:2.2064.v5eef7d0982b_e pipeline-model-declarative-agent:1.1.1 pipeline-model-definition:2.2064.v5eef7d0982b_e pipeline-model-extensions:2.2064.v5eef7d0982b_e pipeline- rest -api:2.23 pipeline-stage-step:291.vf0a8a7aeeb50 pipeline-stage-tags-metadata:2.2064.v5eef7d0982b_e pipeline-stage-view:2.23 pipeline-utility-steps:2.12.0 plain-credentials:1.8 plugin-util-api:2.15.0 popper-api:1.16.1-2 popper2-api:2.11.4-1 pubsub-light:1.16 purge-build-queue-plugin:24.v3e0e709b_f62e rebuild:1.33 resource-disposer:0.17 run-condition:1.5 s3:0.12.1 scm-api:595.vd5a_df5eb_0e39 script-security:1145.vb_cf6cf6ed960 secondary-timestamper-plugin:1.1 slack:608.v19e3b_44b_b_9ff snakeyaml-api:1.29.1 sse-gateway:1.25 ssh:2.6.1 ssh-agent:1.24.1 ssh-credentials:1.19 ssh-slaves:1.806.v2253cedd3295 ssh-steps:2.0.0 sshd:3.1.0 structs:308.v852b473a2b8c subversion:2.15.3 text-finder:1.18 timestamper:1.17 token-macro:285.vff7645a_56ff0 trilead-api:1.0.13 urltrigger:1.02 variant:1.4 view-job-filters:2.3 windows-slaves:1.8 workflow-aggregator:2.7 workflow-api:1143.v2d42f1e9dea_5 workflow-basic-steps:941.vdfe1b_a_132c64 workflow-cps:2682.va_473dcddc941 workflow-cps-global-lib:564.ve62a_4eb_b_e039 workflow-durable-task-step:1128.v8c259d125340 workflow-job:1174.vdcb_d054cf74a_ workflow-multibranch:711.vdfef37cda_816 workflow-scm-step:2.13 workflow-step-api:622.vb_8e7c15b_c95a_ workflow-support:815.vd60466279fc8 ws-cleanup:0.40 xtrigger-api:0.4

          mor lajb added a comment -

          we have the same problem - LTS 2.346.1 - while pipeline complete the executer hanging for 30-60 minutes ...
          we use ec2-fleet and k8s plugins happen on both - pods and ec2's
          any workaround beside delete the instances and start again  ? 

          mor lajb added a comment - we have the same problem - LTS 2.346.1 - while pipeline complete the executer hanging for 30-60 minutes ... we use ec2-fleet and k8s plugins happen on both - pods and ec2's any workaround beside delete the instances and start again  ? 

          Jesse Glick added a comment -

          Whatever the root cause may be in particular cases, https://github.com/jenkinsci/workflow-durable-task-step-plugin/releases/tag/1146.v1a_d2e603f929 should clean up automatically.

          Jesse Glick added a comment - Whatever the root cause may be in particular cases, https://github.com/jenkinsci/workflow-durable-task-step-plugin/releases/tag/1146.v1a_d2e603f929 should clean up automatically.

          Allan BURDAJEWICZ added a comment - - edited

          Was able to reproduce this - leaked queue items even though the pipeline is completed - while troubleshooting a user's scenario. A simple scenario that I could find to reproduce the problem was the following and involves both the timeout and node step:

              timeout(time: 20, unit: 'SECONDS') {    
                  try {
                      // Label that does not exist
                      node('doesnotexists') {
                          sh "sleep 999999"
                      }
                  } finally {  
                      // Label that does not exist
                      node('doesnotexists') {
                          sh "sleep 999999"
                      }
                  }
              }
          

          This would leak a queue item every time.

          And indeed https://github.com/jenkinsci/workflow-durable-task-step-plugin/releases/tag/1146.v1a_d2e603f929 fixes the problem in that particular case.

          Allan BURDAJEWICZ added a comment - - edited Was able to reproduce this - leaked queue items even though the pipeline is completed - while troubleshooting a user's scenario. A simple scenario that I could find to reproduce the problem was the following and involves both the timeout and node step: timeout(time: 20, unit: 'SECONDS' ) { try { // Label that does not exist node( 'doesnotexists' ) { sh "sleep 999999" } } finally { // Label that does not exist node( 'doesnotexists' ) { sh "sleep 999999" } } } This would leak a queue item every time. And indeed https://github.com/jenkinsci/workflow-durable-task-step-plugin/releases/tag/1146.v1a_d2e603f929 fixes the problem in that particular case.

          Jesse Glick added a comment -

          allan_burdajewicz in that case I think the issue is that timeout only delivers one interruption and waits for a grace period before escalating to a hard kill

          Body did not finish within grace period; terminating with extreme prejudice
          

          which bypasses cleanup code. If you put something liable to hang in a finally block you are triggering this scenario. Maybe timeout could escalate more smoothly but it is hard for it to tell whether its body “paid attention” to the interrupt and is actually going to process it soon or not.

          Jesse Glick added a comment - allan_burdajewicz in that case I think the issue is that timeout only delivers one interruption and waits for a grace period before escalating to a hard kill Body did not finish within grace period; terminating with extreme prejudice which bypasses cleanup code. If you put something liable to hang in a finally block you are triggering this scenario. Maybe timeout could escalate more smoothly but it is hard for it to tell whether its body “paid attention” to the interrupt and is actually going to process it soon or not.

          The fix in workflow-durable-task-step 1146.v1a_d2e603f929 definitely seem to solve the problem for that scenario. Can't reproduce it anymore.

          Allan BURDAJEWICZ added a comment - The fix in workflow-durable-task-step 1146.v1a_d2e603f929 definitely seem to solve the problem for that scenario. Can't reproduce it anymore.

          Devin Nusbaum added a comment -

          Pipeline: Groovy plugin version 3785.vee73da_b_9544e fixes one class of issues that could cause executor slots to leak (cases where the CPS VM thread handled an uncaught exception and stopped the build, which correspond to the following log message: "WARNING o.j.p.w.cps.CpsVmExecutorService#reportProblem: Unexpected exception in CPS VM thread"). See JENKINS-71692.

          Devin Nusbaum added a comment - Pipeline: Groovy plugin version 3785.vee73da_b_9544e fixes one class of issues that could cause executor slots to leak (cases where the CPS VM thread handled an uncaught exception and stopped the build, which correspond to the following log message: "WARNING o.j.p.w.cps.CpsVmExecutorService#reportProblem: Unexpected exception in CPS VM thread"). See JENKINS-71692 .

            Unassigned Unassigned
            brainsam Alexander Moiseenko
            Votes:
            3 Vote for this issue
            Watchers:
            23 Start watching this issue

              Created:
              Updated: