Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-51568

Pipeline jobs hanging in Build Executor even if it is finished

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • None
    • jenkins 2.89.4
      EL7

      We have huge Jenkins instance, which runs about 20k builds a day.

      At some moment after couple days without restart of jenkins master, pipeline jobs starts to hang executors after build finish. Freestyle and maven jobs works fine.
      Busy executor looks like:

      But build status is "finished":

      There are records about start and finish build in jenkins.log, but executor wasn't released at May 28, 2018 4:35:14 PM:

      May 28, 2018 4:28:07 PM org.jenkinsci.plugins.scriptsecurity.sandbox.groovy.SandboxResolvingClassLoader$4$1 load
      WARNING: took 5,770ms to load/not load groovy.lang.GroovyObject$groovy$util$script15275137998231310805619$SSH_LOGIN from classLoader hudson.PluginManager$UberClassLoader
      2018/05/28 05:07:798 - job/KKA/job/TRIGGER_JOB_NEW_BUILD_IN_NEXUS_FLAG/ #4533 Started by timer
      May 28, 2018 4:28:07 PM org.jenkinsci.plugins.scriptsecurity.sandbox.groovy.SandboxResolvingClassLoader$4$1 load
      WARNING: took 6,783ms to load/not load groovy.lang.GroovyObject$groovy$util$script15275138109721310805619$SSH_LOGIN from classLoader hudson.PluginManager$UberClassLoader
      
      ...
      
      WARNING: Owner[PPRBTEAM/archive/Deimos/deimos-module-list-wf/23333:PPRBTEAM/archive/Deimos/deimos-module-list-wf #23333] was not in the list to begin with: [Owner[GringoTesting/Try/1:GringoTesting/Try #1], Owner[PPRB_DevOps/Install_EIP/Install_EIP_2_cli/211:PPRB_DevOps/Install_EIP/Install_EIP_2_cli #211], Owner[AutoTransaction/AutoTransaction_release_major-2018-04-30_deprecated/275:AutoTransaction/AutoTransaction_release_major-2018-04-30_deprecated #275], Owner[DataFactory/AHD/Pipeline_dev/49:DataFactory/AHD/Pipeline_dev #49], Owner[CSUO/PipelineBadCode/19:CSUO/PipelineBadCode #19], Owner[PPRB_DevOps/Install_EIP/Install_EIP_2_cli/616:PPRB_DevOps/Install_EIP/Install_EIP_2_cli #616], Owner[KKMB/bundle-barriers/barrier-sbof-commondealinf-r1.28.0/7:KKMB/bundle-barriers/barrier-sbof-commondealinf-r1.28.0 #7], Owner[ESB/FS/FS_CI_RFC_PR_8/3060:ESB/FS/FS_CI_RFC_PR_8 #3060], Owner[TFS/TestJobs/integrationOnly/809:TFS/TestJobs/integrationOnly #809], Owner[PPRB_DevOps/KBT/Install_KBT(DEV)_ear/265:PPRB_DevOps/KBT/Install_KBT(DEV)_ear #265], Owner[MBP/mbp-ci/123:MBP/mbp-ci #123], Owner[PRCRED/EKB3/loans-for-persons-ekb3-pipeline___force-WF-and-envelope_Stend1_dev/338:PRCRED/EKB3/loans-for-persons-ekb3-pipeline___force-WF-and-envelope_Stend1_dev #338], Owner[Kalita/tmp/test_pr/46:Kalita/tmp/test_pr #46], Owner[Kalita/tmp/test_pr/47:Kalita/tmp/test_pr #47], Owner[ASKOO/koo-release-build/feature%2F513/8:ASKOO/koo-release-build/feature%2F513 #8], Owner[ASKOO/koo-release-build/develop/117:ASKOO/koo-release-build/develop #117], Owner[ASKOO/koo-release-build/feature%2F513/9:ASKOO/koo-release-build/feature%2F513 #9], Owner[ASKOO/koo-release-build/develop/118:ASKOO/koo-release-build/develop #118], Owner[GBK/QG_check_minor250518/205:GBK/QG_check_minor250518 #205], Owner[GBK/QG_Check_dev/345:GBK/QG_Check_dev #345], Owner[PRCRED/EKB3/loans-for-persons-ekb3-pipeline___force-WF-and-envelope_Stend1_dev/339:PRCRED/EKB3/loans-for-persons-ekb3-pipeline___force-WF-and-envelope_Stend1_dev #339], Owner[GateWayDP/Gateways/Gateway_ESBGW_CI_PIPELINE/363:GateWayDP/Gateways/Gateway_ESBGW_CI_PIPELINE #363], Owner[DataFactory/stork/Build_PullRequest/524:DataFactory/stork/Build_PullRequest #524], Owner[KKA/KKA_PIPE_CI/352:KKA/KKA_PIPE_CI #352], Owner[KKA/KKA_PIPE_DEPLOY/154:KKA/KKA_PIPE_DEPLOY #154], Owner[GBK/QG_check_minor250518/206:GBK/QG_check_minor250518 #206], Owner[PPRB_DepositCashOperations/card/BuildDistr_Develop_Nexus_Prod_QG/57:PPRB_DepositCashOperations/card/BuildDistr_Develop_Nexus_Prod_QG #57], Owner[PRCRED/CBIR/AUTODEPLOY_PIPE_KENNY/33:PRCRED/CBIR/AUTODEPLOY_PIPE_KENNY #33], Owner[Kalita/Regular_clt_dev_builds/295:Kalita/Regular_clt_dev_builds #295], Owner[DataFactory/stork/Build_PullRequest/525:DataFactory/stork/Build_PullRequest #525], Owner[Kalita/tmp/test_pr/48:Kalita/tmp/test_pr #48], Owner[AEP/AEP_QG/806:AEP/AEP_QG #806], Owner[SNUiL/DevOps/CI-Builds/CI-Build-PIR-29/549:SNUiL/DevOps/CI-Builds/CI-Build-PIR-29 #549], Owner[GBK/QG_Check_dev/346:GBK/QG_Check_dev #346], Owner[GBK/QG_check_minor250518/207:GBK/QG_check_minor250518 #207], Owner[ECOD/DEVELOP/NexusArtifactFlag/1:ECOD/DEVELOP/NexusArtifactFlag #1], Owner[Kalita/Parallel_Pipeline_2/1003:Kalita/Parallel_Pipeline_2 #1003], Owner[HPSM/HPSM_pipeline/403:HPSM/HPSM_pipeline #403], Owner[ESB_CF/IIB9_PIPELINE/3751:ESB_CF/IIB9_PIPELINE #3751], Owner[DataFactory/stork/Build_PullRequest/526:DataFactory/stork/Build_PullRequest #526], Owner[PPRB_DepositCashOperations/card/BuildDistr_Develop_Nexus_Prod_QG/58:PPRB_DepositCashOperations/card/BuildDistr_Develop_Nexus_Prod_QG #58], Owner[DevOps/AHD/sleep-test/2:DevOps/AHD/sleep-test #2], Owner[ESB_SMP/FullBuild/484:ESB_SMP/FullBuild #484], Owner[DataFactory/stork/Build_PullRequest/527:DataFactory/stork/Build_PullRequest #527], Owner[SBK/SMARTREGRESS_TEST_IFT1/4:SBK/SMARTREGRESS_TEST_IFT1 #4], Owner[impprb/checkQG/249:impprb/checkQG #249], Owner[DevOps/AHD/sleep-test/4:DevOps/AHD/sleep-test #4], Owner[TFS/PRBuilders/BuilderPRForCore/903:TFS/PRBuilders/BuilderPRForCore #903], Owner[HPSM/HPSM_pipeline/404:HPSM/HPSM_pipeline #404], Owner[FCCM8/regular_release/41:FCCM8/regular_release #41], Owner[ESB/DevOps/Other/AUTOTESTS/DO_PR_INIT/380:ESB/DevOps/Other/AUTOTESTS/DO_PR_INIT #380], Owner[PPRB_DevOps/KBT/Install_KBT(DEV)_ear/268:PPRB_DevOps/KBT/Install_KBT(DEV)_ear #268], Owner[DataFactory/stork/Build_PullRequest/528:DataFactory/stork/Build_PullRequest #528], Owner[SNUiL/DevOps/CI-Deploy/CI-Deploy-to-testing-from-git/389:SNUiL/DevOps/CI-Deploy/CI-Deploy-to-testing-from-git #389], Owner[ASCC/ascc_full_RELEASE/02.013.00_STG-19937_jenkins_release_job/41:ASCC/ascc_full_RELEASE/02.013.00_STG-19937_jenkins_release_job #41], Owner[ASKOO/koo-release-build/feature%2F253/29:ASKOO/koo-release-build/feature%2F253 #29], Owner[GateWayDP/Gateways/Gateway_EDOGO_CI_PIPELINE/334:GateWayDP/Gateways/Gateway_EDOGO_CI_PIPELINE #334], Owner[DEPOZITORY/PB/69:DEPOZITORY/PB #69], Owner[DataFactory/stork/Auto_Test_DEV/2115:DataFactory/stork/Auto_Test_DEV #2115], Owner[GBK/QG_check_minor250518/208:GBK/QG_check_minor250518 #208], Owner[TDS/GREEN_AN_GREEN/87:TDS/GREEN_AN_GREEN #87], Owner[PPRB_DepositCashOperations/common/Publish_to_IFT_universal/320:PPRB_DepositCashOperations/common/Publish_to_IFT_universal #320], Owner[PPRB_DevOps/Quality_Gate_pipes/Universal_Quality_Gate_pipe/9156:PPRB_DevOps/Quality_Gate_pipes/Universal_Quality_Gate_pipe #9156], Owner[ASCC/server1/02.013.00/1496:ASCC/server1/02.013.00 #1496], Owner[ASCC/ascc_server_branch_build/02.013.00_STG-19937_jenkins_release_job/37:ASCC/ascc_server_branch_build/02.013.00_STG-19937_jenkins_release_job #37], Owner[ESB/FS/Meshkov/FS_CI_RFC_tst/763:ESB/FS/Meshkov/FS_CI_RFC_tst #763], Owner[PPRB_DevOps/Quality_Gate_pipes/Universal_Quality_Gate_pipe/9157:PPRB_DevOps/Quality_Gate_pipes/Universal_Quality_Gate_pipe #9157], Owner[mmt/DEV/1051:mmt/DEV #1051], Owner[ASKOO/koo-release-build/support%2F02.021/18:ASKOO/koo-release-build/support%2F02.021 #18], Owner[Kalita/Parallel_Pipeline_2/1005:Kalita/Parallel_Pipeline_2 #1005], Owner[DataFactory/stork/Build_PullRequest/529:DataFactory/stork/Build_PullRequest #529], Owner[SNUiL/DevOps/CI-Builds/CI-Build-SNUILDEV-3794-COURIER/249:SNUiL/DevOps/CI-Builds/CI-Build-SNUILDEV-3794-COURIER #249], Owner[ASBS/buildByPipeline/532:ASBS/buildByPipeline #532], Owner[ASCC/ascc_server_branch_build/02.014.00_STG-18499_CompositeOutCashOrders/14:ASCC/ascc_server_branch_build/02.014.00_STG-18499_CompositeOutCashOrders #14], Owner[adpSWIFT/adpSWIFT_PIPELINE/3102:adpSWIFT/adpSWIFT_PIPELINE #3102], Owner[ASCC/server1/02.014.00/407:ASCC/server1/02.014.00 #407], Owner[ESB/DevOps/Dev/ESB_KF_CI00223537/PartialESBInstall/33:ESB/DevOps/Dev/ESB_KF_CI00223537/PartialESBInstall #33], Owner[PPRB_DevOps/Quality_Gate_pipes/Universal_Quality_Gate_pipe/9160:PPRB_DevOps/Quality_Gate_pipes/Universal_Quality_Gate_pipe #9160], Owner[ESB/FS/FS_PR_INIT/24747:ESB/FS/FS_PR_INIT #24747], Owner[PPRB.OIP/kbt-scripts/ucp-corp/359:PPRB.OIP/kbt-scripts/ucp-corp #359], Owner[PPRB_DevOps/Install_EIP/Install_EIP_2_cli/733:PPRB_DevOps/Install_EIP/Install_EIP_2_cli #733], Owner[edosgo/elgo-mvd-clientverify/elgo-mvd-clientverify2/38:edosgo/elgo-mvd-clientverify/elgo-mvd-clientverify2 #38], Owner[PPRB_DevOps/Install_EIP/Install_EIP_2_cli/734:PPRB_DevOps/Install_EIP/Install_EIP_2_cli #734], Owner[MBP/mbp-ci/140:MBP/mbp-ci #140], Owner[PPRBDOC/upload_to_pipe/deprecated/QualityGateOnOurPipe/QualityGate-Order/50:PPRBDOC/upload_to_pipe/deprecated/QualityGateOnOurPipe/QualityGate-Order #50], Owner[ESB/FS/FS_CI_RFC/3478:ESB/FS/FS_CI_RFC #3478], Owner[DataFactory/stork/Build_required_distrib/122:DataFactory/stork/Build_required_distrib #122], Owner[TDS/Update_stand_by_url/920:TDS/Update_stand_by_url #920], Owner[CBDBO/Pipeline/2871:CBDBO/Pipeline #2871], Owner[PRCRED/EKB3/loans-for-persons-ekb3-pipeline___force-WF-and-envelope_Stend2_major-2-2018-05-27/315:PRCRED/EKB3/loans-for-persons-ekb3-pipeline___force-WF-and-envelope_Stend2_major-2-2018-05-27 #315], Owner[edosgo/elgo-fns-clientverify/elgo-fns-clientverify-release/193:edosgo/elgo-fns-clientverify/elgo-fns-clientverify-release #193], Owner[DepositPfETL/deposit-client-validation-pipeline-parameters/19:DepositPfETL/deposit-client-validation-pipeline-parameters #19], Owner[mmt/TEST/1:mmt/TEST #1], Owner[edosgo/elgo-remote-starter/323:edosgo/elgo-remote-starter #323], Owner[ESB/DevOps/Other/AUTOTESTS/PartialESBRestoreExGroup/302:ESB/DevOps/Other/AUTOTESTS/PartialESBRestoreExGroup #302], Owner[PPRB_DevOps/Install_EIP/Install_EIP_2_cli/735:PPRB_DevOps/Install_EIP/Install_EIP_2_cli #735], Owner[PPRBTEAM/Gera/gera-autodeploy-wf/116:PPRBTEAM/Gera/gera-autodeploy-wf #116], Owner[EKPiT/deploy-db-dev1/100:EKPiT/deploy-db-dev1 #100], Owner[TDS/PR_pipeline/903:TDS/PR_pipeline #903], Owner[edosgo/elgo-msh-reestrcontract/elgo-msh-reestrcontract-dev-barrier-pipeline/14:edosgo/elgo-msh-reestrcontract/elgo-msh-reestrcontract-dev-barrier-pipeline #14], Owner[ESB/FS/FS_CI_INIT/3648:ESB/FS/FS_CI_INIT #3648], Owner[KKA/TRIGGER_JOB_NEW_BUILD_IN_NEXUS/4907:KKA/TRIGGER_JOB_NEW_BUILD_IN_NEXUS #4907], Owner[edosgo/elgo-msh-reestrcontract/elgo-msh-reestrcontract-dev-barrier-deploy/21:edosgo/elgo-msh-reestrcontract/elgo-msh-reestrcontract-dev-barrier-deploy #21], Owner[Tengri/HDPLocalCiInsallation/151:Tengri/HDPLocalCiInsallation #151], Owner[mgr/API/PUBLISH_ALL/68435:mgr/API/PUBLISH_ALL #68435], Owner[EPS/Main_Pre_Build_Distr_DevOps2018_Pipeline/6405:EPS/Main_Pre_Build_Distr_DevOps2018_Pipeline #6405], Owner[KKA/TRIGGER_JOB_NEW_BUILD_IN_NEXUS_FLAG/4533:KKA/TRIGGER_JOB_NEW_BUILD_IN_NEXUS_FLAG #4533], Owner[PPRBCPRB/Pipeline_Server_Build_For_Dev_Server/10969:PPRBCPRB/Pipeline_Server_Build_For_Dev_Server #10969], Owner[PPRB_CEP/PSI_TEST_Pipe/7763:PPRB_CEP/PSI_TEST_Pipe #7763]]
      ...
      
      May 28, 2018 4:35:14 PM org.jenkinsci.plugins.workflow.job.WorkflowRun finish
      INFO: KKA/TRIGGER_JOB_NEW_BUILD_IN_NEXUS_FLAG #4533 completed: SUCCESS
      
      ....

      As a result, build queue grows since all avaliable executors are busy.

      Some workaround to defer restart: periodically run script to free executors:

      import hudson.model.*;
      
      nodes = jenkins.model.Jenkins.instance.nodes
      nodes.removeAll(Collections.singleton(null))
      
      nodes.each { node ->
      manager.listener.logger.println("-------PROCESSING NODE: $node.displayName -------------------")
      def exec = node.toComputer()
      if (exec == null) {	
      manager.listener.logger.println("------- WARNING: $node.displayName: NULL. Removing! -------------------")
      Jenkins.instance.removeNode(node)
      return;
      }
      
      exec.getExecutors().each { job ->
      if (job.busy && job.progress == -1) {
      manager.listener.logger.println("JOB $job.name LOOKS LIKE STUCK. KILLING.")
      def owner = job.owner
      owner.removeExecutor((hudson.model.Executor) job)
      }
      }
      }
      
      return null
      
       

        1. build-pipeline-steps.PNG
          build-pipeline-steps.PNG
          44 kB
        2. BusyTimers.png
          BusyTimers.png
          37 kB
        3. finished-build.PNG
          finished-build.PNG
          66 kB
        4. image-2018-11-27-11-31-06-881.png
          image-2018-11-27-11-31-06-881.png
          462 kB
        5. image-2018-11-27-11-32-58-457.png
          image-2018-11-27-11-32-58-457.png
          230 kB
        6. NormalCase.PNG
          NormalCase.PNG
          28 kB
        7. sleep-test-script.PNG
          sleep-test-script.PNG
          12 kB
        8. stuck-executor.png
          stuck-executor.png
          4 kB
        9. thread_dump_avaneesh.txt
          543 kB
        10. thread_dump.html
          490 kB

          [JENKINS-51568] Pipeline jobs hanging in Build Executor even if it is finished

          in the job config there is an option "Abort the build if its log file size is too big" from memory I used the configuration slicer plugin to remove it from all jobs

          David van Laatum added a comment - in the job config there is an option "Abort the build if its log file size is too big" from memory I used the configuration slicer plugin to remove it from all jobs

          Thank you David. I didn't find this configuration but I did some restriction with the log of job config history. Now the problem seems to be vanished. 

          Raphael Greger added a comment - Thank you David. I didn't find this configuration but I did some restriction with the log of job config history. Now the problem seems to be vanished. 

          li added a comment -

          look like the issue is still exist. is there any update? 2.289.1

          li added a comment - look like the issue is still exist. is there any update? 2.289.1

          Avaneesh added a comment - - edited

          Issue found on our Jenkins instance as well.

          We schedule more than 1k jobs per day. The jobs that are triggered by the user are completed normally and free up the executor. But the jobs that are triggered by the timer or upstream Jenkins project continue to hold the executor causing issues. The jobs don't go even if I manually kill them.

          Java Version:

          openjdk version "1.8.0_322"
          OpenJDK Runtime Environment (Zulu 8.60.0.21-CA-linux64) (build 1.8.0_322-b06)
          OpenJDK 64-Bit Server VM (Zulu 8.60.0.21-CA-linux64) (build 25.322-b06, mixed mode)

          Thread Dump: thread_dump_avaneesh.txt

          System Information:

          Jenkins: 2.332.1
          OS: Linux - 5.13.0-1019-aws
          ---
          Parameterized-Remote-Trigger:3.1.5.1
          PrioritySorter:4.1.0
          ace-editor:1.1
          ant:1.13
          antisamy-markup-formatter:2.7
          apache-httpcomponents-client-4-api:4.5.13-1.0
          artifact-manager-s3:617.vd98e61689f41
          artifactdeployer:1.2
          audit-trail:3.10
          authentication-tokens:1.4
          authorize-project:1.4.0
          aws-credentials:191.vcb_f183ce58b_9
          aws-global-configuration:1.7
          aws-java-sdk:1.12.163-315.v2b_716ec8e4df
          aws-java-sdk-cloudformation:1.12.163-315.v2b_716ec8e4df
          aws-java-sdk-codebuild:1.12.163-315.v2b_716ec8e4df
          aws-java-sdk-ec2:1.12.163-315.v2b_716ec8e4df
          aws-java-sdk-ecr:1.12.163-315.v2b_716ec8e4df
          aws-java-sdk-ecs:1.12.163-315.v2b_716ec8e4df
          aws-java-sdk-elasticbeanstalk:1.12.163-315.v2b_716ec8e4df
          aws-java-sdk-iam:1.12.163-315.v2b_716ec8e4df
          aws-java-sdk-logs:1.12.163-315.v2b_716ec8e4df
          aws-java-sdk-minimal:1.12.163-315.v2b_716ec8e4df
          aws-java-sdk-ssm:1.12.163-315.v2b_716ec8e4df
          blueocean:1.25.3
          blueocean-autofavorite:1.2.5
          blueocean-bitbucket-pipeline:1.25.3
          blueocean-commons:1.25.3
          blueocean-config:1.25.3
          blueocean-core-js:1.25.3
          blueocean-dashboard:1.25.3
          blueocean-display-url:2.4.1
          blueocean-events:1.25.3
          blueocean-git-pipeline:1.25.3
          blueocean-github-pipeline:1.25.3
          blueocean-i18n:1.25.3
          blueocean-jira:1.25.3
          blueocean-jwt:1.25.3
          blueocean-personalization:1.25.3
          blueocean-pipeline-api-impl:1.25.3
          blueocean-pipeline-editor:1.25.3
          blueocean-pipeline-scm-api:1.25.3
          blueocean-rest:1.25.3
          blueocean-rest-impl:1.25.3
          blueocean-web:1.25.3
          bootstrap4-api:4.6.0-3
          bootstrap5-api:5.1.3-6
          bouncycastle-api:2.25
          branch-api:2.7.0
          build-environment:1.7
          build-name-setter:2.2.0
          build-timeout:1.20
          caffeine-api:2.9.2-29.v717aac953ff3
          categorized-view:1.12
          checks-api:1.7.2
          cloud-stats:0.27
          cloudbees-bitbucket-branch-source:757.vddedc5f2589a_
          cloudbees-folder:6.714.v79e858ef76a_2
          command-launcher:1.6
          conditional-buildstep:1.4.2
          config-autorefresh-plugin:1.0
          config-file-provider:3.9.0
          configurationslicing:430.v966357576543
          copyartifact:1.46.3
          credentials:1074.v60e6c29b_b_44b_
          credentials-binding:1.27.1
          custom-view-tabs:1.3
          cvs:2.19
          display-url-api:2.3.6
          docker-commons:1.19
          docker-java-api:3.1.5.2
          docker-workflow:1.28
          durable-task:495.v29cd95ec10f2
          dynamic-axis:1.0.3
          ec2:1.68
          ec2-fleet:2.5.1
          echarts-api:5.3.0-2
          email-ext:2.87
          envinject-api:1.180.v98d833b_27470
          extended-read-permission:3.2
          extensible-choice-parameter:1.8.0
          external-monitor-job:191.v363d0d1efdf8
          extra-columns:1.25
          favorite:2.4.1
          favorite-view:1.0
          font-awesome-api:6.0.0-1
          fstrigger:1.00
          generic-webhook-trigger:1.83
          git:4.10.3
          git-client:3.11.0
          git-parameter:0.9.15
          git-server:1.10
          github:1.34.3
          github-api:1.301-378.v9807bd746da5
          github-branch-source:1583.v18d333ef7379
          gitlab-api:1.0.6
          gitlab-oauth:1.13
          gitlab-plugin:1.5.29
          gradle:1.38
          greenballs:1.15.1
          handlebars:3.0.8
          handy-uri-templates-2-api:2.1.8-1.0
          htmlpublisher:1.29
          jackson2-api:2.13.2-260.v43d711474c77
          javadoc:217.v905b_86277a_2a_
          javax-activation-api:1.2.0-2
          javax-mail-api:1.6.2-5
          jaxb:2.3.0.1
          jdk-tool:1.5
          jenkins-design-language:1.25.3
          jersey2-api:2.35-4
          jira:3.7
          jjwt-api:0.11.2-9.c8b45b8bb173
          jnr-posix-api:3.1.7-3
          jquery:1.12.4-1
          jquery-detached:1.2.1
          jquery-ui:1.0.2
          jquery3-api:3.6.0-2
          jsch:0.1.55.2
          junit:1.56
          ldap:2.8
          locale:144.v1a_998824ddb_3
          lockable-resources:2.14
          log-parser:2.2
          mailer:408.vd726a_1130320
          mapdb-api:1.0.9.0
          matrix-auth:2.6.7
          matrix-project:758.v7a_ea_491852f3
          maven-plugin:3.18
          mercurial:2.16
          metrics:4.1.6.1
          momentjs:1.1.1
          monitoring:1.90.0
          naginator:1.18.1
          nested-view:1.24
          next-build-number:1.8
          node-iterator-api:1.5.1
          nodelabelparameter:1.10.3
          okhttp-api:4.9.3-105.vb96869f8ac3a
          openstack-cloud:2.61
          pam-auth:1.7
          parameterized-trigger:2.44
          permissive-script-security:0.7
          pipeline-aws:1.43
          pipeline-build-step:2.16
          pipeline-github-lib:36.v4c01db_ca_ed16
          pipeline-graph-analysis:188.v3a01e7973f2c
          pipeline-input-step:446.vf27b_0b_83500e
          pipeline-milestone-step:100.v60a_03cd446e1
          pipeline-model-api:2.2064.v5eef7d0982b_e
          pipeline-model-declarative-agent:1.1.1
          pipeline-model-definition:2.2064.v5eef7d0982b_e
          pipeline-model-extensions:2.2064.v5eef7d0982b_e
          pipeline-rest-api:2.23
          pipeline-stage-step:291.vf0a8a7aeeb50
          pipeline-stage-tags-metadata:2.2064.v5eef7d0982b_e
          pipeline-stage-view:2.23
          pipeline-utility-steps:2.12.0
          plain-credentials:1.8
          plugin-util-api:2.15.0
          popper-api:1.16.1-2
          popper2-api:2.11.4-1
          pubsub-light:1.16
          purge-build-queue-plugin:24.v3e0e709b_f62e
          rebuild:1.33
          resource-disposer:0.17
          run-condition:1.5
          s3:0.12.1
          scm-api:595.vd5a_df5eb_0e39
          script-security:1145.vb_cf6cf6ed960
          secondary-timestamper-plugin:1.1
          slack:608.v19e3b_44b_b_9ff
          snakeyaml-api:1.29.1
          sse-gateway:1.25
          ssh:2.6.1
          ssh-agent:1.24.1
          ssh-credentials:1.19
          ssh-slaves:1.806.v2253cedd3295
          ssh-steps:2.0.0
          sshd:3.1.0
          structs:308.v852b473a2b8c
          subversion:2.15.3
          text-finder:1.18
          timestamper:1.17
          token-macro:285.vff7645a_56ff0
          trilead-api:1.0.13
          urltrigger:1.02
          variant:1.4
          view-job-filters:2.3
          windows-slaves:1.8
          workflow-aggregator:2.7
          workflow-api:1143.v2d42f1e9dea_5
          workflow-basic-steps:941.vdfe1b_a_132c64
          workflow-cps:2682.va_473dcddc941
          workflow-cps-global-lib:564.ve62a_4eb_b_e039
          workflow-durable-task-step:1128.v8c259d125340
          workflow-job:1174.vdcb_d054cf74a_
          workflow-multibranch:711.vdfef37cda_816
          workflow-scm-step:2.13
          workflow-step-api:622.vb_8e7c15b_c95a_
          workflow-support:815.vd60466279fc8
          ws-cleanup:0.40
          xtrigger-api:0.4 

          Avaneesh added a comment - - edited Issue found on our Jenkins instance as well. We schedule more than 1k jobs per day. The jobs that are triggered by the user are completed normally and free up the executor. But the jobs that are triggered by the timer or upstream Jenkins project continue to hold the executor causing issues. The jobs don't go even if I manually kill them. Java Version: openjdk version "1.8.0_322" OpenJDK Runtime Environment (Zulu 8.60.0.21-CA-linux64) (build 1.8.0_322-b06) OpenJDK 64-Bit Server VM (Zulu 8.60.0.21-CA-linux64) (build 25.322-b06, mixed mode) Thread Dump: thread_dump_avaneesh.txt System Information: Jenkins: 2.332.1 OS: Linux - 5.13.0-1019-aws --- Parameterized-Remote-Trigger:3.1.5.1 PrioritySorter:4.1.0 ace-editor:1.1 ant:1.13 antisamy-markup-formatter:2.7 apache-httpcomponents-client-4-api:4.5.13-1.0 artifact-manager-s3:617.vd98e61689f41 artifactdeployer:1.2 audit-trail:3.10 authentication-tokens:1.4 authorize-project:1.4.0 aws-credentials:191.vcb_f183ce58b_9 aws-global-configuration:1.7 aws-java-sdk:1.12.163-315.v2b_716ec8e4df aws-java-sdk-cloudformation:1.12.163-315.v2b_716ec8e4df aws-java-sdk-codebuild:1.12.163-315.v2b_716ec8e4df aws-java-sdk-ec2:1.12.163-315.v2b_716ec8e4df aws-java-sdk-ecr:1.12.163-315.v2b_716ec8e4df aws-java-sdk-ecs:1.12.163-315.v2b_716ec8e4df aws-java-sdk-elasticbeanstalk:1.12.163-315.v2b_716ec8e4df aws-java-sdk-iam:1.12.163-315.v2b_716ec8e4df aws-java-sdk-logs:1.12.163-315.v2b_716ec8e4df aws-java-sdk-minimal:1.12.163-315.v2b_716ec8e4df aws-java-sdk-ssm:1.12.163-315.v2b_716ec8e4df blueocean:1.25.3 blueocean-autofavorite:1.2.5 blueocean-bitbucket-pipeline:1.25.3 blueocean-commons:1.25.3 blueocean-config:1.25.3 blueocean-core-js:1.25.3 blueocean-dashboard:1.25.3 blueocean-display-url:2.4.1 blueocean-events:1.25.3 blueocean-git-pipeline:1.25.3 blueocean-github-pipeline:1.25.3 blueocean-i18n:1.25.3 blueocean-jira:1.25.3 blueocean-jwt:1.25.3 blueocean-personalization:1.25.3 blueocean-pipeline-api-impl:1.25.3 blueocean-pipeline-editor:1.25.3 blueocean-pipeline-scm-api:1.25.3 blueocean- rest :1.25.3 blueocean- rest -impl:1.25.3 blueocean-web:1.25.3 bootstrap4-api:4.6.0-3 bootstrap5-api:5.1.3-6 bouncycastle-api:2.25 branch-api:2.7.0 build-environment:1.7 build-name-setter:2.2.0 build-timeout:1.20 caffeine-api:2.9.2-29.v717aac953ff3 categorized-view:1.12 checks-api:1.7.2 cloud-stats:0.27 cloudbees-bitbucket-branch-source:757.vddedc5f2589a_ cloudbees-folder:6.714.v79e858ef76a_2 command-launcher:1.6 conditional-buildstep:1.4.2 config-autorefresh-plugin:1.0 config-file-provider:3.9.0 configurationslicing:430.v966357576543 copyartifact:1.46.3 credentials:1074.v60e6c29b_b_44b_ credentials-binding:1.27.1 custom-view-tabs:1.3 cvs:2.19 display-url-api:2.3.6 docker-commons:1.19 docker-java-api:3.1.5.2 docker-workflow:1.28 durable-task:495.v29cd95ec10f2 dynamic-axis:1.0.3 ec2:1.68 ec2-fleet:2.5.1 echarts-api:5.3.0-2 email-ext:2.87 envinject-api:1.180.v98d833b_27470 extended-read-permission:3.2 extensible-choice-parameter:1.8.0 external-monitor-job:191.v363d0d1efdf8 extra-columns:1.25 favorite:2.4.1 favorite-view:1.0 font-awesome-api:6.0.0-1 fstrigger:1.00 generic -webhook-trigger:1.83 git:4.10.3 git-client:3.11.0 git-parameter:0.9.15 git-server:1.10 github:1.34.3 github-api:1.301-378.v9807bd746da5 github-branch-source:1583.v18d333ef7379 gitlab-api:1.0.6 gitlab-oauth:1.13 gitlab-plugin:1.5.29 gradle:1.38 greenballs:1.15.1 handlebars:3.0.8 handy-uri-templates-2-api:2.1.8-1.0 htmlpublisher:1.29 jackson2-api:2.13.2-260.v43d711474c77 javadoc:217.v905b_86277a_2a_ javax-activation-api:1.2.0-2 javax-mail-api:1.6.2-5 jaxb:2.3.0.1 jdk-tool:1.5 jenkins-design-language:1.25.3 jersey2-api:2.35-4 jira:3.7 jjwt-api:0.11.2-9.c8b45b8bb173 jnr-posix-api:3.1.7-3 jquery:1.12.4-1 jquery-detached:1.2.1 jquery-ui:1.0.2 jquery3-api:3.6.0-2 jsch:0.1.55.2 junit:1.56 ldap:2.8 locale:144.v1a_998824ddb_3 lockable-resources:2.14 log-parser:2.2 mailer:408.vd726a_1130320 mapdb-api:1.0.9.0 matrix-auth:2.6.7 matrix-project:758.v7a_ea_491852f3 maven-plugin:3.18 mercurial:2.16 metrics:4.1.6.1 momentjs:1.1.1 monitoring:1.90.0 naginator:1.18.1 nested-view:1.24 next-build-number:1.8 node-iterator-api:1.5.1 nodelabelparameter:1.10.3 okhttp-api:4.9.3-105.vb96869f8ac3a openstack-cloud:2.61 pam-auth:1.7 parameterized-trigger:2.44 permissive-script-security:0.7 pipeline-aws:1.43 pipeline-build-step:2.16 pipeline-github-lib:36.v4c01db_ca_ed16 pipeline-graph-analysis:188.v3a01e7973f2c pipeline-input-step:446.vf27b_0b_83500e pipeline-milestone-step:100.v60a_03cd446e1 pipeline-model-api:2.2064.v5eef7d0982b_e pipeline-model-declarative-agent:1.1.1 pipeline-model-definition:2.2064.v5eef7d0982b_e pipeline-model-extensions:2.2064.v5eef7d0982b_e pipeline- rest -api:2.23 pipeline-stage-step:291.vf0a8a7aeeb50 pipeline-stage-tags-metadata:2.2064.v5eef7d0982b_e pipeline-stage-view:2.23 pipeline-utility-steps:2.12.0 plain-credentials:1.8 plugin-util-api:2.15.0 popper-api:1.16.1-2 popper2-api:2.11.4-1 pubsub-light:1.16 purge-build-queue-plugin:24.v3e0e709b_f62e rebuild:1.33 resource-disposer:0.17 run-condition:1.5 s3:0.12.1 scm-api:595.vd5a_df5eb_0e39 script-security:1145.vb_cf6cf6ed960 secondary-timestamper-plugin:1.1 slack:608.v19e3b_44b_b_9ff snakeyaml-api:1.29.1 sse-gateway:1.25 ssh:2.6.1 ssh-agent:1.24.1 ssh-credentials:1.19 ssh-slaves:1.806.v2253cedd3295 ssh-steps:2.0.0 sshd:3.1.0 structs:308.v852b473a2b8c subversion:2.15.3 text-finder:1.18 timestamper:1.17 token-macro:285.vff7645a_56ff0 trilead-api:1.0.13 urltrigger:1.02 variant:1.4 view-job-filters:2.3 windows-slaves:1.8 workflow-aggregator:2.7 workflow-api:1143.v2d42f1e9dea_5 workflow-basic-steps:941.vdfe1b_a_132c64 workflow-cps:2682.va_473dcddc941 workflow-cps-global-lib:564.ve62a_4eb_b_e039 workflow-durable-task-step:1128.v8c259d125340 workflow-job:1174.vdcb_d054cf74a_ workflow-multibranch:711.vdfef37cda_816 workflow-scm-step:2.13 workflow-step-api:622.vb_8e7c15b_c95a_ workflow-support:815.vd60466279fc8 ws-cleanup:0.40 xtrigger-api:0.4

          mor lajb added a comment -

          we have the same problem - LTS 2.346.1 - while pipeline complete the executer hanging for 30-60 minutes ...
          we use ec2-fleet and k8s plugins happen on both - pods and ec2's
          any workaround beside delete the instances and start again  ? 

          mor lajb added a comment - we have the same problem - LTS 2.346.1 - while pipeline complete the executer hanging for 30-60 minutes ... we use ec2-fleet and k8s plugins happen on both - pods and ec2's any workaround beside delete the instances and start again  ? 

          Jesse Glick added a comment -

          Whatever the root cause may be in particular cases, https://github.com/jenkinsci/workflow-durable-task-step-plugin/releases/tag/1146.v1a_d2e603f929 should clean up automatically.

          Jesse Glick added a comment - Whatever the root cause may be in particular cases, https://github.com/jenkinsci/workflow-durable-task-step-plugin/releases/tag/1146.v1a_d2e603f929 should clean up automatically.

          Allan BURDAJEWICZ added a comment - - edited

          Was able to reproduce this - leaked queue items even though the pipeline is completed - while troubleshooting a user's scenario. A simple scenario that I could find to reproduce the problem was the following and involves both the timeout and node step:

              timeout(time: 20, unit: 'SECONDS') {    
                  try {
                      // Label that does not exist
                      node('doesnotexists') {
                          sh "sleep 999999"
                      }
                  } finally {  
                      // Label that does not exist
                      node('doesnotexists') {
                          sh "sleep 999999"
                      }
                  }
              }
          

          This would leak a queue item every time.

          And indeed https://github.com/jenkinsci/workflow-durable-task-step-plugin/releases/tag/1146.v1a_d2e603f929 fixes the problem in that particular case.

          Allan BURDAJEWICZ added a comment - - edited Was able to reproduce this - leaked queue items even though the pipeline is completed - while troubleshooting a user's scenario. A simple scenario that I could find to reproduce the problem was the following and involves both the timeout and node step: timeout(time: 20, unit: 'SECONDS' ) { try { // Label that does not exist node( 'doesnotexists' ) { sh "sleep 999999" } } finally { // Label that does not exist node( 'doesnotexists' ) { sh "sleep 999999" } } } This would leak a queue item every time. And indeed https://github.com/jenkinsci/workflow-durable-task-step-plugin/releases/tag/1146.v1a_d2e603f929 fixes the problem in that particular case.

          Jesse Glick added a comment -

          allan_burdajewicz in that case I think the issue is that timeout only delivers one interruption and waits for a grace period before escalating to a hard kill

          Body did not finish within grace period; terminating with extreme prejudice
          

          which bypasses cleanup code. If you put something liable to hang in a finally block you are triggering this scenario. Maybe timeout could escalate more smoothly but it is hard for it to tell whether its body “paid attention” to the interrupt and is actually going to process it soon or not.

          Jesse Glick added a comment - allan_burdajewicz in that case I think the issue is that timeout only delivers one interruption and waits for a grace period before escalating to a hard kill Body did not finish within grace period; terminating with extreme prejudice which bypasses cleanup code. If you put something liable to hang in a finally block you are triggering this scenario. Maybe timeout could escalate more smoothly but it is hard for it to tell whether its body “paid attention” to the interrupt and is actually going to process it soon or not.

          The fix in workflow-durable-task-step 1146.v1a_d2e603f929 definitely seem to solve the problem for that scenario. Can't reproduce it anymore.

          Allan BURDAJEWICZ added a comment - The fix in workflow-durable-task-step 1146.v1a_d2e603f929 definitely seem to solve the problem for that scenario. Can't reproduce it anymore.

          Devin Nusbaum added a comment -

          Pipeline: Groovy plugin version 3785.vee73da_b_9544e fixes one class of issues that could cause executor slots to leak (cases where the CPS VM thread handled an uncaught exception and stopped the build, which correspond to the following log message: "WARNING o.j.p.w.cps.CpsVmExecutorService#reportProblem: Unexpected exception in CPS VM thread"). See JENKINS-71692.

          Devin Nusbaum added a comment - Pipeline: Groovy plugin version 3785.vee73da_b_9544e fixes one class of issues that could cause executor slots to leak (cases where the CPS VM thread handled an uncaught exception and stopped the build, which correspond to the following log message: "WARNING o.j.p.w.cps.CpsVmExecutorService#reportProblem: Unexpected exception in CPS VM thread"). See JENKINS-71692 .

            Unassigned Unassigned
            brainsam Alexander Moiseenko
            Votes:
            3 Vote for this issue
            Watchers:
            23 Start watching this issue

              Created:
              Updated: