Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-51568

Pipeline jobs hanging in Build Executor even if it is finished

    XMLWordPrintable

    Details

    • Similar Issues:

      Description

      We have huge Jenkins instance, which runs about 20k builds a day.

      At some moment after couple days without restart of jenkins master, pipeline jobs starts to hang executors after build finish. Freestyle and maven jobs works fine.
      Busy executor looks like:

      But build status is "finished":

      There are records about start and finish build in jenkins.log, but executor wasn't released at May 28, 2018 4:35:14 PM:

      May 28, 2018 4:28:07 PM org.jenkinsci.plugins.scriptsecurity.sandbox.groovy.SandboxResolvingClassLoader$4$1 load
      WARNING: took 5,770ms to load/not load groovy.lang.GroovyObject$groovy$util$script15275137998231310805619$SSH_LOGIN from classLoader hudson.PluginManager$UberClassLoader
      2018/05/28 05:07:798 - job/KKA/job/TRIGGER_JOB_NEW_BUILD_IN_NEXUS_FLAG/ #4533 Started by timer
      May 28, 2018 4:28:07 PM org.jenkinsci.plugins.scriptsecurity.sandbox.groovy.SandboxResolvingClassLoader$4$1 load
      WARNING: took 6,783ms to load/not load groovy.lang.GroovyObject$groovy$util$script15275138109721310805619$SSH_LOGIN from classLoader hudson.PluginManager$UberClassLoader
      
      ...
      
      WARNING: Owner[PPRBTEAM/archive/Deimos/deimos-module-list-wf/23333:PPRBTEAM/archive/Deimos/deimos-module-list-wf #23333] was not in the list to begin with: [Owner[GringoTesting/Try/1:GringoTesting/Try #1], Owner[PPRB_DevOps/Install_EIP/Install_EIP_2_cli/211:PPRB_DevOps/Install_EIP/Install_EIP_2_cli #211], Owner[AutoTransaction/AutoTransaction_release_major-2018-04-30_deprecated/275:AutoTransaction/AutoTransaction_release_major-2018-04-30_deprecated #275], Owner[DataFactory/AHD/Pipeline_dev/49:DataFactory/AHD/Pipeline_dev #49], Owner[CSUO/PipelineBadCode/19:CSUO/PipelineBadCode #19], Owner[PPRB_DevOps/Install_EIP/Install_EIP_2_cli/616:PPRB_DevOps/Install_EIP/Install_EIP_2_cli #616], Owner[KKMB/bundle-barriers/barrier-sbof-commondealinf-r1.28.0/7:KKMB/bundle-barriers/barrier-sbof-commondealinf-r1.28.0 #7], Owner[ESB/FS/FS_CI_RFC_PR_8/3060:ESB/FS/FS_CI_RFC_PR_8 #3060], Owner[TFS/TestJobs/integrationOnly/809:TFS/TestJobs/integrationOnly #809], Owner[PPRB_DevOps/KBT/Install_KBT(DEV)_ear/265:PPRB_DevOps/KBT/Install_KBT(DEV)_ear #265], Owner[MBP/mbp-ci/123:MBP/mbp-ci #123], Owner[PRCRED/EKB3/loans-for-persons-ekb3-pipeline___force-WF-and-envelope_Stend1_dev/338:PRCRED/EKB3/loans-for-persons-ekb3-pipeline___force-WF-and-envelope_Stend1_dev #338], Owner[Kalita/tmp/test_pr/46:Kalita/tmp/test_pr #46], Owner[Kalita/tmp/test_pr/47:Kalita/tmp/test_pr #47], Owner[ASKOO/koo-release-build/feature%2F513/8:ASKOO/koo-release-build/feature%2F513 #8], Owner[ASKOO/koo-release-build/develop/117:ASKOO/koo-release-build/develop #117], Owner[ASKOO/koo-release-build/feature%2F513/9:ASKOO/koo-release-build/feature%2F513 #9], Owner[ASKOO/koo-release-build/develop/118:ASKOO/koo-release-build/develop #118], Owner[GBK/QG_check_minor250518/205:GBK/QG_check_minor250518 #205], Owner[GBK/QG_Check_dev/345:GBK/QG_Check_dev #345], Owner[PRCRED/EKB3/loans-for-persons-ekb3-pipeline___force-WF-and-envelope_Stend1_dev/339:PRCRED/EKB3/loans-for-persons-ekb3-pipeline___force-WF-and-envelope_Stend1_dev #339], Owner[GateWayDP/Gateways/Gateway_ESBGW_CI_PIPELINE/363:GateWayDP/Gateways/Gateway_ESBGW_CI_PIPELINE #363], Owner[DataFactory/stork/Build_PullRequest/524:DataFactory/stork/Build_PullRequest #524], Owner[KKA/KKA_PIPE_CI/352:KKA/KKA_PIPE_CI #352], Owner[KKA/KKA_PIPE_DEPLOY/154:KKA/KKA_PIPE_DEPLOY #154], Owner[GBK/QG_check_minor250518/206:GBK/QG_check_minor250518 #206], Owner[PPRB_DepositCashOperations/card/BuildDistr_Develop_Nexus_Prod_QG/57:PPRB_DepositCashOperations/card/BuildDistr_Develop_Nexus_Prod_QG #57], Owner[PRCRED/CBIR/AUTODEPLOY_PIPE_KENNY/33:PRCRED/CBIR/AUTODEPLOY_PIPE_KENNY #33], Owner[Kalita/Regular_clt_dev_builds/295:Kalita/Regular_clt_dev_builds #295], Owner[DataFactory/stork/Build_PullRequest/525:DataFactory/stork/Build_PullRequest #525], Owner[Kalita/tmp/test_pr/48:Kalita/tmp/test_pr #48], Owner[AEP/AEP_QG/806:AEP/AEP_QG #806], Owner[SNUiL/DevOps/CI-Builds/CI-Build-PIR-29/549:SNUiL/DevOps/CI-Builds/CI-Build-PIR-29 #549], Owner[GBK/QG_Check_dev/346:GBK/QG_Check_dev #346], Owner[GBK/QG_check_minor250518/207:GBK/QG_check_minor250518 #207], Owner[ECOD/DEVELOP/NexusArtifactFlag/1:ECOD/DEVELOP/NexusArtifactFlag #1], Owner[Kalita/Parallel_Pipeline_2/1003:Kalita/Parallel_Pipeline_2 #1003], Owner[HPSM/HPSM_pipeline/403:HPSM/HPSM_pipeline #403], Owner[ESB_CF/IIB9_PIPELINE/3751:ESB_CF/IIB9_PIPELINE #3751], Owner[DataFactory/stork/Build_PullRequest/526:DataFactory/stork/Build_PullRequest #526], Owner[PPRB_DepositCashOperations/card/BuildDistr_Develop_Nexus_Prod_QG/58:PPRB_DepositCashOperations/card/BuildDistr_Develop_Nexus_Prod_QG #58], Owner[DevOps/AHD/sleep-test/2:DevOps/AHD/sleep-test #2], Owner[ESB_SMP/FullBuild/484:ESB_SMP/FullBuild #484], Owner[DataFactory/stork/Build_PullRequest/527:DataFactory/stork/Build_PullRequest #527], Owner[SBK/SMARTREGRESS_TEST_IFT1/4:SBK/SMARTREGRESS_TEST_IFT1 #4], Owner[impprb/checkQG/249:impprb/checkQG #249], Owner[DevOps/AHD/sleep-test/4:DevOps/AHD/sleep-test #4], Owner[TFS/PRBuilders/BuilderPRForCore/903:TFS/PRBuilders/BuilderPRForCore #903], Owner[HPSM/HPSM_pipeline/404:HPSM/HPSM_pipeline #404], Owner[FCCM8/regular_release/41:FCCM8/regular_release #41], Owner[ESB/DevOps/Other/AUTOTESTS/DO_PR_INIT/380:ESB/DevOps/Other/AUTOTESTS/DO_PR_INIT #380], Owner[PPRB_DevOps/KBT/Install_KBT(DEV)_ear/268:PPRB_DevOps/KBT/Install_KBT(DEV)_ear #268], Owner[DataFactory/stork/Build_PullRequest/528:DataFactory/stork/Build_PullRequest #528], Owner[SNUiL/DevOps/CI-Deploy/CI-Deploy-to-testing-from-git/389:SNUiL/DevOps/CI-Deploy/CI-Deploy-to-testing-from-git #389], Owner[ASCC/ascc_full_RELEASE/02.013.00_STG-19937_jenkins_release_job/41:ASCC/ascc_full_RELEASE/02.013.00_STG-19937_jenkins_release_job #41], Owner[ASKOO/koo-release-build/feature%2F253/29:ASKOO/koo-release-build/feature%2F253 #29], Owner[GateWayDP/Gateways/Gateway_EDOGO_CI_PIPELINE/334:GateWayDP/Gateways/Gateway_EDOGO_CI_PIPELINE #334], Owner[DEPOZITORY/PB/69:DEPOZITORY/PB #69], Owner[DataFactory/stork/Auto_Test_DEV/2115:DataFactory/stork/Auto_Test_DEV #2115], Owner[GBK/QG_check_minor250518/208:GBK/QG_check_minor250518 #208], Owner[TDS/GREEN_AN_GREEN/87:TDS/GREEN_AN_GREEN #87], Owner[PPRB_DepositCashOperations/common/Publish_to_IFT_universal/320:PPRB_DepositCashOperations/common/Publish_to_IFT_universal #320], Owner[PPRB_DevOps/Quality_Gate_pipes/Universal_Quality_Gate_pipe/9156:PPRB_DevOps/Quality_Gate_pipes/Universal_Quality_Gate_pipe #9156], Owner[ASCC/server1/02.013.00/1496:ASCC/server1/02.013.00 #1496], Owner[ASCC/ascc_server_branch_build/02.013.00_STG-19937_jenkins_release_job/37:ASCC/ascc_server_branch_build/02.013.00_STG-19937_jenkins_release_job #37], Owner[ESB/FS/Meshkov/FS_CI_RFC_tst/763:ESB/FS/Meshkov/FS_CI_RFC_tst #763], Owner[PPRB_DevOps/Quality_Gate_pipes/Universal_Quality_Gate_pipe/9157:PPRB_DevOps/Quality_Gate_pipes/Universal_Quality_Gate_pipe #9157], Owner[mmt/DEV/1051:mmt/DEV #1051], Owner[ASKOO/koo-release-build/support%2F02.021/18:ASKOO/koo-release-build/support%2F02.021 #18], Owner[Kalita/Parallel_Pipeline_2/1005:Kalita/Parallel_Pipeline_2 #1005], Owner[DataFactory/stork/Build_PullRequest/529:DataFactory/stork/Build_PullRequest #529], Owner[SNUiL/DevOps/CI-Builds/CI-Build-SNUILDEV-3794-COURIER/249:SNUiL/DevOps/CI-Builds/CI-Build-SNUILDEV-3794-COURIER #249], Owner[ASBS/buildByPipeline/532:ASBS/buildByPipeline #532], Owner[ASCC/ascc_server_branch_build/02.014.00_STG-18499_CompositeOutCashOrders/14:ASCC/ascc_server_branch_build/02.014.00_STG-18499_CompositeOutCashOrders #14], Owner[adpSWIFT/adpSWIFT_PIPELINE/3102:adpSWIFT/adpSWIFT_PIPELINE #3102], Owner[ASCC/server1/02.014.00/407:ASCC/server1/02.014.00 #407], Owner[ESB/DevOps/Dev/ESB_KF_CI00223537/PartialESBInstall/33:ESB/DevOps/Dev/ESB_KF_CI00223537/PartialESBInstall #33], Owner[PPRB_DevOps/Quality_Gate_pipes/Universal_Quality_Gate_pipe/9160:PPRB_DevOps/Quality_Gate_pipes/Universal_Quality_Gate_pipe #9160], Owner[ESB/FS/FS_PR_INIT/24747:ESB/FS/FS_PR_INIT #24747], Owner[PPRB.OIP/kbt-scripts/ucp-corp/359:PPRB.OIP/kbt-scripts/ucp-corp #359], Owner[PPRB_DevOps/Install_EIP/Install_EIP_2_cli/733:PPRB_DevOps/Install_EIP/Install_EIP_2_cli #733], Owner[edosgo/elgo-mvd-clientverify/elgo-mvd-clientverify2/38:edosgo/elgo-mvd-clientverify/elgo-mvd-clientverify2 #38], Owner[PPRB_DevOps/Install_EIP/Install_EIP_2_cli/734:PPRB_DevOps/Install_EIP/Install_EIP_2_cli #734], Owner[MBP/mbp-ci/140:MBP/mbp-ci #140], Owner[PPRBDOC/upload_to_pipe/deprecated/QualityGateOnOurPipe/QualityGate-Order/50:PPRBDOC/upload_to_pipe/deprecated/QualityGateOnOurPipe/QualityGate-Order #50], Owner[ESB/FS/FS_CI_RFC/3478:ESB/FS/FS_CI_RFC #3478], Owner[DataFactory/stork/Build_required_distrib/122:DataFactory/stork/Build_required_distrib #122], Owner[TDS/Update_stand_by_url/920:TDS/Update_stand_by_url #920], Owner[CBDBO/Pipeline/2871:CBDBO/Pipeline #2871], Owner[PRCRED/EKB3/loans-for-persons-ekb3-pipeline___force-WF-and-envelope_Stend2_major-2-2018-05-27/315:PRCRED/EKB3/loans-for-persons-ekb3-pipeline___force-WF-and-envelope_Stend2_major-2-2018-05-27 #315], Owner[edosgo/elgo-fns-clientverify/elgo-fns-clientverify-release/193:edosgo/elgo-fns-clientverify/elgo-fns-clientverify-release #193], Owner[DepositPfETL/deposit-client-validation-pipeline-parameters/19:DepositPfETL/deposit-client-validation-pipeline-parameters #19], Owner[mmt/TEST/1:mmt/TEST #1], Owner[edosgo/elgo-remote-starter/323:edosgo/elgo-remote-starter #323], Owner[ESB/DevOps/Other/AUTOTESTS/PartialESBRestoreExGroup/302:ESB/DevOps/Other/AUTOTESTS/PartialESBRestoreExGroup #302], Owner[PPRB_DevOps/Install_EIP/Install_EIP_2_cli/735:PPRB_DevOps/Install_EIP/Install_EIP_2_cli #735], Owner[PPRBTEAM/Gera/gera-autodeploy-wf/116:PPRBTEAM/Gera/gera-autodeploy-wf #116], Owner[EKPiT/deploy-db-dev1/100:EKPiT/deploy-db-dev1 #100], Owner[TDS/PR_pipeline/903:TDS/PR_pipeline #903], Owner[edosgo/elgo-msh-reestrcontract/elgo-msh-reestrcontract-dev-barrier-pipeline/14:edosgo/elgo-msh-reestrcontract/elgo-msh-reestrcontract-dev-barrier-pipeline #14], Owner[ESB/FS/FS_CI_INIT/3648:ESB/FS/FS_CI_INIT #3648], Owner[KKA/TRIGGER_JOB_NEW_BUILD_IN_NEXUS/4907:KKA/TRIGGER_JOB_NEW_BUILD_IN_NEXUS #4907], Owner[edosgo/elgo-msh-reestrcontract/elgo-msh-reestrcontract-dev-barrier-deploy/21:edosgo/elgo-msh-reestrcontract/elgo-msh-reestrcontract-dev-barrier-deploy #21], Owner[Tengri/HDPLocalCiInsallation/151:Tengri/HDPLocalCiInsallation #151], Owner[mgr/API/PUBLISH_ALL/68435:mgr/API/PUBLISH_ALL #68435], Owner[EPS/Main_Pre_Build_Distr_DevOps2018_Pipeline/6405:EPS/Main_Pre_Build_Distr_DevOps2018_Pipeline #6405], Owner[KKA/TRIGGER_JOB_NEW_BUILD_IN_NEXUS_FLAG/4533:KKA/TRIGGER_JOB_NEW_BUILD_IN_NEXUS_FLAG #4533], Owner[PPRBCPRB/Pipeline_Server_Build_For_Dev_Server/10969:PPRBCPRB/Pipeline_Server_Build_For_Dev_Server #10969], Owner[PPRB_CEP/PSI_TEST_Pipe/7763:PPRB_CEP/PSI_TEST_Pipe #7763]]
      ...
      
      May 28, 2018 4:35:14 PM org.jenkinsci.plugins.workflow.job.WorkflowRun finish
      INFO: KKA/TRIGGER_JOB_NEW_BUILD_IN_NEXUS_FLAG #4533 completed: SUCCESS
      
      ....

      As a result, build queue grows since all avaliable executors are busy.

      Some workaround to defer restart: periodically run script to free executors:

      import hudson.model.*;
      
      nodes = jenkins.model.Jenkins.instance.nodes
      nodes.removeAll(Collections.singleton(null))
      
      nodes.each { node ->
      manager.listener.logger.println("-------PROCESSING NODE: $node.displayName -------------------")
      def exec = node.toComputer()
      if (exec == null) {	
      manager.listener.logger.println("------- WARNING: $node.displayName: NULL. Removing! -------------------")
      Jenkins.instance.removeNode(node)
      return;
      }
      
      exec.getExecutors().each { job ->
      if (job.busy && job.progress == -1) {
      manager.listener.logger.println("JOB $job.name LOOKS LIKE STUCK. KILLING.")
      def owner = job.owner
      owner.removeExecutor((hudson.model.Executor) job)
      }
      }
      }
      
      return null
      
       

        Attachments

        1. stuck-executor.png
          stuck-executor.png
          4 kB
        2. finished-build.PNG
          finished-build.PNG
          66 kB
        3. build-pipeline-steps.PNG
          build-pipeline-steps.PNG
          44 kB
        4. sleep-test-script.PNG
          sleep-test-script.PNG
          12 kB
        5. thread_dump.html
          490 kB
        6. image-2018-11-27-11-31-06-881.png
          image-2018-11-27-11-31-06-881.png
          462 kB
        7. image-2018-11-27-11-32-58-457.png
          image-2018-11-27-11-32-58-457.png
          230 kB
        8. BusyTimers.png
          BusyTimers.png
          37 kB
        9. NormalCase.PNG
          NormalCase.PNG
          28 kB

          Issue Links

            Activity

            Hide
            davidvanlaatum David van Laatum added a comment -

            seems to have fixed if for us too jenkins has been stable since I disabled the log file size check on all builds

            Show
            davidvanlaatum David van Laatum added a comment - seems to have fixed if for us too jenkins has been stable since I disabled the log file size check on all builds
            Hide
            rag1 Raphael Greger added a comment -

            Hi David

            I got exactly the same problem. What was your workaround? Where did you set the log size check?

            Show
            rag1 Raphael Greger added a comment - Hi David I got exactly the same problem. What was your workaround? Where did you set the log size check?
            Hide
            davidvanlaatum David van Laatum added a comment -

            in the job config there is an option "Abort the build if its log file size is too big" from memory I used the configuration slicer plugin to remove it from all jobs

            Show
            davidvanlaatum David van Laatum added a comment - in the job config there is an option "Abort the build if its log file size is too big" from memory I used the configuration slicer plugin to remove it from all jobs
            Hide
            rag1 Raphael Greger added a comment -

            Thank you David. I didn't find this configuration but I did some restriction with the log of job config history. Now the problem seems to be vanished. 

            Show
            rag1 Raphael Greger added a comment - Thank you David. I didn't find this configuration but I did some restriction with the log of job config history. Now the problem seems to be vanished. 
            Hide
            mengfeil li added a comment -

            look like the issue is still exist. is there any update? 2.289.1

            Show
            mengfeil li added a comment - look like the issue is still exist. is there any update? 2.289.1

              People

              Assignee:
              Unassigned Unassigned
              Reporter:
              brainsam Alexander Moiseenko
              Votes:
              2 Vote for this issue
              Watchers:
              20 Start watching this issue

                Dates

                Created:
                Updated: