Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-30271

Jenkins job hangs and cannot be killed after aborting DSL job

    • Icon: Bug Bug
    • Resolution: Cannot Reproduce
    • Icon: Major Major
    • job-dsl-plugin
    • None

      Running job-dsl-plugin off the master branch (78ae0c67c3d0b3ce630cca0987c5fba869c8dfb1) plus 3 of my outstanding pull requests folded in. I don't think that this issue is related to my changes (which are about adding new DSL configuration options for plugins)

      Steps to reproduce:

      1. Run a job with a Process DSL Script task (ideally long-running)
      2. Abort job while it's executing. Job will not stop, but will finish normally.
      3. Run the job again. Job will hang and cannot be aborted.

      Interestingly, Thread.interrupt() from the script console will not kill it. Using Monitoring plugin's kill feature does kill the job, but the next run will still hang. Only a restart of the Jenkins master fixes the problem.

      UPDATE: also happens without aborting jobs, making this more than minor. BTW, I am running this on two masters and it happened only once on one and a lot on the other.

      Stacktrace for the stuck job:

      Executor #1 for master : executing DSL Job Builder #108
      sun.misc.Unsafe.park(Native Method)
      java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
      java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
      java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
      java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
      org.codehaus.groovy.util.LockableObject.lock(LockableObject.java:34)
      org.codehaus.groovy.reflection.ClassInfo.lock(ClassInfo.java:268)
      org.codehaus.groovy.reflection.ClassInfo.getMetaClass(ClassInfo.java:193)
      org.codehaus.groovy.runtime.metaclass.MetaClassRegistryImpl.getMetaClass(MetaClassRegistryImpl.java:231)
      org.codehaus.groovy.runtime.InvokerHelper.getMetaClass(InvokerHelper.java:747)
      org.codehaus.groovy.runtime.callsite.CallSiteArray.createPojoSite(CallSiteArray.java:109)
      org.codehaus.groovy.runtime.callsite.CallSiteArray.createCallSite(CallSiteArray.java:150)
      org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:42)
      org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:108)
      org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:116)
      javaposse.jobdsl.dsl.helpers.BuildParametersContext.checkParameterName(BuildParametersContext.groovy:270)
      javaposse.jobdsl.dsl.helpers.BuildParametersContext.simpleParam(BuildParametersContext.groovy:187)
      javaposse.jobdsl.dsl.helpers.BuildParametersContext.this$3$simpleParam(BuildParametersContext.groovy)
      javaposse.jobdsl.dsl.helpers.BuildParametersContext$this$3$simpleParam.callCurrent(Unknown Source)
      org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCallCurrent(CallSiteArray.java:46)
      javaposse.jobdsl.dsl.helpers.BuildParametersContext$this$3$simpleParam.callCurrent(Unknown Source)
      javaposse.jobdsl.dsl.helpers.BuildParametersContext.stringParam(BuildParametersContext.groovy:179)
      javaposse.jobdsl.dsl.helpers.BuildParametersContext.stringParam(BuildParametersContext.groovy)
      sun.reflect.GeneratedMethodAccessor459.invoke(Unknown Source)
      sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      java.lang.reflect.Method.invoke(Method.java:497)
      org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:90)
      groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:233)
      org.codehaus.groovy.runtime.metaclass.ClosureMetaClass.invokeMethod(ClosureMetaClass.java:361)
      groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:903)
      org.codehaus.groovy.runtime.callsite.PogoMetaClassSite.callCurrent(PogoMetaClassSite.java:66)
      org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCallCurrent(CallSiteArray.java:46)
      org.codehaus.groovy.runtime.callsite.AbstractCallSite.callCurrent(AbstractCallSite.java:133)
      org.codehaus.groovy.runtime.callsite.AbstractCallSite.callCurrent(AbstractCallSite.java:141)
      PipelineJobsBuilder$_run_closure1_closure9_closure10_closure11_closure20.doCall(PipelineJobsBuilder.groovy:634)
      sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
      sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      java.lang.reflect.Method.invoke(Method.java:497)
      org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:90)
      groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:233)
      org.codehaus.groovy.runtime.metaclass.ClosureMetaClass.invokeMethod(ClosureMetaClass.java:272)
      groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:903)
      groovy.lang.Closure.call(Closure.java:415)
      groovy.lang.Closure.call(Closure.java:428)
      org.codehaus.groovy.runtime.DefaultGroovyMethods.each(DefaultGroovyMethods.java:1379)
      org.codehaus.groovy.runtime.DefaultGroovyMethods.each(DefaultGroovyMethods.java:1351)
      org.codehaus.groovy.runtime.dgm$170.invoke(Unknown Source)
      org.codehaus.groovy.runtime.callsite.PojoMetaMethodSite$PojoMetaMethodSiteNoUnwrapNoCoerce.invoke(PojoMetaMethodSite.java:271)
      org.codehaus.groovy.runtime.callsite.PojoMetaMethodSite.call(PojoMetaMethodSite.java:53)
      org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:42)
      org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:108)
      org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:116)
      org.codehaus.groovy.runtime.callsite.AbstractCallSite.callSafe(AbstractCallSite.java:82)
      PipelineJobsBuilder$_run_closure1_closure9_closure10_closure11.doCall(PipelineJobsBuilder.groovy:633)
      sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
      sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      java.lang.reflect.Method.invoke(Method.java:497)
      org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:90)
      groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:233)
      org.codehaus.groovy.runtime.metaclass.ClosureMetaClass.invokeMethod(ClosureMetaClass.java:272)
      groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:903)
      org.codehaus.groovy.runtime.callsite.PogoMetaClassSite.callCurrent(PogoMetaClassSite.java:66)
      org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCallCurrent(CallSiteArray.java:46)
      org.codehaus.groovy.runtime.callsite.AbstractCallSite.callCurrent(AbstractCallSite.java:133)
      org.codehaus.groovy.runtime.callsite.AbstractCallSite.callCurrent(AbstractCallSite.java:141)
      PipelineJobsBuilder$_run_closure1_closure9_closure10_closure11.doCall(PipelineJobsBuilder.groovy)
      sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
      sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      java.lang.reflect.Method.invoke(Method.java:497)
      org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:90)
      groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:233)
      org.codehaus.groovy.runtime.metaclass.ClosureMetaClass.invokeMethod(ClosureMetaClass.java:272)
      groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:903)
      org.codehaus.groovy.runtime.callsite.PogoMetaClassSite.call(PogoMetaClassSite.java:39)
      org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:42)
      org.codehaus.groovy.runtime.callsite.PogoMetaClassSite.call(PogoMetaClassSite.java:54)
      org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:112)
      javaposse.jobdsl.dsl.ContextHelper.executeInContext(ContextHelper.groovy:14)
      javaposse.jobdsl.dsl.ContextHelper$executeInContext.call(Unknown Source)
      org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:42)
      javaposse.jobdsl.dsl.ContextHelper$executeInContext.call(Unknown Source)
      javaposse.jobdsl.dsl.Job.parameters(Job.groovy:468)
      sun.reflect.GeneratedMethodAccessor626.invoke(Unknown Source)
      sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      java.lang.reflect.Method.invoke(Method.java:497)
      org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:90)
      groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:233)
      org.codehaus.groovy.runtime.metaclass.ClosureMetaClass.invokeMethod(ClosureMetaClass.java:361)
      groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:903)
      org.codehaus.groovy.runtime.callsite.PogoMetaClassSite.callCurrent(PogoMetaClassSite.java:66)
      org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCallCurrent(CallSiteArray.java:46)
      org.codehaus.groovy.runtime.callsite.AbstractCallSite.callCurrent(AbstractCallSite.java:133)
      org.codehaus.groovy.runtime.callsite.AbstractCallSite.callCurrent(AbstractCallSite.java:141)
      PipelineJobsBuilder$_run_closure1_closure9_closure10.doCall(PipelineJobsBuilder.groovy:632)
      sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
      sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      java.lang.reflect.Method.invoke(Method.java:497)
      org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:90)
      groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:233)
      org.codehaus.groovy.runtime.metaclass.ClosureMetaClass.invokeMethod(ClosureMetaClass.java:272)
      groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:903)
      groovy.lang.Closure.call(Closure.java:415)
      groovy.lang.Closure.call(Closure.java:428)
      org.codehaus.groovy.runtime.DefaultGroovyMethods.with(DefaultGroovyMethods.java:196)
      org.codehaus.groovy.runtime.dgm$926.invoke(Unknown Source)
      org.codehaus.groovy.runtime.callsite.PogoMetaMethodSite$PogoMetaMethodSiteNoUnwrapNoCoerce.invoke(PogoMetaMethodSite.java:313)
      org.codehaus.groovy.runtime.callsite.PogoMetaMethodSite.call(PogoMetaMethodSite.java:64)
      org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:42)
      org.codehaus.groovy.runtime.callsite.PogoMetaMethodSite.call(PogoMetaMethodSite.java:69)
      org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:116)
      javaposse.jobdsl.dsl.JobParent.processJob(JobParent.groovy:108)
      sun.reflect.GeneratedMethodAccessor559.invoke(Unknown Source)
      sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      java.lang.reflect.Method.invoke(Method.java:497)
      org.codehaus.groovy.runtime.callsite.PogoMetaMethodSite$PogoCachedMethodSiteNoUnwrapNoCoerce.invoke(PogoMetaMethodSite.java:272)
      org.codehaus.groovy.runtime.callsite.PogoMetaMethodSite.callCurrent(PogoMetaMethodSite.java:52)
      org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCallCurrent(CallSiteArray.java:46)
      org.codehaus.groovy.runtime.callsite.PogoMetaMethodSite.callCurrent(PogoMetaMethodSite.java:57)
      org.codehaus.groovy.runtime.callsite.AbstractCallSite.callCurrent(AbstractCallSite.java:149)
      javaposse.jobdsl.dsl.JobParent.freeStyleJob(JobParent.groovy:42)
      sun.reflect.GeneratedMethodAccessor637.invoke(Unknown Source)
      sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      java.lang.reflect.Method.invoke(Method.java:497)
      org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:90)
      groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:233)
      org.codehaus.groovy.runtime.metaclass.ClosureMetaClass.invokeMethod(ClosureMetaClass.java:361)
      org.codehaus.groovy.runtime.ScriptBytecodeAdapter.invokeMethodOnCurrentN(ScriptBytecodeAdapter.java:78)
      PipelineJobsBuilder$_run_closure1_closure9.doCall(PipelineJobsBuilder.groovy:615)
      sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
      sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      java.lang.reflect.Method.invoke(Method.java:497)
      org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:90)
      groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:233)
      org.codehaus.groovy.runtime.metaclass.ClosureMetaClass.invokeMethod(ClosureMetaClass.java:272)
      groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:903)
      groovy.lang.Closure.call(Closure.java:415)
      groovy.lang.Closure.call(Closure.java:428)
      org.codehaus.groovy.runtime.DefaultGroovyMethods.each(DefaultGroovyMethods.java:1379)
      org.codehaus.groovy.runtime.DefaultGroovyMethods.each(DefaultGroovyMethods.java:1351)
      org.codehaus.groovy.runtime.dgm$170.doMethodInvoke(Unknown Source)
      groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1079)
      groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:903)
      groovy.lang.DelegatingMetaClass.invokeMethod(DelegatingMetaClass.java:149)
      org.codehaus.groovy.runtime.callsite.PogoMetaClassSite.call(PogoMetaClassSite.java:39)
      org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:42)
      org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:108)
      org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:116)
      PipelineJobsBuilder$_run_closure1.doCall(PipelineJobsBuilder.groovy:589)
      sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
      sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      java.lang.reflect.Method.invoke(Method.java:497)
      org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:90)
      groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:233)
      org.codehaus.groovy.runtime.metaclass.ClosureMetaClass.invokeMethod(ClosureMetaClass.java:272)
      groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:903)
      groovy.lang.Closure.call(Closure.java:415)
      groovy.lang.Closure.call(Closure.java:428)
      org.codehaus.groovy.runtime.DefaultGroovyMethods.each(DefaultGroovyMethods.java:1379)
      org.codehaus.groovy.runtime.DefaultGroovyMethods.each(DefaultGroovyMethods.java:1351)
      org.codehaus.groovy.runtime.dgm$170.doMethodInvoke(Unknown Source)
      groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1079)
      groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:903)
      groovy.lang.DelegatingMetaClass.invokeMethod(DelegatingMetaClass.java:149)
      org.codehaus.groovy.runtime.callsite.PogoMetaClassSite.call(PogoMetaClassSite.java:39)
      org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:42)
      org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:108)
      org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:116)
      PipelineJobsBuilder.run(PipelineJobsBuilder.groovy:586)
      javaposse.jobdsl.dsl.DslScriptLoader.runDslEngineForParent(DslScriptLoader.java:80)
      javaposse.jobdsl.dsl.DslScriptLoader.runDslEngine(DslScriptLoader.java:123)
      javaposse.jobdsl.plugin.ExecuteDslScripts.perform(ExecuteDslScripts.java:216)
      hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:20)
      hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:761)
      hudson.model.Build$BuildExecution.build(Build.java:203)
      hudson.model.Build$BuildExecution.doRun(Build.java:160)
      hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:536)
      hudson.model.Run.execute(Run.java:1741)
      hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
      hudson.model.ResourceController.execute(ResourceController.java:98)
      hudson.model.Executor.run(Executor.java:374)

        1. Thread Dump.txt
          51 kB
        2. config.xml
          5 kB
        3. Stuck_seedjob_stacktrace.txt
          20 kB
        4. job-dsl.hpi
          2.33 MB

          [JENKINS-30271] Jenkins job hangs and cannot be killed after aborting DSL job

          Alexander Komarov added a comment - - edited

          This may or may not be the plugin's fault.

          Once this happens, the following code is enough to hang a job:

          def testvar = [:]
          def itemname = 'something'
          println "I am trying to print a nonexistent item: ${testvar[itemname]}"
          

          This code runs fine on one of my Jenkins masters, but hangs on the other. In fact, the hash element doesn't even need to be non-existent. However, since this only occurs after a run of the DSL plugin, it may still be an issue with job-dsl-plugin.

          I also made a job with two steps:

          1. Execute Groovy (using Groovy 2.4.3)
          2. Process Job DSL
            ... and used the code above in both steps. #1 worked and #2 hung

          Alexander Komarov added a comment - - edited This may or may not be the plugin's fault. Once this happens, the following code is enough to hang a job: def testvar = [:] def itemname = 'something' println "I am trying to print a nonexistent item: ${testvar[itemname]}" This code runs fine on one of my Jenkins masters, but hangs on the other. In fact, the hash element doesn't even need to be non-existent. However, since this only occurs after a run of the DSL plugin, it may still be an issue with job-dsl-plugin. I also made a job with two steps: Execute Groovy (using Groovy 2.4.3) Process Job DSL ... and used the code above in both steps. #1 worked and #2 hung

          Alexander Komarov added a comment - - edited

          For the record, this is a very ugly way to "fix" this particular issue without restarting Jenkins
          I'm still not sure if this is caused by this plugin, by jenkins, or by groovy itself.

          (Jenkins script console)

          import org.codehaus.groovy.reflection.*
          import org.codehaus.groovy.util.*
          import java.util.concurrent.locks.*
            
          ClassInfo.class.getDeclaredField("lock").setAccessible(true);
          LockableObject.class.getDeclaredField('owner').setAccessible(true);
          AbstractQueuedSynchronizer.class.getDeclaredField('state').setAccessible(true);
          
          LockableObject lock = ClassInfo.getClassInfo(LinkedHashMap.class).lock;
          
          println "Unlocking lock held by ${lock.owner} in state ${lock.state}"
          
          lock.owner = null
          lock.state = 0
          

          Since I don't know exactly what's causing this, I'm not sure what the long-term effects of this hack will be.

          Alexander Komarov added a comment - - edited For the record, this is a very ugly way to "fix" this particular issue without restarting Jenkins I'm still not sure if this is caused by this plugin, by jenkins, or by groovy itself. (Jenkins script console) import org.codehaus.groovy.reflection.* import org.codehaus.groovy.util.* import java.util.concurrent.locks.* ClassInfo. class. getDeclaredField( "lock" ).setAccessible( true ); LockableObject. class. getDeclaredField( 'owner' ).setAccessible( true ); AbstractQueuedSynchronizer. class. getDeclaredField( 'state' ).setAccessible( true ); LockableObject lock = ClassInfo.getClassInfo(LinkedHashMap.class).lock; println "Unlocking lock held by ${lock.owner} in state ${lock.state}" lock.owner = null lock.state = 0 Since I don't know exactly what's causing this, I'm not sure what the long-term effects of this hack will be.

          The attached HPI file contains a patch build from this pull request: https://github.com/jenkinsci/job-dsl-plugin/pull/604

          Can you test the HPI and report if it fixed your problem?

          Daniel Spilker added a comment - The attached HPI file contains a patch build from this pull request: https://github.com/jenkinsci/job-dsl-plugin/pull/604 Can you test the HPI and report if it fixed your problem?

          Adé Mochtar added a comment -

          I'm running into the same problem. It started when upgrading from 1.34 to 1.38.

          I've also tried the patched plugin, but that doesn't fix the problem.

          Adé Mochtar added a comment - I'm running into the same problem. It started when upgrading from 1.34 to 1.38. I've also tried the patched plugin, but that doesn't fix the problem.

          Can you take a threaddump from Jenkins (e.g. http://localhost:8080/threadDump) directly after hitting the abort button and post that here? That would help to see where the thread is stuck.

          Daniel Spilker added a comment - Can you take a threaddump from Jenkins (e.g. http://localhost:8080/threadDump ) directly after hitting the abort button and post that here? That would help to see where the thread is stuck.

          Adé Mochtar added a comment - - edited

          I have attached the stacktrace for the stuck job. I didn't abort the job though, it got stuck on its own.

          Adé Mochtar added a comment - - edited I have attached the stacktrace for the stuck job. I didn't abort the job though, it got stuck on its own.

          Adé Mochtar added a comment -

          It looks like it has something to do with the downstreamParameterized configuration that has changed in 1.38. I haven't seen any stuck jobs when reverting the config back to the pre 1.38 format.

          Adé Mochtar added a comment - It looks like it has something to do with the downstreamParameterized configuration that has changed in 1.38. I haven't seen any stuck jobs when reverting the config back to the pre 1.38 format.

          Alexander Komarov added a comment - - edited

          I can confirm that interrupting a job is not necessary. After restart of Jenkins, I have not interrupted any jobs, and the issue occurred almost every run of the DSL job. Additionally, it seems to affect unrelated Build Flow jobs in the same way (LinkedHashMap ClassInfo is locked from a DSL job run, and Build Flow wants to lock it), although not as frequently. For now, I have the unlock code I mentioned above as part of my DSL script.

          Alexander Komarov added a comment - - edited I can confirm that interrupting a job is not necessary. After restart of Jenkins, I have not interrupted any jobs, and the issue occurred almost every run of the DSL job. Additionally, it seems to affect unrelated Build Flow jobs in the same way (LinkedHashMap ClassInfo is locked from a DSL job run, and Build Flow wants to lock it), although not as frequently. For now, I have the unlock code I mentioned above as part of my DSL script.

          amochtar post a DSL script that reproduces the problem.

          akom: You should remove that unlock code and try again. It's not a good idea to mess with the runtime internals.

          Daniel Spilker added a comment - amochtar post a DSL script that reproduces the problem. akom : You should remove that unlock code and try again. It's not a good idea to mess with the runtime internals.

          I have had to run the unlock code at least 20 times today (prior to giving up and sticking into the groovy script). What makes you think that the 21st time will be different?

          Alexander Komarov added a comment - I have had to run the unlock code at least 20 times today (prior to giving up and sticking into the groovy script). What makes you think that the 21st time will be different?

          Adé Mochtar added a comment -

          daspilker: This is the diff going back from 1.38 to 1.37:

                 publishers {
                   downstreamParameterized{
          -          trigger("${repository}_${branchSanitized}_deploy") {
          -            condition('SUCCESS')
          -            parameters {
          -              predefinedProp('GIT_URL', "https://bitbucket.org/${organisation}/${repository}")
          -              predefinedProp('GIT_COMMIT', '${GIT_COMMIT}')
          -              currentBuild()
          -            }
          +          trigger("${repository}_${branchSanitized}_deploy", 'SUCCESS') {
          +            predefinedProp('GIT_URL', "https://bitbucket.org/${organisation}/${repository}")
          +            predefinedProp('GIT_COMMIT', '${GIT_COMMIT}')
          +            currentBuild()
                     }
                   }
                 }
          

          Adé Mochtar added a comment - daspilker : This is the diff going back from 1.38 to 1.37: publishers { downstreamParameterized{ - trigger( "${repository}_${branchSanitized}_deploy" ) { - condition( 'SUCCESS' ) - parameters { - predefinedProp( 'GIT_URL' , "https: //bitbucket.org/${organisation}/${repository}" ) - predefinedProp( 'GIT_COMMIT' , '${GIT_COMMIT}' ) - currentBuild() - } + trigger( "${repository}_${branchSanitized}_deploy" , 'SUCCESS' ) { + predefinedProp( 'GIT_URL' , "https: //bitbucket.org/${organisation}/${repository}" ) + predefinedProp( 'GIT_COMMIT' , '${GIT_COMMIT}' ) + currentBuild() } } }

          akom The unlock code should not be necessary. The DSL build step running with the patched HPI should abort immediately when the job is aborted. So nothing should hang. If the job does not abort immediately there is a problem with the patch or the thread is stuck in script code. In that case I need a thread dump after aborting the job to see where the job hangs. But the patched HPI must run in an untouched runtime.

          amochtar Post a complete, runable script that reproduces the problem.

          Daniel Spilker added a comment - akom The unlock code should not be necessary. The DSL build step running with the patched HPI should abort immediately when the job is aborted. So nothing should hang. If the job does not abort immediately there is a problem with the patch or the thread is stuck in script code. In that case I need a thread dump after aborting the job to see where the job hangs. But the patched HPI must run in an untouched runtime. amochtar Post a complete, runable script that reproduces the problem.

          Alexander Komarov added a comment - - edited

          daspilker, I cannot use the hpi you attached - I'm only seeing this problem in our production env where I rely on a version that includes my three pull requests. I can use it in my test environment but I've only seen this happen there once in a few weeks, so it won't be much of a test.

          Additionally, I'm not aborting DSL builds, as I mentioned, they hang just fine without my help.

          Alexander Komarov added a comment - - edited daspilker , I cannot use the hpi you attached - I'm only seeing this problem in our production env where I rely on a version that includes my three pull requests. I can use it in my test environment but I've only seen this happen there once in a few weeks, so it won't be much of a test. Additionally, I'm not aborting DSL builds, as I mentioned, they hang just fine without my help.

          Adé Mochtar added a comment - - edited

          daspilker: I've attached the config.xml for the seedjob, containing the script that gets stuck. I'm using the latest jenkins docker container to run this: https://hub.docker.com/_/jenkins/

          It got stuck on its own on the 3rd run.

          Adé Mochtar added a comment - - edited daspilker : I've attached the config.xml for the seedjob, containing the script that gets stuck. I'm using the latest jenkins docker container to run this: https://hub.docker.com/_/jenkins/ It got stuck on its own on the 3rd run.

          Adé Mochtar added a comment -

          Attached the corresponding thread dump as well

          Adé Mochtar added a comment - Attached the corresponding thread dump as well

          OK, so the primay problem is that the job hangs. That's the job can't be aborted is just a downstream problem.

          I ran the script provided by amochtar more than 3000 times and it hung three times in my test environment. I never experienced the problem in production although be have quite complex scripts.

          I assume that it's somehow related to GROOVY-5249. Unfortunately Jenkins uses an ancient version of Groovy, so we have to wait for JENKINS-21249 to get the fix.

          It can also be a problem in the java.util.concurrent package in certain JDKs. We are running Oracle JDK 1.7.0_85-b15 in production which never showed the problem. In my test environment I used OpenJDK 1.7.0_79-b14 and it reproduced the problem in 0.1% of all runs.

          Daniel Spilker added a comment - OK, so the primay problem is that the job hangs. That's the job can't be aborted is just a downstream problem. I ran the script provided by amochtar more than 3000 times and it hung three times in my test environment. I never experienced the problem in production although be have quite complex scripts. I assume that it's somehow related to GROOVY-5249 . Unfortunately Jenkins uses an ancient version of Groovy, so we have to wait for JENKINS-21249 to get the fix. It can also be a problem in the java.util.concurrent package in certain JDKs. We are running Oracle JDK 1.7.0_85-b15 in production which never showed the problem. In my test environment I used OpenJDK 1.7.0_79-b14 and it reproduced the problem in 0.1% of all runs.

          Haris Shahid added a comment -

          I came here and created an account just to log this exact same bug. I can atest this is definitely a bug and it occurs on 2 different Jenkins environments (very beefy multi node environments provisioned through Amazon Cloud) that my DevOps team manages on the Project that I am currently working on.

          Every time I have to restart Jenkins to fix the issue and then it occurs randomly after a few successful executions. I have used the Job DSL plug-in very extensively on my past two projects and have been an avid user since it's inception and am very familiar with the syntax and how the plugin works. So I assure you it has nothing to do with how I am using or the code that is being executed.

          This is definitely a bug and I would love for someone to fix it as it is hindering our ability to use an otherwise AWESOME! plugin.

          Thanks a lot for creating this plugin and continuing to maintain and enhance it at such a rapid pace. Also thank you in advance for fixing this bug ASAP.

          Haris Shahid added a comment - I came here and created an account just to log this exact same bug. I can atest this is definitely a bug and it occurs on 2 different Jenkins environments (very beefy multi node environments provisioned through Amazon Cloud) that my DevOps team manages on the Project that I am currently working on. Every time I have to restart Jenkins to fix the issue and then it occurs randomly after a few successful executions. I have used the Job DSL plug-in very extensively on my past two projects and have been an avid user since it's inception and am very familiar with the syntax and how the plugin works. So I assure you it has nothing to do with how I am using or the code that is being executed. This is definitely a bug and I would love for someone to fix it as it is hindering our ability to use an otherwise AWESOME! plugin. Thanks a lot for creating this plugin and continuing to maintain and enhance it at such a rapid pace. Also thank you in advance for fixing this bug ASAP.

          Kenny Moens added a comment -

          We faced the same issues on our Jenkins instance. We were using the JobDSL plugin already for 2 years and generate 200+ jobs with it, in all the time without troubles.

          Since our latest upgrade from 1.34 to 1.38 we face the same problems. Initially we applied the migration guide, and we run into the same problem almost every build of the JobDSL. After we reverted the changed of the parameterizedTrigger definition, as suggested by Adé, we almost never run into the problem anymore. It only occurs every 20-30 builds anymore.

          Currently we have to revert to some dirty hacks to avoid restarting our Jenkins server. This implies things like stopping the thread and clearing the lock as outlined by Alexander.

          Kenny Moens added a comment - We faced the same issues on our Jenkins instance. We were using the JobDSL plugin already for 2 years and generate 200+ jobs with it, in all the time without troubles. Since our latest upgrade from 1.34 to 1.38 we face the same problems. Initially we applied the migration guide, and we run into the same problem almost every build of the JobDSL. After we reverted the changed of the parameterizedTrigger definition, as suggested by Adé, we almost never run into the problem anymore. It only occurs every 20-30 builds anymore. Currently we have to revert to some dirty hacks to avoid restarting our Jenkins server. This implies things like stopping the thread and clearing the lock as outlined by Alexander.

          Please report the JDK that you are using to run Jenkins master. My guess is that the issue is somehow related to the JDK version.

          Daniel Spilker added a comment - Please report the JDK that you are using to run Jenkins master. My guess is that the issue is somehow related to the JDK version.

          Kenny Moens added a comment -

          I'm using JDK8u60 for my master node.

          Kenny Moens added a comment - I'm using JDK8u60 for my master node.

          Jenkins uses Groovy 1.8, but only Groovy 2.3 and later officially support JDK8, see Release notes for Groovy 2.3. Can you try JDK7?

          Daniel Spilker added a comment - Jenkins uses Groovy 1.8, but only Groovy 2.3 and later officially support JDK8, see Release notes for Groovy 2.3 . Can you try JDK7?

          Kenny Moens added a comment -

          Sure I can - but I will have to try it in our production environment, which will take some longer time to evaluate if it works.

          Kenny Moens added a comment - Sure I can - but I will have to try it in our production environment, which will take some longer time to evaluate if it works.

          Haris Shahid added a comment -

          Both Jenkins my team has set-up use Java 7 so I believe the version of Java is not the culprit. Any more suggestions?

          Haris Shahid added a comment - Both Jenkins my team has set-up use Java 7 so I believe the version of Java is not the culprit. Any more suggestions?

          We are running into the same problem with latest Jenkins (2.10) and latest job-dsl plugin on JDK8 (U92). The threadDump shows the following:

          "Executor #5 for master : executing generated-jobs/generate-jobs #85" Id=95 Group=main TIMED_WAITING on org.apache.tools.ant.taskdefs.PumpStreamHandler$ThreadWithPumper@1182799
              at java.lang.Object.wait(Native Method)
              -  waiting on org.apache.tools.ant.taskdefs.PumpStreamHandler$ThreadWithPumper@1182799
              at java.lang.Thread.join(Thread.java:1253)
              at org.apache.tools.ant.taskdefs.PumpStreamHandler.finish(PumpStreamHandler.java:188)
              at org.apache.tools.ant.taskdefs.PumpStreamHandler.stop(PumpStreamHandler.java:158)
              at org.apache.tools.ant.taskdefs.Execute.execute(Execute.java:521)
              at org.apache.tools.ant.taskdefs.ExecTask.runExecute(ExecTask.java:631)
              at org.apache.tools.ant.taskdefs.ExecuteOn.runParallel(ExecuteOn.java:717)
              at org.apache.tools.ant.taskdefs.ExecuteOn.runExec(ExecuteOn.java:480)
              at org.apache.tools.ant.taskdefs.ExecTask.execute(ExecTask.java:498)
              at org.apache.tools.ant.taskdefs.Chmod.execute(Chmod.java:181)
              at hudson.Util.makeWritable(Util.java:323)
              at hudson.Util.tryOnceDeleteFile(Util.java:277)
              at hudson.Util.tryOnceDeleteRecursive(Util.java:373)
              at hudson.Util.tryOnceDeleteContentsRecursive(Util.java:392)
              at hudson.Util.tryOnceDeleteRecursive(Util.java:372)
              at hudson.Util.tryOnceDeleteContentsRecursive(Util.java:392)
              at hudson.Util.tryOnceDeleteRecursive(Util.java:372)
              at hudson.Util.tryOnceDeleteContentsRecursive(Util.java:392)
              at hudson.Util.tryOnceDeleteRecursive(Util.java:372)
              at hudson.Util.tryOnceDeleteContentsRecursive(Util.java:392)
              at hudson.Util.tryOnceDeleteRecursive(Util.java:372)
              at hudson.Util.tryOnceDeleteContentsRecursive(Util.java:392)
              at hudson.Util.tryOnceDeleteRecursive(Util.java:372)
              at hudson.Util.tryOnceDeleteContentsRecursive(Util.java:392)
              at hudson.Util.tryOnceDeleteRecursive(Util.java:372)
              at hudson.Util.tryOnceDeleteContentsRecursive(Util.java:392)
              at hudson.Util.tryOnceDeleteRecursive(Util.java:372)
              at hudson.Util.tryOnceDeleteContentsRecursive(Util.java:392)
              at hudson.Util.tryOnceDeleteRecursive(Util.java:372)
              at hudson.Util.tryOnceDeleteContentsRecursive(Util.java:392)
              at hudson.Util.tryOnceDeleteRecursive(Util.java:372)
              at hudson.Util.tryOnceDeleteContentsRecursive(Util.java:392)
              at hudson.Util.tryOnceDeleteRecursive(Util.java:372)
              at hudson.Util.tryOnceDeleteContentsRecursive(Util.java:392)
              at hudson.Util.tryOnceDeleteRecursive(Util.java:372)
              at hudson.Util.deleteRecursive(Util.java:350)
              at hudson.model.AbstractItem.performDelete(AbstractItem.java:600)
              at hudson.model.Job.performDelete(Job.java:278)
              at org.jenkinsci.plugins.workflow.job.WorkflowJob.performDelete(WorkflowJob.java:580)
              at hudson.model.AbstractItem.delete(AbstractItem.java:589)
              -  locked org.jenkinsci.plugins.workflow.job.WorkflowJob@6c3a1409
              at hudson.model.Job.delete(Job.java:688)
              at javaposse.jobdsl.plugin.ExecuteDslScripts.updateGeneratedJobs(ExecuteDslScripts.java:329)
              at javaposse.jobdsl.plugin.ExecuteDslScripts.perform(ExecuteDslScripts.java:222)
              at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:20)
              at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:779)
              at hudson.model.Build$BuildExecution.build(Build.java:205)
              at hudson.model.Build$BuildExecution.doRun(Build.java:162)
              at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:534)
              at hudson.model.Run.execute(Run.java:1720)
              at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
              at hudson.model.ResourceController.execute(ResourceController.java:98)
              at hudson.model.Executor.run(Executor.java:410)
          

          Once we see that, the job cannot be aborted anymore, restarting Jenkins is the only way to stop it.

          Sebastian Hoß added a comment - We are running into the same problem with latest Jenkins (2.10) and latest job-dsl plugin on JDK8 (U92). The threadDump shows the following: "Executor #5 for master : executing generated-jobs/generate-jobs #85" Id=95 Group=main TIMED_WAITING on org.apache.tools.ant.taskdefs.PumpStreamHandler$ThreadWithPumper@1182799 at java.lang. Object .wait(Native Method) - waiting on org.apache.tools.ant.taskdefs.PumpStreamHandler$ThreadWithPumper@1182799 at java.lang. Thread .join( Thread .java:1253) at org.apache.tools.ant.taskdefs.PumpStreamHandler.finish(PumpStreamHandler.java:188) at org.apache.tools.ant.taskdefs.PumpStreamHandler.stop(PumpStreamHandler.java:158) at org.apache.tools.ant.taskdefs.Execute.execute(Execute.java:521) at org.apache.tools.ant.taskdefs.ExecTask.runExecute(ExecTask.java:631) at org.apache.tools.ant.taskdefs.ExecuteOn.runParallel(ExecuteOn.java:717) at org.apache.tools.ant.taskdefs.ExecuteOn.runExec(ExecuteOn.java:480) at org.apache.tools.ant.taskdefs.ExecTask.execute(ExecTask.java:498) at org.apache.tools.ant.taskdefs.Chmod.execute(Chmod.java:181) at hudson.Util.makeWritable(Util.java:323) at hudson.Util.tryOnceDeleteFile(Util.java:277) at hudson.Util.tryOnceDeleteRecursive(Util.java:373) at hudson.Util.tryOnceDeleteContentsRecursive(Util.java:392) at hudson.Util.tryOnceDeleteRecursive(Util.java:372) at hudson.Util.tryOnceDeleteContentsRecursive(Util.java:392) at hudson.Util.tryOnceDeleteRecursive(Util.java:372) at hudson.Util.tryOnceDeleteContentsRecursive(Util.java:392) at hudson.Util.tryOnceDeleteRecursive(Util.java:372) at hudson.Util.tryOnceDeleteContentsRecursive(Util.java:392) at hudson.Util.tryOnceDeleteRecursive(Util.java:372) at hudson.Util.tryOnceDeleteContentsRecursive(Util.java:392) at hudson.Util.tryOnceDeleteRecursive(Util.java:372) at hudson.Util.tryOnceDeleteContentsRecursive(Util.java:392) at hudson.Util.tryOnceDeleteRecursive(Util.java:372) at hudson.Util.tryOnceDeleteContentsRecursive(Util.java:392) at hudson.Util.tryOnceDeleteRecursive(Util.java:372) at hudson.Util.tryOnceDeleteContentsRecursive(Util.java:392) at hudson.Util.tryOnceDeleteRecursive(Util.java:372) at hudson.Util.tryOnceDeleteContentsRecursive(Util.java:392) at hudson.Util.tryOnceDeleteRecursive(Util.java:372) at hudson.Util.tryOnceDeleteContentsRecursive(Util.java:392) at hudson.Util.tryOnceDeleteRecursive(Util.java:372) at hudson.Util.tryOnceDeleteContentsRecursive(Util.java:392) at hudson.Util.tryOnceDeleteRecursive(Util.java:372) at hudson.Util.deleteRecursive(Util.java:350) at hudson.model.AbstractItem.performDelete(AbstractItem.java:600) at hudson.model.Job.performDelete(Job.java:278) at org.jenkinsci.plugins.workflow.job.WorkflowJob.performDelete(WorkflowJob.java:580) at hudson.model.AbstractItem.delete(AbstractItem.java:589) - locked org.jenkinsci.plugins.workflow.job.WorkflowJob@6c3a1409 at hudson.model.Job.delete(Job.java:688) at javaposse.jobdsl.plugin.ExecuteDslScripts.updateGeneratedJobs(ExecuteDslScripts.java:329) at javaposse.jobdsl.plugin.ExecuteDslScripts.perform(ExecuteDslScripts.java:222) at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:20) at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:779) at hudson.model.Build$BuildExecution.build(Build.java:205) at hudson.model.Build$BuildExecution.doRun(Build.java:162) at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:534) at hudson.model.Run.execute(Run.java:1720) at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43) at hudson.model.ResourceController.execute(ResourceController.java:98) at hudson.model.Executor.run(Executor.java:410) Once we see that, the job cannot be aborted anymore, restarting Jenkins is the only way to stop it.

          sebhoss that is a different problem with a different stack trace.

          Can someone reproduce the problem on Jenkins 2.0? If not, I'm going to close this ticket.

          Daniel Spilker added a comment - sebhoss that is a different problem with a different stack trace. Can someone reproduce the problem on Jenkins 2.0? If not, I'm going to close this ticket.

          Roy Tinker added a comment -

          I'm having the exact problem and stack trace Sebastian pasted. Freestyle builds are hanging. This is Jenkins 2.53 on JDK7.

          Roy Tinker added a comment - I'm having the exact problem and stack trace Sebastian pasted. Freestyle builds are hanging. This is Jenkins 2.53 on JDK7.

          daspilker I have not reproduced the problem in 2.x but mostly because I'm too scared to try to kill a DSL run, I even have a red notice against that in the description.  It's quite possible that it's fixed because I haven't had it happen without aborting anymore.

          Alexander Komarov added a comment - daspilker I have not reproduced the problem in 2.x but mostly because I'm too scared to try to kill a DSL run, I even have a red notice against that in the description.  It's quite possible that it's fixed because I haven't had it happen without aborting anymore.

          Groovy has been updated from 2.4.8 to 2.4.11 in Jenkins 2.61. The issue should be fixed, see https://issues.apache.org/jira/browse/GROOVY-8067. Can anyone reproduce the problem in 2.61 or later? If not, I'm going to close this ticket.

          Daniel Spilker added a comment - Groovy has been updated from 2.4.8 to 2.4.11 in Jenkins 2.61. The issue should be fixed, see https://issues.apache.org/jira/browse/GROOVY-8067. Can anyone reproduce the problem in 2.61 or later? If not, I'm going to close this ticket.

          I'm closing the issue because the problem does not occur with newer versions of Jenkins and Job DSL. Please re-open if you can reproduce the problem.

          Daniel Spilker added a comment - I'm closing the issue because the problem does not occur with newer versions of Jenkins and Job DSL. Please re-open if you can reproduce the problem.

            daspilker Daniel Spilker
            akom Alexander Komarov
            Votes:
            6 Vote for this issue
            Watchers:
            13 Start watching this issue

              Created:
              Updated:
              Resolved: