Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-30271

Jenkins job hangs and cannot be killed after aborting DSL job

    XMLWordPrintable

Details

    Description

      Running job-dsl-plugin off the master branch (78ae0c67c3d0b3ce630cca0987c5fba869c8dfb1) plus 3 of my outstanding pull requests folded in. I don't think that this issue is related to my changes (which are about adding new DSL configuration options for plugins)

      Steps to reproduce:

      1. Run a job with a Process DSL Script task (ideally long-running)
      2. Abort job while it's executing. Job will not stop, but will finish normally.
      3. Run the job again. Job will hang and cannot be aborted.

      Interestingly, Thread.interrupt() from the script console will not kill it. Using Monitoring plugin's kill feature does kill the job, but the next run will still hang. Only a restart of the Jenkins master fixes the problem.

      UPDATE: also happens without aborting jobs, making this more than minor. BTW, I am running this on two masters and it happened only once on one and a lot on the other.

      Stacktrace for the stuck job:

      Executor #1 for master : executing DSL Job Builder #108
      sun.misc.Unsafe.park(Native Method)
      java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
      java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
      java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
      java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
      org.codehaus.groovy.util.LockableObject.lock(LockableObject.java:34)
      org.codehaus.groovy.reflection.ClassInfo.lock(ClassInfo.java:268)
      org.codehaus.groovy.reflection.ClassInfo.getMetaClass(ClassInfo.java:193)
      org.codehaus.groovy.runtime.metaclass.MetaClassRegistryImpl.getMetaClass(MetaClassRegistryImpl.java:231)
      org.codehaus.groovy.runtime.InvokerHelper.getMetaClass(InvokerHelper.java:747)
      org.codehaus.groovy.runtime.callsite.CallSiteArray.createPojoSite(CallSiteArray.java:109)
      org.codehaus.groovy.runtime.callsite.CallSiteArray.createCallSite(CallSiteArray.java:150)
      org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:42)
      org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:108)
      org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:116)
      javaposse.jobdsl.dsl.helpers.BuildParametersContext.checkParameterName(BuildParametersContext.groovy:270)
      javaposse.jobdsl.dsl.helpers.BuildParametersContext.simpleParam(BuildParametersContext.groovy:187)
      javaposse.jobdsl.dsl.helpers.BuildParametersContext.this$3$simpleParam(BuildParametersContext.groovy)
      javaposse.jobdsl.dsl.helpers.BuildParametersContext$this$3$simpleParam.callCurrent(Unknown Source)
      org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCallCurrent(CallSiteArray.java:46)
      javaposse.jobdsl.dsl.helpers.BuildParametersContext$this$3$simpleParam.callCurrent(Unknown Source)
      javaposse.jobdsl.dsl.helpers.BuildParametersContext.stringParam(BuildParametersContext.groovy:179)
      javaposse.jobdsl.dsl.helpers.BuildParametersContext.stringParam(BuildParametersContext.groovy)
      sun.reflect.GeneratedMethodAccessor459.invoke(Unknown Source)
      sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      java.lang.reflect.Method.invoke(Method.java:497)
      org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:90)
      groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:233)
      org.codehaus.groovy.runtime.metaclass.ClosureMetaClass.invokeMethod(ClosureMetaClass.java:361)
      groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:903)
      org.codehaus.groovy.runtime.callsite.PogoMetaClassSite.callCurrent(PogoMetaClassSite.java:66)
      org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCallCurrent(CallSiteArray.java:46)
      org.codehaus.groovy.runtime.callsite.AbstractCallSite.callCurrent(AbstractCallSite.java:133)
      org.codehaus.groovy.runtime.callsite.AbstractCallSite.callCurrent(AbstractCallSite.java:141)
      PipelineJobsBuilder$_run_closure1_closure9_closure10_closure11_closure20.doCall(PipelineJobsBuilder.groovy:634)
      sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
      sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      java.lang.reflect.Method.invoke(Method.java:497)
      org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:90)
      groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:233)
      org.codehaus.groovy.runtime.metaclass.ClosureMetaClass.invokeMethod(ClosureMetaClass.java:272)
      groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:903)
      groovy.lang.Closure.call(Closure.java:415)
      groovy.lang.Closure.call(Closure.java:428)
      org.codehaus.groovy.runtime.DefaultGroovyMethods.each(DefaultGroovyMethods.java:1379)
      org.codehaus.groovy.runtime.DefaultGroovyMethods.each(DefaultGroovyMethods.java:1351)
      org.codehaus.groovy.runtime.dgm$170.invoke(Unknown Source)
      org.codehaus.groovy.runtime.callsite.PojoMetaMethodSite$PojoMetaMethodSiteNoUnwrapNoCoerce.invoke(PojoMetaMethodSite.java:271)
      org.codehaus.groovy.runtime.callsite.PojoMetaMethodSite.call(PojoMetaMethodSite.java:53)
      org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:42)
      org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:108)
      org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:116)
      org.codehaus.groovy.runtime.callsite.AbstractCallSite.callSafe(AbstractCallSite.java:82)
      PipelineJobsBuilder$_run_closure1_closure9_closure10_closure11.doCall(PipelineJobsBuilder.groovy:633)
      sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
      sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      java.lang.reflect.Method.invoke(Method.java:497)
      org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:90)
      groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:233)
      org.codehaus.groovy.runtime.metaclass.ClosureMetaClass.invokeMethod(ClosureMetaClass.java:272)
      groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:903)
      org.codehaus.groovy.runtime.callsite.PogoMetaClassSite.callCurrent(PogoMetaClassSite.java:66)
      org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCallCurrent(CallSiteArray.java:46)
      org.codehaus.groovy.runtime.callsite.AbstractCallSite.callCurrent(AbstractCallSite.java:133)
      org.codehaus.groovy.runtime.callsite.AbstractCallSite.callCurrent(AbstractCallSite.java:141)
      PipelineJobsBuilder$_run_closure1_closure9_closure10_closure11.doCall(PipelineJobsBuilder.groovy)
      sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
      sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      java.lang.reflect.Method.invoke(Method.java:497)
      org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:90)
      groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:233)
      org.codehaus.groovy.runtime.metaclass.ClosureMetaClass.invokeMethod(ClosureMetaClass.java:272)
      groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:903)
      org.codehaus.groovy.runtime.callsite.PogoMetaClassSite.call(PogoMetaClassSite.java:39)
      org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:42)
      org.codehaus.groovy.runtime.callsite.PogoMetaClassSite.call(PogoMetaClassSite.java:54)
      org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:112)
      javaposse.jobdsl.dsl.ContextHelper.executeInContext(ContextHelper.groovy:14)
      javaposse.jobdsl.dsl.ContextHelper$executeInContext.call(Unknown Source)
      org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:42)
      javaposse.jobdsl.dsl.ContextHelper$executeInContext.call(Unknown Source)
      javaposse.jobdsl.dsl.Job.parameters(Job.groovy:468)
      sun.reflect.GeneratedMethodAccessor626.invoke(Unknown Source)
      sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      java.lang.reflect.Method.invoke(Method.java:497)
      org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:90)
      groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:233)
      org.codehaus.groovy.runtime.metaclass.ClosureMetaClass.invokeMethod(ClosureMetaClass.java:361)
      groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:903)
      org.codehaus.groovy.runtime.callsite.PogoMetaClassSite.callCurrent(PogoMetaClassSite.java:66)
      org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCallCurrent(CallSiteArray.java:46)
      org.codehaus.groovy.runtime.callsite.AbstractCallSite.callCurrent(AbstractCallSite.java:133)
      org.codehaus.groovy.runtime.callsite.AbstractCallSite.callCurrent(AbstractCallSite.java:141)
      PipelineJobsBuilder$_run_closure1_closure9_closure10.doCall(PipelineJobsBuilder.groovy:632)
      sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
      sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      java.lang.reflect.Method.invoke(Method.java:497)
      org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:90)
      groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:233)
      org.codehaus.groovy.runtime.metaclass.ClosureMetaClass.invokeMethod(ClosureMetaClass.java:272)
      groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:903)
      groovy.lang.Closure.call(Closure.java:415)
      groovy.lang.Closure.call(Closure.java:428)
      org.codehaus.groovy.runtime.DefaultGroovyMethods.with(DefaultGroovyMethods.java:196)
      org.codehaus.groovy.runtime.dgm$926.invoke(Unknown Source)
      org.codehaus.groovy.runtime.callsite.PogoMetaMethodSite$PogoMetaMethodSiteNoUnwrapNoCoerce.invoke(PogoMetaMethodSite.java:313)
      org.codehaus.groovy.runtime.callsite.PogoMetaMethodSite.call(PogoMetaMethodSite.java:64)
      org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:42)
      org.codehaus.groovy.runtime.callsite.PogoMetaMethodSite.call(PogoMetaMethodSite.java:69)
      org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:116)
      javaposse.jobdsl.dsl.JobParent.processJob(JobParent.groovy:108)
      sun.reflect.GeneratedMethodAccessor559.invoke(Unknown Source)
      sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      java.lang.reflect.Method.invoke(Method.java:497)
      org.codehaus.groovy.runtime.callsite.PogoMetaMethodSite$PogoCachedMethodSiteNoUnwrapNoCoerce.invoke(PogoMetaMethodSite.java:272)
      org.codehaus.groovy.runtime.callsite.PogoMetaMethodSite.callCurrent(PogoMetaMethodSite.java:52)
      org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCallCurrent(CallSiteArray.java:46)
      org.codehaus.groovy.runtime.callsite.PogoMetaMethodSite.callCurrent(PogoMetaMethodSite.java:57)
      org.codehaus.groovy.runtime.callsite.AbstractCallSite.callCurrent(AbstractCallSite.java:149)
      javaposse.jobdsl.dsl.JobParent.freeStyleJob(JobParent.groovy:42)
      sun.reflect.GeneratedMethodAccessor637.invoke(Unknown Source)
      sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      java.lang.reflect.Method.invoke(Method.java:497)
      org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:90)
      groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:233)
      org.codehaus.groovy.runtime.metaclass.ClosureMetaClass.invokeMethod(ClosureMetaClass.java:361)
      org.codehaus.groovy.runtime.ScriptBytecodeAdapter.invokeMethodOnCurrentN(ScriptBytecodeAdapter.java:78)
      PipelineJobsBuilder$_run_closure1_closure9.doCall(PipelineJobsBuilder.groovy:615)
      sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
      sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      java.lang.reflect.Method.invoke(Method.java:497)
      org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:90)
      groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:233)
      org.codehaus.groovy.runtime.metaclass.ClosureMetaClass.invokeMethod(ClosureMetaClass.java:272)
      groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:903)
      groovy.lang.Closure.call(Closure.java:415)
      groovy.lang.Closure.call(Closure.java:428)
      org.codehaus.groovy.runtime.DefaultGroovyMethods.each(DefaultGroovyMethods.java:1379)
      org.codehaus.groovy.runtime.DefaultGroovyMethods.each(DefaultGroovyMethods.java:1351)
      org.codehaus.groovy.runtime.dgm$170.doMethodInvoke(Unknown Source)
      groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1079)
      groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:903)
      groovy.lang.DelegatingMetaClass.invokeMethod(DelegatingMetaClass.java:149)
      org.codehaus.groovy.runtime.callsite.PogoMetaClassSite.call(PogoMetaClassSite.java:39)
      org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:42)
      org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:108)
      org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:116)
      PipelineJobsBuilder$_run_closure1.doCall(PipelineJobsBuilder.groovy:589)
      sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
      sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      java.lang.reflect.Method.invoke(Method.java:497)
      org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:90)
      groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:233)
      org.codehaus.groovy.runtime.metaclass.ClosureMetaClass.invokeMethod(ClosureMetaClass.java:272)
      groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:903)
      groovy.lang.Closure.call(Closure.java:415)
      groovy.lang.Closure.call(Closure.java:428)
      org.codehaus.groovy.runtime.DefaultGroovyMethods.each(DefaultGroovyMethods.java:1379)
      org.codehaus.groovy.runtime.DefaultGroovyMethods.each(DefaultGroovyMethods.java:1351)
      org.codehaus.groovy.runtime.dgm$170.doMethodInvoke(Unknown Source)
      groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1079)
      groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:903)
      groovy.lang.DelegatingMetaClass.invokeMethod(DelegatingMetaClass.java:149)
      org.codehaus.groovy.runtime.callsite.PogoMetaClassSite.call(PogoMetaClassSite.java:39)
      org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:42)
      org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:108)
      org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:116)
      PipelineJobsBuilder.run(PipelineJobsBuilder.groovy:586)
      javaposse.jobdsl.dsl.DslScriptLoader.runDslEngineForParent(DslScriptLoader.java:80)
      javaposse.jobdsl.dsl.DslScriptLoader.runDslEngine(DslScriptLoader.java:123)
      javaposse.jobdsl.plugin.ExecuteDslScripts.perform(ExecuteDslScripts.java:216)
      hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:20)
      hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:761)
      hudson.model.Build$BuildExecution.build(Build.java:203)
      hudson.model.Build$BuildExecution.doRun(Build.java:160)
      hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:536)
      hudson.model.Run.execute(Run.java:1741)
      hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
      hudson.model.ResourceController.execute(ResourceController.java:98)
      hudson.model.Executor.run(Executor.java:374)

      Attachments

        1. config.xml
          5 kB
        2. job-dsl.hpi
          2.33 MB
        3. Stuck_seedjob_stacktrace.txt
          20 kB
        4. Thread Dump.txt
          51 kB

        Issue Links

          Activity

            akom Alexander Komarov added a comment - - edited

            This may or may not be the plugin's fault.

            Once this happens, the following code is enough to hang a job:

            def testvar = [:]
            def itemname = 'something'
            println "I am trying to print a nonexistent item: ${testvar[itemname]}"
            

            This code runs fine on one of my Jenkins masters, but hangs on the other. In fact, the hash element doesn't even need to be non-existent. However, since this only occurs after a run of the DSL plugin, it may still be an issue with job-dsl-plugin.

            I also made a job with two steps:

            1. Execute Groovy (using Groovy 2.4.3)
            2. Process Job DSL
              ... and used the code above in both steps. #1 worked and #2 hung
            akom Alexander Komarov added a comment - - edited This may or may not be the plugin's fault. Once this happens, the following code is enough to hang a job: def testvar = [:] def itemname = 'something' println "I am trying to print a nonexistent item: ${testvar[itemname]}" This code runs fine on one of my Jenkins masters, but hangs on the other. In fact, the hash element doesn't even need to be non-existent. However, since this only occurs after a run of the DSL plugin, it may still be an issue with job-dsl-plugin. I also made a job with two steps: Execute Groovy (using Groovy 2.4.3) Process Job DSL ... and used the code above in both steps. #1 worked and #2 hung
            akom Alexander Komarov added a comment - - edited

            For the record, this is a very ugly way to "fix" this particular issue without restarting Jenkins
            I'm still not sure if this is caused by this plugin, by jenkins, or by groovy itself.

            (Jenkins script console)

            import org.codehaus.groovy.reflection.*
            import org.codehaus.groovy.util.*
            import java.util.concurrent.locks.*
              
            ClassInfo.class.getDeclaredField("lock").setAccessible(true);
            LockableObject.class.getDeclaredField('owner').setAccessible(true);
            AbstractQueuedSynchronizer.class.getDeclaredField('state').setAccessible(true);
            
            LockableObject lock = ClassInfo.getClassInfo(LinkedHashMap.class).lock;
            
            println "Unlocking lock held by ${lock.owner} in state ${lock.state}"
            
            lock.owner = null
            lock.state = 0
            

            Since I don't know exactly what's causing this, I'm not sure what the long-term effects of this hack will be.

            akom Alexander Komarov added a comment - - edited For the record, this is a very ugly way to "fix" this particular issue without restarting Jenkins I'm still not sure if this is caused by this plugin, by jenkins, or by groovy itself. (Jenkins script console) import org.codehaus.groovy.reflection.* import org.codehaus.groovy.util.* import java.util.concurrent.locks.* ClassInfo. class. getDeclaredField( "lock" ).setAccessible( true ); LockableObject. class. getDeclaredField( 'owner' ).setAccessible( true ); AbstractQueuedSynchronizer. class. getDeclaredField( 'state' ).setAccessible( true ); LockableObject lock = ClassInfo.getClassInfo(LinkedHashMap.class).lock; println "Unlocking lock held by ${lock.owner} in state ${lock.state}" lock.owner = null lock.state = 0 Since I don't know exactly what's causing this, I'm not sure what the long-term effects of this hack will be.

            The attached HPI file contains a patch build from this pull request: https://github.com/jenkinsci/job-dsl-plugin/pull/604

            Can you test the HPI and report if it fixed your problem?

            daspilker Daniel Spilker added a comment - The attached HPI file contains a patch build from this pull request: https://github.com/jenkinsci/job-dsl-plugin/pull/604 Can you test the HPI and report if it fixed your problem?
            amochtar Adé Mochtar added a comment -

            I'm running into the same problem. It started when upgrading from 1.34 to 1.38.

            I've also tried the patched plugin, but that doesn't fix the problem.

            amochtar Adé Mochtar added a comment - I'm running into the same problem. It started when upgrading from 1.34 to 1.38. I've also tried the patched plugin, but that doesn't fix the problem.

            Can you take a threaddump from Jenkins (e.g. http://localhost:8080/threadDump) directly after hitting the abort button and post that here? That would help to see where the thread is stuck.

            daspilker Daniel Spilker added a comment - Can you take a threaddump from Jenkins (e.g. http://localhost:8080/threadDump ) directly after hitting the abort button and post that here? That would help to see where the thread is stuck.
            amochtar Adé Mochtar added a comment - - edited

            I have attached the stacktrace for the stuck job. I didn't abort the job though, it got stuck on its own.

            amochtar Adé Mochtar added a comment - - edited I have attached the stacktrace for the stuck job. I didn't abort the job though, it got stuck on its own.
            amochtar Adé Mochtar added a comment -

            It looks like it has something to do with the downstreamParameterized configuration that has changed in 1.38. I haven't seen any stuck jobs when reverting the config back to the pre 1.38 format.

            amochtar Adé Mochtar added a comment - It looks like it has something to do with the downstreamParameterized configuration that has changed in 1.38. I haven't seen any stuck jobs when reverting the config back to the pre 1.38 format.
            akom Alexander Komarov added a comment - - edited

            I can confirm that interrupting a job is not necessary. After restart of Jenkins, I have not interrupted any jobs, and the issue occurred almost every run of the DSL job. Additionally, it seems to affect unrelated Build Flow jobs in the same way (LinkedHashMap ClassInfo is locked from a DSL job run, and Build Flow wants to lock it), although not as frequently. For now, I have the unlock code I mentioned above as part of my DSL script.

            akom Alexander Komarov added a comment - - edited I can confirm that interrupting a job is not necessary. After restart of Jenkins, I have not interrupted any jobs, and the issue occurred almost every run of the DSL job. Additionally, it seems to affect unrelated Build Flow jobs in the same way (LinkedHashMap ClassInfo is locked from a DSL job run, and Build Flow wants to lock it), although not as frequently. For now, I have the unlock code I mentioned above as part of my DSL script.

            amochtar post a DSL script that reproduces the problem.

            akom: You should remove that unlock code and try again. It's not a good idea to mess with the runtime internals.

            daspilker Daniel Spilker added a comment - amochtar post a DSL script that reproduces the problem. akom : You should remove that unlock code and try again. It's not a good idea to mess with the runtime internals.

            I have had to run the unlock code at least 20 times today (prior to giving up and sticking into the groovy script). What makes you think that the 21st time will be different?

            akom Alexander Komarov added a comment - I have had to run the unlock code at least 20 times today (prior to giving up and sticking into the groovy script). What makes you think that the 21st time will be different?
            amochtar Adé Mochtar added a comment -

            daspilker: This is the diff going back from 1.38 to 1.37:

                   publishers {
                     downstreamParameterized{
            -          trigger("${repository}_${branchSanitized}_deploy") {
            -            condition('SUCCESS')
            -            parameters {
            -              predefinedProp('GIT_URL', "https://bitbucket.org/${organisation}/${repository}")
            -              predefinedProp('GIT_COMMIT', '${GIT_COMMIT}')
            -              currentBuild()
            -            }
            +          trigger("${repository}_${branchSanitized}_deploy", 'SUCCESS') {
            +            predefinedProp('GIT_URL', "https://bitbucket.org/${organisation}/${repository}")
            +            predefinedProp('GIT_COMMIT', '${GIT_COMMIT}')
            +            currentBuild()
                       }
                     }
                   }
            
            amochtar Adé Mochtar added a comment - daspilker : This is the diff going back from 1.38 to 1.37: publishers { downstreamParameterized{ - trigger( "${repository}_${branchSanitized}_deploy" ) { - condition( 'SUCCESS' ) - parameters { - predefinedProp( 'GIT_URL' , "https: //bitbucket.org/${organisation}/${repository}" ) - predefinedProp( 'GIT_COMMIT' , '${GIT_COMMIT}' ) - currentBuild() - } + trigger( "${repository}_${branchSanitized}_deploy" , 'SUCCESS' ) { + predefinedProp( 'GIT_URL' , "https: //bitbucket.org/${organisation}/${repository}" ) + predefinedProp( 'GIT_COMMIT' , '${GIT_COMMIT}' ) + currentBuild() } } }

            akom The unlock code should not be necessary. The DSL build step running with the patched HPI should abort immediately when the job is aborted. So nothing should hang. If the job does not abort immediately there is a problem with the patch or the thread is stuck in script code. In that case I need a thread dump after aborting the job to see where the job hangs. But the patched HPI must run in an untouched runtime.

            amochtar Post a complete, runable script that reproduces the problem.

            daspilker Daniel Spilker added a comment - akom The unlock code should not be necessary. The DSL build step running with the patched HPI should abort immediately when the job is aborted. So nothing should hang. If the job does not abort immediately there is a problem with the patch or the thread is stuck in script code. In that case I need a thread dump after aborting the job to see where the job hangs. But the patched HPI must run in an untouched runtime. amochtar Post a complete, runable script that reproduces the problem.
            akom Alexander Komarov added a comment - - edited

            daspilker, I cannot use the hpi you attached - I'm only seeing this problem in our production env where I rely on a version that includes my three pull requests. I can use it in my test environment but I've only seen this happen there once in a few weeks, so it won't be much of a test.

            Additionally, I'm not aborting DSL builds, as I mentioned, they hang just fine without my help.

            akom Alexander Komarov added a comment - - edited daspilker , I cannot use the hpi you attached - I'm only seeing this problem in our production env where I rely on a version that includes my three pull requests. I can use it in my test environment but I've only seen this happen there once in a few weeks, so it won't be much of a test. Additionally, I'm not aborting DSL builds, as I mentioned, they hang just fine without my help.
            amochtar Adé Mochtar added a comment - - edited

            daspilker: I've attached the config.xml for the seedjob, containing the script that gets stuck. I'm using the latest jenkins docker container to run this: https://hub.docker.com/_/jenkins/

            It got stuck on its own on the 3rd run.

            amochtar Adé Mochtar added a comment - - edited daspilker : I've attached the config.xml for the seedjob, containing the script that gets stuck. I'm using the latest jenkins docker container to run this: https://hub.docker.com/_/jenkins/ It got stuck on its own on the 3rd run.
            amochtar Adé Mochtar added a comment -

            Attached the corresponding thread dump as well

            amochtar Adé Mochtar added a comment - Attached the corresponding thread dump as well

            OK, so the primay problem is that the job hangs. That's the job can't be aborted is just a downstream problem.

            I ran the script provided by amochtar more than 3000 times and it hung three times in my test environment. I never experienced the problem in production although be have quite complex scripts.

            I assume that it's somehow related to GROOVY-5249. Unfortunately Jenkins uses an ancient version of Groovy, so we have to wait for JENKINS-21249 to get the fix.

            It can also be a problem in the java.util.concurrent package in certain JDKs. We are running Oracle JDK 1.7.0_85-b15 in production which never showed the problem. In my test environment I used OpenJDK 1.7.0_79-b14 and it reproduced the problem in 0.1% of all runs.

            daspilker Daniel Spilker added a comment - OK, so the primay problem is that the job hangs. That's the job can't be aborted is just a downstream problem. I ran the script provided by amochtar more than 3000 times and it hung three times in my test environment. I never experienced the problem in production although be have quite complex scripts. I assume that it's somehow related to GROOVY-5249 . Unfortunately Jenkins uses an ancient version of Groovy, so we have to wait for JENKINS-21249 to get the fix. It can also be a problem in the java.util.concurrent package in certain JDKs. We are running Oracle JDK 1.7.0_85-b15 in production which never showed the problem. In my test environment I used OpenJDK 1.7.0_79-b14 and it reproduced the problem in 0.1% of all runs.
            harisshahidmalik Haris Shahid added a comment -

            I came here and created an account just to log this exact same bug. I can atest this is definitely a bug and it occurs on 2 different Jenkins environments (very beefy multi node environments provisioned through Amazon Cloud) that my DevOps team manages on the Project that I am currently working on.

            Every time I have to restart Jenkins to fix the issue and then it occurs randomly after a few successful executions. I have used the Job DSL plug-in very extensively on my past two projects and have been an avid user since it's inception and am very familiar with the syntax and how the plugin works. So I assure you it has nothing to do with how I am using or the code that is being executed.

            This is definitely a bug and I would love for someone to fix it as it is hindering our ability to use an otherwise AWESOME! plugin.

            Thanks a lot for creating this plugin and continuing to maintain and enhance it at such a rapid pace. Also thank you in advance for fixing this bug ASAP.

            harisshahidmalik Haris Shahid added a comment - I came here and created an account just to log this exact same bug. I can atest this is definitely a bug and it occurs on 2 different Jenkins environments (very beefy multi node environments provisioned through Amazon Cloud) that my DevOps team manages on the Project that I am currently working on. Every time I have to restart Jenkins to fix the issue and then it occurs randomly after a few successful executions. I have used the Job DSL plug-in very extensively on my past two projects and have been an avid user since it's inception and am very familiar with the syntax and how the plugin works. So I assure you it has nothing to do with how I am using or the code that is being executed. This is definitely a bug and I would love for someone to fix it as it is hindering our ability to use an otherwise AWESOME! plugin. Thanks a lot for creating this plugin and continuing to maintain and enhance it at such a rapid pace. Also thank you in advance for fixing this bug ASAP.
            kmoens Kenny Moens added a comment -

            We faced the same issues on our Jenkins instance. We were using the JobDSL plugin already for 2 years and generate 200+ jobs with it, in all the time without troubles.

            Since our latest upgrade from 1.34 to 1.38 we face the same problems. Initially we applied the migration guide, and we run into the same problem almost every build of the JobDSL. After we reverted the changed of the parameterizedTrigger definition, as suggested by Adé, we almost never run into the problem anymore. It only occurs every 20-30 builds anymore.

            Currently we have to revert to some dirty hacks to avoid restarting our Jenkins server. This implies things like stopping the thread and clearing the lock as outlined by Alexander.

            kmoens Kenny Moens added a comment - We faced the same issues on our Jenkins instance. We were using the JobDSL plugin already for 2 years and generate 200+ jobs with it, in all the time without troubles. Since our latest upgrade from 1.34 to 1.38 we face the same problems. Initially we applied the migration guide, and we run into the same problem almost every build of the JobDSL. After we reverted the changed of the parameterizedTrigger definition, as suggested by Adé, we almost never run into the problem anymore. It only occurs every 20-30 builds anymore. Currently we have to revert to some dirty hacks to avoid restarting our Jenkins server. This implies things like stopping the thread and clearing the lock as outlined by Alexander.

            Please report the JDK that you are using to run Jenkins master. My guess is that the issue is somehow related to the JDK version.

            daspilker Daniel Spilker added a comment - Please report the JDK that you are using to run Jenkins master. My guess is that the issue is somehow related to the JDK version.
            kmoens Kenny Moens added a comment -

            I'm using JDK8u60 for my master node.

            kmoens Kenny Moens added a comment - I'm using JDK8u60 for my master node.

            Jenkins uses Groovy 1.8, but only Groovy 2.3 and later officially support JDK8, see Release notes for Groovy 2.3. Can you try JDK7?

            daspilker Daniel Spilker added a comment - Jenkins uses Groovy 1.8, but only Groovy 2.3 and later officially support JDK8, see Release notes for Groovy 2.3 . Can you try JDK7?
            kmoens Kenny Moens added a comment -

            Sure I can - but I will have to try it in our production environment, which will take some longer time to evaluate if it works.

            kmoens Kenny Moens added a comment - Sure I can - but I will have to try it in our production environment, which will take some longer time to evaluate if it works.
            harisshahidmalik Haris Shahid added a comment -

            Both Jenkins my team has set-up use Java 7 so I believe the version of Java is not the culprit. Any more suggestions?

            harisshahidmalik Haris Shahid added a comment - Both Jenkins my team has set-up use Java 7 so I believe the version of Java is not the culprit. Any more suggestions?

            We are running into the same problem with latest Jenkins (2.10) and latest job-dsl plugin on JDK8 (U92). The threadDump shows the following:

            "Executor #5 for master : executing generated-jobs/generate-jobs #85" Id=95 Group=main TIMED_WAITING on org.apache.tools.ant.taskdefs.PumpStreamHandler$ThreadWithPumper@1182799
                at java.lang.Object.wait(Native Method)
                -  waiting on org.apache.tools.ant.taskdefs.PumpStreamHandler$ThreadWithPumper@1182799
                at java.lang.Thread.join(Thread.java:1253)
                at org.apache.tools.ant.taskdefs.PumpStreamHandler.finish(PumpStreamHandler.java:188)
                at org.apache.tools.ant.taskdefs.PumpStreamHandler.stop(PumpStreamHandler.java:158)
                at org.apache.tools.ant.taskdefs.Execute.execute(Execute.java:521)
                at org.apache.tools.ant.taskdefs.ExecTask.runExecute(ExecTask.java:631)
                at org.apache.tools.ant.taskdefs.ExecuteOn.runParallel(ExecuteOn.java:717)
                at org.apache.tools.ant.taskdefs.ExecuteOn.runExec(ExecuteOn.java:480)
                at org.apache.tools.ant.taskdefs.ExecTask.execute(ExecTask.java:498)
                at org.apache.tools.ant.taskdefs.Chmod.execute(Chmod.java:181)
                at hudson.Util.makeWritable(Util.java:323)
                at hudson.Util.tryOnceDeleteFile(Util.java:277)
                at hudson.Util.tryOnceDeleteRecursive(Util.java:373)
                at hudson.Util.tryOnceDeleteContentsRecursive(Util.java:392)
                at hudson.Util.tryOnceDeleteRecursive(Util.java:372)
                at hudson.Util.tryOnceDeleteContentsRecursive(Util.java:392)
                at hudson.Util.tryOnceDeleteRecursive(Util.java:372)
                at hudson.Util.tryOnceDeleteContentsRecursive(Util.java:392)
                at hudson.Util.tryOnceDeleteRecursive(Util.java:372)
                at hudson.Util.tryOnceDeleteContentsRecursive(Util.java:392)
                at hudson.Util.tryOnceDeleteRecursive(Util.java:372)
                at hudson.Util.tryOnceDeleteContentsRecursive(Util.java:392)
                at hudson.Util.tryOnceDeleteRecursive(Util.java:372)
                at hudson.Util.tryOnceDeleteContentsRecursive(Util.java:392)
                at hudson.Util.tryOnceDeleteRecursive(Util.java:372)
                at hudson.Util.tryOnceDeleteContentsRecursive(Util.java:392)
                at hudson.Util.tryOnceDeleteRecursive(Util.java:372)
                at hudson.Util.tryOnceDeleteContentsRecursive(Util.java:392)
                at hudson.Util.tryOnceDeleteRecursive(Util.java:372)
                at hudson.Util.tryOnceDeleteContentsRecursive(Util.java:392)
                at hudson.Util.tryOnceDeleteRecursive(Util.java:372)
                at hudson.Util.tryOnceDeleteContentsRecursive(Util.java:392)
                at hudson.Util.tryOnceDeleteRecursive(Util.java:372)
                at hudson.Util.tryOnceDeleteContentsRecursive(Util.java:392)
                at hudson.Util.tryOnceDeleteRecursive(Util.java:372)
                at hudson.Util.deleteRecursive(Util.java:350)
                at hudson.model.AbstractItem.performDelete(AbstractItem.java:600)
                at hudson.model.Job.performDelete(Job.java:278)
                at org.jenkinsci.plugins.workflow.job.WorkflowJob.performDelete(WorkflowJob.java:580)
                at hudson.model.AbstractItem.delete(AbstractItem.java:589)
                -  locked org.jenkinsci.plugins.workflow.job.WorkflowJob@6c3a1409
                at hudson.model.Job.delete(Job.java:688)
                at javaposse.jobdsl.plugin.ExecuteDslScripts.updateGeneratedJobs(ExecuteDslScripts.java:329)
                at javaposse.jobdsl.plugin.ExecuteDslScripts.perform(ExecuteDslScripts.java:222)
                at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:20)
                at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:779)
                at hudson.model.Build$BuildExecution.build(Build.java:205)
                at hudson.model.Build$BuildExecution.doRun(Build.java:162)
                at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:534)
                at hudson.model.Run.execute(Run.java:1720)
                at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
                at hudson.model.ResourceController.execute(ResourceController.java:98)
                at hudson.model.Executor.run(Executor.java:410)
            

            Once we see that, the job cannot be aborted anymore, restarting Jenkins is the only way to stop it.

            sebhoss Sebastian Hoß added a comment - We are running into the same problem with latest Jenkins (2.10) and latest job-dsl plugin on JDK8 (U92). The threadDump shows the following: "Executor #5 for master : executing generated-jobs/generate-jobs #85" Id=95 Group=main TIMED_WAITING on org.apache.tools.ant.taskdefs.PumpStreamHandler$ThreadWithPumper@1182799 at java.lang. Object .wait(Native Method) - waiting on org.apache.tools.ant.taskdefs.PumpStreamHandler$ThreadWithPumper@1182799 at java.lang. Thread .join( Thread .java:1253) at org.apache.tools.ant.taskdefs.PumpStreamHandler.finish(PumpStreamHandler.java:188) at org.apache.tools.ant.taskdefs.PumpStreamHandler.stop(PumpStreamHandler.java:158) at org.apache.tools.ant.taskdefs.Execute.execute(Execute.java:521) at org.apache.tools.ant.taskdefs.ExecTask.runExecute(ExecTask.java:631) at org.apache.tools.ant.taskdefs.ExecuteOn.runParallel(ExecuteOn.java:717) at org.apache.tools.ant.taskdefs.ExecuteOn.runExec(ExecuteOn.java:480) at org.apache.tools.ant.taskdefs.ExecTask.execute(ExecTask.java:498) at org.apache.tools.ant.taskdefs.Chmod.execute(Chmod.java:181) at hudson.Util.makeWritable(Util.java:323) at hudson.Util.tryOnceDeleteFile(Util.java:277) at hudson.Util.tryOnceDeleteRecursive(Util.java:373) at hudson.Util.tryOnceDeleteContentsRecursive(Util.java:392) at hudson.Util.tryOnceDeleteRecursive(Util.java:372) at hudson.Util.tryOnceDeleteContentsRecursive(Util.java:392) at hudson.Util.tryOnceDeleteRecursive(Util.java:372) at hudson.Util.tryOnceDeleteContentsRecursive(Util.java:392) at hudson.Util.tryOnceDeleteRecursive(Util.java:372) at hudson.Util.tryOnceDeleteContentsRecursive(Util.java:392) at hudson.Util.tryOnceDeleteRecursive(Util.java:372) at hudson.Util.tryOnceDeleteContentsRecursive(Util.java:392) at hudson.Util.tryOnceDeleteRecursive(Util.java:372) at hudson.Util.tryOnceDeleteContentsRecursive(Util.java:392) at hudson.Util.tryOnceDeleteRecursive(Util.java:372) at hudson.Util.tryOnceDeleteContentsRecursive(Util.java:392) at hudson.Util.tryOnceDeleteRecursive(Util.java:372) at hudson.Util.tryOnceDeleteContentsRecursive(Util.java:392) at hudson.Util.tryOnceDeleteRecursive(Util.java:372) at hudson.Util.tryOnceDeleteContentsRecursive(Util.java:392) at hudson.Util.tryOnceDeleteRecursive(Util.java:372) at hudson.Util.tryOnceDeleteContentsRecursive(Util.java:392) at hudson.Util.tryOnceDeleteRecursive(Util.java:372) at hudson.Util.tryOnceDeleteContentsRecursive(Util.java:392) at hudson.Util.tryOnceDeleteRecursive(Util.java:372) at hudson.Util.deleteRecursive(Util.java:350) at hudson.model.AbstractItem.performDelete(AbstractItem.java:600) at hudson.model.Job.performDelete(Job.java:278) at org.jenkinsci.plugins.workflow.job.WorkflowJob.performDelete(WorkflowJob.java:580) at hudson.model.AbstractItem.delete(AbstractItem.java:589) - locked org.jenkinsci.plugins.workflow.job.WorkflowJob@6c3a1409 at hudson.model.Job.delete(Job.java:688) at javaposse.jobdsl.plugin.ExecuteDslScripts.updateGeneratedJobs(ExecuteDslScripts.java:329) at javaposse.jobdsl.plugin.ExecuteDslScripts.perform(ExecuteDslScripts.java:222) at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:20) at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:779) at hudson.model.Build$BuildExecution.build(Build.java:205) at hudson.model.Build$BuildExecution.doRun(Build.java:162) at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:534) at hudson.model.Run.execute(Run.java:1720) at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43) at hudson.model.ResourceController.execute(ResourceController.java:98) at hudson.model.Executor.run(Executor.java:410) Once we see that, the job cannot be aborted anymore, restarting Jenkins is the only way to stop it.

            sebhoss that is a different problem with a different stack trace.

            Can someone reproduce the problem on Jenkins 2.0? If not, I'm going to close this ticket.

            daspilker Daniel Spilker added a comment - sebhoss that is a different problem with a different stack trace. Can someone reproduce the problem on Jenkins 2.0? If not, I'm going to close this ticket.
            pianoroy Roy Tinker added a comment -

            I'm having the exact problem and stack trace Sebastian pasted. Freestyle builds are hanging. This is Jenkins 2.53 on JDK7.

            pianoroy Roy Tinker added a comment - I'm having the exact problem and stack trace Sebastian pasted. Freestyle builds are hanging. This is Jenkins 2.53 on JDK7.

            daspilker I have not reproduced the problem in 2.x but mostly because I'm too scared to try to kill a DSL run, I even have a red notice against that in the description.  It's quite possible that it's fixed because I haven't had it happen without aborting anymore.

            akom Alexander Komarov added a comment - daspilker I have not reproduced the problem in 2.x but mostly because I'm too scared to try to kill a DSL run, I even have a red notice against that in the description.  It's quite possible that it's fixed because I haven't had it happen without aborting anymore.

            Groovy has been updated from 2.4.8 to 2.4.11 in Jenkins 2.61. The issue should be fixed, see https://issues.apache.org/jira/browse/GROOVY-8067. Can anyone reproduce the problem in 2.61 or later? If not, I'm going to close this ticket.

            daspilker Daniel Spilker added a comment - Groovy has been updated from 2.4.8 to 2.4.11 in Jenkins 2.61. The issue should be fixed, see https://issues.apache.org/jira/browse/GROOVY-8067. Can anyone reproduce the problem in 2.61 or later? If not, I'm going to close this ticket.

            I'm closing the issue because the problem does not occur with newer versions of Jenkins and Job DSL. Please re-open if you can reproduce the problem.

            daspilker Daniel Spilker added a comment - I'm closing the issue because the problem does not occur with newer versions of Jenkins and Job DSL. Please re-open if you can reproduce the problem.

            People

              daspilker Daniel Spilker
              akom Alexander Komarov
              Votes:
              6 Vote for this issue
              Watchers:
              13 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: