Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-48685

Deadlock when running a Multijob with multiple slaves

    XMLWordPrintable

Details

    • Bug
    • Status: Resolved (View Workflow)
    • Critical
    • Resolution: Duplicate
    • multijob-plugin
    • None

    Description

      After upgrading from 2.73.3 to 2.89.2 our Jenkins has started to experience deadlock.

      We use the Multijob plugin to run any number of other jobs that extend a common template. When the Multijob kicks off, it will spin up as many AWS slaves as it needs to run all of the child jobs in parallel (Test-Suites in the stack trace). Every time we run one of these Multijob jobs, Jenkins locks up.

      Attached is the deadlock stack traces from a thread dump.

      Executor #4 for Big Box (r4.2xlarge) (i-05a4635a2e6e063cf) : executing Test-Suites/test-suite-1 #1165 is in deadlock with Executor #2 for Big Box (r4.2xlarge) (i-057d9fdd7076c7c10) : executing Test-Suites/test-suite-2 #1307
      
      Executor #4 for Big Box (r4.2xlarge) (i-05a4635a2e6e063cf) : executing Test-Suites/test-suite-1 #1165 - priority:5 - threadId:0x00007f8fe4118800 - nativeId:0x3455 - state:BLOCKED
      stackTrace:
      java.lang.Thread.State: BLOCKED (on object monitor)
      at jenkins.model.lazy.AbstractLazyLoadRunMap.getByNumber(AbstractLazyLoadRunMap.java:369)
      - waiting to lock <0x000000008cdab698> (a hudson.model.RunMap)
      at jenkins.model.lazy.LazyBuildMixIn.getBuildByNumber(LazyBuildMixIn.java:231)
      at hudson.model.AbstractProject.getBuildByNumber(AbstractProject.java:926)
      at hudson.model.AbstractProject.getBuildByNumber(AbstractProject.java:137)
      at hudson.model.Run.fromExternalizableId(Run.java:2345)
      at hudson.model.Run$Replacer.readResolve(Run.java:1937)
      
      Executor #2 for Big Box (r4.2xlarge) (i-057d9fdd7076c7c10) : executing Test-Suites/test-suite-2 #1307 - priority:5 - threadId:0x00007f8ff868e000 - nativeId:0x32e9 - state:BLOCKED
      stackTrace:
      java.lang.Thread.State: BLOCKED (on object monitor)
      at jenkins.model.lazy.AbstractLazyLoadRunMap.getByNumber(AbstractLazyLoadRunMap.java:369)
      - waiting to lock <0x000000008d744a90> (a hudson.model.RunMap)
      at jenkins.model.lazy.LazyBuildMixIn.getBuildByNumber(LazyBuildMixIn.java:231)
      at hudson.model.AbstractProject.getBuildByNumber(AbstractProject.java:926)
      at hudson.model.AbstractProject.getBuildByNumber(AbstractProject.java:137)
      at hudson.model.Run.fromExternalizableId(Run.java:2345)
      at hudson.model.Run$Replacer.readResolve(Run.java:1937)
      at sun.reflect.GeneratedMethodAccessor51.invoke(Unknown Source)

      We tried downgrading Jenkins again, but we had already updated all of the other plugins and after downgrading the majority of the plugins were not compatible.

      Attachments

        Issue Links

          Activity

            oleg_nenashev Oleg Nenashev added a comment -

            Here is a root cause thread from the dump:

            Executor #2 for Big Box (r4.2xlarge) (i-057d9fdd7076c7c10) : executing Test-Suites/test-suite-2 #1307 - priority:5 - threadId:0x00007f8ff868e000 - nativeId:0x32e9 - state:BLOCKED
            stackTrace:
            java.lang.Thread.State: BLOCKED (on object monitor)
            at jenkins.model.lazy.AbstractLazyLoadRunMap.getByNumber(AbstractLazyLoadRunMap.java:369)
            - waiting to lock <0x000000008d744a90> (a hudson.model.RunMap)
            at jenkins.model.lazy.LazyBuildMixIn.getBuildByNumber(LazyBuildMixIn.java:231)
            at hudson.model.AbstractProject.getBuildByNumber(AbstractProject.java:926)
            at hudson.model.AbstractProject.getBuildByNumber(AbstractProject.java:137)
            at hudson.model.Run.fromExternalizableId(Run.java:2345)
            at hudson.model.Run$Replacer.readResolve(Run.java:1937)
            at sun.reflect.GeneratedMethodAccessor51.invoke(Unknown Source)
            at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
            at java.lang.reflect.Method.invoke(Method.java:498)
            at com.thoughtworks.xstream.converters.reflection.SerializationMethodInvoker.callReadResolve(SerializationMethodInvoker.java:66)
            at hudson.util.RobustReflectionConverter.unmarshal(RobustReflectionConverter.java:271)
            at com.thoughtworks.xstream.core.TreeUnmarshaller.convert(TreeUnmarshaller.java:72)
            at com.thoughtworks.xstream.core.AbstractReferenceUnmarshaller.convert(AbstractReferenceUnmarshaller.java:65)
            at com.thoughtworks.xstream.core.TreeUnmarshaller.convertAnother(TreeUnmarshaller.java:66)
            at hudson.util.RobustReflectionConverter.unmarshalField(RobustReflectionConverter.java:393)
            at hudson.util.RobustReflectionConverter.doUnmarshal(RobustReflectionConverter.java:331)
            at hudson.util.RobustReflectionConverter.unmarshal(RobustReflectionConverter.java:270)
            at com.thoughtworks.xstream.core.TreeUnmarshaller.convert(TreeUnmarshaller.java:72)
            at com.thoughtworks.xstream.core.AbstractReferenceUnmarshaller.convert(AbstractReferenceUnmarshaller.java:65)
            at com.thoughtworks.xstream.core.TreeUnmarshaller.convertAnother(TreeUnmarshaller.java:66)
            at com.thoughtworks.xstream.core.TreeUnmarshaller.convertAnother(TreeUnmarshaller.java:50)
            at com.thoughtworks.xstream.converters.collections.AbstractCollectionConverter.readItem(AbstractCollectionConverter.java:71)
            at hudson.util.RobustCollectionConverter.populateCollection(RobustCollectionConverter.java:85)
            at com.thoughtworks.xstream.converters.collections.CollectionConverter.unmarshal(CollectionConverter.java:80)
            at hudson.util.RobustCollectionConverter.unmarshal(RobustCollectionConverter.java:76)
            at com.thoughtworks.xstream.core.TreeUnmarshaller.convert(TreeUnmarshaller.java:72)
            at com.thoughtworks.xstream.core.AbstractReferenceUnmarshaller.convert(AbstractReferenceUnmarshaller.java:65)
            at com.thoughtworks.xstream.core.TreeUnmarshaller.convertAnother(TreeUnmarshaller.java:66)
            at hudson.util.RobustReflectionConverter.unmarshalField(RobustReflectionConverter.java:393)
            at hudson.util.RobustReflectionConverter.doUnmarshal(RobustReflectionConverter.java:331)
            at hudson.util.RobustReflectionConverter.unmarshal(RobustReflectionConverter.java:270)
            at com.thoughtworks.xstream.core.TreeUnmarshaller.convert(TreeUnmarshaller.java:72)
            at com.thoughtworks.xstream.core.AbstractReferenceUnmarshaller.convert(AbstractReferenceUnmarshaller.java:65)
            at com.thoughtworks.xstream.core.TreeUnmarshaller.convertAnother(TreeUnmarshaller.java:66)
            at com.thoughtworks.xstream.core.TreeUnmarshaller.convertAnother(TreeUnmarshaller.java:50)
            at com.thoughtworks.xstream.core.TreeUnmarshaller.start(TreeUnmarshaller.java:134)
            at com.thoughtworks.xstream.core.AbstractTreeMarshallingStrategy.unmarshal(AbstractTreeMarshallingStrategy.java:32)
            at com.thoughtworks.xstream.XStream.unmarshal(XStream.java:1189)
            at hudson.util.XStream2.unmarshal(XStream2.java:114)
            at com.thoughtworks.xstream.XStream.unmarshal(XStream.java:1173)
            at hudson.XmlFile.unmarshal(XmlFile.java:167)
            at hudson.model.Run.reload(Run.java:336)
            at hudson.model.Run.<init>(Run.java:324)
            at hudson.model.AbstractBuild.<init>(AbstractBuild.java:173)
            at hudson.model.Build.<init>(Build.java:104)
            at com.tikal.jenkins.plugins.multijob.MultiJobBuild.<init>(MultiJobBuild.java:59)
            at sun.reflect.GeneratedConstructorAccessor501.newInstance(Unknown Source)
            at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
            at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
            at jenkins.model.lazy.LazyBuildMixIn.loadBuild(LazyBuildMixIn.java:165)
            at jenkins.model.lazy.LazyBuildMixIn$1.create(LazyBuildMixIn.java:142)
            at hudson.model.RunMap.retrieve(RunMap.java:224)
            at hudson.model.RunMap.retrieve(RunMap.java:57)
            at jenkins.model.lazy.AbstractLazyLoadRunMap.load(AbstractLazyLoadRunMap.java:500)
            at jenkins.model.lazy.AbstractLazyLoadRunMap.load(AbstractLazyLoadRunMap.java:482)
            at jenkins.model.lazy.AbstractLazyLoadRunMap.getByNumber(AbstractLazyLoadRunMap.java:380)
            - locked <0x000000008cdab698> (a hudson.model.RunMap)
            at jenkins.model.lazy.LazyBuildMixIn.getBuildByNumber(LazyBuildMixIn.java:231)
            at hudson.model.AbstractProject.getBuildByNumber(AbstractProject.java:926)
            at hudson.model.AbstractProject.getBuildByNumber(AbstractProject.java:137)
            at hudson.model.Run.fromExternalizableId(Run.java:2345)
            at hudson.model.Run$Replacer.readResolve(Run.java:1937)
            at sun.reflect.GeneratedMethodAccessor51.invoke(Unknown
            

            Needs investigation

            oleg_nenashev Oleg Nenashev added a comment - Here is a root cause thread from the dump: Executor #2 for Big Box (r4.2xlarge) (i-057d9fdd7076c7c10) : executing Test-Suites/test-suite-2 #1307 - priority:5 - threadId:0x00007f8ff868e000 - nativeId:0x32e9 - state:BLOCKED stackTrace: java.lang. Thread .State: BLOCKED (on object monitor) at jenkins.model.lazy.AbstractLazyLoadRunMap.getByNumber(AbstractLazyLoadRunMap.java:369) - waiting to lock <0x000000008d744a90> (a hudson.model.RunMap) at jenkins.model.lazy.LazyBuildMixIn.getBuildByNumber(LazyBuildMixIn.java:231) at hudson.model.AbstractProject.getBuildByNumber(AbstractProject.java:926) at hudson.model.AbstractProject.getBuildByNumber(AbstractProject.java:137) at hudson.model.Run.fromExternalizableId(Run.java:2345) at hudson.model.Run$Replacer.readResolve(Run.java:1937) at sun.reflect.GeneratedMethodAccessor51.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at com.thoughtworks.xstream.converters.reflection.SerializationMethodInvoker.callReadResolve(SerializationMethodInvoker.java:66) at hudson.util.RobustReflectionConverter.unmarshal(RobustReflectionConverter.java:271) at com.thoughtworks.xstream.core.TreeUnmarshaller.convert(TreeUnmarshaller.java:72) at com.thoughtworks.xstream.core.AbstractReferenceUnmarshaller.convert(AbstractReferenceUnmarshaller.java:65) at com.thoughtworks.xstream.core.TreeUnmarshaller.convertAnother(TreeUnmarshaller.java:66) at hudson.util.RobustReflectionConverter.unmarshalField(RobustReflectionConverter.java:393) at hudson.util.RobustReflectionConverter.doUnmarshal(RobustReflectionConverter.java:331) at hudson.util.RobustReflectionConverter.unmarshal(RobustReflectionConverter.java:270) at com.thoughtworks.xstream.core.TreeUnmarshaller.convert(TreeUnmarshaller.java:72) at com.thoughtworks.xstream.core.AbstractReferenceUnmarshaller.convert(AbstractReferenceUnmarshaller.java:65) at com.thoughtworks.xstream.core.TreeUnmarshaller.convertAnother(TreeUnmarshaller.java:66) at com.thoughtworks.xstream.core.TreeUnmarshaller.convertAnother(TreeUnmarshaller.java:50) at com.thoughtworks.xstream.converters.collections.AbstractCollectionConverter.readItem(AbstractCollectionConverter.java:71) at hudson.util.RobustCollectionConverter.populateCollection(RobustCollectionConverter.java:85) at com.thoughtworks.xstream.converters.collections.CollectionConverter.unmarshal(CollectionConverter.java:80) at hudson.util.RobustCollectionConverter.unmarshal(RobustCollectionConverter.java:76) at com.thoughtworks.xstream.core.TreeUnmarshaller.convert(TreeUnmarshaller.java:72) at com.thoughtworks.xstream.core.AbstractReferenceUnmarshaller.convert(AbstractReferenceUnmarshaller.java:65) at com.thoughtworks.xstream.core.TreeUnmarshaller.convertAnother(TreeUnmarshaller.java:66) at hudson.util.RobustReflectionConverter.unmarshalField(RobustReflectionConverter.java:393) at hudson.util.RobustReflectionConverter.doUnmarshal(RobustReflectionConverter.java:331) at hudson.util.RobustReflectionConverter.unmarshal(RobustReflectionConverter.java:270) at com.thoughtworks.xstream.core.TreeUnmarshaller.convert(TreeUnmarshaller.java:72) at com.thoughtworks.xstream.core.AbstractReferenceUnmarshaller.convert(AbstractReferenceUnmarshaller.java:65) at com.thoughtworks.xstream.core.TreeUnmarshaller.convertAnother(TreeUnmarshaller.java:66) at com.thoughtworks.xstream.core.TreeUnmarshaller.convertAnother(TreeUnmarshaller.java:50) at com.thoughtworks.xstream.core.TreeUnmarshaller.start(TreeUnmarshaller.java:134) at com.thoughtworks.xstream.core.AbstractTreeMarshallingStrategy.unmarshal(AbstractTreeMarshallingStrategy.java:32) at com.thoughtworks.xstream.XStream.unmarshal(XStream.java:1189) at hudson.util.XStream2.unmarshal(XStream2.java:114) at com.thoughtworks.xstream.XStream.unmarshal(XStream.java:1173) at hudson.XmlFile.unmarshal(XmlFile.java:167) at hudson.model.Run.reload(Run.java:336) at hudson.model.Run.<init>(Run.java:324) at hudson.model.AbstractBuild.<init>(AbstractBuild.java:173) at hudson.model.Build.<init>(Build.java:104) at com.tikal.jenkins.plugins.multijob.MultiJobBuild.<init>(MultiJobBuild.java:59) at sun.reflect.GeneratedConstructorAccessor501.newInstance(Unknown Source) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at jenkins.model.lazy.LazyBuildMixIn.loadBuild(LazyBuildMixIn.java:165) at jenkins.model.lazy.LazyBuildMixIn$1.create(LazyBuildMixIn.java:142) at hudson.model.RunMap.retrieve(RunMap.java:224) at hudson.model.RunMap.retrieve(RunMap.java:57) at jenkins.model.lazy.AbstractLazyLoadRunMap.load(AbstractLazyLoadRunMap.java:500) at jenkins.model.lazy.AbstractLazyLoadRunMap.load(AbstractLazyLoadRunMap.java:482) at jenkins.model.lazy.AbstractLazyLoadRunMap.getByNumber(AbstractLazyLoadRunMap.java:380) - locked <0x000000008cdab698> (a hudson.model.RunMap) at jenkins.model.lazy.LazyBuildMixIn.getBuildByNumber(LazyBuildMixIn.java:231) at hudson.model.AbstractProject.getBuildByNumber(AbstractProject.java:926) at hudson.model.AbstractProject.getBuildByNumber(AbstractProject.java:137) at hudson.model.Run.fromExternalizableId(Run.java:2345) at hudson.model.Run$Replacer.readResolve(Run.java:1937) at sun.reflect.GeneratedMethodAccessor51.invoke(Unknown Needs investigation

            oleg_nenashev jglick Could this be related to JENKINS-49328 and nested references ? We see the `hudson.model.Run$Replacer.readResolve` in the stacktrace. I have also seen a case were the hudson.model.RunMap#retrieve fails with a StackOverflowError, showing the same stacktrace as we can see here.

            allan_burdajewicz Allan BURDAJEWICZ added a comment - oleg_nenashev jglick Could this be related to JENKINS-49328 and nested references ? We see the `hudson.model.Run$Replacer.readResolve` in the stacktrace. I have also seen a case were the hudson.model.RunMap#retrieve fails with a StackOverflowError, showing the same stacktrace as we can see here.

            People

              Unassigned Unassigned
              ketchumm Mark Ketchum
              Votes:
              1 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: