Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-48685

Deadlock when running a Multijob with multiple slaves

    • Icon: Bug Bug
    • Resolution: Duplicate
    • Icon: Critical Critical
    • multijob-plugin
    • None

      After upgrading from 2.73.3 to 2.89.2 our Jenkins has started to experience deadlock.

      We use the Multijob plugin to run any number of other jobs that extend a common template. When the Multijob kicks off, it will spin up as many AWS slaves as it needs to run all of the child jobs in parallel (Test-Suites in the stack trace). Every time we run one of these Multijob jobs, Jenkins locks up.

      Attached is the deadlock stack traces from a thread dump.

      Executor #4 for Big Box (r4.2xlarge) (i-05a4635a2e6e063cf) : executing Test-Suites/test-suite-1 #1165 is in deadlock with Executor #2 for Big Box (r4.2xlarge) (i-057d9fdd7076c7c10) : executing Test-Suites/test-suite-2 #1307
      
      Executor #4 for Big Box (r4.2xlarge) (i-05a4635a2e6e063cf) : executing Test-Suites/test-suite-1 #1165 - priority:5 - threadId:0x00007f8fe4118800 - nativeId:0x3455 - state:BLOCKED
      stackTrace:
      java.lang.Thread.State: BLOCKED (on object monitor)
      at jenkins.model.lazy.AbstractLazyLoadRunMap.getByNumber(AbstractLazyLoadRunMap.java:369)
      - waiting to lock <0x000000008cdab698> (a hudson.model.RunMap)
      at jenkins.model.lazy.LazyBuildMixIn.getBuildByNumber(LazyBuildMixIn.java:231)
      at hudson.model.AbstractProject.getBuildByNumber(AbstractProject.java:926)
      at hudson.model.AbstractProject.getBuildByNumber(AbstractProject.java:137)
      at hudson.model.Run.fromExternalizableId(Run.java:2345)
      at hudson.model.Run$Replacer.readResolve(Run.java:1937)
      
      Executor #2 for Big Box (r4.2xlarge) (i-057d9fdd7076c7c10) : executing Test-Suites/test-suite-2 #1307 - priority:5 - threadId:0x00007f8ff868e000 - nativeId:0x32e9 - state:BLOCKED
      stackTrace:
      java.lang.Thread.State: BLOCKED (on object monitor)
      at jenkins.model.lazy.AbstractLazyLoadRunMap.getByNumber(AbstractLazyLoadRunMap.java:369)
      - waiting to lock <0x000000008d744a90> (a hudson.model.RunMap)
      at jenkins.model.lazy.LazyBuildMixIn.getBuildByNumber(LazyBuildMixIn.java:231)
      at hudson.model.AbstractProject.getBuildByNumber(AbstractProject.java:926)
      at hudson.model.AbstractProject.getBuildByNumber(AbstractProject.java:137)
      at hudson.model.Run.fromExternalizableId(Run.java:2345)
      at hudson.model.Run$Replacer.readResolve(Run.java:1937)
      at sun.reflect.GeneratedMethodAccessor51.invoke(Unknown Source)

      We tried downgrading Jenkins again, but we had already updated all of the other plugins and after downgrading the majority of the plugins were not compatible.

          [JENKINS-48685] Deadlock when running a Multijob with multiple slaves

          Oleg Nenashev added a comment -

          Here is a root cause thread from the dump:

          Executor #2 for Big Box (r4.2xlarge) (i-057d9fdd7076c7c10) : executing Test-Suites/test-suite-2 #1307 - priority:5 - threadId:0x00007f8ff868e000 - nativeId:0x32e9 - state:BLOCKED
          stackTrace:
          java.lang.Thread.State: BLOCKED (on object monitor)
          at jenkins.model.lazy.AbstractLazyLoadRunMap.getByNumber(AbstractLazyLoadRunMap.java:369)
          - waiting to lock <0x000000008d744a90> (a hudson.model.RunMap)
          at jenkins.model.lazy.LazyBuildMixIn.getBuildByNumber(LazyBuildMixIn.java:231)
          at hudson.model.AbstractProject.getBuildByNumber(AbstractProject.java:926)
          at hudson.model.AbstractProject.getBuildByNumber(AbstractProject.java:137)
          at hudson.model.Run.fromExternalizableId(Run.java:2345)
          at hudson.model.Run$Replacer.readResolve(Run.java:1937)
          at sun.reflect.GeneratedMethodAccessor51.invoke(Unknown Source)
          at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
          at java.lang.reflect.Method.invoke(Method.java:498)
          at com.thoughtworks.xstream.converters.reflection.SerializationMethodInvoker.callReadResolve(SerializationMethodInvoker.java:66)
          at hudson.util.RobustReflectionConverter.unmarshal(RobustReflectionConverter.java:271)
          at com.thoughtworks.xstream.core.TreeUnmarshaller.convert(TreeUnmarshaller.java:72)
          at com.thoughtworks.xstream.core.AbstractReferenceUnmarshaller.convert(AbstractReferenceUnmarshaller.java:65)
          at com.thoughtworks.xstream.core.TreeUnmarshaller.convertAnother(TreeUnmarshaller.java:66)
          at hudson.util.RobustReflectionConverter.unmarshalField(RobustReflectionConverter.java:393)
          at hudson.util.RobustReflectionConverter.doUnmarshal(RobustReflectionConverter.java:331)
          at hudson.util.RobustReflectionConverter.unmarshal(RobustReflectionConverter.java:270)
          at com.thoughtworks.xstream.core.TreeUnmarshaller.convert(TreeUnmarshaller.java:72)
          at com.thoughtworks.xstream.core.AbstractReferenceUnmarshaller.convert(AbstractReferenceUnmarshaller.java:65)
          at com.thoughtworks.xstream.core.TreeUnmarshaller.convertAnother(TreeUnmarshaller.java:66)
          at com.thoughtworks.xstream.core.TreeUnmarshaller.convertAnother(TreeUnmarshaller.java:50)
          at com.thoughtworks.xstream.converters.collections.AbstractCollectionConverter.readItem(AbstractCollectionConverter.java:71)
          at hudson.util.RobustCollectionConverter.populateCollection(RobustCollectionConverter.java:85)
          at com.thoughtworks.xstream.converters.collections.CollectionConverter.unmarshal(CollectionConverter.java:80)
          at hudson.util.RobustCollectionConverter.unmarshal(RobustCollectionConverter.java:76)
          at com.thoughtworks.xstream.core.TreeUnmarshaller.convert(TreeUnmarshaller.java:72)
          at com.thoughtworks.xstream.core.AbstractReferenceUnmarshaller.convert(AbstractReferenceUnmarshaller.java:65)
          at com.thoughtworks.xstream.core.TreeUnmarshaller.convertAnother(TreeUnmarshaller.java:66)
          at hudson.util.RobustReflectionConverter.unmarshalField(RobustReflectionConverter.java:393)
          at hudson.util.RobustReflectionConverter.doUnmarshal(RobustReflectionConverter.java:331)
          at hudson.util.RobustReflectionConverter.unmarshal(RobustReflectionConverter.java:270)
          at com.thoughtworks.xstream.core.TreeUnmarshaller.convert(TreeUnmarshaller.java:72)
          at com.thoughtworks.xstream.core.AbstractReferenceUnmarshaller.convert(AbstractReferenceUnmarshaller.java:65)
          at com.thoughtworks.xstream.core.TreeUnmarshaller.convertAnother(TreeUnmarshaller.java:66)
          at com.thoughtworks.xstream.core.TreeUnmarshaller.convertAnother(TreeUnmarshaller.java:50)
          at com.thoughtworks.xstream.core.TreeUnmarshaller.start(TreeUnmarshaller.java:134)
          at com.thoughtworks.xstream.core.AbstractTreeMarshallingStrategy.unmarshal(AbstractTreeMarshallingStrategy.java:32)
          at com.thoughtworks.xstream.XStream.unmarshal(XStream.java:1189)
          at hudson.util.XStream2.unmarshal(XStream2.java:114)
          at com.thoughtworks.xstream.XStream.unmarshal(XStream.java:1173)
          at hudson.XmlFile.unmarshal(XmlFile.java:167)
          at hudson.model.Run.reload(Run.java:336)
          at hudson.model.Run.<init>(Run.java:324)
          at hudson.model.AbstractBuild.<init>(AbstractBuild.java:173)
          at hudson.model.Build.<init>(Build.java:104)
          at com.tikal.jenkins.plugins.multijob.MultiJobBuild.<init>(MultiJobBuild.java:59)
          at sun.reflect.GeneratedConstructorAccessor501.newInstance(Unknown Source)
          at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
          at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
          at jenkins.model.lazy.LazyBuildMixIn.loadBuild(LazyBuildMixIn.java:165)
          at jenkins.model.lazy.LazyBuildMixIn$1.create(LazyBuildMixIn.java:142)
          at hudson.model.RunMap.retrieve(RunMap.java:224)
          at hudson.model.RunMap.retrieve(RunMap.java:57)
          at jenkins.model.lazy.AbstractLazyLoadRunMap.load(AbstractLazyLoadRunMap.java:500)
          at jenkins.model.lazy.AbstractLazyLoadRunMap.load(AbstractLazyLoadRunMap.java:482)
          at jenkins.model.lazy.AbstractLazyLoadRunMap.getByNumber(AbstractLazyLoadRunMap.java:380)
          - locked <0x000000008cdab698> (a hudson.model.RunMap)
          at jenkins.model.lazy.LazyBuildMixIn.getBuildByNumber(LazyBuildMixIn.java:231)
          at hudson.model.AbstractProject.getBuildByNumber(AbstractProject.java:926)
          at hudson.model.AbstractProject.getBuildByNumber(AbstractProject.java:137)
          at hudson.model.Run.fromExternalizableId(Run.java:2345)
          at hudson.model.Run$Replacer.readResolve(Run.java:1937)
          at sun.reflect.GeneratedMethodAccessor51.invoke(Unknown
          

          Needs investigation

          Oleg Nenashev added a comment - Here is a root cause thread from the dump: Executor #2 for Big Box (r4.2xlarge) (i-057d9fdd7076c7c10) : executing Test-Suites/test-suite-2 #1307 - priority:5 - threadId:0x00007f8ff868e000 - nativeId:0x32e9 - state:BLOCKED stackTrace: java.lang. Thread .State: BLOCKED (on object monitor) at jenkins.model.lazy.AbstractLazyLoadRunMap.getByNumber(AbstractLazyLoadRunMap.java:369) - waiting to lock <0x000000008d744a90> (a hudson.model.RunMap) at jenkins.model.lazy.LazyBuildMixIn.getBuildByNumber(LazyBuildMixIn.java:231) at hudson.model.AbstractProject.getBuildByNumber(AbstractProject.java:926) at hudson.model.AbstractProject.getBuildByNumber(AbstractProject.java:137) at hudson.model.Run.fromExternalizableId(Run.java:2345) at hudson.model.Run$Replacer.readResolve(Run.java:1937) at sun.reflect.GeneratedMethodAccessor51.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at com.thoughtworks.xstream.converters.reflection.SerializationMethodInvoker.callReadResolve(SerializationMethodInvoker.java:66) at hudson.util.RobustReflectionConverter.unmarshal(RobustReflectionConverter.java:271) at com.thoughtworks.xstream.core.TreeUnmarshaller.convert(TreeUnmarshaller.java:72) at com.thoughtworks.xstream.core.AbstractReferenceUnmarshaller.convert(AbstractReferenceUnmarshaller.java:65) at com.thoughtworks.xstream.core.TreeUnmarshaller.convertAnother(TreeUnmarshaller.java:66) at hudson.util.RobustReflectionConverter.unmarshalField(RobustReflectionConverter.java:393) at hudson.util.RobustReflectionConverter.doUnmarshal(RobustReflectionConverter.java:331) at hudson.util.RobustReflectionConverter.unmarshal(RobustReflectionConverter.java:270) at com.thoughtworks.xstream.core.TreeUnmarshaller.convert(TreeUnmarshaller.java:72) at com.thoughtworks.xstream.core.AbstractReferenceUnmarshaller.convert(AbstractReferenceUnmarshaller.java:65) at com.thoughtworks.xstream.core.TreeUnmarshaller.convertAnother(TreeUnmarshaller.java:66) at com.thoughtworks.xstream.core.TreeUnmarshaller.convertAnother(TreeUnmarshaller.java:50) at com.thoughtworks.xstream.converters.collections.AbstractCollectionConverter.readItem(AbstractCollectionConverter.java:71) at hudson.util.RobustCollectionConverter.populateCollection(RobustCollectionConverter.java:85) at com.thoughtworks.xstream.converters.collections.CollectionConverter.unmarshal(CollectionConverter.java:80) at hudson.util.RobustCollectionConverter.unmarshal(RobustCollectionConverter.java:76) at com.thoughtworks.xstream.core.TreeUnmarshaller.convert(TreeUnmarshaller.java:72) at com.thoughtworks.xstream.core.AbstractReferenceUnmarshaller.convert(AbstractReferenceUnmarshaller.java:65) at com.thoughtworks.xstream.core.TreeUnmarshaller.convertAnother(TreeUnmarshaller.java:66) at hudson.util.RobustReflectionConverter.unmarshalField(RobustReflectionConverter.java:393) at hudson.util.RobustReflectionConverter.doUnmarshal(RobustReflectionConverter.java:331) at hudson.util.RobustReflectionConverter.unmarshal(RobustReflectionConverter.java:270) at com.thoughtworks.xstream.core.TreeUnmarshaller.convert(TreeUnmarshaller.java:72) at com.thoughtworks.xstream.core.AbstractReferenceUnmarshaller.convert(AbstractReferenceUnmarshaller.java:65) at com.thoughtworks.xstream.core.TreeUnmarshaller.convertAnother(TreeUnmarshaller.java:66) at com.thoughtworks.xstream.core.TreeUnmarshaller.convertAnother(TreeUnmarshaller.java:50) at com.thoughtworks.xstream.core.TreeUnmarshaller.start(TreeUnmarshaller.java:134) at com.thoughtworks.xstream.core.AbstractTreeMarshallingStrategy.unmarshal(AbstractTreeMarshallingStrategy.java:32) at com.thoughtworks.xstream.XStream.unmarshal(XStream.java:1189) at hudson.util.XStream2.unmarshal(XStream2.java:114) at com.thoughtworks.xstream.XStream.unmarshal(XStream.java:1173) at hudson.XmlFile.unmarshal(XmlFile.java:167) at hudson.model.Run.reload(Run.java:336) at hudson.model.Run.<init>(Run.java:324) at hudson.model.AbstractBuild.<init>(AbstractBuild.java:173) at hudson.model.Build.<init>(Build.java:104) at com.tikal.jenkins.plugins.multijob.MultiJobBuild.<init>(MultiJobBuild.java:59) at sun.reflect.GeneratedConstructorAccessor501.newInstance(Unknown Source) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at jenkins.model.lazy.LazyBuildMixIn.loadBuild(LazyBuildMixIn.java:165) at jenkins.model.lazy.LazyBuildMixIn$1.create(LazyBuildMixIn.java:142) at hudson.model.RunMap.retrieve(RunMap.java:224) at hudson.model.RunMap.retrieve(RunMap.java:57) at jenkins.model.lazy.AbstractLazyLoadRunMap.load(AbstractLazyLoadRunMap.java:500) at jenkins.model.lazy.AbstractLazyLoadRunMap.load(AbstractLazyLoadRunMap.java:482) at jenkins.model.lazy.AbstractLazyLoadRunMap.getByNumber(AbstractLazyLoadRunMap.java:380) - locked <0x000000008cdab698> (a hudson.model.RunMap) at jenkins.model.lazy.LazyBuildMixIn.getBuildByNumber(LazyBuildMixIn.java:231) at hudson.model.AbstractProject.getBuildByNumber(AbstractProject.java:926) at hudson.model.AbstractProject.getBuildByNumber(AbstractProject.java:137) at hudson.model.Run.fromExternalizableId(Run.java:2345) at hudson.model.Run$Replacer.readResolve(Run.java:1937) at sun.reflect.GeneratedMethodAccessor51.invoke(Unknown Needs investigation

          oleg_nenashev jglick Could this be related to JENKINS-49328 and nested references ? We see the `hudson.model.Run$Replacer.readResolve` in the stacktrace. I have also seen a case were the hudson.model.RunMap#retrieve fails with a StackOverflowError, showing the same stacktrace as we can see here.

          Allan BURDAJEWICZ added a comment - oleg_nenashev jglick Could this be related to JENKINS-49328 and nested references ? We see the `hudson.model.Run$Replacer.readResolve` in the stacktrace. I have also seen a case were the hudson.model.RunMap#retrieve fails with a StackOverflowError, showing the same stacktrace as we can see here.

            Unassigned Unassigned
            ketchumm Mark Ketchum
            Votes:
            1 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated:
              Resolved: