Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-63750

Thousands of JNLP4-connect threads blocked on object monitor

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • remoting
    • None

      Our JNLP agents often get stuck – the process is still running, Jenkins thinks they are still executing the build that was assigned to them, but the build is not progressing anywhere for days.

      When we started investigating the state of such stuck agents, the first thing we noticed was the following message in the agent's log:

      Sep 17, 2020 2:17:11 PM hudson.remoting.jnlp.Main$CuiListener status
      INFO: Connected
      Sep 17, 2020 2:17:30 PM hudson.remoting.RemoteInvocationHandler$Unexporter run
      SEVERE: Couldn't clean up oid=2 from null
      java.lang.OutOfMemoryError: unable to create new native thread
        at java.lang.Thread.start0(Native Method)
        at java.lang.Thread.start(Thread.java:717)
        at java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:957)
        at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1378)
        at java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:134)
        at hudson.remoting.DelegatingExecutorService.submit(DelegatingExecutorService.java:42)
        at hudson.remoting.InterceptingExecutorService.submit(InterceptingExecutorService.java:46)
        at hudson.remoting.InterceptingExecutorService.submit(InterceptingExecutorService.java:41)
        at org.jenkinsci.remoting.util.AnonymousClassWarnings.check(AnonymousClassWarnings.java:66)
        at org.jenkinsci.remoting.util.AnonymousClassWarnings$1.annotateClass(AnonymousClassWarnings.java:122)
        at java.io.ObjectOutputStream.writeNonProxyDesc(ObjectOutputStream.java:1290)
        at java.io.ObjectOutputStream.writeClassDesc(ObjectOutputStream.java:1231)
        at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1427)
        at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
        at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
        at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
        at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
        at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
        at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
        at java.io.ObjectOutputStream.defaultWriteObject(ObjectOutputStream.java:441)
        at java.lang.Throwable.writeObject(Throwable.java:1014)
        at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:1140)
        at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1496)
        at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
        at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
        at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
        at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
        at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
        at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
        at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)
        at hudson.remoting.Command.writeTo(Command.java:111)
        at hudson.remoting.AbstractByteBufferCommandTransport.write(AbstractByteBufferCommandTransport.java:287)
        at hudson.remoting.Channel.send(Channel.java:764)
        at hudson.remoting.RemoteInvocationHandler$PhantomReferenceImpl.cleanup(RemoteInvocationHandler.java:395)
        at hudson.remoting.RemoteInvocationHandler$PhantomReferenceImpl.access$1000(RemoteInvocationHandler.java:354)
        at hudson.remoting.RemoteInvocationHandler$Unexporter.run(RemoteInvocationHandler.java:612)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at hudson.remoting.AtmostOneThreadExecutor$Worker.run(AtmostOneThreadExecutor.java:111)
        at java.lang.Thread.run(Thread.java:748)
      

      Because the process is still running, we can obtain the thread dump, which contains 3653 lines for threads like this:

      "pool-1-thread-157 for JNLP4-connect connection to resources-ci-master-jenkins.grid.hosting.cerence.net/10.179.225.4:50003 id=1138934" #172 daemon prio=5 os_prio=0 tid=0x00007f2a1c002800 nid=0x3d3a waiting for monitor entry [0x00007f2a49c81000]
      "pool-1-thread-155 for JNLP4-connect connection to resources-ci-master-jenkins.grid.hosting.cerence.net/10.179.225.4:50003 id=1138866" #170 daemon prio=5 os_prio=0 tid=0x00007f29d8002000 nid=0x3d38 waiting for monitor entry [0x00007f2a4ad92000]
      "pool-1-thread-148 for JNLP4-connect connection to resources-ci-master-jenkins.grid.hosting.cerence.net/10.179.225.4:50003 id=1138261" #163 daemon prio=5 os_prio=0 tid=0x00007f2a8c043800 nid=0x131b8 waiting for monitor entry [0x00007f2a4a388000]
      

      Each of those threads (by quickly looking through the thread dump) seems to be stuck in the same call:

      "pool-1-thread-13682 for JNLP4-connect connection to resources-ci-master-jenkins.grid.hosting.cerence.net/10.179.225.4:50003 id=2766416" #13698 daemon prio=5 os_prio=0 tid=0x00007f2a8cba6800 nid=0x12b54 waiting for monitor entry [0x00007f1fdd8a0000]
         java.lang.Thread.State: BLOCKED (on object monitor)
        at sun.misc.Unsafe.defineClass(Native Method)
        at sun.reflect.ClassDefiner.defineClass(ClassDefiner.java:63)
        at sun.reflect.MethodAccessorGenerator$1.run(MethodAccessorGenerator.java:399)
        at sun.reflect.MethodAccessorGenerator$1.run(MethodAccessorGenerator.java:394)
        at java.security.AccessController.doPrivileged(Native Method)
        at sun.reflect.MethodAccessorGenerator.generate(MethodAccessorGenerator.java:393)
        at sun.reflect.MethodAccessorGenerator.generateMethod(MethodAccessorGenerator.java:75)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:53)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1170)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2177)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2068)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1572)
        at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2286)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2210)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2068)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1572)
        at java.io.ObjectInputStream.readObject(ObjectInputStream.java:430)
        at hudson.remoting.UserRequest.deserialize(UserRequest.java:290)
        at hudson.remoting.UserRequest.perform(UserRequest.java:189)
        at hudson.remoting.UserRequest.perform(UserRequest.java:54)
        at hudson.remoting.Request$2.run(Request.java:369)
        at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:72)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at hudson.remoting.Engine$1.lambda$newThread$0(Engine.java:117)
        at hudson.remoting.Engine$1$$Lambda$3/231593000.run(Unknown Source)
        at java.lang.Thread.run(Thread.java:748)
      

      This massive deadlock grinds the agent to a halt – I suppose that's because in this state, the user has reached its ulimit on number of (lightweight) processes (ulimit -u) of 4096.

      It could be that the trigger was some error in transmission over the network between the Jenkins master and the agent – I don't trust the network much. We also have occasional issues with reliability of the file systems. Anyhow, I am filing this ticket already, expecting that maybe it's partially caused by some issue in Jenkins or Remoting and will come back again.

            Unassigned Unassigned
            mkorvas Matěj Korvas
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated: