Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-58290

WebSocket / OkHttp thread leak from BourneShellScript + ContainerExecDecorator

XMLWordPrintable

    • durable-task 1.30

      The durable-task plugin runs a wrapper process which redirects the user process' stdout/err to a file and sends its exit code to another file. Thus there is no need for the agent JVM to hold onto a process handle for the wrapper; it should be fork-and-forget. In fact the Proc is discarded.

      Unfortunately, the current implementation in BourneShellScript does not actually allow the Proc to exit until the user process also exits. On a regular agent this does not matter much. But when you run sh steps inside container on a Kubernetes agent, ContainerExecDecorator and ContainerExecProc actually keep a WebSocket open for the duration of the launched process. This consumes resources on the Kubernetes API server; it is possible to run out of connections. It also consumes three master-side Java threads per sh, like

      "OkHttp http://…/..." #361 prio=5 os_prio=0 tid=… nid=… runnable […]
         java.lang.Thread.State: RUNNABLE
              at java.net.SocketInputStream.socketRead0(Native Method)
              at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
              at java.net.SocketInputStream.read(SocketInputStream.java:171)
              at java.net.SocketInputStream.read(SocketInputStream.java:141)
              at okio.Okio$2.read(Okio.java:140)
              at okio.AsyncTimeout$2.read(AsyncTimeout.java:237)
              at okio.RealBufferedSource.request(RealBufferedSource.java:68)
              at okio.RealBufferedSource.require(RealBufferedSource.java:61)
              at okio.RealBufferedSource.readByte(RealBufferedSource.java:74)
              at okhttp3.internal.ws.WebSocketReader.readHeader(WebSocketReader.java:117)
              at okhttp3.internal.ws.WebSocketReader.processNextFrame(WebSocketReader.java:101)
              at okhttp3.internal.ws.RealWebSocket.loopReader(RealWebSocket.java:274)
              at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:214)
              at okhttp3.RealCall$AsyncCall.execute(RealCall.java:206)
              at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
              at java.lang.Thread.run(Thread.java:748)
      
      "OkHttp WebSocket http://…/..." #359 prio=5 os_prio=0 tid=… nid=… waiting on condition […]
         java.lang.Thread.State: TIMED_WAITING (parking)
              at sun.misc.Unsafe.park(Native Method)
              - parking to wait for  <…> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
              at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
              at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
              at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1093)
              at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809)
              at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
              at java.lang.Thread.run(Thread.java:748)
      
      "pool-73-thread-1" #358 prio=5 os_prio=0 tid=… nid=… waiting on condition […]
         java.lang.Thread.State: TIMED_WAITING (sleeping)
              at java.lang.Thread.sleep(Native Method)
              at io.fabric8.kubernetes.client.utils.NonBlockingInputStreamPumper.run(NonBlockingInputStreamPumper.java:57)
              at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
              at java.util.concurrent.FutureTask.run(FutureTask.java:266)
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
              at java.lang.Thread.run(Thread.java:748)
      

      To see the problem, you can run

      while (true) {
          podTemplate(label: BUILD_TAG, containers: [containerTemplate(name: 'ubuntu', image: 'ubuntu', command: 'sleep', args: 'infinity')]) {
              node (BUILD_TAG) {
                  container('ubuntu') {
                      branches = [:] // TODO cannot use collectEntries because: java.io.NotSerializableException: groovy.lang.IntRange
                      for (int x = 0; x < 1000; x += 5) {
                          def _x = x
                          branches["sleep$x"] = {
                              sleep time: _x, unit: 'SECONDS'
                              sh """set +x; while :; do echo -n "$_x "; date; sleep 10; done"""
                          }
                      }
                      parallel branches
                  }
              }
          }
      }
      

      and watch via

      while :; do jstack $pid | fgrep '"' | sort | egrep -i 'ok|pool' > /tmp/x; clear; cat /tmp/x; sleep 5; done
      

            jglick Jesse Glick
            jglick Jesse Glick
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated:
              Resolved: