Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-42187

Docker plugin causes queue hanging in the case of hanging requests

    XMLWordPrintable

Details

    Description

      Stacktrace explanation: When DockerOnceRetentionStrategy runs, it locks the Jenkins Queue. If it decides to terminate a Cloud agent, the a DockerSlave#_terminate() gets invoked. This method invokes the REST API call using docker-java. This call has no timeout: https://github.com/jenkinsci/docker-plugin/blob/master/docker-plugin/src/main/java/com/nirima/jenkins/plugins/docker/DockerSlave.java#L168 .

      If the REST API hangs due to whatever reason, the entire Queue hangs till the request gets interrupted somehow. We see it on one of the instances, where containers cannot be terminated sometimes.

      Queue hanging causes massive outage of the Jenkins functionality, including build scheduling and particular UI widgets.

      IMHO all calls to Docker Java REST API in the plugin should have the timeout specified. E.g. Yet Another Docker Plugin does it: https://github.com/KostyaSha/yet-another-docker-plugin/blob/6853301c885447ca31648d6cfa4861e6e272bf16/yet-another-docker-plugin/src/main/java/com/github/kostyasha/yad/commons/DockerStopContainer.java#L37-L41

      java.lang.Thread.State: RUNNABLE
      at java.net.SocketInputStream.socketRead0(Native Method)
      at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
      at java.net.SocketInputStream.read(SocketInputStream.java:170)
      at java.net.SocketInputStream.read(SocketInputStream.java:141)
      at sun.security.ssl.InputRecord.readFully(InputRecord.java:465)
      at sun.security.ssl.InputRecord.read(InputRecord.java:503)
      at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:973)
      - locked <0x0000000704c04ba8> (a java.lang.Object)
      at sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:930)
      at sun.security.ssl.AppInputStream.read(AppInputStream.java:105)
      - locked <0x0000000704c06bc8> (a sun.security.ssl.AppInputStream)
      at org.apache.http.impl.io.SessionInputBufferImpl.streamRead(SessionInputBufferImpl.java:137)
      at org.apache.http.impl.io.SessionInputBufferImpl.fillBuffer(SessionInputBufferImpl.java:153)
      at org.apache.http.impl.io.SessionInputBufferImpl.readLine(SessionInputBufferImpl.java:282)
      at org.apache.http.impl.io.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:129)
      at org.apache.http.impl.io.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:53)
      at org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:259)
      at org.apache.http.impl.DefaultBHttpClientConnection.receiveResponseHeader(DefaultBHttpClientConnection.java:163)
      at org.apache.http.impl.conn.CPoolProxy.receiveResponseHeader(CPoolProxy.java:167)
      at org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:273)
      at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125)
      at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:271)
      at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:184)
      at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:88)
      at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110)
      at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:184)
      at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:71)
      at org.glassfish.jersey.apache.connector.ApacheConnector.apply(ApacheConnector.java:435)
      at org.glassfish.jersey.client.ClientRuntime.invoke(ClientRuntime.java:252)
      at org.glassfish.jersey.client.JerseyInvocation$1.call(JerseyInvocation.java:684)
      at org.glassfish.jersey.client.JerseyInvocation$1.call(JerseyInvocation.java:681)
      at org.glassfish.jersey.internal.Errors.process(Errors.java:315)
      at org.glassfish.jersey.internal.Errors.process(Errors.java:297)
      at org.glassfish.jersey.internal.Errors.process(Errors.java:228)
      at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:444)
      at org.glassfish.jersey.client.JerseyInvocation.invoke(JerseyInvocation.java:681)
      at org.glassfish.jersey.client.JerseyInvocation$Builder.method(JerseyInvocation.java:437)
      at org.glassfish.jersey.client.JerseyInvocation$Builder.post(JerseyInvocation.java:343)
      at com.github.dockerjava.jaxrs.StopContainerCmdExec.execute(StopContainerCmdExec.java:31)
      at com.github.dockerjava.jaxrs.StopContainerCmdExec.execute(StopContainerCmdExec.java:12)
      at com.github.dockerjava.jaxrs.AbstrSyncDockerCmdExec.exec(AbstrSyncDockerCmdExec.java:23)
      at com.github.dockerjava.core.command.AbstrDockerCmd.exec(AbstrDockerCmd.java:35)
      at com.github.dockerjava.core.command.StopContainerCmdImpl.exec(StopContainerCmdImpl.java:63)
      at com.nirima.jenkins.plugins.docker.DockerSlave._terminate(DockerSlave.java:168)
      at hudson.slaves.AbstractCloudSlave.terminate(AbstractCloudSlave.java:67)
      at com.nirima.jenkins.plugins.docker.strategy.DockerOnceRetentionStrategy$1$1.run(DockerOnceRetentionStrategy.java:112)
      at hudson.model.Queue._withLock(Queue.java:1306)
      at hudson.model.Queue.withLock(Queue.java:1189)
      at com.nirima.jenkins.plugins.docker.strategy.DockerOnceRetentionStrategy$1.run(DockerOnceRetentionStrategy.java:106)
      

      Attachments

        Issue Links

          Activity

            Code changed in jenkins
            User: Peter Darton
            Path:
            src/main/java/org/jenkinsci/plugins/vSphereCloud.java
            http://jenkins-ci.org/commit/vsphere-cloud-plugin/b9594afef5b74a99a65d0b9f426854e835346af8
            Log:
            Slave termination now deletes VMs asynchronously.

            JENKINS-42187 applies to us too; same cause, same fix.
            So we avoid trying to delete VMs in-line with the slave's
            terminate method and instead schedule deletion for later.

            scm_issue_link SCM/JIRA link daemon added a comment - Code changed in jenkins User: Peter Darton Path: src/main/java/org/jenkinsci/plugins/vSphereCloud.java http://jenkins-ci.org/commit/vsphere-cloud-plugin/b9594afef5b74a99a65d0b9f426854e835346af8 Log: Slave termination now deletes VMs asynchronously. JENKINS-42187 applies to us too; same cause, same fix. So we avoid trying to delete VMs in-line with the slave's terminate method and instead schedule deletion for later.
            oleg_nenashev Oleg Nenashev added a comment -

            jglick it could. It is what is being done in Yet Another Docker plugin.
            Timeout is just a minimal patch of course

            oleg_nenashev Oleg Nenashev added a comment - jglick it could. It is what is being done in Yet Another Docker plugin. Timeout is just a minimal patch of course
            jglick Jesse Glick added a comment -

            Could StopContainerCmdImpl.exec be run in a background thread, and DockerSlave._terminate return immediately?

            jglick Jesse Glick added a comment - Could StopContainerCmdImpl.exec be run in a background thread, and DockerSlave._terminate return immediately?

            People

              ndeloof Nicolas De Loof
              oleg_nenashev Oleg Nenashev
              Votes:
              1 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: