Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-42187

Docker plugin causes queue hanging in the case of hanging requests

    XMLWordPrintable

Details

    Description

      Stacktrace explanation: When DockerOnceRetentionStrategy runs, it locks the Jenkins Queue. If it decides to terminate a Cloud agent, the a DockerSlave#_terminate() gets invoked. This method invokes the REST API call using docker-java. This call has no timeout: https://github.com/jenkinsci/docker-plugin/blob/master/docker-plugin/src/main/java/com/nirima/jenkins/plugins/docker/DockerSlave.java#L168 .

      If the REST API hangs due to whatever reason, the entire Queue hangs till the request gets interrupted somehow. We see it on one of the instances, where containers cannot be terminated sometimes.

      Queue hanging causes massive outage of the Jenkins functionality, including build scheduling and particular UI widgets.

      IMHO all calls to Docker Java REST API in the plugin should have the timeout specified. E.g. Yet Another Docker Plugin does it: https://github.com/KostyaSha/yet-another-docker-plugin/blob/6853301c885447ca31648d6cfa4861e6e272bf16/yet-another-docker-plugin/src/main/java/com/github/kostyasha/yad/commons/DockerStopContainer.java#L37-L41

      java.lang.Thread.State: RUNNABLE
      at java.net.SocketInputStream.socketRead0(Native Method)
      at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
      at java.net.SocketInputStream.read(SocketInputStream.java:170)
      at java.net.SocketInputStream.read(SocketInputStream.java:141)
      at sun.security.ssl.InputRecord.readFully(InputRecord.java:465)
      at sun.security.ssl.InputRecord.read(InputRecord.java:503)
      at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:973)
      - locked <0x0000000704c04ba8> (a java.lang.Object)
      at sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:930)
      at sun.security.ssl.AppInputStream.read(AppInputStream.java:105)
      - locked <0x0000000704c06bc8> (a sun.security.ssl.AppInputStream)
      at org.apache.http.impl.io.SessionInputBufferImpl.streamRead(SessionInputBufferImpl.java:137)
      at org.apache.http.impl.io.SessionInputBufferImpl.fillBuffer(SessionInputBufferImpl.java:153)
      at org.apache.http.impl.io.SessionInputBufferImpl.readLine(SessionInputBufferImpl.java:282)
      at org.apache.http.impl.io.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:129)
      at org.apache.http.impl.io.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:53)
      at org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:259)
      at org.apache.http.impl.DefaultBHttpClientConnection.receiveResponseHeader(DefaultBHttpClientConnection.java:163)
      at org.apache.http.impl.conn.CPoolProxy.receiveResponseHeader(CPoolProxy.java:167)
      at org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:273)
      at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125)
      at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:271)
      at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:184)
      at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:88)
      at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110)
      at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:184)
      at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:71)
      at org.glassfish.jersey.apache.connector.ApacheConnector.apply(ApacheConnector.java:435)
      at org.glassfish.jersey.client.ClientRuntime.invoke(ClientRuntime.java:252)
      at org.glassfish.jersey.client.JerseyInvocation$1.call(JerseyInvocation.java:684)
      at org.glassfish.jersey.client.JerseyInvocation$1.call(JerseyInvocation.java:681)
      at org.glassfish.jersey.internal.Errors.process(Errors.java:315)
      at org.glassfish.jersey.internal.Errors.process(Errors.java:297)
      at org.glassfish.jersey.internal.Errors.process(Errors.java:228)
      at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:444)
      at org.glassfish.jersey.client.JerseyInvocation.invoke(JerseyInvocation.java:681)
      at org.glassfish.jersey.client.JerseyInvocation$Builder.method(JerseyInvocation.java:437)
      at org.glassfish.jersey.client.JerseyInvocation$Builder.post(JerseyInvocation.java:343)
      at com.github.dockerjava.jaxrs.StopContainerCmdExec.execute(StopContainerCmdExec.java:31)
      at com.github.dockerjava.jaxrs.StopContainerCmdExec.execute(StopContainerCmdExec.java:12)
      at com.github.dockerjava.jaxrs.AbstrSyncDockerCmdExec.exec(AbstrSyncDockerCmdExec.java:23)
      at com.github.dockerjava.core.command.AbstrDockerCmd.exec(AbstrDockerCmd.java:35)
      at com.github.dockerjava.core.command.StopContainerCmdImpl.exec(StopContainerCmdImpl.java:63)
      at com.nirima.jenkins.plugins.docker.DockerSlave._terminate(DockerSlave.java:168)
      at hudson.slaves.AbstractCloudSlave.terminate(AbstractCloudSlave.java:67)
      at com.nirima.jenkins.plugins.docker.strategy.DockerOnceRetentionStrategy$1$1.run(DockerOnceRetentionStrategy.java:112)
      at hudson.model.Queue._withLock(Queue.java:1306)
      at hudson.model.Queue.withLock(Queue.java:1189)
      at com.nirima.jenkins.plugins.docker.strategy.DockerOnceRetentionStrategy$1.run(DockerOnceRetentionStrategy.java:106)
      

      Attachments

        Issue Links

          Activity

            oleg_nenashev Oleg Nenashev created issue -
            jglick Jesse Glick added a comment -

            Could StopContainerCmdImpl.exec be run in a background thread, and DockerSlave._terminate return immediately?

            jglick Jesse Glick added a comment - Could StopContainerCmdImpl.exec be run in a background thread, and DockerSlave._terminate return immediately?
            oleg_nenashev Oleg Nenashev added a comment -

            jglick it could. It is what is being done in Yet Another Docker plugin.
            Timeout is just a minimal patch of course

            oleg_nenashev Oleg Nenashev added a comment - jglick it could. It is what is being done in Yet Another Docker plugin. Timeout is just a minimal patch of course

            Code changed in jenkins
            User: Peter Darton
            Path:
            src/main/java/org/jenkinsci/plugins/vSphereCloud.java
            http://jenkins-ci.org/commit/vsphere-cloud-plugin/b9594afef5b74a99a65d0b9f426854e835346af8
            Log:
            Slave termination now deletes VMs asynchronously.

            JENKINS-42187 applies to us too; same cause, same fix.
            So we avoid trying to delete VMs in-line with the slave's
            terminate method and instead schedule deletion for later.

            scm_issue_link SCM/JIRA link daemon added a comment - Code changed in jenkins User: Peter Darton Path: src/main/java/org/jenkinsci/plugins/vSphereCloud.java http://jenkins-ci.org/commit/vsphere-cloud-plugin/b9594afef5b74a99a65d0b9f426854e835346af8 Log: Slave termination now deletes VMs asynchronously. JENKINS-42187 applies to us too; same cause, same fix. So we avoid trying to delete VMs in-line with the slave's terminate method and instead schedule deletion for later.
            ndeloof Nicolas De Loof made changes -
            Field Original Value New Value
            Assignee magnayn [ magnayn ] Nicolas De Loof [ ndeloof ]
            ndeloof Nicolas De Loof made changes -
            Link This issue is duplicated by JENKINS-38369 [ JENKINS-38369 ]
            ndeloof Nicolas De Loof made changes -
            ndeloof Nicolas De Loof made changes -
            Remote Link This issue links to "PR (Web Link)" [ 17672 ]
            ndeloof Nicolas De Loof made changes -
            Resolution Fixed [ 1 ]
            Status Open [ 1 ] Closed [ 6 ]
            jamesdumay James Dumay made changes -
            Remote Link This issue links to "CloudBees Internal OSS-2010 (Web Link)" [ 18460 ]

            People

              ndeloof Nicolas De Loof
              oleg_nenashev Oleg Nenashev
              Votes:
              1 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: