Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-58513

Flake in RestartPipelineTest#terminatedPodAfterRestart

    XMLWordPrintable

    Details

    • Similar Issues:

      Description

      I'm getting test failures from time to time with the following error.

      Tests run: 4, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 227.45 s <<< FAILURE! - in org.csanchez.jenkins.plugins.kubernetes.pipeline.RestartPipelineTest
       [ERROR] terminatedPodAfterRestart(org.csanchez.jenkins.plugins.kubernetes.pipeline.RestartPipelineTest)  Time elapsed: 52.479 s  <<< FAILURE!
       java.lang.AssertionError: 
       
       Expected: a string containing " was deleted, but do not have a node body to cancel"
            but: was "Started
       Running in Durability level: MAX_SURVIVABILITY
       [Pipeline] Start of Pipeline
       [Pipeline] podTemplate
       [Pipeline] {
       [Pipeline] node
       Agent terminatedpodafterrestart-kjsfm-7krt9 is provisioned from template Kubernetes Pod Template
       Agent specification [Kubernetes Pod Template] (terminatedPodAfterRestart): 
       ---
       apiVersion: "v1"
       kind: "Pod"
       metadata:
         annotations:
           buildUrl: "http://100.96.12.163:44000/jenkins/job/terminated%20Pod%20After%20Restart/1/"
         labels:
           jenkins: "slave"
           BUILD_NUMBER: "2"
           test: "terminatedPodAfterRestart"
           jenkins/terminatedPodAfterRestart: "true"
           BRANCH_NAME: "PR-546"
           class: "RestartPipelineTest"
         name: "terminatedpodafterrestart-kjsfm-7krt9"
       spec:
         containers:
         - command:
           - "/bin/cat"
           env:
           - name: "JENKINS_SECRET"
             value: "984e0b6a03b49c6ad8c2c8e24cef61e2dd279e885c46cb7278e0f56d8b920880"
           - name: "JENKINS_AGENT_NAME"
             value: "terminatedpodafterrestart-kjsfm-7krt9"
           - name: "JENKINS_NAME"
             value: "terminatedpodafterrestart-kjsfm-7krt9"
           - name: "JENKINS_URL"
             value: "http://100.96.12.163:44000/jenkins/"
           image: "busybox"
           imagePullPolicy: "IfNotPresent"
           name: "busybox"
           resources:
             limits: {}
             requests: {}
           securityContext:
             privileged: false
           tty: true
           volumeMounts:
           - mountPath: "/home/jenkins"
             name: "workspace-volume"
             readOnly: false
           workingDir: "/home/jenkins"
         - env:
           - name: "JENKINS_SECRET"
             value: "984e0b6a03b49c6ad8c2c8e24cef61e2dd279e885c46cb7278e0f56d8b920880"
           - name: "JENKINS_AGENT_NAME"
             value: "terminatedpodafterrestart-kjsfm-7krt9"
           - name: "JENKINS_NAME"
             value: "terminatedpodafterrestart-kjsfm-7krt9"
           - name: "JENKINS_URL"
             value: "http://100.96.12.163:44000/jenkins/"
           image: "jenkins/jnlp-slave:alpine"
           name: "jnlp"
           volumeMounts:
           - mountPath: "/home/jenkins"
             name: "workspace-volume"
             readOnly: false
         nodeSelector: {}
         restartPolicy: "Never"
         volumes:
         - emptyDir: {}
           name: "workspace-volume"
       
       Running on terminatedpodafterrestart-kjsfm-7krt9 in /home/jenkins/workspace/terminated Pod After Restart
       [Pipeline] {
       [Pipeline] container
       [Pipeline] {
       [Pipeline] sh
       + sleep 9999999
       Resuming build at Tue Jul 16 11:34:37 UTC 2019 after Jenkins restart
       Waiting to resume part of terminated Pod After Restart #1: In the quiet period. Expires in 0 ms
       Still trying to load Looking for path named ‘/home/jenkins/workspace/terminated Pod After Restart’ on computer named ‘terminatedpodafterrestart-kjsfm-7krt9’
       Waiting to resume part of terminated Pod After Restart #1: ‘terminatedpodafterrestart-kjsfm-7krt9’ is offline
       Still trying to load Looking for path named ‘/home/jenkins/workspace/terminated Pod After Restart’ on computer named ‘terminatedpodafterrestart-kjsfm-7krt9’
       Waiting to resume part of terminated Pod After Restart #1: ‘terminatedpodafterrestart-kjsfm-7krt9’ is offline
       Still trying to load Looking for path named ‘/home/jenkins/workspace/terminated Pod After Restart’ on computer named ‘terminatedpodafterrestart-kjsfm-7krt9’
       Waiting to resume part of terminated Pod After Restart #1: ‘terminatedpodafterrestart-kjsfm-7krt9’ is offline
       Still trying to load Looking for path named ‘/home/jenkins/workspace/terminated Pod After Restart’ on computer named ‘terminatedpodafterrestart-kjsfm-7krt9’
       Ready to run at Tue Jul 16 11:34:49 UTC 2019
       Agent terminatedpodafterrestart-kjsfm-7krt9 is provisioned from template Kubernetes Pod Template
       Agent specification [Kubernetes Pod Template] (terminatedPodAfterRestart): 
       ---
       apiVersion: "v1"
       kind: "Pod"
       metadata:
         annotations:
           buildUrl: "http://100.96.12.163:44000/jenkins/job/terminated%20Pod%20After%20Restart/1/"
         labels:
           jenkins: "slave"
           BUILD_NUMBER: "2"
           test: "terminatedPodAfterRestart"
           jenkins/terminatedPodAfterRestart: "true"
           BRANCH_NAME: "PR-546"
           class: "RestartPipelineTest"
         name: "terminatedpodafterrestart-kjsfm-7krt9"
       spec:
         containers:
         - command:
           - "/bin/cat"
           env:
           - name: "JENKINS_SECRET"
             value: "984e0b6a03b49c6ad8c2c8e24cef61e2dd279e885c46cb7278e0f56d8b920880"
           - name: "JENKINS_AGENT_NAME"
             value: "terminatedpodafterrestart-kjsfm-7krt9"
           - name: "JENKINS_NAME"
             value: "terminatedpodafterrestart-kjsfm-7krt9"
           - name: "JENKINS_URL"
             value: "http://localhost:44000/jenkins/"
           image: "busybox"
           imagePullPolicy: "IfNotPresent"
           name: "busybox"
           resources:
             limits: {}
             requests: {}
           securityContext:
             privileged: false
           tty: true
           volumeMounts:
           - mountPath: "/home/jenkins"
             name: "workspace-volume"
             readOnly: false
           workingDir: "/home/jenkins"
         - env:
           - name: "JENKINS_SECRET"
             value: "984e0b6a03b49c6ad8c2c8e24cef61e2dd279e885c46cb7278e0f56d8b920880"
           - name: "JENKINS_AGENT_NAME"
             value: "terminatedpodafterrestart-kjsfm-7krt9"
           - name: "JENKINS_NAME"
             value: "terminatedpodafterrestart-kjsfm-7krt9"
           - name: "JENKINS_URL"
             value: "http://localhost:44000/jenkins/"
           image: "jenkins/jnlp-slave:alpine"
           name: "jnlp"
           volumeMounts:
           - mountPath: "/home/jenkins"
             name: "workspace-volume"
             readOnly: false
         nodeSelector: {}
         restartPolicy: "Never"
         volumes:
         - emptyDir: {}
           name: "workspace-volume"
       
       terminatedpodafterrestart-kjsfm-7krt9 was marked offline: Connection was broken: java.nio.channels.ClosedChannelException
       	at org.jenkinsci.remoting.protocol.impl.ChannelApplicationLayer.onReadClosed(ChannelApplicationLayer.java:209)
       	at org.jenkinsci.remoting.protocol.ApplicationLayer.onRecvClosed(ApplicationLayer.java:222)
       	at org.jenkinsci.remoting.protocol.ProtocolStack$Ptr.onRecvClosed(ProtocolStack.java:816)
       	at org.jenkinsci.remoting.protocol.FilterLayer.onRecvClosed(FilterLayer.java:287)
       	at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.onRecvClosed(SSLEngineFilterLayer.java:181)
       	at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.switchToNoSecure(SSLEngineFilterLayer.java:283)
       	at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.processWrite(SSLEngineFilterLayer.java:503)
       	at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.processQueuedWrites(SSLEngineFilterLayer.java:248)
       	at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.doSend(SSLEngineFilterLayer.java:200)
       	at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.doCloseSend(SSLEngineFilterLayer.java:213)
       	at org.jenkinsci.remoting.protocol.ProtocolStack$Ptr.doCloseSend(ProtocolStack.java:784)
       	at org.jenkinsci.remoting.protocol.ApplicationLayer.doCloseWrite(ApplicationLayer.java:173)
       	at org.jenkinsci.remoting.protocol.impl.ChannelApplicationLayer$ByteBufferCommandTransport.closeWrite(ChannelApplicationLayer.java:314)
       	at hudson.remoting.Channel.close(Channel.java:1452)
       	at hudson.remoting.Channel.close(Channel.java:1405)
       	at hudson.slaves.SlaveComputer.closeChannel(SlaveComputer.java:844)
       	at hudson.slaves.SlaveComputer.kill(SlaveComputer.java:811)
       	at hudson.model.AbstractCIBase.killComputer(AbstractCIBase.java:89)
       	at hudson.model.AbstractCIBase.updateComputerList(AbstractCIBase.java:233)
       	at jenkins.model.Jenkins.updateComputerList(Jenkins.java:1576)
       	at jenkins.model.Nodes$6.run(Nodes.java:271)
       	at hudson.model.Queue._withLock(Queue.java:1379)
       	at hudson.model.Queue.withLock(Queue.java:1256)
       	at jenkins.model.Nodes.removeNode(Nodes.java:262)
       	at jenkins.model.Jenkins.removeNode(Jenkins.java:2092)
       	at org.csanchez.jenkins.plugins.kubernetes.pod.retention.Reaper.eventReceived(Reaper.java:122)
       	at org.csanchez.jenkins.plugins.kubernetes.pod.retention.Reaper.eventReceived(Reaper.java:48)
       	at io.fabric8.kubernetes.client.utils.WatcherToggle.eventReceived(WatcherToggle.java:49)
       	at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$1.onMessage(WatchConnectionManager.java:232)
       	at okhttp3.internal.ws.RealWebSocket.onReadMessage(RealWebSocket.java:323)
       	at okhttp3.internal.ws.WebSocketReader.readMessageFrame(WebSocketReader.java:219)
       	at okhttp3.internal.ws.WebSocketReader.processNextFrame(WebSocketReader.java:105)
       	at okhttp3.internal.ws.RealWebSocket.loopReader(RealWebSocket.java:274)
       	at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:214)
       	at okhttp3.RealCall$AsyncCall.execute(RealCall.java:206)
       	at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
       	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
       	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
       	at java.lang.Thread.run(Thread.java:748)
       
       Cannot contact terminatedpodafterrestart-kjsfm-7krt9: hudson.remoting.RequestAbortedException: java.nio.channels.ClosedChannelException
       [Pipeline] }
       [Pipeline] // container
       [Pipeline] }
       [Pipeline] // node
       [Pipeline] // node
       [Pipeline] }
       [Pipeline] // podTemplate
       [Pipeline] End of Pipeline
       Agent was removed
       Finished: ABORTED
       "
       	at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
       	at org.junit.Assert.assertThat(Assert.java:956)
       	at org.junit.Assert.assertThat(Assert.java:923)
       	at org.jvnet.hudson.test.JenkinsRule.assertLogContains(JenkinsRule.java:1387)
       	at org.csanchez.jenkins.plugins.kubernetes.pipeline.RestartPipelineTest.lambda$terminatedPodAfterRestart$5(RestartPipelineTest.java:218)
       	at org.jvnet.hudson.test.RestartableJenkinsRule$3.evaluate(RestartableJenkinsRule.java:232)
       	at org.jvnet.hudson.test.RestartableJenkinsRule$6.evaluate(RestartableJenkinsRule.java:272)
       	at org.jvnet.hudson.test.JenkinsRule$1.evaluate(JenkinsRule.java:599)
       	at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
       	at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
       	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
       	at java.lang.Thread.run(Thread.java:748) 

        Attachments

          Issue Links

            Activity

            Hide
            jglick Jesse Glick added a comment -

            kubernetes #561 removed the failing assertion. Still need to track down the reason.

            Show
            jglick Jesse Glick added a comment - kubernetes #561 removed the failing assertion. Still need to track down the reason.
            Hide
            vlatombe Vincent Latombe added a comment - - edited

            Jesse Glick while attempting to track down the root cause, I saw that in the failure cases it was running this line. Since it runs asynchronously, this is probably a race condition with the thread handling node deletion.

            Show
            vlatombe Vincent Latombe added a comment - - edited Jesse Glick while attempting to track down the root cause, I saw that in the failure cases it was running this  line. Since it runs asynchronously, this is probably a race condition with the thread handling node deletion.
            Hide
            jglick Jesse Glick added a comment -
            Show
            jglick Jesse Glick added a comment - permalink

              People

              Assignee:
              jglick Jesse Glick
              Reporter:
              vlatombe Vincent Latombe
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Dates

                Created:
                Updated: