Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-58301

Kubernetes Plugin Repeated Socket Ping Timeout Exceptions

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • kubernetes-plugin
    • Jenkins version: 2.182
      Kubernetes Plugin: 1.16.2
      Kubernetes: v1.13.7-eks-c57ff8

      I often get issues like this:
      java.net.SocketTimeoutException: sent ping but didn't receive pong within 1000ms (after 330 successful ping/pongs)

      One single issue breaks the entire task and makes it hard to even cancel the task. Should this not be retried rather than break execution? Our Jenkins will run longer running tasks as well. Any single task breaking stop in the middle is a real issue, and I don't see why one network issue after 330 successful ones (in this case) is such a big issue.

      x

          [JENKINS-58301] Kubernetes Plugin Repeated Socket Ping Timeout Exceptions

          Deiwin Sarjas added a comment -

          We've also been seeing quite a bit of this recently on EKS with Jenkins 2.164.2, Kubernetes plugin 1.15.2.

          Deiwin Sarjas added a comment - We've also been seeing quite a bit of this recently on EKS with Jenkins 2.164.2, Kubernetes plugin 1.15.2.

          cheng jingtao added a comment -

          +1

          cheng jingtao added a comment - +1

          Tyrone Grech added a comment -

          We are also encountering this issue fairly often in our CI system running:

          • On premises Kubernetes cluster on version 1.14.1
          • Jenkins version 2.186
          • Kubernetes Plugin version 1.17.2

          Tyrone Grech added a comment - We are also encountering this issue fairly often in our CI system running: On premises Kubernetes cluster on version 1.14.1 Jenkins version 2.186 Kubernetes Plugin version 1.17.2

          Deiwin Sarjas added a comment -

          We configured -Dkubernetes.websocket.ping.interval=30000 for Jenkins based on this comment on another issue. I'll report back if it helps or not.

          Deiwin Sarjas added a comment - We configured -Dkubernetes.websocket.ping.interval=30000 for Jenkins based on this comment on another issue . I'll report back if it helps or not.

          That option helped for us. But the reason why the pings started to fail was actually the JVM garbage collector, which caused the master to hang for more than 1 second. We switched from the default to G1GC to reduce time the master is blocked, and this helped with other timeouts too.

          Juha Tiensyrjä added a comment - That option helped for us. But the reason why the pings started to fail was actually the JVM garbage collector, which caused the master to hang for more than 1 second. We switched from the default to G1GC to reduce time the master is blocked, and this helped with other timeouts too.

          Allan BURDAJEWICZ added a comment - I believe that this issue is resolved since the release of version 1.19.3 that uses kubernetes-client 4.6.0 in which the default ping interval is 30 seconds: https://github.com/fabric8io/kubernetes-client/commit/2b1799497f46de81c841ea43808472d3239e7209#diff-7a4b549d7e10b88fbe20ebe680f6b25b https://github.com/jenkinsci/kubernetes-plugin/commit/464320a012fa0fd47b92f3af3d0403afd22c41a5#diff-600376dffeb79835ede4a0b285078036 https://github.com/jenkinsci/kubernetes-client-api-plugin/blob/kubernetes-client-api-4.6.0-1/pom.xml#L20 maybe vlatombe can confirm ?

            Unassigned Unassigned
            autarchprinceps autarch princeps
            Votes:
            5 Vote for this issue
            Watchers:
            9 Start watching this issue

              Created:
              Updated: