Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-58301

Kubernetes Plugin Repeated Socket Ping Timeout Exceptions

    XMLWordPrintable

Details

    • Bug
    • Status: Open (View Workflow)
    • Major
    • Resolution: Unresolved
    • kubernetes-plugin
    • Jenkins version: 2.182
      Kubernetes Plugin: 1.16.2
      Kubernetes: v1.13.7-eks-c57ff8

    Description

      I often get issues like this:
      java.net.SocketTimeoutException: sent ping but didn't receive pong within 1000ms (after 330 successful ping/pongs)

      One single issue breaks the entire task and makes it hard to even cancel the task. Should this not be retried rather than break execution? Our Jenkins will run longer running tasks as well. Any single task breaking stop in the middle is a real issue, and I don't see why one network issue after 330 successful ones (in this case) is such a big issue.

      x

      Attachments

        Issue Links

          Activity

            deiwin Deiwin Sarjas added a comment -

            We've also been seeing quite a bit of this recently on EKS with Jenkins 2.164.2, Kubernetes plugin 1.15.2.

            deiwin Deiwin Sarjas added a comment - We've also been seeing quite a bit of this recently on EKS with Jenkins 2.164.2, Kubernetes plugin 1.15.2.
            chengjingtao cheng jingtao added a comment -

            +1

            chengjingtao cheng jingtao added a comment - +1
            tyrone_grech Tyrone Grech added a comment -

            We are also encountering this issue fairly often in our CI system running:

            • On premises Kubernetes cluster on version 1.14.1
            • Jenkins version 2.186
            • Kubernetes Plugin version 1.17.2
            tyrone_grech Tyrone Grech added a comment - We are also encountering this issue fairly often in our CI system running: On premises Kubernetes cluster on version 1.14.1 Jenkins version 2.186 Kubernetes Plugin version 1.17.2
            deiwin Deiwin Sarjas added a comment -

            We configured -Dkubernetes.websocket.ping.interval=30000 for Jenkins based on this comment on another issue. I'll report back if it helps or not.

            deiwin Deiwin Sarjas added a comment - We configured -Dkubernetes.websocket.ping.interval=30000 for Jenkins based on this comment on another issue . I'll report back if it helps or not.

            That option helped for us. But the reason why the pings started to fail was actually the JVM garbage collector, which caused the master to hang for more than 1 second. We switched from the default to G1GC to reduce time the master is blocked, and this helped with other timeouts too.

            juhtie01 Juha Tiensyrjä added a comment - That option helped for us. But the reason why the pings started to fail was actually the JVM garbage collector, which caused the master to hang for more than 1 second. We switched from the default to G1GC to reduce time the master is blocked, and this helped with other timeouts too.
            allan_burdajewicz Allan BURDAJEWICZ added a comment - I believe that this issue is resolved since the release of version 1.19.3 that uses kubernetes-client 4.6.0 in which the default ping interval is 30 seconds: https://github.com/fabric8io/kubernetes-client/commit/2b1799497f46de81c841ea43808472d3239e7209#diff-7a4b549d7e10b88fbe20ebe680f6b25b https://github.com/jenkinsci/kubernetes-plugin/commit/464320a012fa0fd47b92f3af3d0403afd22c41a5#diff-600376dffeb79835ede4a0b285078036 https://github.com/jenkinsci/kubernetes-client-api-plugin/blob/kubernetes-client-api-4.6.0-1/pom.xml#L20 maybe vlatombe can confirm ?

            People

              Unassigned Unassigned
              autarchprinceps autarch princeps
              Votes:
              5 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated: