Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-58463

Job build failed by "Interrupted while waiting for websocket connection, you should increase the Max connections to Kubernetes API"

    • Icon: Bug Bug
    • Resolution: Fixed
    • Icon: Major Major
    • kubernetes-plugin
    • None
    • Plugin version: 1.16.3, Centos; Jenkins start with java ops: -Dorg.csanchez.jenkins.plugins.kubernetes.clients.cacheExpiration=2592000
      Kubernetes: AWS EKS

       Job will failed randomly by message "Interrupted while waiting for websocket connection, you should increase the Max connections to Kubernetes API".

      The metrics total requests reached 64 when the issue happen. 

      It seems in Kubernetes-plugin codes, the k8s client maxRequests is set to 64 by hard code. No configuration can change it. This java parameter also can not change it: kubernetes.max.concurrent.requests.

      This issue will happen even set MaxConcurrentRequestsPerHost to a big number.

      Once change MaxConcurrentRequestsPerHost in jenkins, the issue will disappear some times, but will occur again after running for a period. It seems it due to that the k8s API client recreated when change the configuration.

      Is it possible add a configuration for k8s client maxRequests? 

       

          [JENKINS-58463] Job build failed by "Interrupted while waiting for websocket connection, you should increase the Max connections to Kubernetes API"

          David Schott added a comment - - edited

          The PR build has passed and the artifact is available here if anyone wants to validate the new behavior.

          David Schott added a comment - - edited The PR build has passed and the artifact is available here  if anyone wants to validate the new behavior.

          For those impacted, would you be able to run a script like dumpK8sClientsRequests.groovy under Manage Jenkins > Script Console to check what kind of requests are queuing and monopolizing the k8s clients dispatchers ?

          The fix brought here will help raise the limit and work around the problem but I think it is still valuable to investigate what causes the number or concurrent requests to reach high numbers.

          Allan BURDAJEWICZ added a comment - For those impacted, would you be able to run a script like dumpK8sClientsRequests.groovy under Manage Jenkins > Script Console to check what kind of requests are queuing and monopolizing the k8s clients dispatchers ? The fix brought here will help raise the limit and work around the problem but I think it is still valuable to investigate what causes the number or concurrent requests to reach high numbers.

          shott85 Should we mark this a solved (ability to set the max concurrent requests released in 1.27.3) ?

          Allan BURDAJEWICZ added a comment - shott85 Should we mark this a solved (ability to set the max concurrent requests released in 1.27.3 ) ?

          Alec added a comment -

          What's the best way to get this to folks?  My output is about 400 lines.

          Alec added a comment - What's the best way to get this to folks?  My output is about 400 lines.

          David Schott added a comment -

          Released in 1.27.3

          David Schott added a comment - Released in 1.27.3

          David Schott added a comment -

          akloss if you have a file to share I think you can attach it to the ticket, or use https://gist.github.com (or similar) if you wish.

          David Schott added a comment - akloss  if you have a file to share I think you can attach it to the ticket, or use https://gist.github.com  (or similar) if you wish.

          Alec added a comment -

          Attached.  I replaced parts of a bunch of the names with X to obfuscate the job names.  I hope that's not too disorienting.

          Alec added a comment - Attached.  I replaced parts of a bunch of the names with X to obfuscate the job names.  I hope that's not too disorienting.

          Alec added a comment -

          How large a number should we expect to set this?  The helm chart for Jenkins has it semi-hard-coded at 32, which doesn't seem sufficient for us.

          Alec added a comment - How large a number should we expect to set this?  The helm chart for Jenkins has it semi-hard-coded at 32, which doesn't seem sufficient for us.

          akloss what version of kubernetes plugin are you using ?
          Maybe shott85 can confirm but I think that those /pods?fieldSelector and /events?fieldSelector are watches that wait for pods to be ready and maybe connected: https://github.com/jenkinsci/kubernetes-plugin/blob/kubernetes-1.28.4/src/main/java/org/csanchez/jenkins/plugins/kubernetes/KubernetesLauncher.java#L158-L163. Like if the provisionning of many agents was triggered at once. If such build storm is expected in the environment, then raising the request might be the best option.

          Allan BURDAJEWICZ added a comment - akloss what version of kubernetes plugin are you using ? Maybe shott85 can confirm but I think that those /pods?fieldSelector and /events?fieldSelector are watches that wait for pods to be ready and maybe connected: https://github.com/jenkinsci/kubernetes-plugin/blob/kubernetes-1.28.4/src/main/java/org/csanchez/jenkins/plugins/kubernetes/KubernetesLauncher.java#L158-L163 . Like if the provisionning of many agents was triggered at once. If such build storm is expected in the environment, then raising the request might be the best option.

          Alec added a comment -

          We're using 1.27.4.  It seems unfortunate that even though we have the build concurrency limit set to 50, we need more connections than that.  This is a new Jenkins building a handful of existing repositories, some of which have many branches.

          Alec added a comment - We're using 1.27.4.  It seems unfortunate that even though we have the build concurrency limit set to 50, we need more connections than that.  This is a new Jenkins building a handful of existing repositories, some of which have many branches.

            shott85 David Schott
            bibline He Bihong
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

              Created:
              Updated:
              Resolved: