Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-68322

sh command hangs, no errors, pod hangs. sporadic.

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Trivial Trivial
    • kubernetes-plugin
    • None
    • Jenkins: 2.342
      OS: Linux - 5.4.0-1060-aws
      Kubernetes plugin: latest

      Jenkins behind NGINX with websockets. kubernetes 1.21 on EKS

      2 nodes:

      Windows node - JNLP connects works fine running powershell in container and then it hangs if parallel linux container runs, Jenkins hangs and dies error 404

      Linux node - JNLP connects, SH command hangs, no error appear, sometimes jenkins hangs with error and fails. Sporadic. 50% works, sometimes it passes fine.

      POD not being destroyed at all if it hangs. Sometimes may appear error 500 sometimes NosuchMethodError.

      The problem started appearing after adding changes needed to have windows containers, changes where made to AWS Roles.

      Terminating Jenkins Job does not work - jenkins hangs until full reboot

       

      running ubuntu container after running SH and jenkins hangs

      Cannot contact aws-nuke-227-hqk39-nm7mx-28qkh: java.lang.InterruptedException

       

       

          [JENKINS-68322] sh command hangs, no errors, pod hangs. sporadic.

          george f created issue -
          george f made changes -
          Summary Original: sh command hangs, no erros, pod hangs. sporadic. New: sh command hangs, no errors, pod hangs. sporadic.
          george f made changes -
          Description Original: Jenkins behind NGINX with websockets. kubernetes 1.21 on EKS

          2 nodes:

          Windows node - JNLP connects works fine running powershell in container 

          Linux node - JNLP connects, SH command hangs, no error appear, sometimes jenkins hangs with error and fails. Sporadic. 50% works, sometimes it passes fine.

          POD not being destroyed at all if it hangs. Sometimes may appear error 500 sometimes NosuchMethodError.

           

           
          New: Jenkins behind NGINX with websockets. kubernetes 1.21 on EKS

          2 nodes:

          Windows node - JNLP connects works fine running powershell in container 

          Linux node - JNLP connects, SH command hangs, no error appear, sometimes jenkins hangs with error and fails. Sporadic. 50% works, sometimes it passes fine.

          POD not being destroyed at all if it hangs. Sometimes may appear error 500 sometimes NosuchMethodError.

          The problem started appearing after adding changes needed to have windows containers, changes where made to AWS Roles.

           

           
          george f made changes -
          Description Original: Jenkins behind NGINX with websockets. kubernetes 1.21 on EKS

          2 nodes:

          Windows node - JNLP connects works fine running powershell in container 

          Linux node - JNLP connects, SH command hangs, no error appear, sometimes jenkins hangs with error and fails. Sporadic. 50% works, sometimes it passes fine.

          POD not being destroyed at all if it hangs. Sometimes may appear error 500 sometimes NosuchMethodError.

          The problem started appearing after adding changes needed to have windows containers, changes where made to AWS Roles.

           

           
          New: Jenkins behind NGINX with websockets. kubernetes 1.21 on EKS

          2 nodes:

          Windows node - JNLP connects works fine running powershell in container 

          Linux node - JNLP connects, SH command hangs, no error appear, sometimes jenkins hangs with error and fails. Sporadic. 50% works, sometimes it passes fine.

          POD not being destroyed at all if it hangs. Sometimes may appear error 500 sometimes NosuchMethodError.

          The problem started appearing after adding changes needed to have windows containers, changes where made to AWS Roles.

          Terminating Jenkins Job does not work - jenkins hangs until full reboot

           

           
          george f made changes -
          Description Original: Jenkins behind NGINX with websockets. kubernetes 1.21 on EKS

          2 nodes:

          Windows node - JNLP connects works fine running powershell in container 

          Linux node - JNLP connects, SH command hangs, no error appear, sometimes jenkins hangs with error and fails. Sporadic. 50% works, sometimes it passes fine.

          POD not being destroyed at all if it hangs. Sometimes may appear error 500 sometimes NosuchMethodError.

          The problem started appearing after adding changes needed to have windows containers, changes where made to AWS Roles.

          Terminating Jenkins Job does not work - jenkins hangs until full reboot

           

           
          New: Jenkins behind NGINX with websockets. kubernetes 1.21 on EKS

          2 nodes:

          Windows node - JNLP connects works fine running powershell in container and then it hangs if parallel linux container runs, Jenkins hangs and dies error 404

          Linux node - JNLP connects, SH command hangs, no error appear, sometimes jenkins hangs with error and fails. Sporadic. 50% works, sometimes it passes fine.

          POD not being destroyed at all if it hangs. Sometimes may appear error 500 sometimes NosuchMethodError.

          The problem started appearing after adding changes needed to have windows containers, changes where made to AWS Roles.

          Terminating Jenkins Job does not work - jenkins hangs until full reboot

           

           
          george f made changes -
          Description Original: Jenkins behind NGINX with websockets. kubernetes 1.21 on EKS

          2 nodes:

          Windows node - JNLP connects works fine running powershell in container and then it hangs if parallel linux container runs, Jenkins hangs and dies error 404

          Linux node - JNLP connects, SH command hangs, no error appear, sometimes jenkins hangs with error and fails. Sporadic. 50% works, sometimes it passes fine.

          POD not being destroyed at all if it hangs. Sometimes may appear error 500 sometimes NosuchMethodError.

          The problem started appearing after adding changes needed to have windows containers, changes where made to AWS Roles.

          Terminating Jenkins Job does not work - jenkins hangs until full reboot

           

           
          New: Jenkins behind NGINX with websockets. kubernetes 1.21 on EKS

          2 nodes:

          Windows node - JNLP connects works fine running powershell in container and then it hangs if parallel linux container runs, Jenkins hangs and dies error 404

          Linux node - JNLP connects, SH command hangs, no error appear, sometimes jenkins hangs with error and fails. Sporadic. 50% works, sometimes it passes fine.

          POD not being destroyed at all if it hangs. Sometimes may appear error 500 sometimes NosuchMethodError.

          The problem started appearing after adding changes needed to have windows containers, changes where made to AWS Roles.

          Terminating Jenkins Job does not work - jenkins hangs until full reboot

           

          running ubuntu container after running SH and jenkins hangs

          Cannot contact aws-nuke-227-hqk39-nm7mx-28qkh: java.lang.InterruptedException

           

           

          george f added a comment -

          Ping failed. Terminating the channel aws-nuke-227-hqk39-nm7mx-28qkh. java.util.concurrent.TimeoutException: Ping started at 1650532290436 hasn't completed by 1650532530436 at hudson.remoting.PingThread.ping(PingThread.java:132) at hudson.remoting.PingThread.run(PingThread.java:88)
          Apr 21, 2022 9:20:30 AM WARNING jenkins.agents.WebSocketAgents$Session error
          null java.util.concurrent.TimeoutException: Idle timeout expired: 300001/300000 ms Caused: org.eclipse.jetty.websocket.api.CloseException at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection.onReadTimeout(AbstractWebSocketConnection.java:564) at org.eclipse.jetty.io.AbstractConnection.onFillInterestedFailed(AbstractConnection.java:172) at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection.onFillInterestedFailed(AbstractWebSocketConnection.java:539) at org.eclipse.jetty.io.AbstractConnection$ReadCallback.failed(AbstractConnection.java:317) at org.eclipse.jetty.io.FillInterest.onFail(FillInterest.java:140) at org.eclipse.jetty.io.AbstractEndPoint.onIdleExpired(AbstractEndPoint.java:407) at org.eclipse.jetty.io.IdleTimeout.checkIdleTimeout(IdleTimeout.java:171) at org.eclipse.jetty.io.IdleTimeout.idleCheck(IdleTimeout.java:113) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)

          george f added a comment - Ping failed. Terminating the channel aws-nuke-227-hqk39-nm7mx-28qkh. java.util.concurrent.TimeoutException: Ping started at 1650532290436 hasn't completed by 1650532530436 at hudson.remoting.PingThread.ping(PingThread.java:132) at hudson.remoting.PingThread.run(PingThread.java:88) Apr 21, 2022 9:20:30 AM WARNING jenkins.agents.WebSocketAgents$Session error null java.util.concurrent.TimeoutException: Idle timeout expired: 300001/300000 ms Caused: org.eclipse.jetty.websocket.api.CloseException at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection.onReadTimeout(AbstractWebSocketConnection.java:564) at org.eclipse.jetty.io.AbstractConnection.onFillInterestedFailed(AbstractConnection.java:172) at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection.onFillInterestedFailed(AbstractWebSocketConnection.java:539) at org.eclipse.jetty.io.AbstractConnection$ReadCallback.failed(AbstractConnection.java:317) at org.eclipse.jetty.io.FillInterest.onFail(FillInterest.java:140) at org.eclipse.jetty.io.AbstractEndPoint.onIdleExpired(AbstractEndPoint.java:407) at org.eclipse.jetty.io.IdleTimeout.checkIdleTimeout(IdleTimeout.java:171) at org.eclipse.jetty.io.IdleTimeout.idleCheck(IdleTimeout.java:113) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)

          george f added a comment -

          further investigation the SH stuck after several sh lines

          tested:

          echo this is test

          echo this is test

          echo this is test

          echo this is test

          echo this is test

           

          after 5 execution of echo the job is hang. 

          i think it hits timeout of ping somehow

           

          george f added a comment - further investigation the SH stuck after several sh lines tested: echo this is test echo this is test echo this is test echo this is test echo this is test   after 5 execution of echo the job is hang.  i think it hits timeout of ping somehow  
          george f made changes -
          Priority Original: Major [ 3 ] New: Trivial [ 5 ]
          george f made changes -
          Comment [ after reading this article i think i managed to solve it, 5 second (default Jenkins) timeout for EKS API is too short, i extended everything to 120 seconds and seems it works now.

          [https://support.cloudbees.com/hc/en-us/articles/360054642231-Considerations-for-Kubernetes-Clients-Connections-when-using-Kubernetes-Plugin] ]

            Unassigned Unassigned
            george_f george f
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated: