Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-68322

sh command hangs, no errors, pod hangs. sporadic.

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Trivial Trivial
    • kubernetes-plugin
    • None
    • Jenkins: 2.342
      OS: Linux - 5.4.0-1060-aws
      Kubernetes plugin: latest

      Jenkins behind NGINX with websockets. kubernetes 1.21 on EKS

      2 nodes:

      Windows node - JNLP connects works fine running powershell in container and then it hangs if parallel linux container runs, Jenkins hangs and dies error 404

      Linux node - JNLP connects, SH command hangs, no error appear, sometimes jenkins hangs with error and fails. Sporadic. 50% works, sometimes it passes fine.

      POD not being destroyed at all if it hangs. Sometimes may appear error 500 sometimes NosuchMethodError.

      The problem started appearing after adding changes needed to have windows containers, changes where made to AWS Roles.

      Terminating Jenkins Job does not work - jenkins hangs until full reboot

       

      running ubuntu container after running SH and jenkins hangs

      Cannot contact aws-nuke-227-hqk39-nm7mx-28qkh: java.lang.InterruptedException

       

       

          [JENKINS-68322] sh command hangs, no errors, pod hangs. sporadic.

          george f added a comment -

          Ping failed. Terminating the channel aws-nuke-227-hqk39-nm7mx-28qkh. java.util.concurrent.TimeoutException: Ping started at 1650532290436 hasn't completed by 1650532530436 at hudson.remoting.PingThread.ping(PingThread.java:132) at hudson.remoting.PingThread.run(PingThread.java:88)
          Apr 21, 2022 9:20:30 AM WARNING jenkins.agents.WebSocketAgents$Session error
          null java.util.concurrent.TimeoutException: Idle timeout expired: 300001/300000 ms Caused: org.eclipse.jetty.websocket.api.CloseException at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection.onReadTimeout(AbstractWebSocketConnection.java:564) at org.eclipse.jetty.io.AbstractConnection.onFillInterestedFailed(AbstractConnection.java:172) at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection.onFillInterestedFailed(AbstractWebSocketConnection.java:539) at org.eclipse.jetty.io.AbstractConnection$ReadCallback.failed(AbstractConnection.java:317) at org.eclipse.jetty.io.FillInterest.onFail(FillInterest.java:140) at org.eclipse.jetty.io.AbstractEndPoint.onIdleExpired(AbstractEndPoint.java:407) at org.eclipse.jetty.io.IdleTimeout.checkIdleTimeout(IdleTimeout.java:171) at org.eclipse.jetty.io.IdleTimeout.idleCheck(IdleTimeout.java:113) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)

          george f added a comment - Ping failed. Terminating the channel aws-nuke-227-hqk39-nm7mx-28qkh. java.util.concurrent.TimeoutException: Ping started at 1650532290436 hasn't completed by 1650532530436 at hudson.remoting.PingThread.ping(PingThread.java:132) at hudson.remoting.PingThread.run(PingThread.java:88) Apr 21, 2022 9:20:30 AM WARNING jenkins.agents.WebSocketAgents$Session error null java.util.concurrent.TimeoutException: Idle timeout expired: 300001/300000 ms Caused: org.eclipse.jetty.websocket.api.CloseException at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection.onReadTimeout(AbstractWebSocketConnection.java:564) at org.eclipse.jetty.io.AbstractConnection.onFillInterestedFailed(AbstractConnection.java:172) at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection.onFillInterestedFailed(AbstractWebSocketConnection.java:539) at org.eclipse.jetty.io.AbstractConnection$ReadCallback.failed(AbstractConnection.java:317) at org.eclipse.jetty.io.FillInterest.onFail(FillInterest.java:140) at org.eclipse.jetty.io.AbstractEndPoint.onIdleExpired(AbstractEndPoint.java:407) at org.eclipse.jetty.io.IdleTimeout.checkIdleTimeout(IdleTimeout.java:171) at org.eclipse.jetty.io.IdleTimeout.idleCheck(IdleTimeout.java:113) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)

          george f added a comment -

          further investigation the SH stuck after several sh lines

          tested:

          echo this is test

          echo this is test

          echo this is test

          echo this is test

          echo this is test

           

          after 5 execution of echo the job is hang. 

          i think it hits timeout of ping somehow

           

          george f added a comment - further investigation the SH stuck after several sh lines tested: echo this is test echo this is test echo this is test echo this is test echo this is test   after 5 execution of echo the job is hang.  i think it hits timeout of ping somehow  

          george f added a comment - - edited

          the problem is solved as workaround but the bug still exist

          when adding io.fabric8.kubernetes logger and setting DEBUG ALL when starting sh (bash) in linux containers (only) commands are stuck, Jenkins fail totally with error 500 HTTP, ping check fails, pods are stuck and wont delete. Full Jenkins restart and removing  io.fabric8.kubernetes debug log solves the problem.

          george f added a comment - - edited the problem is solved as workaround but the bug still exist when adding io.fabric8.kubernetes logger and setting DEBUG ALL when starting sh (bash) in linux containers (only) commands are stuck, Jenkins fail totally with error 500 HTTP, ping check fails, pods are stuck and wont delete. Full Jenkins restart and removing  io.fabric8.kubernetes debug log solves the problem.

            Unassigned Unassigned
            george_f george f
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated: