Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-44483

Large console logs cause infinite loops in slave

    • Icon: Bug Bug
    • Resolution: Duplicate
    • Icon: Major Major
    • pipeline, remoting
    • We run Jenkins using the official docker image version 2.46.2, docker version 17.03.1-ce, Amazon EC2 plugin 1.36, Ubuntu 14.04, Oracle java version 1.8.0_131 for the slave.jar process

      I have just been investigating a problem in our jenkins setup that I think might be related to JENKINS-25218. We're using the EC2 plugin and running builds that generate quite large logs (230 MB). At some point during the build, the master loses track of the log and just starts logging the same block of text from the log over and over as long as I let it. The build completes successfully on the slave and nothing bad appears in the Node log in the jenkins UI. But the master continues to fill up the filesystem with the same repeated text forever. I changed the build to log much less and now this isn't happening. We're running 2.46.2. Could this potentially be one of the edge cases?

          [JENKINS-44483] Large console logs cause infinite loops in slave

          We do use pipeline. Another variable that might be in play is that we were using an EFS volume for jenkins home. We've since migrated away to using EBS. We were having pretty typical NFS type problems with the master getting hung up with super high load avg yet using no cpu and high network bandwidth.

          Since we reduced the log verbosity we haven't had the problem (even before we switched off EFS). I didn't see anything in the system logs when it happened. The thread dump wasn't the same as 25218. It really appeared to be a livelock situation the threads weren't stuck outright. I'll try and reproduce and take some thread dumps.

          Chris Phillips added a comment - We do use pipeline. Another variable that might be in play is that we were using an EFS volume for jenkins home. We've since migrated away to using EBS. We were having pretty typical NFS type problems with the master getting hung up with super high load avg yet using no cpu and high network bandwidth. Since we reduced the log verbosity we haven't had the problem (even before we switched off EFS). I didn't see anything in the system logs when it happened. The thread dump wasn't the same as 25218. It really appeared to be a livelock situation the threads weren't stuck outright. I'll try and reproduce and take some thread dumps.

          Oleg Nenashev added a comment -

          Pipeline has it's own log collection logic, hence I am not sure it is a completely remoting issue. CC jglick

          Oleg Nenashev added a comment - Pipeline has it's own log collection logic, hence I am not sure it is a completely remoting issue. CC jglick

          Jesse Glick added a comment -

          Probably a dupe of JENKINS-37575. Will be obsolete as soon as I can merge JENKINS-38381.

          Jesse Glick added a comment - Probably a dupe of  JENKINS-37575 . Will be obsolete as soon as I can merge  JENKINS-38381 .

           I was able to recreate the issue just now by flipping the verbosity of the build back higher.  Nothing blocked in the thread dumps.  

          I'll keep an eye out for resolution of JENKINS-38381 and try again then. It sounds promising.

          Chris Phillips added a comment -  I was able to recreate the issue just now by flipping the verbosity of the build back higher.  Nothing blocked in the thread dumps.   I'll keep an eye out for resolution of JENKINS-38381 and try again then. It sounds promising.

          George Davis added a comment -

          Hi, Is there an update if this issue was fixed along with the release of JENKINS-38381?. Could you confirm with the release / build number? I am still seeing large log files on our Jenkins master for Pipeline jobs.

          George Davis added a comment - Hi, Is there an update if this issue was fixed along with the release of JENKINS-38381 ?. Could you confirm with the release / build number? I am still seeing large log files on our Jenkins master for Pipeline jobs.

          I too get this issue. Our pipeline logs will be more than 250 mb. This happens for slaves which we connect to a different domain than master. Same jenkinsfile works on slaves within same domain as master.

          sudhakar natarajan added a comment - I too get this issue. Our pipeline logs will be more than 250 mb. This happens for slaves which we connect to a different domain than master. Same jenkinsfile works on slaves within same domain as master.

          Jesse Glick added a comment -

          Likely a dupe.

          Jesse Glick added a comment - Likely a dupe.

          mishal shah added a comment -

          is there any workaround to this issue? 

          mishal shah added a comment - is there any workaround to this issue? 

          Hi all,

          please help me with another occurrence of this issue: I've just moved a C.I. environment from Jenkins 2.204.2, remoting 3.36.1, and slaves launched manually via JNLP, to a new environment with Jenkins 2.249.1, remoting 4.5, and slaves launched using the EC2 plugin.

          Our jobs run on both Windows and Linux slaves. Since we moved to the new environment, we are almost regularly experiencing the very same issue on Windows slaves: endless jobs that collect several GB of repeated output from the slave. Going to the slave box, the job has completed all the stuff to do.

          Any idea why we are experiencing the same issue using the latest Jenkins LTS and remoting 4.5?

          Thanks

          Regards,

          Paolo

          Paolo Parlapiano added a comment - Hi all, please help me with another occurrence of this issue: I've just moved a C.I. environment from Jenkins 2.204.2, remoting 3.36.1, and slaves launched manually via JNLP, to a new environment with Jenkins 2.249.1, remoting 4.5, and slaves launched using the EC2 plugin. Our jobs run on both Windows and Linux slaves. Since we moved to the new environment, we are almost regularly experiencing the very same issue on Windows slaves: endless jobs that collect several GB of repeated output from the slave. Going to the slave box, the job has completed all the stuff to do. Any idea why we are experiencing the same issue using the latest Jenkins LTS and remoting 4.5? Thanks Regards, Paolo

          Zbynek Konecny added a comment - - edited

          jglick I get the same symptoms with a Mac agent (ssh). One thread on the controller has this stacktrace

          org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep [#5296]: checking /Users/Shared/Jenkins/Agent/workspace/iOS-UiTest/dev/ios on ios-test / waiting for ios-test id=352763
          java.lang.Object.wait(Native Method)
          hudson.remoting.Request.call(Request.java:177)
          hudson.remoting.Channel.call(Channel.java:1000)
          hudson.FilePath.act(FilePath.java:1164)
          hudson.FilePath.act(FilePath.java:1153)
          hudson.FilePath.isDirectory(FilePath.java:1778)
          org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution.getWorkspace(DurableTaskStep.java:383)
          org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution.check(DurableTaskStep.java:566)
          org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution.run(DurableTaskStep.java:549)
          java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
          java.util.concurrent.FutureTask.run(FutureTask.java:266)
          java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
          java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
          java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
          java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
          java.lang.Thread.run(Thread.java:748)
          

          Using latest weekly, all plugins updated to latest versions. Any ideas how to debug this further?

          Zbynek Konecny added a comment - - edited jglick I get the same symptoms with a Mac agent (ssh). One thread on the controller has this stacktrace org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep [#5296]: checking /Users/Shared/Jenkins/Agent/workspace/iOS-UiTest/dev/ios on ios-test / waiting for ios-test id=352763 java.lang. Object .wait(Native Method) hudson.remoting.Request.call(Request.java:177) hudson.remoting.Channel.call(Channel.java:1000) hudson.FilePath.act(FilePath.java:1164) hudson.FilePath.act(FilePath.java:1153) hudson.FilePath.isDirectory(FilePath.java:1778) org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution.getWorkspace(DurableTaskStep.java:383) org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution.check(DurableTaskStep.java:566) org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution.run(DurableTaskStep.java:549) java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) java.util.concurrent.FutureTask.run(FutureTask.java:266) java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) java.lang. Thread .run( Thread .java:748) Using latest weekly, all plugins updated to latest versions. Any ideas how to debug this further?

            Unassigned Unassigned
            chris_phillips Chris Phillips
            Votes:
            4 Vote for this issue
            Watchers:
            17 Start watching this issue

              Created:
              Updated:
              Resolved: