Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-37575

Delays in FileMonitoringTask.WriteLog can cause process output to be resent indefinitely

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Fixed
    • Icon: Major Major
    • durable-task-plugin
    • Windows Server 2012 R2
      Jenkins 2.7.2
    • durable-task 1.23

      Problem

      Sometimes, after a certain amount of builds, the build never ends on the node.
      It sends forever the same 20-30 lines.
      It seems that the problem occurs more often when I restart Jenkins while there are some tasks running. But I'm not sure.
      It also seems that the problem is linked to a high load/network on the master/node.
      We can see the difference between the timestamper date (put when received by the master) and the log date (written during the Powershell execution)

      Some logs files can be bigger than 10GB before I kill the process.
      (Yes, it's really stored in the JENKINS_HOME)

      I think it's worth mentionning that I have 3 jenkins nodes on the same machine, and that my JENKINS-HOME is located on a network drive (CIFS/SMB).

      Investigation

      Steps

      I've found on Wireshark that the node keeps sending the same logs forever.
      So the jenkins master is not (directly) the culprit.
      After enabling the debugguer on the slave, I've found that the method FileMonitoringTask$FileMonitoringController$WriteLog.invoke is called in an infinite loop somewhere in this file:
      durable-task-plugin\src\main\java\org\jenkinsci\plugins\durabletask\FileMonitoringTask.java
      The same file is read again and again with a lastLocation of 1930670. lastLocation represent the bytes already read. But I don't understand why it doesn't increase.
      The process is terminated, the log file is not bigger than 3MB (and can be seen in the upper left corner of the screenshot)

      Update 1: It seems that Jenkins read the whole file. If it fails, it will return 0. I suspect that Jenkins is failing to close the file descriptor. So the lastLocation is not updated. But the data are sent. Jenkins retries to read the file, fail again, etc. That's only a supposition for now.

      Update 2: It seems that it comes from the network, as I've captured a java.io.InterruptedIOException in this loop in hudson.remoting.ProxyOutputStream.

      Update 3: It seems that the Jenkins Master is guilty. I've connected my debugger to this one. The error occurs when it tries to write the log in its JENKINS_HOME. When executing the green line on the following screenshot.

      The error is catched in DurableTaskStep$Execution.check, as it seems to be a workspace error. It seems that Jenkins doesn't find the workspace folder, as it's searching the jenkins node workspace in its local file system. C:\\ci\\int12
      ocoint... So it saves the log but interrupt the task, send to the slave that it has interrupted the task. The slave thinks that the logs has not been saved, and resend it to the master, which don't find the node workspace in it's local filesystem, etc.

      Update 4: It seems that when the master deserializes the response of the node, it returns a null object instead of the logs when it comes to Write Logs...

      Update 5: I've modified the code of durable-task-plugin to have more logs in the console

      // org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke
      @Override public Long invoke(File f, VirtualChannel channel) throws IOException, InterruptedException {
                      try {
                          long len = f.length();
                          if (len > lastLocation) {
                              RandomAccessFile raf = new RandomAccessFile(f, "r");
                              try {
                                  raf.seek(lastLocation);
                                  long toRead = len - lastLocation;
                                  if (toRead > Integer.MAX_VALUE) { // >2Gb of output at once is unlikely
                                      throw new IOException("large reads not yet implemented");
                                  }
                                  // TODO is this efficient for large amounts of output? Would it be better to stream data, or return a byte[] from the callable?
                                  byte[] buf = new byte[(int) toRead];
                                  raf.readFully(buf);
                                  sink.write(buf);
                              } finally {
                                  raf.close();
                              }
                              LOGGER.log(Level.SEVERE, "QDU WILL RETURN AS NEW CURSOR POSITION {0}", len);
                              return len;
                          } else {
                              LOGGER.log(Level.SEVERE, "QDU WILL RETURN NULL AND WILL HAVE TO BE REUPLOADED");
                              return null;
                          }
                      } catch(IOException e) {
                          LOGGER.log(Level.SEVERE, "QDU IOEXCEPTION {0}", e);
                          throw e;
                      } catch(Exception e) {
                          LOGGER.log(Level.SEVERE, "QDU UNKNOWN EXCEPTION {0}", e);
                      }
                      return null;
                  }
      

      On the Jenkins Node, I've the following logs:

      Aug 22, 2016 2:56:39 AM org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog invoke
      SEVERE: QDU WILL RETURN NULL AND WILL HAVE TO BE REUPLOADED
      Aug 22, 2016 2:56:41 AM org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog invoke
      SEVERE: QDU IOEXCEPTION {0}
      java.io.InterruptedIOException
      	at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:147)
      	at java.io.OutputStream.write(OutputStream.java:75)
      	at hudson.remoting.RemoteOutputStream.write(RemoteOutputStream.java:106)
      	at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:137)
      	at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:116)
      	at hudson.FilePath$FileCallableWrapper.call(FilePath.java:2772)
      	at hudson.remoting.UserRequest.perform(UserRequest.java:153)
      	at hudson.remoting.UserRequest.perform(UserRequest.java:50)
      	at hudson.remoting.Request$2.run(Request.java:332)
      	at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:68)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      	at hudson.remoting.Engine$1$1.run(Engine.java:85)
      	at java.lang.Thread.run(Thread.java:745)
      Caused by: java.lang.InterruptedException
      	at java.lang.Object.wait(Native Method)
      	at hudson.remoting.PipeWindow$Real.get(PipeWindow.java:209)
      	at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:122)
      	... 14 more
      
      Aug 22, 2016 2:56:52 AM org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog invoke
      SEVERE: QDU IOEXCEPTION {0}
      java.io.InterruptedIOException
      	at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:147)
      	at java.io.OutputStream.write(OutputStream.java:75)
      	at hudson.remoting.RemoteOutputStream.write(RemoteOutputStream.java:106)
      	at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:137)
      	at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:116)
      	at hudson.FilePath$FileCallableWrapper.call(FilePath.java:2772)
      	at hudson.remoting.UserRequest.perform(UserRequest.java:153)
      	at hudson.remoting.UserRequest.perform(UserRequest.java:50)
      	at hudson.remoting.Request$2.run(Request.java:332)
      	at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:68)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      	at hudson.remoting.Engine$1$1.run(Engine.java:85)
      	at java.lang.Thread.run(Thread.java:745)
      Caused by: java.lang.InterruptedException
      	at java.lang.Object.wait(Native Method)
      	at hudson.remoting.PipeWindow$Real.get(PipeWindow.java:209)
      	at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:122)
      	... 14 more
      
      Aug 22, 2016 2:56:54 AM org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog invoke
      SEVERE: QDU WILL RETURN NULL AND WILL HAVE TO BE REUPLOADED
      Aug 22, 2016 2:57:02 AM org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog invoke
      SEVERE: QDU IOEXCEPTION {0}
      java.io.InterruptedIOException
      	at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:147)
      	at java.io.OutputStream.write(OutputStream.java:75)
      	at hudson.remoting.RemoteOutputStream.write(RemoteOutputStream.java:106)
      	at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:137)
      	at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:116)
      	at hudson.FilePath$FileCallableWrapper.call(FilePath.java:2772)
      	at hudson.remoting.UserRequest.perform(UserRequest.java:153)
      	at hudson.remoting.UserRequest.perform(UserRequest.java:50)
      	at hudson.remoting.Request$2.run(Request.java:332)
      	at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:68)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      	at hudson.remoting.Engine$1$1.run(Engine.java:85)
      	at java.lang.Thread.run(Thread.java:745)
      Caused by: java.lang.InterruptedException
      	at java.lang.Object.wait(Native Method)
      	at hudson.remoting.PipeWindow$Real.get(PipeWindow.java:209)
      	at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:122)
      	... 14 more
      
      Aug 22, 2016 2:57:09 AM org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog invoke
      SEVERE: QDU WILL RETURN NULL AND WILL HAVE TO BE REUPLOADED
      Aug 22, 2016 2:57:12 AM org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog invoke
      SEVERE: QDU IOEXCEPTION {0}
      java.io.InterruptedIOException
      	at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:147)
      	at java.io.OutputStream.write(OutputStream.java:75)
      	at hudson.remoting.RemoteOutputStream.write(RemoteOutputStream.java:106)
      	at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:137)
      	at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:116)
      	at hudson.FilePath$FileCallableWrapper.call(FilePath.java:2772)
      	at hudson.remoting.UserRequest.perform(UserRequest.java:153)
      	at hudson.remoting.UserRequest.perform(UserRequest.java:50)
      	at hudson.remoting.Request$2.run(Request.java:332)
      	at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:68)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      	at hudson.remoting.Engine$1$1.run(Engine.java:85)
      	at java.lang.Thread.run(Thread.java:745)
      Caused by: java.lang.InterruptedException
      	at java.lang.Object.wait(Native Method)
      	at hudson.remoting.PipeWindow$Real.get(PipeWindow.java:209)
      	at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:122)
      	... 14 more
      
      Aug 22, 2016 2:57:22 AM org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog invoke
      SEVERE: QDU IOEXCEPTION {0}
      java.io.InterruptedIOException
      	at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:147)
      	at java.io.OutputStream.write(OutputStream.java:75)
      	at hudson.remoting.RemoteOutputStream.write(RemoteOutputStream.java:106)
      	at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:137)
      	at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:116)
      	at hudson.FilePath$FileCallableWrapper.call(FilePath.java:2772)
      	at hudson.remoting.UserRequest.perform(UserRequest.java:153)
      	at hudson.remoting.UserRequest.perform(UserRequest.java:50)
      	at hudson.remoting.Request$2.run(Request.java:332)
      	at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:68)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      	at hudson.remoting.Engine$1$1.run(Engine.java:85)
      	at java.lang.Thread.run(Thread.java:745)
      Caused by: java.lang.InterruptedException
      	at java.lang.Object.wait(Native Method)
      	at hudson.remoting.PipeWindow$Real.get(PipeWindow.java:209)
      	at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:122)
      	... 14 more
      
      Aug 22, 2016 2:57:24 AM org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog invoke
      SEVERE: QDU WILL RETURN NULL AND WILL HAVE TO BE REUPLOADED
      Aug 22, 2016 2:57:33 AM org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog invoke
      SEVERE: QDU IOEXCEPTION {0}
      java.io.InterruptedIOException
      	at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:147)
      	at java.io.OutputStream.write(OutputStream.java:75)
      	at hudson.remoting.RemoteOutputStream.write(RemoteOutputStream.java:106)
      	at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:137)
      	at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:116)
      	at hudson.FilePath$FileCallableWrapper.call(FilePath.java:2772)
      	at hudson.remoting.UserRequest.perform(UserRequest.java:153)
      	at hudson.remoting.UserRequest.perform(UserRequest.java:50)
      	at hudson.remoting.Request$2.run(Request.java:332)
      	at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:68)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      	at hudson.remoting.Engine$1$1.run(Engine.java:85)
      	at java.lang.Thread.run(Thread.java:745)
      Caused by: java.lang.InterruptedException
      	at java.lang.Object.wait(Native Method)
      	at hudson.remoting.PipeWindow$Real.get(PipeWindow.java:209)
      	at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:122)
      	... 14 more
      
      Aug 22, 2016 2:57:39 AM org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog invoke
      SEVERE: QDU WILL RETURN NULL AND WILL HAVE TO BE REUPLOADED
      Aug 22, 2016 2:57:43 AM org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog invoke
      SEVERE: QDU IOEXCEPTION {0}
      java.io.InterruptedIOException
      	at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:147)
      	at java.io.OutputStream.write(OutputStream.java:75)
      	at hudson.remoting.RemoteOutputStream.write(RemoteOutputStream.java:106)
      	at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:137)
      	at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:116)
      	at hudson.FilePath$FileCallableWrapper.call(FilePath.java:2772)
      	at hudson.remoting.UserRequest.perform(UserRequest.java:153)
      	at hudson.remoting.UserRequest.perform(UserRequest.java:50)
      	at hudson.remoting.Request$2.run(Request.java:332)
      	at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:68)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      	at hudson.remoting.Engine$1$1.run(Engine.java:85)
      	at java.lang.Thread.run(Thread.java:745)
      Caused by: java.lang.InterruptedException
      	at java.lang.Object.wait(Native Method)
      	at hudson.remoting.PipeWindow$Real.get(PipeWindow.java:209)
      	at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:122)
      	... 14 more
      
      Aug 22, 2016 2:57:53 AM org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog invoke
      SEVERE: QDU IOEXCEPTION {0}
      java.io.InterruptedIOException
      	at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:147)
      	at java.io.OutputStream.write(OutputStream.java:75)
      	at hudson.remoting.RemoteOutputStream.write(RemoteOutputStream.java:106)
      	at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:137)
      	at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:116)
      	at hudson.FilePath$FileCallableWrapper.call(FilePath.java:2772)
      	at hudson.remoting.UserRequest.perform(UserRequest.java:153)
      	at hudson.remoting.UserRequest.perform(UserRequest.java:50)
      	at hudson.remoting.Request$2.run(Request.java:332)
      	at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:68)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      	at hudson.remoting.Engine$1$1.run(Engine.java:85)
      	at java.lang.Thread.run(Thread.java:745)
      Caused by: java.lang.InterruptedException
      	at java.lang.Object.wait(Native Method)
      	at hudson.remoting.PipeWindow$Real.get(PipeWindow.java:209)
      	at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:122)
      	... 14 more
      

      How to reproduce

      I've tried to reproduce this bug for 3 days but didn't achieve to.
      I had to put my production in debug to inspect the error.
      As I don't understand why it fails, I can't create a reproduce protocol yet.
      I hope that someone google this error with the same keywords and post a comment here to mention that he has the same error and more information

        1. Capture d'écran de 2016-08-20 16-35-18.png
          286 kB
          Quentin Dufour
        2. Capture d'écran de 2016-08-20 17-04-34.png
          157 kB
          Quentin Dufour
        3. Capture d'écran de 2016-08-20 19-02-41.png
          137 kB
          Quentin Dufour
        4. Capture d'écran de 2016-08-20 19-35-13.png
          139 kB
          Quentin Dufour
        5. Capture d'écran de 2016-08-20 19-35-36.png
          141 kB
          Quentin Dufour
        6. Capture d'écran de 2016-08-20 23-41-53.png
          133 kB
          Quentin Dufour
        7. jekins_10GB_log.png
          36 kB
          Quentin Dufour

            jglick Jesse Glick
            superboum Quentin Dufour
            Votes:
            13 Vote for this issue
            Watchers:
            27 Start watching this issue

              Created:
              Updated:
              Resolved: