Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-54603

Memory leak in remoting causes Jenkins to crash

    • Icon: Bug Bug
    • Resolution: Not A Defect
    • Icon: Major Major
    • remoting
    • None

      Some of our jobs rely on an external slave. This has been working for a while without any issues. Recently, the number of jobs that run daily has been increased. This is when our problems started. After a while Jenkins has consumed all the memory available to the VM, and it locks up as a result.

      The log is full of this exception:

      Nov 12 17:03:53 mfa-sp.microflown.loc jenkins[46451]: SEVERE: This command is created here
      Nov 12 17:03:53 mfa-sp.microflown.loc jenkins[46451]: Nov 12, 2018 5:03:53 PM hudson.remoting.Channel$1 handle
      Nov 12 17:03:53 mfa-sp.microflown.loc jenkins[46451]: SEVERE: Failed to execute command Pipe.Flush(-1) (channel PLTSTSRV001)
      Nov 12 17:03:53 mfa-sp.microflown.loc jenkins[46451]: java.util.concurrent.ExecutionException: Invalid object ID -1 iota=1723
      Nov 12 17:03:53 mfa-sp.microflown.loc jenkins[46451]: at hudson.remoting.ExportTable.diagnoseInvalidObjectId(ExportTable.java:478)
      Nov 12 17:03:53 mfa-sp.microflown.loc jenkins[46451]: at hudson.remoting.ExportTable.get(ExportTable.java:397)
      Nov 12 17:03:53 mfa-sp.microflown.loc jenkins[46451]: at hudson.remoting.Channel.getExportedObject(Channel.java:780)
      Nov 12 17:03:53 mfa-sp.microflown.loc jenkins[46451]: at hudson.remoting.ProxyOutputStream$Flush.execute(ProxyOutputStream.java:307)
      Nov 12 17:03:53 mfa-sp.microflown.loc jenkins[46451]: at hudson.remoting.Channel$1.handle(Channel.java:565)
      Nov 12 17:03:53 mfa-sp.microflown.loc jenkins[46451]: at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:85)
      Nov 12 17:03:53 mfa-sp.microflown.loc jenkins[46451]: Caused by: java.lang.Exception: Object appears to be deallocated at lease before Mon Nov 12 16:39:51 CET 2018
      Nov 12 17:03:53 mfa-sp.microflown.loc jenkins[46451]: at hudson.remoting.ExportTable.diagnoseInvalidObjectId(ExportTable.java:474)
      Nov 12 17:03:53 mfa-sp.microflown.loc jenkins[46451]: ... 5 more

      I think this is what's causing Jenkins to leak.

        1. heap-histogram.txt
          15 kB
        2. Jenkins Job Output
          27 kB
        3. Pipeline.txt
          7 kB

          [JENKINS-54603] Memory leak in remoting causes Jenkins to crash

          Jesse Glick added a comment -

          Yeah the flush error would be JENKINS-54566 . I see no reason to think that would have any relationship to a memory leak.

          Jesse Glick added a comment - Yeah the flush error would be JENKINS-54566 . I see no reason to think that would have any relationship to a memory leak.

          Jeff Thompson added a comment -

          smokeythebandit, we don't believe these flush error messages you're seeing are related to the failures you're experiencing. There's a PR to clean up the log messages a little bit: https://github.com/jenkinsci/remoting/pull/308/files . Can you provide more information about your out-of-memory issues or should we close this report out?

          Jeff Thompson added a comment - smokeythebandit , we don't believe these flush error messages you're seeing are related to the failures you're experiencing. There's a PR to clean up the log messages a little bit: https://github.com/jenkinsci/remoting/pull/308/files  . Can you provide more information about your out-of-memory issues or should we close this report out?

          Jesse Glick added a comment -

          jthompson the remoting PR was just a side fix. The main fix for the Pipe.Flush error is in the workflow-api plugin, under review, as linked from JENKINS-54566.

          Jesse Glick added a comment - jthompson the remoting PR was just a side fix. The main fix for the Pipe.Flush error is in the workflow-api plugin, under review, as linked from JENKINS-54566 .

          Jeff Thompson added a comment -

          Oh, I missed that separation, jglick. Thanks for clarifying. Does that workflow-api plugin issue have anything to do with this reported memory leak?

          Jeff Thompson added a comment - Oh, I missed that separation, jglick . Thanks for clarifying. Does that workflow-api plugin issue have anything to do with this reported memory leak?

          Jesse Glick added a comment -

          I cannot speculate about any relationship to a memory leak, since we have no diagnostics for that. The workflow-api plugin patch fixes (or purports to fix) the Failed to execute command Pipe.Flush error.

          Jesse Glick added a comment - I cannot speculate about any relationship to a memory leak, since we have no diagnostics for that. The workflow-api plugin patch fixes (or purports to fix) the Failed to execute command Pipe.Flush error.

          Benjamin Martens added a comment - - edited

          Hey guys thank you for your response! I've upgraded the 'Pipeline: API' plugin from version 2.32 to 2.33. Usually the server crashes within 24 hours of its last restart, I will keep monitoring it and see if the issue is resolved.

          jthompson It is the master that runs out of memory. I cannot really pin down an event that is causing it to run out of memory. I did notice that when I increased the memory allocated to the VM from 1024MB to 2048MB it took longer for the server to crash, confirming that its probably a memory leak.

          I've attached the log of one of the jobs. I had to remove some of the output because it contains sensitive information. The information I removed is generated by a python script that runs automated tests for our software.

          Included are the pipeline script and the output it generated for the job that I suspect is causing this issue.

          Jenkins Job Output

          Pipeline.txt

          Benjamin Martens added a comment - - edited Hey guys thank you for your response! I've upgraded the 'Pipeline: API' plugin from version 2.32 to 2.33. Usually the server crashes within 24 hours of its last restart, I will keep monitoring it and see if the issue is resolved. jthompson It is the master that runs out of memory. I cannot really pin down an event that is causing it to run out of memory. I did notice that when I increased the memory allocated to the VM from 1024MB to 2048MB it took longer for the server to crash, confirming that its probably a memory leak. I've attached the log of one of the jobs. I had to remove some of the output because it contains sensitive information. The information I removed is generated by a python script that runs automated tests for our software. Included are the pipeline script and the output it generated for the job that I suspect is causing this issue. Jenkins Job Output Pipeline.txt

          Jesse Glick added a comment -

          smokeythebandit the build log is unlikely to be useful in diagnosing a memory leak. The bare minimum would be a heap histogram. This is most easily collected by installing the Support Core plugin, then selecting the Master Heap Histogram diagnostic when getting a Support bundle. You can attach that individual nodes/master/heap-histogram.txt, or send the bundle to one of us privately, or select the system option to anonymize support bundles and then attach the whole bundle here (always best to give the contents a manual review to look for sensitive information).

          Jesse Glick added a comment - smokeythebandit the build log is unlikely to be useful in diagnosing a memory leak. The bare minimum would be a heap histogram. This is most easily collected by installing the Support Core plugin, then selecting the Master Heap Histogram diagnostic when getting a Support bundle. You can attach that individual nodes/master/heap-histogram.txt , or send the bundle to one of us privately, or select the system option to anonymize support bundles and then attach the whole bundle here (always best to give the contents a manual review to look for sensitive information).

          Benjamin Martens added a comment - - edited

          http://example.com/I installed this plugin and ran a couple of jobs. I'm not sure how to interpret the heap histogram, but I'm starting to suspect this plugin: https://wiki.jenkins.io/display/JENKINS/Test+Results+Analyzer+Plugin

          Anyway, here is the heap histogram.

          heap-histogram.txt

          Benjamin Martens added a comment - - edited http://example.com/ I installed this plugin and ran a couple of jobs. I'm not sure how to interpret the heap histogram, but I'm starting to suspect this plugin:  https://wiki.jenkins.io/display/JENKINS/Test+Results+Analyzer+Plugin Anyway, here is the heap histogram. heap-histogram.txt

          Jesse Glick added a comment -

          Indeed. If this plugin is not critical for your workflow, try disabling it for a while.

          Jesse Glick added a comment - Indeed. If this plugin is not critical for your workflow, try disabling it for a while.

          It is important for some developers in our company. I think it's best that they open a new ticket and figure it out with the maintainer of the plugin.

          jthompson jglick Thank you for your help!

           

          Benjamin Martens added a comment - It is important for some developers in our company. I think it's best that they open a new ticket and figure it out with the maintainer of the plugin. jthompson jglick Thank you for your help!  

            smokeythebandit Benjamin Martens
            smokeythebandit Benjamin Martens
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: