• Icon: Bug Bug
    • Resolution: Incomplete
    • Icon: Critical Critical
    • core
    • Linux Server with 24 CPUs and 64GB RAM
      Jenkins version LTS 1.509.4/1.532.1 on Jetty
      Memory allocated for Jenkins/Jetty process: 42GB
      Environment: Jenkins working with 600 jobs with high activities + 40 slave machines (Linux and Windows)

      After Jenkins upgraded to LTS 1.509.4 from LTS 1.509.3 I notice that over time (24 hours) Jenkins becomes very slowly.
      It turns out that Jenkins (under Jetty service) slowly "eats" the server memory. It's takes about 24 hours to take all the memory allocated to Jenkins (42GB). See the snapshots with examples...

      TEST #1 on LTS 1.509.4:
      1. Machine with Jetty up after restart
      2. Jenkins Used - After one Hour: 22GB
      3. Jenkins Used - After 12 Hours: 27GB
      4. Jenkins Used - After 20 Hours: 35GB -> Memory leaks between 10:00-10:20 as you can see after GC it's still think Java in-use and fail to cleanup the all memory as it should be.
      5. Jenkins Used - After 23 Hours: 39GB -> Very slow response and Heap is almost 100%

      TEST #2 on LTS 1.509.4:
      I tried to do manual GC, Doesn't help!
      (see attach file: "Monitor_Memory_Over_Time_Manual_GC")

      TEST #3 on LTS 1.509.3:
      Unfortunately I downgrade to LTS 1.509.3 because of the memory leak, for me it's a blocker issue!

      Please note that on version LTS 1.509.3 Jenkins works stable even on high environment without any memory leak... (See attach files: "Good_GC_A1.509.3" and "Good_GC_B1.509.3") but unfortunately there is a BIG unsolved problem/bug in this version, I can't rename jobs (Deadlock! which solved on the next version LTS 1.509.4/1.532.1 that I can't use because of the memory leak).

      TEST #4 with LTS 1.532.1:
      Same issue! Jenkins stuck with 100% memory usage after only 12 hours!

      Thank You,
      Ronen.

        1. gc.log
          15 kB
        2. Good_GC_A1.509.3.JPG
          Good_GC_A1.509.3.JPG
          268 kB
        3. Good_GC_B1.509.3.jpg
          Good_GC_B1.509.3.jpg
          322 kB
        4. Monitor_After_1_Hour.jpg
          Monitor_After_1_Hour.jpg
          238 kB
        5. Monitor_After_12_Hours.jpg
          Monitor_After_12_Hours.jpg
          224 kB
        6. Monitor_After_20_Hours.jpg
          Monitor_After_20_Hours.jpg
          321 kB
        7. Monitor_After_23_Hours.JPG
          Monitor_After_23_Hours.JPG
          253 kB
        8. Monitor_Memory_Over_Time_Manual_GC.jpg
          Monitor_Memory_Over_Time_Manual_GC.jpg
          572 kB

          [JENKINS-20620] Memory Leak on Jenkins LTS 1.509.4/1.532.1

          Oleg Nenashev added a comment -

          Could you collect a heap dump / any other memory statistics of your system?
          We use 1.509.4 with a quite similar configuration, but we have not experienced such memory leaks (the uptime is close to 3 months). There are many other issues, but seems that the memory is OK...

          Oleg Nenashev added a comment - Could you collect a heap dump / any other memory statistics of your system? We use 1.509.4 with a quite similar configuration, but we have not experienced such memory leaks (the uptime is close to 3 months). There are many other issues, but seems that the memory is OK...

          Ronen Peleg added a comment -

          Thanks Oleg.

          I can't dump it because it's to big (40GB) and for some reason the "jmap" unable to connect to the Jenkins process. Anyway I downgraded the version to 1.537...

          Oleg, If you have a system with 600 jobs with high activities + 40 slave machines online and you don't have this issue it's very strange because I test these versions on several different machines and each time the problem reproduced (after 12-24 hours).

          Ronen Peleg added a comment - Thanks Oleg. I can't dump it because it's to big (40GB) and for some reason the "jmap" unable to connect to the Jenkins process. Anyway I downgraded the version to 1.537... Oleg, If you have a system with 600 jobs with high activities + 40 slave machines online and you don't have this issue it's very strange because I test these versions on several different machines and each time the problem reproduced (after 12-24 hours).

          Oleg Nenashev added a comment -

          Probably, we have different job contents/plugins (e.g. we don't use Maven plugin; xUnit components are not popular due to integration with external systems).
          Could you provide a list of your plugins? Since we know about stable/unstable versions, we can try to find an interoperability issue if it exists.

          Oleg Nenashev added a comment - Probably, we have different job contents/plugins (e.g. we don't use Maven plugin; xUnit components are not popular due to integration with external systems). Could you provide a list of your plugins? Since we know about stable/unstable versions, we can try to find an interoperability issue if it exists.

          Ronen Peleg added a comment -

          Oleg, I sent you email with our Jenkins Plugins list. Thank You

          Ronen Peleg added a comment - Oleg, I sent you email with our Jenkins Plugins list. Thank You

          BTW, Ronen, do you use System Groovy script buildsteps in your jobs?

          Nickolay Rumyantsev added a comment - BTW, Ronen, do you use System Groovy script buildsteps in your jobs?

          Ronen Peleg added a comment -

          Hi Nickolay, Yes we have a Groovy scripts on our jobs.

          Ronen Peleg added a comment - Hi Nickolay, Yes we have a Groovy scripts on our jobs.

          Ronen Peleg added a comment -

          Update:
          The solution was to delete some slave machines from the Jenkins nodes.
          It turns out that Jenkins can't handle more than 100 slave machines.
          Currently we have (after cleanup) 70 slave machines and no memory leak!

          BTW: The memory leak issue occurs only on Jenkins Master running on Linux O/S with more than 100 slave machines, Actually on Windows O/S this issue doesn't exist!

          Ronen Peleg added a comment - Update: The solution was to delete some slave machines from the Jenkins nodes. It turns out that Jenkins can't handle more than 100 slave machines. Currently we have (after cleanup) 70 slave machines and no memory leak! BTW: The memory leak issue occurs only on Jenkins Master running on Linux O/S with more than 100 slave machines, Actually on Windows O/S this issue doesn't exist!

          Oleg Nenashev added a comment -

          I've tested Jenkins 1.509.4(patched by remoting-2.36)/RHEL6.4 with about 150 slaves.
          There's no memory leak after 1 week. The test installation just builds several Jenkins plugins, hence there's no extremal load

          Probably, the error could be in the communication layer. I'll try the remoting version from 1.532.1 with a bigger workload

          Oleg Nenashev added a comment - I've tested Jenkins 1.509.4(patched by remoting-2.36)/RHEL6.4 with about 150 slaves. There's no memory leak after 1 week. The test installation just builds several Jenkins plugins, hence there's no extremal load Probably, the error could be in the communication layer. I'll try the remoting version from 1.532.1 with a bigger workload

          We need more information to be able to solve problems like this. Please see https://wiki.jenkins-ci.org/display/JENKINS/I%27m+getting+OutOfMemoryError for how to get the details we need to be able to work on problems like this.

          I'm not doubting that you are seeing the problem, and for that I am sorry. Please get us the details we need so that we can fix the problem.

          If you cannot post a heap dump, please get at least the histogram summary.

          Kohsuke Kawaguchi added a comment - We need more information to be able to solve problems like this. Please see https://wiki.jenkins-ci.org/display/JENKINS/I%27m+getting+OutOfMemoryError for how to get the details we need to be able to work on problems like this. I'm not doubting that you are seeing the problem, and for that I am sorry. Please get us the details we need so that we can fix the problem. If you cannot post a heap dump, please get at least the histogram summary.

          Ronen Peleg added a comment - - edited

          @Oleg Nenashev, Did you try it with 1200 active jobs? anyway this is what solved my problem. I guess you have issue with 100+ slave machines connected to Jenkins with high load Jenkins.

          @Kohsuke Kawaguchi, because I have 64GB RAM, I can't do it, I can't save 64GB RAM on my HDD and anyway my problem is already solved so it's save to close it.

          Ronen Peleg added a comment - - edited @Oleg Nenashev, Did you try it with 1200 active jobs? anyway this is what solved my problem. I guess you have issue with 100+ slave machines connected to Jenkins with high load Jenkins. @Kohsuke Kawaguchi, because I have 64GB RAM, I can't do it, I can't save 64GB RAM on my HDD and anyway my problem is already solved so it's save to close it.

            oleg_nenashev Oleg Nenashev
            ronenpg Ronen Peleg
            Votes:
            21 Vote for this issue
            Watchers:
            15 Start watching this issue

              Created:
              Updated:
              Resolved: