Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-51057

EventDispatcher and ConcurrentLinkedQueue ate my JVM

    • Icon: Bug Bug
    • Resolution: Fixed
    • Icon: Major Major
    • sse-gateway-plugin

      We started running out of memory in our JVM (Xmx 8G) and when looking at Melody's memory (heap) histogram (JENKINS_URL/monitoring?part=heaphisto) the top two items were:

       

      Class Size (Kb) % size Instances % instances Source
      org.jenkinsci.plugins.ssegateway.sse.EventDispatcher$Retry 2,890,399 44 92,492,793 43  
      java.util.concurrent.ConcurrentLinkedQueue$Node 2,167,981 33 92,500,553 43  

       
      77% (and growing as we were researching the problem) of the memory was being used by these two items.

      I have two support bundles from this time and an .hprof as well.

      I can either screen share with someone or if you can tell me how to analyze these files I would be happy to.

          [JENKINS-51057] EventDispatcher and ConcurrentLinkedQueue ate my JVM

          Christian Höltje created issue -

          Christian Höltje added a comment - - edited

          Our Jenkins server has been up 23 hours and we're already seeing large numbers of the EventDispacher objects:

           

           Class  Size (Kb)  % size  Instances  % instances  Source
          org.jenkinsci.plugins.ssegateway.sse.EventDispatcher$Retry 96,964 2 3,102,853 4  
          java.util.concurrent.ConcurrentLinkedQueue$Node 73,163 1 3,121,633 4  

           

          It isn't a problem (yet) but this is alarming.

           

          Our other, Jenkins server has no references to EventDispatcher$Retry in the memory histogram, even when expanding details.

          Christian Höltje added a comment - - edited Our Jenkins server has been up 23 hours and we're already seeing large numbers of the EventDispacher objects:    Class  Size (Kb)  % size  Instances  % instances  Source org.jenkinsci.plugins.ssegateway.sse.EventDispatcher$Retry 96,964 2 3,102,853 4   java.util.concurrent.ConcurrentLinkedQueue$Node 73,163 1 3,121,633 4     It isn't a problem (yet) but this is alarming.   Our other, Jenkins server has no references to EventDispatcher$Retry in the memory histogram, even when expanding details.

          In our logs, we're see messages like:

          May 02, 2018 1:04:41 PM WARNING org.jenkinsci.plugins.ssegateway.sse.EventDispatcher unsubscribe
          Invalid SSE unsubscribe configuration. No active subscription matching filter: 
          May 02, 2018 1:04:41 PM WARNING org.jenkinsci.plugins.ssegateway.sse.EventDispatcher unsubscribe
          Invalid SSE unsubscribe configuration. No active subscription matching filter: 
          May 02, 2018 1:04:41 PM WARNING org.jenkinsci.plugins.ssegateway.sse.EventDispatcher unsubscribe
          Invalid SSE unsubscribe configuration. No active subscription matching filter: 
          May 02, 2018 1:04:41 PM WARNING org.jenkinsci.plugins.ssegateway.sse.EventDispatcher unsubscribe
          Invalid SSE unsubscribe configuration. No active subscription matching filter: 

          Could that be related?

           

          Christian Höltje added a comment - In our logs, we're see messages like: May 02, 2018 1:04:41 PM WARNING org.jenkinsci.plugins.ssegateway.sse.EventDispatcher unsubscribe Invalid SSE unsubscribe configuration. No active subscription matching filter: May 02, 2018 1:04:41 PM WARNING org.jenkinsci.plugins.ssegateway.sse.EventDispatcher unsubscribe Invalid SSE unsubscribe configuration. No active subscription matching filter: May 02, 2018 1:04:41 PM WARNING org.jenkinsci.plugins.ssegateway.sse.EventDispatcher unsubscribe Invalid SSE unsubscribe configuration. No active subscription matching filter: May 02, 2018 1:04:41 PM WARNING org.jenkinsci.plugins.ssegateway.sse.EventDispatcher unsubscribe Invalid SSE unsubscribe configuration. No active subscription matching filter: Could that be related?  

          And now its at:

          Class  Size (Kb)  % size  Instances  % instances  Source
          org.jenkinsci.plugins.ssegateway.sse.EventDispatcher$Retry 105,174 3 3,365,570 4  
          java.util.concurrent.ConcurrentLinkedQueue$Node 79,051 2 3,372,854 4  

          Christian Höltje added a comment - And now its at: Class  Size (Kb)  % size  Instances  % instances  Source org.jenkinsci.plugins.ssegateway.sse.EventDispatcher$Retry 105,174 3 3,365,570 4   java.util.concurrent.ConcurrentLinkedQueue$Node 79,051 2 3,372,854 4  

          There are a lot of changes/improvements in sse-gateway (mostly by rtyler) since 1.15 was released (back in Jan 16th, 2017).

          In the logs, I see a commit titled message leaks which sounds interesting.

          I tried compiling the master branch and I get errors:

          $ docker run --rm -it -v maven-repo:/root/.m2 -v $PWD:/src:rw -w /src maven:3-jdk-8-alpine mvn verify 
          ...
          [ERROR] Failed to execute goal com.github.eirslett:frontend-maven-plugin:1.0:npm (npm install) on project sse-gateway: Failed to run task: 'npm install' failed. java.io.IOException: Cannot run program "/src/node/node" (in directory "/src"): error=2, No such file or directory -> [Help 1]
          

          Christian Höltje added a comment - There are a lot of changes/improvements in sse-gateway (mostly by rtyler ) since 1.15 was released (back in Jan 16th, 2017). In the logs, I see a commit titled  message leaks which sounds interesting. I tried compiling the master branch and I get errors: $ docker run --rm -it -v maven-repo:/root/.m2 -v $PWD:/src:rw -w /src maven:3-jdk-8-alpine mvn verify ... [ERROR] Failed to execute goal com.github.eirslett:frontend-maven-plugin:1.0:npm (npm install) on project sse-gateway: Failed to run task: 'npm install' failed. java.io.IOException: Cannot run program "/src/node/node" (in directory "/src"): error=2, No such file or directory -> [Help 1]

           

          I got further when using the non-alpine openjdk image.  Still doesn't build, but it is due to tests failing.

           

          $ docker run --rm -it -v maven-repo:/root/.m2 -v $PWD:/src:rw -w /src maven:3-jdk-8 mvn verify
          ...
          Results :
          Failed tests:
          org.jenkinsci.plugins.ssegateway.EventHistoryStoreTest.test_autoDeleteOnExpire(org.jenkinsci.plugins.ssegateway.EventHistoryStoreTest)
           Run 1: EventHistoryStoreTest.test_autoDeleteOnExpire:119 expected:<100> but was:<50>
           Run 2: EventHistoryStoreTest.test_autoDeleteOnExpire:119 expected:<100> but was:<50>
           Run 3: EventHistoryStoreTest.test_autoDeleteOnExpire:119 expected:<100> but was:<50>
           Run 4: EventHistoryStoreTest.test_autoDeleteOnExpire:119 expected:<100> but was:<50>
           Run 5: EventHistoryStoreTest.test_autoDeleteOnExpire:119 expected:<100> but was:<50>
          org.jenkinsci.plugins.ssegateway.EventHistoryStoreTest.test_delete_stale_events(org.jenkinsci.plugins.ssegateway.EventHistoryStoreTest)
           Run 1: EventHistoryStoreTest.test_delete_stale_events:69
           Run 2: EventHistoryStoreTest.test_delete_stale_events:69
           Run 3: EventHistoryStoreTest.test_delete_stale_events:69
           Run 4: EventHistoryStoreTest.test_delete_stale_events:69
           Run 5: EventHistoryStoreTest.test_delete_stale_events:69
          
          Tests run: 12, Failures: 2, Errors: 0, Skipped: 0
          ...
          [ERROR] Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.19.1:test (default-test) on project sse-gateway: There are test failures.
          ...
          

           

           

          Christian Höltje added a comment -   I got further when using the non-alpine openjdk image.  Still doesn't build, but it is due to tests failing.   $ docker run --rm -it -v maven-repo:/root/.m2 -v $PWD:/src:rw -w /src maven:3-jdk-8 mvn verify ... Results : Failed tests: org.jenkinsci.plugins.ssegateway.EventHistoryStoreTest.test_autoDeleteOnExpire(org.jenkinsci.plugins.ssegateway.EventHistoryStoreTest) Run 1: EventHistoryStoreTest.test_autoDeleteOnExpire:119 expected:<100> but was:<50> Run 2: EventHistoryStoreTest.test_autoDeleteOnExpire:119 expected:<100> but was:<50> Run 3: EventHistoryStoreTest.test_autoDeleteOnExpire:119 expected:<100> but was:<50> Run 4: EventHistoryStoreTest.test_autoDeleteOnExpire:119 expected:<100> but was:<50> Run 5: EventHistoryStoreTest.test_autoDeleteOnExpire:119 expected:<100> but was:<50> org.jenkinsci.plugins.ssegateway.EventHistoryStoreTest.test_delete_stale_events(org.jenkinsci.plugins.ssegateway.EventHistoryStoreTest) Run 1: EventHistoryStoreTest.test_delete_stale_events:69 Run 2: EventHistoryStoreTest.test_delete_stale_events:69 Run 3: EventHistoryStoreTest.test_delete_stale_events:69 Run 4: EventHistoryStoreTest.test_delete_stale_events:69 Run 5: EventHistoryStoreTest.test_delete_stale_events:69 Tests run: 12, Failures: 2, Errors: 0, Skipped: 0 ... [ERROR] Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.19.1:test (default-test) on project sse-gateway: There are test failures. ...    

          Some more events from our logs:

           

          May 03, 2018 1:36:00 PM FINE org.jenkinsci.plugins.ssegateway.sse.EventDispatcher doDispatch
          Error dispatching event to SSE channel. Write failed.
          May 03, 2018 1:36:36 PM FINE org.jenkinsci.plugins.ssegateway.sse.EventDispatcher doDispatch
          Error dispatching event to SSE channel. Write failed.
          May 03, 2018 1:36:39 PM FINE org.jenkinsci.plugins.ssegateway.sse.EventDispatcher processRetries
          Error dispatching retry event to SSE channel. Write failed. Dispatcher jenkins-blueocean-core-js-1525286577663-o9g92 (1006446612).
          May 03, 2018 1:36:39 PM FINE org.jenkinsci.plugins.ssegateway.sse.EventDispatcher processRetries
          Error dispatching retry event to SSE channel. Write failed. Dispatcher jenkins-blueocean-core-js-1525286577663-o9g92 (1006446612).
          May 03, 2018 1:36:39 PM FINE org.jenkinsci.plugins.ssegateway.sse.EventDispatcher processRetries
          Error dispatching retry event to SSE channel. Write failed. Dispatcher jenkins-blueocean-core-js-1525286577663-o9g92 (1006446612).

          Christian Höltje added a comment - Some more events from our logs:   May 03, 2018 1:36:00 PM FINE org.jenkinsci.plugins.ssegateway.sse.EventDispatcher doDispatch Error dispatching event to SSE channel. Write failed. May 03, 2018 1:36:36 PM FINE org.jenkinsci.plugins.ssegateway.sse.EventDispatcher doDispatch Error dispatching event to SSE channel. Write failed. May 03, 2018 1:36:39 PM FINE org.jenkinsci.plugins.ssegateway.sse.EventDispatcher processRetries Error dispatching retry event to SSE channel. Write failed. Dispatcher jenkins-blueocean-core-js-1525286577663-o9g92 (1006446612). May 03, 2018 1:36:39 PM FINE org.jenkinsci.plugins.ssegateway.sse.EventDispatcher processRetries Error dispatching retry event to SSE channel. Write failed. Dispatcher jenkins-blueocean-core-js-1525286577663-o9g92 (1006446612). May 03, 2018 1:36:39 PM FINE org.jenkinsci.plugins.ssegateway.sse.EventDispatcher processRetries Error dispatching retry event to SSE channel. Write failed. Dispatcher jenkins-blueocean-core-js-1525286577663-o9g92 (1006446612).

          This morning:

           Class  Size (Kb)  % size  Instances  % instances  Source
          org.jenkinsci.plugins.ssegateway.sse.EventDispatcher$Retry 283,979 5 9,087,330 10  
          java.util.concurrent.ConcurrentLinkedQueue$Node 214,241 4 9,140,981 10  

           

          Is there a way I can find out what's in the queues to track what's causing the Retry to happen?

          Christian Höltje added a comment - This morning:  Class  Size (Kb)  % size  Instances  % instances  Source org.jenkinsci.plugins.ssegateway.sse.EventDispatcher$Retry 283,979 5 9,087,330 10   java.util.concurrent.ConcurrentLinkedQueue$Node 214,241 4 9,140,981 10     Is there a way I can find out what's in the queues to track what's causing the Retry to happen?

          Looking in $JENKINS_HOME/logs/sse-events/ I see 16 .json files in jobs and 108 .json files in pipeline. And they change frequently (e.g. they are now, a minute later, 6 and 54).

          Is there another place I can look at why there are some many Retry objects?

          Christian Höltje added a comment - Looking in $JENKINS_HOME/logs/sse-events/ I see 16 .json files in jobs and 108 .json files in pipeline . And they change frequently (e.g. they are now, a minute later, 6 and 54). Is there another place I can look at why there are some many Retry objects?

          I managed to use jvirtualvm to look at the .hprof from the initial comment and it was our friend EventDispatcher$Retry and $ConcurrentLinkedQueue$Node using up 90% of the memory.

          Christian Höltje added a comment - I managed to use jvirtualvm to look at the .hprof from the initial comment and it was our friend EventDispatcher$Retry and $ ConcurrentLinkedQueue$Node using up 90% of the memory.

            olamy Olivier Lamy
            docwhat Christian Höltje
            Votes:
            16 Vote for this issue
            Watchers:
            24 Start watching this issue

              Created:
              Updated:
              Resolved: