Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-75978

Jenkins master hangs due to in-memory WorkflowRun/CpsFlowExecution leak

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Blocker Blocker
    • None

      We've been observing frequent instances where Jenkins becomes unresponsive. During these occurrences, the Jenkins instance appears heavily hung, with a significant number of active threads.

      The heap was initially configured with 16GB. As memory usage continued to grow, we increased it to 18GB and later to 20GB, but the issue still persisted.

      We attempted to analyze the thread dumps to identify and terminate any problematic threads, but that didn't resolve the issue. Ultimately, the only reliable way to recover Jenkins has been to restart the instance.

      However, a safeRestart does not help in this scenario. We consistently need to go to the AWS console, update the service, and trigger a new deployment to restore Jenkins functionality.

      Took a heap dump and analyzed it through Eclipse MAT.

      1. Overall Heap Breakdown * Total heap inspected: ~9.4 GB

      2. Analyzed the below two suspects WorkflowRun objects (2.97GB) and CpsFlowExecution(2.30GB):

      (a) WorkflowRun objects (~2.97 GB):
      (b) CpsFlowExecution objects (~2.30 GB)

      Started analyzing the source of these WorkflowRun objects and found that top contributors were different jobs from the affected jenkins.


       
       
       

       
       
      Our current setup:

      • Seed job / Job-DSL: defines a logRotator { ... }}} in every {{{}pipelineJob { ... }{}, which injects a <logRotator> into each job's config.xml.
      • Declarative Pipelines: none of our Jenkinsfile pipelines include an {{options { buildDiscarder(...) }}} block.

      While investigating the issue, we came across an understanding, though not explicitly stated in Jenkins documentation, that using buildDiscarder in a Declarative Pipeline can help mitigate JVM heap buildup by cleaning up old WorkflowRun and CPS VM state. This isn't directly documented, but it's a widely acknowledged behavior within the Jenkins community.

      Are we correct in understanding that it's necessary to include buildDiscarder within a Declarative Pipeline?

      Is configuring logRotator in the job DSL file  and having simpleBuildDiscarder in Jenkins configuration, not sufficient to ensure cleanup of these potentially leaking objects?

      If there are any additional suggestions or recommendations to help address this issue, we'd greatly appreciate your input.

            nkns165 nkns165
            raj_anand Raj
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated: