Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-47430

SandboxResolvingClassLoader use of Guava cache can cause classloading bottleneck/deadlock

    • script-security 1.61

      Noted the following when investigating a system burning a lot of CPU with pipelines.  They were traced from native thread IDs in top with high CPU use to Java threads in stack traces that were using the SandboxResolvingClassloader.  System also exhibited very high classloading/parsing times for some pipelines. 

      java.lang.Thread.State: WAITING (parking)
      at sun.misc.Unsafe.park(Native Method)

      • parking to wait for <0x000000075b9264f8> (a com.google.common.util.concurrent.AbstractFuture$Sync)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
        at com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:275)
        at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:111)
        at com.google.common.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:132)
        at com.google.common.cache.LocalCache$LoadingValueReference.waitForValue(LocalCache.java:3586)
        at com.google.common.cache.LocalCache$Segment.waitForLoadingValue(LocalCache.java:2333)
        at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2222)
        at com.google.common.cache.LocalCache.get(LocalCache.java:3965)
        at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3969)
        at com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4829)
        at com.google.common.cache.LocalCache$LocalManualCache.getUnchecked(LocalCache.java:4834)
        at org.jenkinsci.plugins.scriptsecurity.sandbox.groovy.SandboxResolvingClassLoader.loadClass(SandboxResolvingClassLoader.java:51)
      • locked <0x000000069c03be78> (a org.jenkinsci.plugins.scriptsecurity.sandbox.groovy.SandboxResolvingClassLoader)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:411)
      • locked <0x000000069fc00b48> (a org.jenkinsci.plugins.workflow.cps.CpsGroovyShell$TimingLoader)

      16 365 = 0x3FED = 44% CPU fetching from JAR with sandbox resolving classloader
      SandboxResolvingClassLoader$2.compute(SandboxResolvingClassLoader.java:39

      This is using a Guava LoadingCache rather than the much faster Caffeine cache, which can be a drop-in replacement.
       

          [JENKINS-47430] SandboxResolvingClassLoader use of Guava cache can cause classloading bottleneck/deadlock

          Sam Van Oort created issue -
          Sam Van Oort made changes -
          Status Original: Open [ 1 ] New: In Progress [ 3 ]
          Sam Van Oort made changes -
          Remote Link New: This issue links to "Implemented in PR #160 (Web Link)" [ 17987 ]
          Devin Nusbaum made changes -
          Status Original: In Progress [ 3 ] New: Open [ 1 ]

          Devin Nusbaum added a comment -

          PR is stalled and would need to be updated to resolve merge conflicts. Would probably need some additional testing at that point as well to understand the impact.

          Devin Nusbaum added a comment - PR is stalled and would need to be updated to resolve merge conflicts. Would probably need some additional testing at that point as well to understand the impact.
          Devin Nusbaum made changes -
          Remote Link New: This issue links to "jenkinsci/script-security-plugin#160 (Web Link)" [ 22387 ]
          Devin Nusbaum made changes -
          Remote Link Original: This issue links to "Implemented in PR #160 (Web Link)" [ 17987 ]

          Devin Nusbaum added a comment -

          Noting also that I have seen evidence of a bug in Guava (not just a performance issue) in some cases, where many threads are waiting to load a value from the cache but no thread is actually loading a value, which is described in this upstream issue.

          My best guess for the cause of the issue in the cases I have seen is that a StackOverflowError thrown by the loading thread was somehow swallowed by Guava. We should investigate to understand if that issue is reproducible and if it is a bug in the Pipeline-Groovy layer or in Guava itself.

          Devin Nusbaum added a comment - Noting also that I have seen evidence of a bug in Guava (not just a performance issue) in some cases, where many threads are waiting to load a value from the cache but no thread is actually loading a value, which is described in this upstream issue . My best guess for the cause of the issue in the cases I have seen is that a StackOverflowError thrown by the loading thread was somehow swallowed by Guava. We should investigate to understand if that issue is reproducible and if it is a bug in the Pipeline-Groovy layer or in Guava itself.
          Devin Nusbaum made changes -
          Summary Original: SandboxResolvingClassLoader uses inefficient caching, imposes a classloading bottleneck New: SandboxResolvingClassLoader use of Guava cache can cause classloading bottleneck/deadlock
          Liam Newman made changes -
          Status Original: Open [ 1 ] New: In Progress [ 3 ]

            Unassigned Unassigned
            svanoort Sam Van Oort
            Votes:
            1 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: