Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-47430

SandboxResolvingClassLoader use of Guava cache can cause classloading bottleneck/deadlock

    • script-security 1.61

      Noted the following when investigating a system burning a lot of CPU with pipelines.  They were traced from native thread IDs in top with high CPU use to Java threads in stack traces that were using the SandboxResolvingClassloader.  System also exhibited very high classloading/parsing times for some pipelines. 

      java.lang.Thread.State: WAITING (parking)
      at sun.misc.Unsafe.park(Native Method)

      • parking to wait for <0x000000075b9264f8> (a com.google.common.util.concurrent.AbstractFuture$Sync)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
        at com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:275)
        at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:111)
        at com.google.common.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:132)
        at com.google.common.cache.LocalCache$LoadingValueReference.waitForValue(LocalCache.java:3586)
        at com.google.common.cache.LocalCache$Segment.waitForLoadingValue(LocalCache.java:2333)
        at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2222)
        at com.google.common.cache.LocalCache.get(LocalCache.java:3965)
        at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3969)
        at com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4829)
        at com.google.common.cache.LocalCache$LocalManualCache.getUnchecked(LocalCache.java:4834)
        at org.jenkinsci.plugins.scriptsecurity.sandbox.groovy.SandboxResolvingClassLoader.loadClass(SandboxResolvingClassLoader.java:51)
      • locked <0x000000069c03be78> (a org.jenkinsci.plugins.scriptsecurity.sandbox.groovy.SandboxResolvingClassLoader)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:411)
      • locked <0x000000069fc00b48> (a org.jenkinsci.plugins.workflow.cps.CpsGroovyShell$TimingLoader)

      16 365 = 0x3FED = 44% CPU fetching from JAR with sandbox resolving classloader
      SandboxResolvingClassLoader$2.compute(SandboxResolvingClassLoader.java:39

      This is using a Guava LoadingCache rather than the much faster Caffeine cache, which can be a drop-in replacement.
       

          [JENKINS-47430] SandboxResolvingClassLoader use of Guava cache can cause classloading bottleneck/deadlock

          Devin Nusbaum added a comment -

          PR is stalled and would need to be updated to resolve merge conflicts. Would probably need some additional testing at that point as well to understand the impact.

          Devin Nusbaum added a comment - PR is stalled and would need to be updated to resolve merge conflicts. Would probably need some additional testing at that point as well to understand the impact.

          Devin Nusbaum added a comment -

          Noting also that I have seen evidence of a bug in Guava (not just a performance issue) in some cases, where many threads are waiting to load a value from the cache but no thread is actually loading a value, which is described in this upstream issue.

          My best guess for the cause of the issue in the cases I have seen is that a StackOverflowError thrown by the loading thread was somehow swallowed by Guava. We should investigate to understand if that issue is reproducible and if it is a bug in the Pipeline-Groovy layer or in Guava itself.

          Devin Nusbaum added a comment - Noting also that I have seen evidence of a bug in Guava (not just a performance issue) in some cases, where many threads are waiting to load a value from the cache but no thread is actually loading a value, which is described in this upstream issue . My best guess for the cause of the issue in the cases I have seen is that a StackOverflowError thrown by the loading thread was somehow swallowed by Guava. We should investigate to understand if that issue is reproducible and if it is a bug in the Pipeline-Groovy layer or in Guava itself.

          Devin Nusbaum added a comment -

          A fix for this issue was release in version 1.61 of Script Security Plugin.

          Devin Nusbaum added a comment - A fix for this issue was release in version 1.61 of Script Security Plugin.

            Unassigned Unassigned
            svanoort Sam Van Oort
            Votes:
            1 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: