Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-55014

Deadlocking around support-core-plugin log publishing

      The following versions together cause a deadlock around log publishing:

      • jenkins: 2.89.4
      • git-plugin: 3.8.0
      • support-core-plugin: 2.45.1
      • workflow-cps: 2.51

      Attaching stack dump jenkins-deadlock-workflow-git

          [JENKINS-55014] Deadlocking around support-core-plugin log publishing

          Mark Waite added a comment -

          Can you provide a detailed description of the configuration and environment where you're seeing the error?

          I'm not aware of any issues related to threads in git plugin 3.8.0 or in the most recent releases like 3.9.0 and 3.9.1

          Mark Waite added a comment - Can you provide a detailed description of the configuration and environment where you're seeing the error? I'm not aware of any issues related to threads in git plugin 3.8.0 or in the most recent releases like 3.9.0 and 3.9.1

          Andrew Feller added a comment - - edited

          Hey markewaite,

          The only thing I can think configuration wise worth noting is we had a handful of Log Records with one logging everything at FINEST level. As far as environment, we're using a variation off openshift/jenkins-2-centos7 with some different plugins added and/or upgraded. Given the plugins deadlocking in the attached stack dump, I'm not sure what else would be useful sharing as these errors are happening on Jenkins boot.

          Andrew Feller added a comment - - edited Hey markewaite , The only thing I can think configuration wise worth noting is we had a handful of Log Records with one logging everything at FINEST level. As far as environment, we're using a variation off openshift/jenkins-2-centos7 with some different plugins added and/or upgraded. Given the plugins deadlocking in the attached stack dump, I'm not sure what else would be useful sharing as these errors are happening on Jenkins boot.

          Devin Nusbaum added a comment - - edited

          I think this is an issue in support-core's custom log handler, and it seems related to JENKINS-27669.

          It looks like ReplayAction.ensurePermissionRegistered is being loaded, and during the load, it locks a class loader, and somewhere during the load after the class loader is locked, Jetty's class loader ends up writing to the logs, which is then handled by CustomHandler which tries to lock a StreamHandler.

          At the same time, GitSCM.onLoaded tries to log an exception, which is also intercepted by CustomHandler. It locks the StreamHandler, and then the custom log handler uses SupportLogFormatter.format to format the message, but the formatter tries to load classes, which will require locking the same class loader that ReplayAction.ensurePermissionRegistered already locked, and so we have a deadlock.

          Maybe preloading SupportLogFormatter  here is not enough to preload PrintWriter and Throwable, so we should add them as well? Or maybe the class loaders have to be consulted even though the classes have already been loaded, and so this is just a really nasty timing bug.

          Maybe it could be fixed by getting rid of the class loading in support-core's SupportLogFormatter (not sure how it would affect the behavior), or maybe the scope of the custom handler could be changed to exclude Jetty's class loader, and then we could install a separate log handler for Jetty that doesn't need to do anything fancy. I'm not sure what is causing GitSCM.onLoaded and ReplayAction.ensurePermissionRegistered to be called at the same time, but it looks like they are running as part of the Reactor during startup, and I think it is normal for it to execute things with no interdependency in parallel.

          andyfeller Are you able to reproduce this consistently? Did you start seeing the error immediately after updating Jenkins or any plugins?

          Devin Nusbaum added a comment - - edited I think this is an issue in support-core's custom log handler , and it seems related to  JENKINS-27669 . It looks like ReplayAction.ensurePermissionRegistered is being loaded, and during the load, it locks a class loader, and somewhere during the load after the class loader is locked, Jetty's class loader ends up writing to the logs, which is then handled by CustomHandler which tries to lock a  StreamHandler . At the same time, GitSCM.onLoaded tries to log an exception, which is also intercepted by CustomHandler . It locks the StreamHandler , and then the custom log handler uses SupportLogFormatter.format to format the message, but the formatter tries to load classes , which will require locking the same class loader that ReplayAction.ensurePermissionRegistered already locked, and so we have a deadlock. Maybe preloading SupportLogFormatter   here is not enough to preload PrintWriter and Throwable , so we should add them as well? Or maybe the class loaders have to be consulted even though the classes have already been loaded, and so this is just a really nasty timing bug. Maybe it could be fixed by getting rid of the class loading in support-core's SupportLogFormatter (not sure how it would affect the behavior), or maybe the scope of the custom handler could be changed to exclude Jetty's class loader, and then we could install a separate log handler for Jetty that doesn't need to do anything fancy. I'm not sure what is causing GitSCM.onLoaded and ReplayAction.ensurePermissionRegistered to be called at the same time, but it looks like they are running as part of the Reactor during startup, and I think it is normal for it to execute things with no interdependency in parallel. andyfeller Are you able to reproduce this consistently? Did you start seeing the error immediately after updating Jenkins or any plugins?

          Andrew Feller added a comment -

          Hey dnusbaum,

          I believe it was coincidence we saw this in upgrading jenkins openshift-login plugin from 1.0.8 to 1.0.11 as we hadn't brought the service down for service for several weeks. In attempting to roll out this upgrade, this deadlocking has been consistently blocking the rollout. To test a theory, I removed all of the custom loggers including the one that had everything logging FINEST and that appears to allow Jenkins startup to complete. AFAIC it seems the need for plugins to acquire a lock to perform logging in support-core-plugin is really the problem here, however I am not an authority on the reasoning or the best way to address.

          Thanks for chiming in!

          Andrew Feller added a comment - Hey dnusbaum , I believe it was coincidence we saw this in upgrading jenkins openshift-login plugin from 1.0.8 to 1.0.11 as we hadn't brought the service down for service for several weeks. In attempting to roll out this upgrade, this deadlocking has been consistently blocking the rollout. To test a theory, I removed all of the custom loggers including the one that had everything logging FINEST and that appears to allow Jenkins startup to complete. AFAIC it seems the need for plugins to acquire a lock to perform logging in support-core-plugin is really the problem here, however I am not an authority on the reasoning or the best way to address. Thanks for chiming in!

          Devin Nusbaum added a comment -

          Ah, yeah I think that the custom logger logging everything at FINEST was the proximate cause for the issue. The code in Jetty that logs during class loading requires DEBUG level in Jetty's API, which corresponds to FINE when using java.util.logging.

          Support Core's special formatting is certainly not ideal, but synchronous logging inside of a class loader is also asking for trouble. In your case, maybe you could change your custom logger to explicitly capture jenkins, hudson, and maybe org.jenkinsci instead of logging everything at FINEST as a workaround (I'm not super familiar with the proper syntax for wildcard logging with java.util.logging, and maybe there is even a way to do an inverse filter and log everything at FINEST except for Jetty?).

          Devin Nusbaum added a comment - Ah, yeah I think that the custom logger logging everything at FINEST was the proximate cause for the issue. The code in Jetty that logs during class loading requires DEBUG level in Jetty's API, which corresponds to FINE when using java.util.logging . Support Core's special formatting is certainly not ideal, but synchronous logging inside of a class loader is also asking for trouble. In your case, maybe you could change your custom logger to explicitly capture jenkins , hudson , and maybe org.jenkinsci instead of logging everything at FINEST as a workaround (I'm not super familiar with the proper syntax for wildcard logging with java.util.logging, and maybe there is even a way to do an inverse filter and log everything at FINEST except for Jetty?).

            Unassigned Unassigned
            andyfeller Andrew Feller
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: