Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-65513

Saml plugin 2.x.x causes deadlock impacting Jenkins performance

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Resolved (View Workflow)
    • Priority: Minor
    • Resolution: Fixed
    • Component/s: saml-plugin
    • Labels:
    • Environment:
      Jenkins LTS 2.273.2
      SAML 2.0.3 (2.x.x as well)
      Running on CentOS-7 inside a docker container
      openjdk version "1.8.0_282"



    • Similar Issues:
    • Released As:
      saml-2.0.5

      Description

      We wanted to use SSO authentication on our Jenkins server, so we started using the Saml plugin 2.0.2. Since we started using it, we noticed that Jenkins started to gradually run slower to the point where it was unusable. Upon further investigation, we found in our logs the following message (see Saml logs picture) - we might have a concurrency issue (probably deadlock).

      These messages are generated at least on every login (we are not sure if also due to something else). Furthermore, we discovered that our threads were continuously increasing with threads from the Saml plugin (see picture). From /threadDump, this is the information that we could obtain from one of the threads:

       Timer for org.opensaml.saml.metadata.resolver.impl.FilesystemMetadataResolver@11a2ac5bTimer for org.opensaml.saml.metadata.resolver.impl.FilesystemMetadataResolver@11a2ac5b"Timer for org.opensaml.saml.metadata.resolver.impl.FilesystemMetadataResolver@11a2ac5b" Id=10090 Group=main TIMED_WAITING on java.util.TaskQueue@7aa956ec at java.lang.Object.wait(Native Method) -  waiting on java.util.TaskQueue@7aa956ec at java.util.TimerThread.mainLoop(Timer.java:552) at java.util.TimerThread.run(Timer.java:505)

      When reaching 9k opened threads, we had to restart Jenkins because of the bad performance. Every couple of days (5-7 days) a restart was needed.

      We tested this with multiple Saml plugin versions. The problem only occurs with versions 2.x.x. Therefore we rolled back to version 1.1.7 which seems to run without causing any issues.

      We tried to investigate and compare the differences between the two major versions:

      • Saml jenkins plugin 2.0.3 uses version 3.9.0 for pac4j. Pac4j uses version 3.4.3 opensaml-saml-impl.
      • Saml jenkins plugin 1.1.7 uses version 1.9.9 for pac4j. Pac4j uses version 3.2.0 opensaml-saml-impl.

      We think the problem is in the different pac4j version, as the changes in the plugin itself between those versions do not seem that major.

      Part of the Jenkins logs (SAML Log.txt) has been attached to the ticket.

      And also more descriptive logs (saml_debug_log).

        Attachments

        1. Jenkins_threads.PNG
          Jenkins_threads.PNG
          271 kB
        2. saml_debug_log.log
          26 kB
        3. SAML Log.txt
          26 kB
        4. Saml logs.png
          Saml logs.png
          101 kB
        5. Thread dump.txt
          67 kB

          Issue Links

            Activity

            georg020 Georgi created issue -
            georg020 Georgi made changes -
            Field Original Value New Value
            Description We wanted to use SSO authentication on our Jenkins server, so we started using the Saml plugin 2.0.2. Since we started using it, we noticed that Jenkins started to gradually run slower to the point where it was unusable. Upon further investigation, we found in our logs the following message (see Saml logs picture) - we might have a concurrency issue (probably deadlock).

            These messages are generated at least on every login (we are not sure if also due to something else). Furthermore, we discovered that our threads were continuously increasing with threads from the Saml plugin (see picture). From /threadDump, this is the information that we could obtain from one of the threads:
            {code:java}
             Timer for org.opensaml.saml.metadata.resolver.impl.FilesystemMetadataResolver@11a2ac5bTimer for org.opensaml.saml.metadata.resolver.impl.FilesystemMetadataResolver@11a2ac5b"Timer for org.opensaml.saml.metadata.resolver.impl.FilesystemMetadataResolver@11a2ac5b" Id=10090 Group=main TIMED_WAITING on java.util.TaskQueue@7aa956ec at java.lang.Object.wait(Native Method) -  waiting on java.util.TaskQueue@7aa956ec at java.util.TimerThread.mainLoop(Timer.java:552) at java.util.TimerThread.run(Timer.java:505){code}
            When reaching 9k opened threads, we had to restart Jenkins because of the bad performance. Every couple of days (5-7 days) a restart was needed.

            We tested this with multiple Saml plugin versions. The problem only occurs with versions 2.x.x. Therefore we rolled back to version 1.1.7 which seems to run without causing any issues.

            We tried to investigate and compare the differences between the two major versions:
             * Saml jenkins plugin 2.0.3 uses version 3.9.0 for pac4j. Pac4j uses version 3.4.3 opensaml-saml-impl.
             * Saml jenkins plugin 1.1.7 uses version 1.9.9 for pac4j. Pac4j uses version 3.2.0 opensaml-saml-impl.

            We think the problem is in the different pac4j version, as the changes in the plugin itself between those versions do not seem that major.

            Part of the Jenkins logs has been attached to the ticket.
            We wanted to use SSO authentication on our Jenkins server, so we started using the Saml plugin 2.0.2. Since we started using it, we noticed that Jenkins started to gradually run slower to the point where it was unusable. Upon further investigation, we found in our logs the following message (see Saml logs picture) - we might have a concurrency issue (probably deadlock).

            These messages are generated at least on every login (we are not sure if also due to something else). Furthermore, we discovered that our threads were continuously increasing with threads from the Saml plugin (see picture). From /threadDump, this is the information that we could obtain from one of the threads:
            {code:java}
             Timer for org.opensaml.saml.metadata.resolver.impl.FilesystemMetadataResolver@11a2ac5bTimer for org.opensaml.saml.metadata.resolver.impl.FilesystemMetadataResolver@11a2ac5b"Timer for org.opensaml.saml.metadata.resolver.impl.FilesystemMetadataResolver@11a2ac5b" Id=10090 Group=main TIMED_WAITING on java.util.TaskQueue@7aa956ec at java.lang.Object.wait(Native Method) -  waiting on java.util.TaskQueue@7aa956ec at java.util.TimerThread.mainLoop(Timer.java:552) at java.util.TimerThread.run(Timer.java:505){code}
            When reaching 9k opened threads, we had to restart Jenkins because of the bad performance. Every couple of days (5-7 days) a restart was needed.

            We tested this with multiple Saml plugin versions. The problem only occurs with versions 2.x.x. Therefore we rolled back to version 1.1.7 which seems to run without causing any issues.

            We tried to investigate and compare the differences between the two major versions:
             * Saml jenkins plugin 2.0.3 uses version 3.9.0 for pac4j. Pac4j uses version 3.4.3 opensaml-saml-impl.
             * Saml jenkins plugin 1.1.7 uses version 1.9.9 for pac4j. Pac4j uses version 3.2.0 opensaml-saml-impl.

            We think the problem is in the different pac4j version, as the changes in the plugin itself between those versions do not seem that major.

            Part of the Jenkins logs (SAML Log.txt) has been attached to the ticket.
            georg020 Georgi made changes -
            Attachment saml_debug_log.log [ 54725 ]
            georg020 Georgi made changes -
            Description We wanted to use SSO authentication on our Jenkins server, so we started using the Saml plugin 2.0.2. Since we started using it, we noticed that Jenkins started to gradually run slower to the point where it was unusable. Upon further investigation, we found in our logs the following message (see Saml logs picture) - we might have a concurrency issue (probably deadlock).

            These messages are generated at least on every login (we are not sure if also due to something else). Furthermore, we discovered that our threads were continuously increasing with threads from the Saml plugin (see picture). From /threadDump, this is the information that we could obtain from one of the threads:
            {code:java}
             Timer for org.opensaml.saml.metadata.resolver.impl.FilesystemMetadataResolver@11a2ac5bTimer for org.opensaml.saml.metadata.resolver.impl.FilesystemMetadataResolver@11a2ac5b"Timer for org.opensaml.saml.metadata.resolver.impl.FilesystemMetadataResolver@11a2ac5b" Id=10090 Group=main TIMED_WAITING on java.util.TaskQueue@7aa956ec at java.lang.Object.wait(Native Method) -  waiting on java.util.TaskQueue@7aa956ec at java.util.TimerThread.mainLoop(Timer.java:552) at java.util.TimerThread.run(Timer.java:505){code}
            When reaching 9k opened threads, we had to restart Jenkins because of the bad performance. Every couple of days (5-7 days) a restart was needed.

            We tested this with multiple Saml plugin versions. The problem only occurs with versions 2.x.x. Therefore we rolled back to version 1.1.7 which seems to run without causing any issues.

            We tried to investigate and compare the differences between the two major versions:
             * Saml jenkins plugin 2.0.3 uses version 3.9.0 for pac4j. Pac4j uses version 3.4.3 opensaml-saml-impl.
             * Saml jenkins plugin 1.1.7 uses version 1.9.9 for pac4j. Pac4j uses version 3.2.0 opensaml-saml-impl.

            We think the problem is in the different pac4j version, as the changes in the plugin itself between those versions do not seem that major.

            Part of the Jenkins logs (SAML Log.txt) has been attached to the ticket.
            We wanted to use SSO authentication on our Jenkins server, so we started using the Saml plugin 2.0.2. Since we started using it, we noticed that Jenkins started to gradually run slower to the point where it was unusable. Upon further investigation, we found in our logs the following message (see Saml logs picture) - we might have a concurrency issue (probably deadlock).

            These messages are generated at least on every login (we are not sure if also due to something else). Furthermore, we discovered that our threads were continuously increasing with threads from the Saml plugin (see picture). From /threadDump, this is the information that we could obtain from one of the threads:
            {code:java}
             Timer for org.opensaml.saml.metadata.resolver.impl.FilesystemMetadataResolver@11a2ac5bTimer for org.opensaml.saml.metadata.resolver.impl.FilesystemMetadataResolver@11a2ac5b"Timer for org.opensaml.saml.metadata.resolver.impl.FilesystemMetadataResolver@11a2ac5b" Id=10090 Group=main TIMED_WAITING on java.util.TaskQueue@7aa956ec at java.lang.Object.wait(Native Method) -  waiting on java.util.TaskQueue@7aa956ec at java.util.TimerThread.mainLoop(Timer.java:552) at java.util.TimerThread.run(Timer.java:505){code}
            When reaching 9k opened threads, we had to restart Jenkins because of the bad performance. Every couple of days (5-7 days) a restart was needed.

            We tested this with multiple Saml plugin versions. The problem only occurs with versions 2.x.x. Therefore we rolled back to version 1.1.7 which seems to run without causing any issues.

            We tried to investigate and compare the differences between the two major versions:
             * Saml jenkins plugin 2.0.3 uses version 3.9.0 for pac4j. Pac4j uses version 3.4.3 opensaml-saml-impl.
             * Saml jenkins plugin 1.1.7 uses version 1.9.9 for pac4j. Pac4j uses version 3.2.0 opensaml-saml-impl.

            We think the problem is in the different pac4j version, as the changes in the plugin itself between those versions do not seem that major.

            Part of the Jenkins logs (SAML Log.txt) has been attached to the ticket.

            And also more descriptive
            georg020 Georgi made changes -
            Description We wanted to use SSO authentication on our Jenkins server, so we started using the Saml plugin 2.0.2. Since we started using it, we noticed that Jenkins started to gradually run slower to the point where it was unusable. Upon further investigation, we found in our logs the following message (see Saml logs picture) - we might have a concurrency issue (probably deadlock).

            These messages are generated at least on every login (we are not sure if also due to something else). Furthermore, we discovered that our threads were continuously increasing with threads from the Saml plugin (see picture). From /threadDump, this is the information that we could obtain from one of the threads:
            {code:java}
             Timer for org.opensaml.saml.metadata.resolver.impl.FilesystemMetadataResolver@11a2ac5bTimer for org.opensaml.saml.metadata.resolver.impl.FilesystemMetadataResolver@11a2ac5b"Timer for org.opensaml.saml.metadata.resolver.impl.FilesystemMetadataResolver@11a2ac5b" Id=10090 Group=main TIMED_WAITING on java.util.TaskQueue@7aa956ec at java.lang.Object.wait(Native Method) -  waiting on java.util.TaskQueue@7aa956ec at java.util.TimerThread.mainLoop(Timer.java:552) at java.util.TimerThread.run(Timer.java:505){code}
            When reaching 9k opened threads, we had to restart Jenkins because of the bad performance. Every couple of days (5-7 days) a restart was needed.

            We tested this with multiple Saml plugin versions. The problem only occurs with versions 2.x.x. Therefore we rolled back to version 1.1.7 which seems to run without causing any issues.

            We tried to investigate and compare the differences between the two major versions:
             * Saml jenkins plugin 2.0.3 uses version 3.9.0 for pac4j. Pac4j uses version 3.4.3 opensaml-saml-impl.
             * Saml jenkins plugin 1.1.7 uses version 1.9.9 for pac4j. Pac4j uses version 3.2.0 opensaml-saml-impl.

            We think the problem is in the different pac4j version, as the changes in the plugin itself between those versions do not seem that major.

            Part of the Jenkins logs (SAML Log.txt) has been attached to the ticket.

            And also more descriptive
            We wanted to use SSO authentication on our Jenkins server, so we started using the Saml plugin 2.0.2. Since we started using it, we noticed that Jenkins started to gradually run slower to the point where it was unusable. Upon further investigation, we found in our logs the following message (see Saml logs picture) - we might have a concurrency issue (probably deadlock).

            These messages are generated at least on every login (we are not sure if also due to something else). Furthermore, we discovered that our threads were continuously increasing with threads from the Saml plugin (see picture). From /threadDump, this is the information that we could obtain from one of the threads:
            {code:java}
             Timer for org.opensaml.saml.metadata.resolver.impl.FilesystemMetadataResolver@11a2ac5bTimer for org.opensaml.saml.metadata.resolver.impl.FilesystemMetadataResolver@11a2ac5b"Timer for org.opensaml.saml.metadata.resolver.impl.FilesystemMetadataResolver@11a2ac5b" Id=10090 Group=main TIMED_WAITING on java.util.TaskQueue@7aa956ec at java.lang.Object.wait(Native Method) -  waiting on java.util.TaskQueue@7aa956ec at java.util.TimerThread.mainLoop(Timer.java:552) at java.util.TimerThread.run(Timer.java:505){code}
            When reaching 9k opened threads, we had to restart Jenkins because of the bad performance. Every couple of days (5-7 days) a restart was needed.

            We tested this with multiple Saml plugin versions. The problem only occurs with versions 2.x.x. Therefore we rolled back to version 1.1.7 which seems to run without causing any issues.

            We tried to investigate and compare the differences between the two major versions:
             * Saml jenkins plugin 2.0.3 uses version 3.9.0 for pac4j. Pac4j uses version 3.4.3 opensaml-saml-impl.
             * Saml jenkins plugin 1.1.7 uses version 1.9.9 for pac4j. Pac4j uses version 3.2.0 opensaml-saml-impl.

            We think the problem is in the different pac4j version, as the changes in the plugin itself between those versions do not seem that major.

            Part of the Jenkins logs (SAML Log.txt) has been attached to the ticket.

            And also more descriptive logs (saml_debug_log)
            georg020 Georgi made changes -
            Description We wanted to use SSO authentication on our Jenkins server, so we started using the Saml plugin 2.0.2. Since we started using it, we noticed that Jenkins started to gradually run slower to the point where it was unusable. Upon further investigation, we found in our logs the following message (see Saml logs picture) - we might have a concurrency issue (probably deadlock).

            These messages are generated at least on every login (we are not sure if also due to something else). Furthermore, we discovered that our threads were continuously increasing with threads from the Saml plugin (see picture). From /threadDump, this is the information that we could obtain from one of the threads:
            {code:java}
             Timer for org.opensaml.saml.metadata.resolver.impl.FilesystemMetadataResolver@11a2ac5bTimer for org.opensaml.saml.metadata.resolver.impl.FilesystemMetadataResolver@11a2ac5b"Timer for org.opensaml.saml.metadata.resolver.impl.FilesystemMetadataResolver@11a2ac5b" Id=10090 Group=main TIMED_WAITING on java.util.TaskQueue@7aa956ec at java.lang.Object.wait(Native Method) -  waiting on java.util.TaskQueue@7aa956ec at java.util.TimerThread.mainLoop(Timer.java:552) at java.util.TimerThread.run(Timer.java:505){code}
            When reaching 9k opened threads, we had to restart Jenkins because of the bad performance. Every couple of days (5-7 days) a restart was needed.

            We tested this with multiple Saml plugin versions. The problem only occurs with versions 2.x.x. Therefore we rolled back to version 1.1.7 which seems to run without causing any issues.

            We tried to investigate and compare the differences between the two major versions:
             * Saml jenkins plugin 2.0.3 uses version 3.9.0 for pac4j. Pac4j uses version 3.4.3 opensaml-saml-impl.
             * Saml jenkins plugin 1.1.7 uses version 1.9.9 for pac4j. Pac4j uses version 3.2.0 opensaml-saml-impl.

            We think the problem is in the different pac4j version, as the changes in the plugin itself between those versions do not seem that major.

            Part of the Jenkins logs (SAML Log.txt) has been attached to the ticket.

            And also more descriptive logs (saml_debug_log)
            We wanted to use SSO authentication on our Jenkins server, so we started using the Saml plugin 2.0.2. Since we started using it, we noticed that Jenkins started to gradually run slower to the point where it was unusable. Upon further investigation, we found in our logs the following message (see Saml logs picture) - we might have a concurrency issue (probably deadlock).

            These messages are generated at least on every login (we are not sure if also due to something else). Furthermore, we discovered that our threads were continuously increasing with threads from the Saml plugin (see picture). From /threadDump, this is the information that we could obtain from one of the threads:
            {code:java}
             Timer for org.opensaml.saml.metadata.resolver.impl.FilesystemMetadataResolver@11a2ac5bTimer for org.opensaml.saml.metadata.resolver.impl.FilesystemMetadataResolver@11a2ac5b"Timer for org.opensaml.saml.metadata.resolver.impl.FilesystemMetadataResolver@11a2ac5b" Id=10090 Group=main TIMED_WAITING on java.util.TaskQueue@7aa956ec at java.lang.Object.wait(Native Method) -  waiting on java.util.TaskQueue@7aa956ec at java.util.TimerThread.mainLoop(Timer.java:552) at java.util.TimerThread.run(Timer.java:505){code}
            When reaching 9k opened threads, we had to restart Jenkins because of the bad performance. Every couple of days (5-7 days) a restart was needed.

            We tested this with multiple Saml plugin versions. The problem only occurs with versions 2.x.x. Therefore we rolled back to version 1.1.7 which seems to run without causing any issues.

            We tried to investigate and compare the differences between the two major versions:
             * Saml jenkins plugin 2.0.3 uses version 3.9.0 for pac4j. Pac4j uses version 3.4.3 opensaml-saml-impl.
             * Saml jenkins plugin 1.1.7 uses version 1.9.9 for pac4j. Pac4j uses version 3.2.0 opensaml-saml-impl.

            We think the problem is in the different pac4j version, as the changes in the plugin itself between those versions do not seem that major.

            Part of the Jenkins logs (SAML Log.txt) has been attached to the ticket.

            And also more descriptive logs (saml_debug_log).
            georg020 Georgi made changes -
            Attachment Thread dump.txt [ 54741 ]
            ifernandezcalvo Ivan Fernandez Calvo made changes -
            Status Open [ 1 ] In Progress [ 3 ]
            ifernandezcalvo Ivan Fernandez Calvo made changes -
            Remote Link This issue links to "PR (Web Link)" [ 26738 ]
            ifernandezcalvo Ivan Fernandez Calvo made changes -
            Status In Progress [ 3 ] In Review [ 10005 ]
            ifernandezcalvo Ivan Fernandez Calvo made changes -
            Released As saml-2.0.4
            Resolution Fixed [ 1 ]
            Status In Review [ 10005 ] Resolved [ 5 ]
            ifernandezcalvo Ivan Fernandez Calvo made changes -
            Released As saml-2.0.4 saml-2.0.5

              People

              Assignee:
              ifernandezcalvo Ivan Fernandez Calvo
              Reporter:
              georg020 Georgi
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: