• Icon: Bug Bug
    • Resolution: Fixed
    • Icon: Major Major
    • core

      We are seeing repeatable very heavy lock congestion with FingerPrint.save.

      Jenkins apparently slides down to a state where a lot of threads have locked / are competing for a lock on FingerPrint instance and are competing for a lock on AnnotationMapper (a singleton). Everything grings to a halt.

          [JENKINS-13154] Heavy thread congestion with FingerPrint.save

          Code changed in jenkins
          User: Jesse Glick
          Path:
          changelog.html
          core/pom.xml
          http://jenkins-ci.org/commit/jenkins/fdc090a3ae3830196b64535862425c6cd844d46b
          Log:
          [FIXED JENKINS-13154] AnnotationMapper bug was causing massive lock contention when saving fingerprints.

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Jesse Glick Path: changelog.html core/pom.xml http://jenkins-ci.org/commit/jenkins/fdc090a3ae3830196b64535862425c6cd844d46b Log: [FIXED JENKINS-13154] AnnotationMapper bug was causing massive lock contention when saving fingerprints.

          Jesse Glick added a comment -

          And by the way thanks you @mp3 for your thread dump which led to the fix.

          Jesse Glick added a comment - And by the way thanks you @mp3 for your thread dump which led to the fix.

          dogfood added a comment -

          Integrated in jenkins_main_trunk #2394
          [FIXED JENKINS-13154] AnnotationMapper bug was causing massive lock contention when saving fingerprints. (Revision fdc090a3ae3830196b64535862425c6cd844d46b)

          Result = SUCCESS
          Jesse Glick : fdc090a3ae3830196b64535862425c6cd844d46b
          Files :

          • changelog.html
          • core/pom.xml

          dogfood added a comment - Integrated in jenkins_main_trunk #2394 [FIXED JENKINS-13154] AnnotationMapper bug was causing massive lock contention when saving fingerprints. (Revision fdc090a3ae3830196b64535862425c6cd844d46b) Result = SUCCESS Jesse Glick : fdc090a3ae3830196b64535862425c6cd844d46b Files : changelog.html core/pom.xml

          Eric Denman added a comment -

          FWIW, I just installed 1.510-SNAPSHOT (e837438c0138faf401bcb433450c8dfdc7ebbe6f) and our builds are still taking 5+ minutes on the "Waiting for Jenkins to finish collecting data" step and the whole jenkins app frequently stops responding to any requests. We have 228M of data in the "fingerprints" dir.

          I tried getting a thread dump by using the /threadDump when it was in the "not responding" state but that didn't work so I used jstack -F. Attaching that dump.

          Eric Denman added a comment - FWIW, I just installed 1.510-SNAPSHOT (e837438c0138faf401bcb433450c8dfdc7ebbe6f) and our builds are still taking 5+ minutes on the "Waiting for Jenkins to finish collecting data" step and the whole jenkins app frequently stops responding to any requests. We have 228M of data in the "fingerprints" dir. I tried getting a thread dump by using the /threadDump when it was in the "not responding" state but that didn't work so I used jstack -F. Attaching that dump.

          Eric Denman added a comment -

          @jglick looks like my threaddump is blocked deep underneath Fingerprint.load so it may not be the same issue. Should I re-open this ticket or file a new one?

          Eric Denman added a comment - @jglick looks like my threaddump is blocked deep underneath Fingerprint.load so it may not be the same issue. Should I re-open this ticket or file a new one?

          Jesse Glick added a comment -

          @edenman: separate issue please. (Can use JIRA’s “related” link if appropriate.) Your thread dump anyway does not show Jenkins being blocked, but rather excessively busy doing a couple different things: loading fingerprints; and loading historical build records (perhaps after having discarded them under memory pressure).

          Jesse Glick added a comment - @edenman: separate issue please. (Can use JIRA’s “related” link if appropriate.) Your thread dump anyway does not show Jenkins being blocked, but rather excessively busy doing a couple different things: loading fingerprints; and loading historical build records (perhaps after having discarded them under memory pressure).

          Eric Denman added a comment -

          @jglick thanks! Filed JENKINS-17412

          Eric Denman added a comment - @jglick thanks! Filed JENKINS-17412

          We are still seeing lots of congestion with Jenkins 1.517 and our builds may still stay tens of minutes in "Waiting for Jenkins to finish collecting data" -phase.

          The place has changed though, now thread dumps show lots of these:

          java.lang.Thread.State: BLOCKED (on object monitor)
          at java.util.Collections$SynchronizedMap.get(Collections.java:2031)

          • waiting to lock <0x000000070c6dcfa8> (a java.util.Collections$SynchronizedMap)
            at com.thoughtworks.xstream.core.DefaultConverterLookup.lookupConverterForType(DefaultConverterLookup.java:49)

          Mikko Peltonen added a comment - We are still seeing lots of congestion with Jenkins 1.517 and our builds may still stay tens of minutes in "Waiting for Jenkins to finish collecting data" -phase. The place has changed though, now thread dumps show lots of these: java.lang.Thread.State: BLOCKED (on object monitor) at java.util.Collections$SynchronizedMap.get(Collections.java:2031) waiting to lock <0x000000070c6dcfa8> (a java.util.Collections$SynchronizedMap) at com.thoughtworks.xstream.core.DefaultConverterLookup.lookupConverterForType(DefaultConverterLookup.java:49)

          Attached relevant parts of the whole thread dump.

          Mikko Peltonen added a comment - Attached relevant parts of the whole thread dump.

          Jesse Glick added a comment -

          @mp3 whatever you are seeing is a distinct bug. Please file separately and use JIRA’s “is related to” link as needed.

          (I would have suspected a regression from JENKINS-18775, but that is much newer than 1.517.)

          Jesse Glick added a comment - @mp3 whatever you are seeing is a distinct bug. Please file separately and use JIRA’s “is related to” link as needed. (I would have suspected a regression from JENKINS-18775 , but that is much newer than 1.517.)

            jglick Jesse Glick
            t_kurki Teppo Kurki
            Votes:
            4 Vote for this issue
            Watchers:
            12 Start watching this issue

              Created:
              Updated:
              Resolved: