• Icon: Bug Bug
    • Resolution: Fixed
    • Icon: Major Major
    • core, maven-plugin
    • None
    • core 1.564-SNAPSHOT, remoting 2.41

      On a number of the slaves at builds.apache.org, we're seeing slaves hanging after a while, both Linux and Windows slaves. The common thread seems to be Maven jobs being run on them and eventually hanging, causing everything else on the slave to hang (including, in some cases, attempts to get the threaddump from within Jenkins). The original Maven build hangs indefinitely, and any subsequent builds trying to run on the same slave get to the point of starting the git clone/svn checkout/etc and then just hang. The Linux slaves are running Java 1.8.0_05, and the Windows are running some Java 7 version - not sure which.

      Threaddump for Linux is at https://gist.github.com/abayer/3d567b56776e1ce78ad7 (one job hanging for over a day, another that started an hour or so ago but is now hanging), threaddump for Windows is at https://gist.github.com/abayer/c99f72ca1232e4d8acfa (only one job running at all on there, hanging for 17 hours or so).

          [JENKINS-23098] Slaves hanging with Maven jobs

          Andrew Bayer added a comment -

          Very weird. It was idling at 99% CPU for 3 hours after the log said Maven was done, so...weird.

          Andrew Bayer added a comment - Very weird. It was idling at 99% CPU for 3 hours after the log said Maven was done, so...weird.

          Tony Bridges added a comment -

          This looks very similar to what I am seeing on Windows master/slave running 1.554.3 with maven plugin 2.4. I'm also seeing a particular maven job (not all) consistently hanging up after metadata collection.

          Tony Bridges added a comment - This looks very similar to what I am seeing on Windows master/slave running 1.554.3 with maven plugin 2.4. I'm also seeing a particular maven job (not all) consistently hanging up after metadata collection.

          Tony Bridges added a comment -

          That latter hang, by the way, is not present with the maven plugin 2.1 after a downgrade. That might be a useful data point.

          Tony Bridges added a comment - That latter hang, by the way, is not present with the maven plugin 2.1 after a downgrade. That might be a useful data point.

          We had the same issue with the maven plugin 2.3 and different Jenkins versions (1.554.2, 1.554.1 and older non-LTS versions). We had to downgrade to 2.1 to solve the issue and get our Jenkins stable again.

          Wilm Schomburg added a comment - We had the same issue with the maven plugin 2.3 and different Jenkins versions (1.554.2, 1.554.1 and older non-LTS versions). We had to downgrade to 2.1 to solve the issue and get our Jenkins stable again.

          Jesse Glick added a comment -

          @tbridges @wilm if you can reproduce the problem easily in newer plugin versions but not older, we really need you to git bisect until you find the plugin commit introducing the problem, since I at least have no other leads.

          Jesse Glick added a comment - @tbridges @wilm if you can reproduce the problem easily in newer plugin versions but not older, we really need you to git bisect until you find the plugin commit introducing the problem, since I at least have no other leads.

          Jesse Glick added a comment -

          Looks like the fix of JENKINS-22354, in 2.2, may have introduced this bug.

          Jesse Glick added a comment - Looks like the fix of JENKINS-22354 , in 2.2, may have introduced this bug.

          thread dump from abayer shows that something weird is happening with SplittableBuildListener.

          Below is my analysis of the issue from one of our customers (ZD-19531), which turns out to be the same problem:

          3 threads appear to be blocked on SplittableBuildListener.synchronizeOnMark of the same object, which is odd, as the execution of this is supposed to be sequential.

          • Computer.threadPoolForRemoting [#1099] is waiting to enter SplittableBuildListener.synchronizeOnMark.
          • Computer.threadPoolForRemoting [#1108] is inside synchronizeOnMark and on markCountLock.wait.
          • Computer.threadPoolForRemoting [#1113] has found the mark and trying to report that, but blocked to get in
          • Computer.threadPoolForRemoting [#1104] is inside synchronizeOnMark waiting for Future.get()

          I think there's incorrect use of synchronization here. When wait() happens, the lock is released, which allows another thread to enter synchronizedOnMark. We need to use another lock to ensure synchronizeOnMark is not concurrently invoked.

          Kohsuke Kawaguchi added a comment - thread dump from abayer shows that something weird is happening with SplittableBuildListener . Below is my analysis of the issue from one of our customers (ZD-19531), which turns out to be the same problem: — 3 threads appear to be blocked on SplittableBuildListener.synchronizeOnMark of the same object, which is odd, as the execution of this is supposed to be sequential. Computer.threadPoolForRemoting [#1099] is waiting to enter SplittableBuildListener.synchronizeOnMark. Computer.threadPoolForRemoting [#1108] is inside synchronizeOnMark and on markCountLock.wait. Computer.threadPoolForRemoting [#1113] has found the mark and trying to report that, but blocked to get in Computer.threadPoolForRemoting [#1104] is inside synchronizeOnMark waiting for Future.get() I think there's incorrect use of synchronization here. When wait() happens, the lock is released, which allows another thread to enter synchronizedOnMark. We need to use another lock to ensure synchronizeOnMark is not concurrently invoked.

          Code changed in jenkins
          User: Kohsuke Kawaguchi
          Path:
          src/main/java/hudson/maven/SplittableBuildListener.java
          http://jenkins-ci.org/commit/maven-plugin/b145d5925ddeae2d697743920da204e6991375ac
          Log:
          [FIXED JENKINS-23098]

          Reference: ZD-19531

          Looking at [4], one notices that three threads are in an effective dead lock state around synchronizeOnMark. I extracted relevant part into [5].

          Thread #1661 is trying to report a discovered mark, but blocking [1]. Thread #1665 is inside synchronizeOnMark, on markCountLock.wait() [2]. Thread #1667 is stuck on Future.get() and hasn't returned [3], which holds the lock that blocks [1] from unblocking [2].

          The root problem is that synchronizeOnMark method is never meant to be concurrently executed. But given the way the lock is used, if one thread gets to wait(), it's possible that another thread would come along and go into this function.

          In this change, I'm preventing that by introducing another lock to serialize the execution of the entire synchronizeOnMark() call. I'm not using the "this" object for locking because it's already used for another purpose (see the lock() method)

          I'm not yet clear on why the synchronizeOnMark() method is called concurrently to begin with. The interaction with the -T option of Maven is suspected.

          [1] https://gist.github.com/kohsuke/374c22e737a77c9b0421#file-gistfile1-txt-L2
          [2] https://gist.github.com/kohsuke/374c22e737a77c9b0421#file-gistfile1-txt-L34
          [3] https://gist.github.com/kohsuke/374c22e737a77c9b0421#file-gistfile1-txt-L71
          [4] https://gist.github.com/abayer/7ff4de807c6373eec40d
          [5] https://gist.github.com/kohsuke/374c22e737a77c9b0421

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Kohsuke Kawaguchi Path: src/main/java/hudson/maven/SplittableBuildListener.java http://jenkins-ci.org/commit/maven-plugin/b145d5925ddeae2d697743920da204e6991375ac Log: [FIXED JENKINS-23098] Reference: ZD-19531 Looking at [4] , one notices that three threads are in an effective dead lock state around synchronizeOnMark. I extracted relevant part into [5] . Thread #1661 is trying to report a discovered mark, but blocking [1] . Thread #1665 is inside synchronizeOnMark, on markCountLock.wait() [2] . Thread #1667 is stuck on Future.get() and hasn't returned [3] , which holds the lock that blocks [1] from unblocking [2] . The root problem is that synchronizeOnMark method is never meant to be concurrently executed. But given the way the lock is used, if one thread gets to wait(), it's possible that another thread would come along and go into this function. In this change, I'm preventing that by introducing another lock to serialize the execution of the entire synchronizeOnMark() call. I'm not using the "this" object for locking because it's already used for another purpose (see the lock() method) I'm not yet clear on why the synchronizeOnMark() method is called concurrently to begin with. The interaction with the -T option of Maven is suspected. [1] https://gist.github.com/kohsuke/374c22e737a77c9b0421#file-gistfile1-txt-L2 [2] https://gist.github.com/kohsuke/374c22e737a77c9b0421#file-gistfile1-txt-L34 [3] https://gist.github.com/kohsuke/374c22e737a77c9b0421#file-gistfile1-txt-L71 [4] https://gist.github.com/abayer/7ff4de807c6373eec40d [5] https://gist.github.com/kohsuke/374c22e737a77c9b0421

          If you see this problem, can you please try out this build and report back if that fixes the problem?

          Kohsuke Kawaguchi added a comment - If you see this problem, can you please try out this build and report back if that fixes the problem?

          Released Maven plugin 2.5 with this fix.

          Kohsuke Kawaguchi added a comment - Released Maven plugin 2.5 with this fix.

            kohsuke Kohsuke Kawaguchi
            abayer Andrew Bayer
            Votes:
            1 Vote for this issue
            Watchers:
            8 Start watching this issue

              Created:
              Updated:
              Resolved: