• Icon: Bug Bug
    • Resolution: Fixed
    • Icon: Major Major
    • core, maven-plugin
    • None
    • core 1.564-SNAPSHOT, remoting 2.41

      On a number of the slaves at builds.apache.org, we're seeing slaves hanging after a while, both Linux and Windows slaves. The common thread seems to be Maven jobs being run on them and eventually hanging, causing everything else on the slave to hang (including, in some cases, attempts to get the threaddump from within Jenkins). The original Maven build hangs indefinitely, and any subsequent builds trying to run on the same slave get to the point of starting the git clone/svn checkout/etc and then just hang. The Linux slaves are running Java 1.8.0_05, and the Windows are running some Java 7 version - not sure which.

      Threaddump for Linux is at https://gist.github.com/abayer/3d567b56776e1ce78ad7 (one job hanging for over a day, another that started an hour or so ago but is now hanging), threaddump for Windows is at https://gist.github.com/abayer/c99f72ca1232e4d8acfa (only one job running at all on there, hanging for 17 hours or so).

          [JENKINS-23098] Slaves hanging with Maven jobs

          Andrew Bayer created issue -

          Andrew Bayer added a comment -

          kohsuke, jglick - any ideas? I don't know where to start. For what it's worth, the forked off Maven process for the hung job is still running in these cases, but not doing anything...

          Andrew Bayer added a comment - kohsuke , jglick - any ideas? I don't know where to start. For what it's worth, the forked off Maven process for the hung job is still running in these cases, but not doing anything...

          Jesse Glick added a comment -

          Do not see any clues there. The master thread dump might also be relevant. Better to install the Support Core plugin and attach a diagnostic bundle that would have everything.

          Jesse Glick added a comment - Do not see any clues there. The master thread dump might also be relevant. Better to install the Support Core plugin and attach a diagnostic bundle that would have everything.

          Jesse Glick added a comment -

          (And consider using freestyle projects, which are much less trouble-prone.)

          Jesse Glick added a comment - (And consider using freestyle projects, which are much less trouble-prone.)

          Andrew Bayer added a comment -

          Yeah, I'd love to get off the Maven projects, but, well, there's 600 or so of them (out of 1150 or so jobs) and they're pretty well entrenched. If we can't resolve this, I'll try to start the ball rolling on a complete rebuild of the Apache Jenkins setup with the Maven plugin explicitly removed, but that'll be a giant pain in the ass given the fact that we're talking about a massive number of separate ASF projects each with their own teams, etc, etc...yeah.

          Installing Support Core now, and full thread dump up at https://gist.github.com/abayer/7ff4de807c6373eec40d.

          Might be worth mentioning that we see absolutely no hangs like this on the hadoopX slaves, which only run freestyle jobs, so far as I can tell, so it definitely looks like a problem in the Maven plugin...

          Andrew Bayer added a comment - Yeah, I'd love to get off the Maven projects, but, well, there's 600 or so of them (out of 1150 or so jobs) and they're pretty well entrenched. If we can't resolve this, I'll try to start the ball rolling on a complete rebuild of the Apache Jenkins setup with the Maven plugin explicitly removed, but that'll be a giant pain in the ass given the fact that we're talking about a massive number of separate ASF projects each with their own teams, etc, etc...yeah. Installing Support Core now, and full thread dump up at https://gist.github.com/abayer/7ff4de807c6373eec40d . Might be worth mentioning that we see absolutely no hangs like this on the hadoopX slaves, which only run freestyle jobs, so far as I can tell, so it definitely looks like a problem in the Maven plugin...

          Andrew Bayer added a comment -

          ...and fwiw, in the new version of my Jenkins best practices talk, I harp quite a bit on how you should never use the Maven plugin because it's a morass of pain. =)

          Andrew Bayer added a comment - ...and fwiw, in the new version of my Jenkins best practices talk, I harp quite a bit on how you should never use the Maven plugin because it's a morass of pain. =)

          Andrew Bayer added a comment -

          And also fwiw, the support core plugin doesn't actually seem to give me a real bundle. I'm guessing because the whole master is so borked. =)

          Andrew Bayer added a comment - And also fwiw, the support core plugin doesn't actually seem to give me a real bundle. I'm guessing because the whole master is so borked. =)

          Jesse Glick added a comment -

          Handling GET /job/Mahout-Quality/ws/trunk/examples/target/site/apidocs/index.html sounds bad. Is someone seriously trying to load a generated site from the workspace? Avoid (remote) workspace browsing whenever possible.

          Jesse Glick added a comment - Handling GET /job/Mahout-Quality/ws/trunk/examples/target/site/apidocs/index.html sounds bad. Is someone seriously trying to load a generated site from the workspace? Avoid (remote) workspace browsing whenever possible.

          Jesse Glick added a comment -

          And Handling GET /job/river-qa-refactor-j9/ws/trunk/qa/result/*zip*/result.zip is even worse. Teach people to archive artifacts, then start disabling workspace browse permission. You are getting DoS’d I think.

          Jesse Glick added a comment - And Handling GET /job/river-qa-refactor-j9/ws/trunk/qa/result/*zip*/result.zip is even worse. Teach people to archive artifacts, then start disabling workspace browse permission. You are getting DoS’d I think.

          Andrew Bayer added a comment -

          Yeah, quite aware of that from another JIRA I opened. I've turned off anonymous workspace read access and am trying to get people to stop linking to workspaces in general, but again, at ASF it's hard to get everyone to even notice the emails I send them about what they should stop doing, let alone actually stop doing it. Fun!

          Andrew Bayer added a comment - Yeah, quite aware of that from another JIRA I opened. I've turned off anonymous workspace read access and am trying to get people to stop linking to workspaces in general, but again, at ASF it's hard to get everyone to even notice the emails I send them about what they should stop doing, let alone actually stop doing it. Fun!

            kohsuke Kohsuke Kawaguchi
            abayer Andrew Bayer
            Votes:
            1 Vote for this issue
            Watchers:
            8 Start watching this issue

              Created:
              Updated:
              Resolved: