• Icon: Bug Bug
    • Resolution: Fixed
    • Icon: Major Major
    • None
    • CentOS 5.4, x86_64, Java 1.6.0_16

      This has started happening more frequently since 1.420, and since updating to 1.424, is now pretty much guaranteed to happen after 24 hours of restarting. There have been no major changes to the build architecture, and this seems to happen to all jobs, on all hosts (slaves and master alike). The thread dump is attached, as well as the systemInfo page for one of our build slaves ("Alfred").

          [JENKINS-10607] Jenkins jobs hang when finished

          amcfague added a comment -

          We are on version 1.1. In fact, this issue started about a week ago, so we're guessing this started when we upgraded to the newest release. I'm not 100% sure that its because of the global stats plug-in, but the stack traces state they're waiting on that. Plus, when we go to click "initialize" on the plug-in page itself, it just hangs forever.

          We've been using the plug-in since last year, and we've only now encountered this issue with the latest update.

          Thanks Frédéric!

          Andrew

          amcfague added a comment - We are on version 1.1. In fact, this issue started about a week ago, so we're guessing this started when we upgraded to the newest release. I'm not 100% sure that its because of the global stats plug-in, but the stack traces state they're waiting on that. Plus, when we go to click "initialize" on the plug-in page itself, it just hangs forever. We've been using the plug-in since last year, and we've only now encountered this issue with the latest update. Thanks Frédéric! Andrew

          Ok thanks for the precious inputs Andrew.

          I'll try to investigate on this on following 2 weeks ... holidays mean spare time for jenkins

          Frédéric Camblor added a comment - Ok thanks for the precious inputs Andrew. I'll try to investigate on this on following 2 weeks ... holidays mean spare time for jenkins

          amcfague added a comment -

          We've disabled the plug-in (to verify that this is indeed the issue) until the issue can be fixed.

          Either way, thanks a lot for all your hard work, Frédéric! And don't spend ALL your time off working...

          Andrew

          amcfague added a comment - We've disabled the plug-in (to verify that this is indeed the issue) until the issue can be fixed. Either way, thanks a lot for all your hard work, Frédéric! And don't spend ALL your time off working... Andrew

          Carl Quinn added a comment - - edited

          We just started running into this problem over the last month or so, and we are still running Jenkins 1.399. We do have, however, about 1500 jobs, some with many builds hanging around.

          The data file that this plugin writes, global-build-stats.xml, in our case ends up over 100MB. When each build finishes, the plugin rewrites the whole entire file, then moves it into place. This becomes a huge bottleneck when there are hundreds of builds happening every few minutes and 90% of the time is spent writing this xml file over and over.

          Maybe the disk-flush could be deferred on a background thread until activity dies down.

          Below is the code in question:

          core/src/main/java/hudson/util/AtomicFileWriter.java

          public void commit() throws IOException {
              close();
              if(destFile.exists() && !destFile.delete()) {
                  tmpFile.delete();
                  throw new IOException("Unable to delete "+destFile);
              }
              tmpFile.renameTo(destFile);
          }
          

          Carl Quinn added a comment - - edited We just started running into this problem over the last month or so, and we are still running Jenkins 1.399. We do have, however, about 1500 jobs, some with many builds hanging around. The data file that this plugin writes, global-build-stats.xml, in our case ends up over 100MB. When each build finishes, the plugin rewrites the whole entire file, then moves it into place. This becomes a huge bottleneck when there are hundreds of builds happening every few minutes and 90% of the time is spent writing this xml file over and over. Maybe the disk-flush could be deferred on a background thread until activity dies down. Below is the code in question: core/src/main/java/hudson/util/AtomicFileWriter.java public void commit() throws IOException { close(); if (destFile.exists() && !destFile.delete()) { tmpFile.delete(); throw new IOException( "Unable to delete " +destFile); } tmpFile.renameTo(destFile); }

          I just investigated on the problem today, and it seems there could be contention points, as you said Carl, coming when global build stats deals with large amount of data.

          amcfague > Are you in such a case ? What is your global-build-stats.xml file size ?
          Maybe a workaround for me would be to shard this file into a monthly (or weekly) build result file.

          carl quinn > Do you know, approximately, how long is your global-build-stats history ?

          Frédéric Camblor added a comment - I just investigated on the problem today, and it seems there could be contention points, as you said Carl, coming when global build stats deals with large amount of data. amcfague > Are you in such a case ? What is your global-build-stats.xml file size ? Maybe a workaround for me would be to shard this file into a monthly (or weekly) build result file. carl quinn > Do you know, approximately, how long is your global-build-stats history ?

          Carl Quinn added a comment -

          Our file is just over 100MB, and has records for builds from May-2010.

          We've been talking to Kohsuke as we were tracking down our performance problem to this plugin, and he had some ideas about how the core might better support this kind of data storage for plugins.

          One other thing that we noticed is that the xml store contains stats for builds that have been removed from the main storage due to age / count. It might be nice if this plugin had at least an option to mirror that cleanup as well as have other kinds of retention contraints.

          Carl Quinn added a comment - Our file is just over 100MB, and has records for builds from May-2010. We've been talking to Kohsuke as we were tracking down our performance problem to this plugin, and he had some ideas about how the core might better support this kind of data storage for plugins. One other thing that we noticed is that the xml store contains stats for builds that have been removed from the main storage due to age / count. It might be nice if this plugin had at least an option to mirror that cleanup as well as have other kinds of retention contraints.

          amcfague added a comment -

          We actually hadn't even initialized it--and the initialization screen would hang forever when the initialize button was clicked in the config.

          amcfague added a comment - We actually hadn't even initialized it--and the initialization screen would hang forever when the initialize button was clicked in the config.

          Global-build-stats 1.2-SNAPSHOT, fixing job hanging on large jenkins instances

          Frédéric Camblor added a comment - Global-build-stats 1.2-SNAPSHOT, fixing job hanging on large jenkins instances

          Hi all,

          Could you mind testing attached hpi, it should solve the problem in 2 ways :

          • I generalized kohsuke pull request to delay job results serialization in a separate thread (thus, jobs won't hang anymore after completion)
          • I splitted job results into monthly files (instead of one fat global-build-stats.xml file). Thus, when adding a job result, only the current month results will be re-serialized
            Before testing it, please, backup your global-build-stats.xml file since v1.2-SNAPSHOT will make changes to your global build stats file organization. To be able to go back to 1.1, you will have to restore your global-build-stats.xml file.

          Later, I will provide strategies for job build results retention policy (based on nicolas de loof pull request and my comments on it : https://github.com/jenkinsci/global-build-stats-plugin/pull/1)

          Frédéric Camblor added a comment - Hi all, Could you mind testing attached hpi, it should solve the problem in 2 ways : I generalized kohsuke pull request to delay job results serialization in a separate thread (thus, jobs won't hang anymore after completion) I splitted job results into monthly files (instead of one fat global-build-stats.xml file). Thus, when adding a job result, only the current month results will be re-serialized Before testing it, please, backup your global-build-stats.xml file since v1.2-SNAPSHOT will make changes to your global build stats file organization. To be able to go back to 1.1, you will have to restore your global-build-stats.xml file. Later, I will provide strategies for job build results retention policy (based on nicolas de loof pull request and my comments on it : https://github.com/jenkinsci/global-build-stats-plugin/pull/1 )

          Fix integrated in global-build-stats v1.2

          Frédéric Camblor added a comment - Fix integrated in global-build-stats v1.2

            fcamblor Frédéric Camblor
            amcfague amcfague
            Votes:
            2 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: