• Icon: Bug Bug
    • Resolution: Fixed
    • Icon: Critical Critical
    • other
    • None
    • Platform: Sun, OS: Solaris

      When using the later revisions of Hudson 1.257 or 1.259,

      This is an issue with Hudson where it puts new jobs in queue waiting for the
      next available executer for no reason. For some reason Hudson was not able to
      generate an executor to run a new job automatically. Jobs just remained in the
      queue waiting for the next available executor.

      I am on Sun Solaris 5.11 snv_55b i86pc i386, using Glassfish 9.1_02 (build
      b04-fcs). Ofcourse restarting Glassfish resolves the problem for the time being
      and re-installing/re-configuring Glassfish from scratch resolves this issue for
      a day or so, but this is not a permanent solution.

      After further investigation, we decided to downgrade Hudson to an earlier
      version (1.200). And since then the problem seem to have gone away.

        1. dump.txt
          15 kB
        2. threadDump.htm
          38 kB
        3. threadDump.htm
          37 kB
        4. threaddump.txt
          39 kB
        5. threaddump.txt
          19 kB

          [JENKINS-2596] Hudson waiting for next available executor

          We haven't observed this kind of problems in our production, but I suspect a
          dead lock.

          When this problem happens next time (or if this is happening to someone reading
          this issue), please go to http://server/hudson/threadDump and obtain the full
          thread dump on the server, and attach it to this issue.

          Kohsuke Kawaguchi added a comment - We haven't observed this kind of problems in our production, but I suspect a dead lock. When this problem happens next time (or if this is happening to someone reading this issue), please go to http://server/hudson/threadDump and obtain the full thread dump on the server, and attach it to this issue.

          gdurand added a comment -

          Created an attachment (id=441)
          Thread dump of blocked jobs (occurs every night, resumes on restart)

          gdurand added a comment - Created an attachment (id=441) Thread dump of blocked jobs (occurs every night, resumes on restart)

          gdurand added a comment -

          The attachement 441 is from a server with windows XP SP2, java 1.6.10, tomcat
          5.5, hudson 1.261.
          The jobs are Maven 2.09 projects with submodules.
          Notice that all the jobs in the queue are colored red with the comment (roughly
          translated) "Seems to be blocked".

          gdurand added a comment - The attachement 441 is from a server with windows XP SP2, java 1.6.10, tomcat 5.5, hudson 1.261. The jobs are Maven 2.09 projects with submodules. Notice that all the jobs in the queue are colored red with the comment (roughly translated) "Seems to be blocked".

          gdurand added a comment -

          Created an attachment (id=449)
          Another dump, with blocked jobs triggered by a SVN change

          gdurand added a comment - Created an attachment (id=449) Another dump, with blocked jobs triggered by a SVN change

          dagerber added a comment -

          Created an attachment (id=456)
          Another dump (on Linux, 1.261)

          dagerber added a comment - Created an attachment (id=456) Another dump (on Linux, 1.261)

          mzlamal added a comment -

          Created an attachment (id=462)
          Thread dump from out production server running tomcat 6.0.9 and jdk1.6.0_02

          mzlamal added a comment - Created an attachment (id=462) Thread dump from out production server running tomcat 6.0.9 and jdk1.6.0_02

          mzlamal added a comment -

          in hudson.node_monitors.DiskSpaceMonitor there is:
          protected Long monitor(Computer c) throws IOException,
          InterruptedException {
          FilePath p = c.getNode().getRootPath();
          if(p==null) return null;

          Long size = p.act(new GetUsableSpace());
          if(size!=null && size!=0 && size/(1024*1024*1024)==0) {
          // TODO: this scheme should be generalized, so that Hudson can
          remember why it's marking the node
          // as offline, as well as allowing the user to force Hudson to
          use it.
          if(!c.isTemporarilyOffline())

          { LOGGER.warning(Messages.DiskSpaceMonitor_MarkedOffline(c.getName())); c.setTemporarilyOffline(true); }

          }
          return size;
          }

          which can, if there is less than 1G of diskspace on node, make it offline. And
          it seems that if it happens on the main node then it get blocked and won't
          unblock after freeing disk space.

          mzlamal added a comment - in hudson.node_monitors.DiskSpaceMonitor there is: protected Long monitor(Computer c) throws IOException, InterruptedException { FilePath p = c.getNode().getRootPath(); if(p==null) return null; Long size = p.act(new GetUsableSpace()); if(size!=null && size!=0 && size/(1024*1024*1024)==0) { // TODO: this scheme should be generalized, so that Hudson can remember why it's marking the node // as offline, as well as allowing the user to force Hudson to use it. if(!c.isTemporarilyOffline()) { LOGGER.warning(Messages.DiskSpaceMonitor_MarkedOffline(c.getName())); c.setTemporarilyOffline(true); } } return size; } which can, if there is less than 1G of diskspace on node, make it offline. And it seems that if it happens on the main node then it get blocked and won't unblock after freeing disk space.

          tleese added a comment -

          Created an attachment (id=482)
          Hudson 1.265 Thread Dump

          tleese added a comment - Created an attachment (id=482) Hudson 1.265 Thread Dump

          Code changed in hudson
          User: : kohsuke
          Path:
          trunk/hudson/main/core/src/main/java/hudson/model/ComputerSet.java
          trunk/hudson/main/core/src/main/java/hudson/node_monitors/AbstractNodeMonitorDescriptor.java
          trunk/hudson/main/core/src/main/java/hudson/node_monitors/DiskSpaceMonitorDescriptor.java
          trunk/hudson/main/core/src/main/java/hudson/node_monitors/MonitorMarkedNodeOffline.java
          trunk/hudson/main/core/src/main/java/hudson/node_monitors/NodeMonitor.java
          trunk/hudson/main/core/src/main/java/hudson/node_monitors/ResponseTimeMonitor.java
          trunk/hudson/main/core/src/main/resources/hudson/model/ComputerSet/configure.jelly
          trunk/hudson/main/core/src/main/resources/hudson/model/ComputerSet/sidepanel.jelly
          trunk/hudson/main/core/src/main/resources/hudson/node_monitors/MonitorMarkedNodeOffline/message.jelly
          trunk/hudson/main/core/src/main/resources/hudson/node_monitors/MonitorMarkedNodeOffline/message.properties
          trunk/hudson/main/test/src/test/java/hudson/model/ComputerSetTest.java
          trunk/www/changelog.html
          http://fisheye4.cenqua.com/changelog/hudson/?cs=17361
          Log:
          [FIXED JENKINS-2596] Preventive node monitoring of slave health metrics can be now configured individually. This will be in Hudson 1.301.

          SCM/JIRA link daemon added a comment - Code changed in hudson User: : kohsuke Path: trunk/hudson/main/core/src/main/java/hudson/model/ComputerSet.java trunk/hudson/main/core/src/main/java/hudson/node_monitors/AbstractNodeMonitorDescriptor.java trunk/hudson/main/core/src/main/java/hudson/node_monitors/DiskSpaceMonitorDescriptor.java trunk/hudson/main/core/src/main/java/hudson/node_monitors/MonitorMarkedNodeOffline.java trunk/hudson/main/core/src/main/java/hudson/node_monitors/NodeMonitor.java trunk/hudson/main/core/src/main/java/hudson/node_monitors/ResponseTimeMonitor.java trunk/hudson/main/core/src/main/resources/hudson/model/ComputerSet/configure.jelly trunk/hudson/main/core/src/main/resources/hudson/model/ComputerSet/sidepanel.jelly trunk/hudson/main/core/src/main/resources/hudson/node_monitors/MonitorMarkedNodeOffline/message.jelly trunk/hudson/main/core/src/main/resources/hudson/node_monitors/MonitorMarkedNodeOffline/message.properties trunk/hudson/main/test/src/test/java/hudson/model/ComputerSetTest.java trunk/www/changelog.html http://fisheye4.cenqua.com/changelog/hudson/?cs=17361 Log: [FIXED JENKINS-2596] Preventive node monitoring of slave health metrics can be now configured individually. This will be in Hudson 1.301.

          Code changed in hudson
          User: : kohsuke
          Path:
          trunk/www/changelog.html
          http://fisheye4.cenqua.com/changelog/hudson/?cs=17362
          Log:
          [FIXED JENKINS-2552] I originally recorded it for JENKINS-2596 but it should be also listed for JENKINS-2552. Will be in 1.301.

          SCM/JIRA link daemon added a comment - Code changed in hudson User: : kohsuke Path: trunk/www/changelog.html http://fisheye4.cenqua.com/changelog/hudson/?cs=17362 Log: [FIXED JENKINS-2552] I originally recorded it for JENKINS-2596 but it should be also listed for JENKINS-2552 . Will be in 1.301.

            Unassigned Unassigned
            am111535 am111535
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

              Created:
              Updated:
              Resolved: