Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-2263

Build stuck in pending state even with free executors

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed (View Workflow)
    • Priority: Critical
    • Resolution: Fixed
    • Component/s: maven-plugin
    • Labels:
      None
    • Environment:
      Platform: PC, OS: Linux
    • Similar Issues:

      Description

      I am running Hudson 1.247 using Maven2 on:

      Java(TM) SE Runtime Environment (build 1.6.0_07-b06)
      Java HotSpot(TM) Client VM (build 10.0-b23, mixed mode, sharing)

      Linux swmhq-build01 2.6.18-6-686 #1 SMP Sun Feb 10 22:11:31 UTC 2008 i686 GNU/Linux

      and after about ~20-30 minutes of the builds automatically running correctly, a
      new build becomes stuck in the pending state even though their are free
      executors available. This issue sounds exactly the same as described on this
      thread:

      https://hudson.dev.java.net/servlets/ReadMsg?listName=users&msgNo=10917

      When I hover the mouse over the item in the build queue it says "waiting for
      next available executor".

      I am starting Hudson in a script:

      HUDSON_CMD="nohup java -jar $HUDSON/hudson.war -Xmx512m --javaHome=$JAVA_HOME >
      $HUDSON_LOG 2>&1"
      $HUDSON_CMD &

      Requested debugging actions in the above mentioned thread were:

      >>>> If you can attach a debugger to Hudson, can you set a break point on
      Queue.java line 858 and see if it the q.maintain() method is invoked?

      How would I do this? Is it possible to download the source to my local
      workstation, start Hudson on a Linux Server, attached my local Eclipse IDE to
      the HTTP port, and set the break point? It's been some time since I've coded
      Java and used a debugger...

      >>>> Another thing I'd like you to try to increase the log level of the
      "hudson.model.Queue" logger.

      How do I do this? I believe I just need to put a log4j.properties in the
      classpath - as I am writing this, I realize what I need to do here, but don't
      have access to the system until Monday. I'll try this and update the BUG.

      Full thread dump Java HotSpot(TM) Client VM (10.0-b23 mixed mode, sharing):

      "pool-2-thread-2" prio=10 tid=0xb52ef400 nid=0x6b34 waiting on condition
      [0xb53ba000..0xb53bafc0]
      java.lang.Thread.State: TIMED_WAITING (parking)
      at sun.misc.Unsafe.park(Native Method)

      • parking to wait for <0x8c8584c0> (a
        java.util.concurrent.SynchronousQueue$TransferStack)
        at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:198)
        at
        java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(SynchronousQueue.java:424)
        at
        java.util.concurrent.SynchronousQueue$TransferStack.transfer(SynchronousQueue.java:323)
        at java.util.concurrent.SynchronousQueue.poll(SynchronousQueue.java:874)
        at
        java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:944)
        at
        java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:906)
        at java.lang.Thread.run(Thread.java:619)

      "RequestHandlerThread3" daemon prio=10 tid=0x0819b800 nid=0x681b in
      Object.wait() [0xb4ba7000..0xb4ba7fc0]
      java.lang.Thread.State: WAITING (on object monitor)
      at java.lang.Object.wait(Native Method)

      • waiting on <0x8c6a0798> (a winstone.RequestHandlerThread)
        at java.lang.Object.wait(Object.java:485)
        at winstone.RequestHandlerThread.run(RequestHandlerThread.java:216)
      • locked <0x8c6a0798> (a winstone.RequestHandlerThread)
        at java.lang.Thread.run(Thread.java:619)

      "RequestHandlerThread4" daemon prio=10 tid=0x0833c000 nid=0x681a in
      Object.wait() [0xb4c9a000..0xb4c9b040]
      java.lang.Thread.State: WAITING (on object monitor)
      at java.lang.Object.wait(Native Method)

      • waiting on <0x8c6a07d0> (a winstone.RequestHandlerThread)
        at java.lang.Object.wait(Object.java:485)
        at winstone.RequestHandlerThread.run(RequestHandlerThread.java:216)
      • locked <0x8c6a07d0> (a winstone.RequestHandlerThread)
        at java.lang.Thread.run(Thread.java:619)

      "Executor #1 for master" prio=10 tid=0xb5223000 nid=0x66ef in Object.wait()
      [0xb4f0d000..0xb4f0df40]
      java.lang.Thread.State: WAITING (on object monitor)
      at java.lang.Object.wait(Native Method)
      at java.lang.Object.wait(Object.java:485)
      at hudson.util.OneShotEvent.block(OneShotEvent.java:28)

      • locked <0x8cc5b560> (a hudson.util.OneShotEvent)
        at hudson.model.Queue.pop(Queue.java:495)
        at hudson.model.Executor.run(Executor.java:72)

      "Executor #0 for master" prio=10 tid=0xb5204400 nid=0x66ee in Object.wait()
      [0xb4f5e000..0xb4f5efc0]
      java.lang.Thread.State: WAITING (on object monitor)
      at java.lang.Object.wait(Native Method)

      • waiting on <0x8cb01020> (a hudson.util.OneShotEvent)
        at java.lang.Object.wait(Object.java:485)
        at hudson.util.OneShotEvent.block(OneShotEvent.java:28)
      • locked <0x8cb01020> (a hudson.util.OneShotEvent)
        at hudson.model.Queue.pop(Queue.java:495)
        at hudson.model.Executor.run(Executor.java:72)

      "TCP slave agent listener port=0" prio=10 tid=0xb520b400 nid=0x66ed runnable
      [0xb4faf000..0xb4fb0040]
      java.lang.Thread.State: RUNNABLE
      at java.net.PlainSocketImpl.socketAccept(Native Method)
      at java.net.PlainSocketImpl.accept(PlainSocketImpl.java:384)

      • locked <0x8c956298> (a java.net.SocksSocketImpl)
        at java.net.ServerSocket.implAccept(ServerSocket.java:453)
        at java.net.ServerSocket.accept(ServerSocket.java:421)
        at hudson.TcpSlaveAgentListener.run(TcpSlaveAgentListener.java:82)

      "DestroyJavaVM" prio=10 tid=0x08059c00 nid=0x66dd waiting on condition
      [0x00000000..0xb7e20110]
      java.lang.Thread.State: RUNNABLE

      "LauncherControlThread[ControlPort=-1]" prio=10 tid=0x080ccc00 nid=0x66ec
      waiting on condition [0xb5000000..0xb50010c0]
      java.lang.Thread.State: TIMED_WAITING (sleeping)
      at java.lang.Thread.sleep(Native Method)
      at winstone.Launcher.run(Launcher.java:279)
      at java.lang.Thread.run(Thread.java:619)

      "ConnectorThread:[ajp13-8009]" daemon prio=10 tid=0x08287800 nid=0x66eb runnable
      [0xb5065000..0xb5066140]
      java.lang.Thread.State: RUNNABLE
      at java.net.PlainSocketImpl.socketAccept(Native Method)
      at java.net.PlainSocketImpl.accept(PlainSocketImpl.java:384)

      • locked <0x8c87ad80> (a java.net.SocksSocketImpl)
        at java.net.ServerSocket.implAccept(ServerSocket.java:453)
        at java.net.ServerSocket.accept(ServerSocket.java:421)
        at winstone.ajp13.Ajp13Listener.run(Ajp13Listener.java:111)
        at java.lang.Thread.run(Thread.java:619)

      "ConnectorThread:[http-8080]" daemon prio=10 tid=0x082b0c00 nid=0x66ea runnable
      [0xb50b6000..0xb50b71c0]
      java.lang.Thread.State: RUNNABLE
      at java.net.PlainSocketImpl.socketAccept(Native Method)
      at java.net.PlainSocketImpl.accept(PlainSocketImpl.java:384)

      • locked <0x8c87aea8> (a java.net.SocksSocketImpl)
        at java.net.ServerSocket.implAccept(ServerSocket.java:453)
        at java.net.ServerSocket.accept(ServerSocket.java:421)
        at winstone.HttpListener.run(HttpListener.java:127)
        at java.lang.Thread.run(Thread.java:619)

      "WinstoneHostConfigurationMgmt:default" daemon prio=10 tid=0x082afc00 nid=0x66e9
      waiting on condition [0xb5107000..0xb5107e40]
      java.lang.Thread.State: TIMED_WAITING (sleeping)
      at java.lang.Thread.sleep(Native Method)
      at winstone.HostConfiguration.run(HostConfiguration.java:176)
      at java.lang.Thread.run(Thread.java:619)

      "Thread-2" prio=10 tid=0x08853400 nid=0x66e8 in Object.wait()
      [0xb5158000..0xb5158ec0]
      java.lang.Thread.State: TIMED_WAITING (on object monitor)
      at java.lang.Object.wait(Native Method)
      at hudson.model.ViewJob$ReloadThread.getNext(ViewJob.java:136)

      • locked <0x8c79a490> (a java.util.LinkedHashSet)
        at hudson.model.ViewJob$ReloadThread.run(ViewJob.java:152)

      "Hudson cron thread" prio=10 tid=0xb5203000 nid=0x66e7 in Object.wait()
      [0xb5369000..0xb5369f40]
      java.lang.Thread.State: TIMED_WAITING (on object monitor)
      at java.lang.Object.wait(Native Method)
      at java.util.TimerThread.mainLoop(Timer.java:509)

      • locked <0x8c747b20> (a java.util.TaskQueue)
        at java.util.TimerThread.run(Timer.java:462)

      "WinstoneObjectPoolMgmt" daemon prio=10 tid=0x082f0800 nid=0x66e5 waiting on
      condition [0xb563d000..0xb563e040]
      java.lang.Thread.State: TIMED_WAITING (sleeping)
      at java.lang.Thread.sleep(Native Method)
      at winstone.ObjectPool.run(ObjectPool.java:103)
      at java.lang.Thread.run(Thread.java:619)

      "Low Memory Detector" daemon prio=10 tid=0x08097400 nid=0x66e3 runnable
      [0x00000000..0x00000000]
      java.lang.Thread.State: RUNNABLE

      "CompilerThread0" daemon prio=10 tid=0x0808c800 nid=0x66e2 waiting on condition
      [0x00000000..0xb598ac78]
      java.lang.Thread.State: RUNNABLE

      "Signal Dispatcher" daemon prio=10 tid=0x0808b400 nid=0x66e1 waiting on
      condition [0x00000000..0x00000000]
      java.lang.Thread.State: RUNNABLE

      "Finalizer" daemon prio=10 tid=0x08083000 nid=0x66e0 in Object.wait()
      [0xb5b54000..0xb5b54ec0]
      java.lang.Thread.State: WAITING (on object monitor)
      at java.lang.Object.wait(Native Method)
      at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:116)

      • locked <0x8c55d3a0> (a java.lang.ref.ReferenceQueue$Lock)
        at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:132)
        at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159)

      "Reference Handler" daemon prio=10 tid=0x08081c00 nid=0x66df in Object.wait()
      [0xb5ba5000..0xb5ba5f40]
      java.lang.Thread.State: WAITING (on object monitor)
      at java.lang.Object.wait(Native Method)
      at java.lang.Object.wait(Object.java:485)
      at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116)

      • locked <0x8c55d428> (a java.lang.ref.Reference$Lock)

      "VM Thread" prio=10 tid=0x08080800 nid=0x66de runnable

      "VM Periodic Task Thread" prio=10 tid=0x08098800 nid=0x66e4 waiting on condition

      JNI global references: 902

      Heap
      def new generation total 2240K, used 1249K [0x8c070000, 0x8c2d0000, 0x8c550000)
      eden space 2048K, 51% used [0x8c070000, 0x8c178778, 0x8c270000)
      from space 192K, 100% used [0x8c270000, 0x8c2a0000, 0x8c2a0000)
      to space 192K, 0% used [0x8c2a0000, 0x8c2a0000, 0x8c2d0000)
      tenured generation total 28376K, used 19746K [0x8c550000, 0x8e106000, 0x90070000)
      the space 28376K, 69% used [0x8c550000, 0x8d898b70, 0x8d898c00, 0x8e106000)
      compacting perm gen total 18944K, used 18378K [0x90070000, 0x912f0000, 0x94070000)
      the space 18944K, 97% used [0x90070000, 0x91262bb8, 0x91262c00, 0x912f0000)
      ro space 8192K, 73% used [0x94070000, 0x946535a8, 0x94653600, 0x94870000)
      rw space 12288K, 58% used [0x94870000, 0x94f68878, 0x94f68a00, 0x95470000)

        Attachments

          Issue Links

            Activity

            Hide
            clarkeja clarkeja added a comment -

            Same issue found. running v1.262 on Glassfish.
            Only error message apperaing in the logfile since last restart was a
            DiskSpaceMonitor warning so this seems to be inline with DiskSpace comments in
            the thread.. Hope this warning message helps identify the culprit code...

            [#|2008-11-28T18:18:27.887+1100|WARNING|sun-appserver9.1|hudson.node_monitors.DiskSpaceMonitor|_ThreadID=27;_ThreadName=Monitoring
            thread for Free Disk Space started on Fri Nov 2
            8 18:18:27 EST 2008;_RequestID=1da362cf-bf85-447b-bb50-f76381f53bc6;|Making
            offline temporarily due to the lack of disk space|#]

            [#|2008-11-28T18:18:27.988+1100|WARNING|sun-appserver9.1|hudson.node_monitors.SwapSpaceMonitor$1|_ThreadID=28;_ThreadName=Monitoring
            thread for Free Swap Space started on Fri Nov
            28 18:18:27 EST 2008;_RequestID=8b0a8bc4-f1d4-4584-8079-7dae05165772;|Failed to
            monitor master for Free Swap Space
            java.io.IOException: No suitable implementation found
            at org.jvnet.hudson.MemoryMonitor.obtain(MemoryMonitor.java:51)
            at org.jvnet.hudson.MemoryMonitor.get(MemoryMonitor.java:31)
            at
            hudson.node_monitors.SwapSpaceMonitor$MonitorTask.call(SwapSpaceMonitor.java:75)
            at
            hudson.node_monitors.SwapSpaceMonitor$MonitorTask.call(SwapSpaceMonitor.java:71)
            at hudson.remoting.LocalChannel.call(LocalChannel.java:22)
            at hudson.node_monitors.SwapSpaceMonitor$1.monitor(SwapSpaceMonitor.java:55)
            at hudson.node_monitors.SwapSpaceMonitor$1.monitor(SwapSpaceMonitor.java:53)
            at
            hudson.node_monitors.AbstractNodeMonitorDescriptor$Record.run(AbstractNodeMonitorDescriptor.java:123)

            #]
            Show
            clarkeja clarkeja added a comment - Same issue found. running v1.262 on Glassfish. Only error message apperaing in the logfile since last restart was a DiskSpaceMonitor warning so this seems to be inline with DiskSpace comments in the thread.. Hope this warning message helps identify the culprit code... [#|2008-11-28T18:18:27.887+1100|WARNING|sun-appserver9.1|hudson.node_monitors.DiskSpaceMonitor|_ThreadID=27;_ThreadName=Monitoring thread for Free Disk Space started on Fri Nov 2 8 18:18:27 EST 2008;_RequestID=1da362cf-bf85-447b-bb50-f76381f53bc6;|Making offline temporarily due to the lack of disk space|#] [#|2008-11-28T18:18:27.988+1100|WARNING|sun-appserver9.1|hudson.node_monitors.SwapSpaceMonitor$1|_ThreadID=28;_ThreadName=Monitoring thread for Free Swap Space started on Fri Nov 28 18:18:27 EST 2008;_RequestID=8b0a8bc4-f1d4-4584-8079-7dae05165772;|Failed to monitor master for Free Swap Space java.io.IOException: No suitable implementation found at org.jvnet.hudson.MemoryMonitor.obtain(MemoryMonitor.java:51) at org.jvnet.hudson.MemoryMonitor.get(MemoryMonitor.java:31) at hudson.node_monitors.SwapSpaceMonitor$MonitorTask.call(SwapSpaceMonitor.java:75) at hudson.node_monitors.SwapSpaceMonitor$MonitorTask.call(SwapSpaceMonitor.java:71) at hudson.remoting.LocalChannel.call(LocalChannel.java:22) at hudson.node_monitors.SwapSpaceMonitor$1.monitor(SwapSpaceMonitor.java:55) at hudson.node_monitors.SwapSpaceMonitor$1.monitor(SwapSpaceMonitor.java:53) at hudson.node_monitors.AbstractNodeMonitorDescriptor$Record.run(AbstractNodeMonitorDescriptor.java:123) #]
            Hide
            kohsuke Kohsuke Kawaguchi added a comment -

            Recent recurrence of this in the users list:
            http://www.nabble.com/jobs-hanging-in-the-build-queue-td22081780.html

            ... which includes pointers to the various related postings:
            [1] Peter Lappo Jul 16, 2008: "Builds stay in pending state even when there are
            free executors";
            https://hudson.dev.java.net/servlets/ReadMsg?listName=users&msgNo=10917
            [2] https://hudson.dev.java.net/issues/show_bug.cgi?id=2263
            [3] esmith1 Feb 03, 2009: "Excess workload detected but free executors";
            http://www.nabble.com/Excess-workload-detected-but-free-executors-td21812185.html
            [4] Comparing Revisions 14215 to 13157 for hudson/slaves/NodeProvisioner.java:
            http://fisheye4.atlassian.com/browse/hudson/trunk/hudson/main/core/src/main/java/hudson/slaves/NodeProvisioner.java?r1=13157&r2=14215

            Show
            kohsuke Kohsuke Kawaguchi added a comment - Recent recurrence of this in the users list: http://www.nabble.com/jobs-hanging-in-the-build-queue-td22081780.html ... which includes pointers to the various related postings: [1] Peter Lappo Jul 16, 2008: "Builds stay in pending state even when there are free executors"; https://hudson.dev.java.net/servlets/ReadMsg?listName=users&msgNo=10917 [2] https://hudson.dev.java.net/issues/show_bug.cgi?id=2263 [3] esmith1 Feb 03, 2009: "Excess workload detected but free executors"; http://www.nabble.com/Excess-workload-detected-but-free-executors-td21812185.html [4] Comparing Revisions 14215 to 13157 for hudson/slaves/NodeProvisioner.java: http://fisheye4.atlassian.com/browse/hudson/trunk/hudson/main/core/src/main/java/hudson/slaves/NodeProvisioner.java?r1=13157&r2=14215
            Hide
            scm_issue_link SCM/JIRA link daemon added a comment -

            Code changed in hudson
            User: : kohsuke
            Path:
            trunk/hudson/main/core/src/main/resources/lib/hudson/executors.jelly
            trunk/www/changelog.html
            http://fisheye4.cenqua.com/changelog/hudson/?cs=15435
            Log:
            [FIXED JENKINS-2263]
            indicate the executors of offline node as offline, not idle.

            Show
            scm_issue_link SCM/JIRA link daemon added a comment - Code changed in hudson User: : kohsuke Path: trunk/hudson/main/core/src/main/resources/lib/hudson/executors.jelly trunk/www/changelog.html http://fisheye4.cenqua.com/changelog/hudson/?cs=15435 Log: [FIXED JENKINS-2263] indicate the executors of offline node as offline, not idle.
            Hide
            kohsuke Kohsuke Kawaguchi added a comment -

            My theory is that this was caused by the poor UI.

            Starting around 1.233, Hudson continuously monitors systems (both master and
            slaves) for potential early indicator of problems — such as the lack of disk
            space or swap space, unusually long response time, etc. Once those situations
            are detected, the node is cut offline.

            When there's only the master in the system, the result of this effort isn't
            correctly rendered in the executors box on the top page. I believe this is the
            "cause" of the most of the reports here — Hudson marked the node as offline,
            but it didn't show it in the UI.

            In 1.285, I corrected this in the UI, so that offline executors are clearly
            shown as offline, with a hyperlink to the cause.

            This proactive monitoring is useful for a large Hudson deployment, as there's
            almost always some nodes in the cluster that are in trouble. But for smaller
            Hudson, maybe this doesn't make much sense, and perhaps I should just doing this
            altogether when there's no slaves at all. I appreciate your input on this.

            There are also separate issues filed for making some of this monitoring
            configurable.

            Show
            kohsuke Kohsuke Kawaguchi added a comment - My theory is that this was caused by the poor UI. Starting around 1.233, Hudson continuously monitors systems (both master and slaves) for potential early indicator of problems — such as the lack of disk space or swap space, unusually long response time, etc. Once those situations are detected, the node is cut offline. When there's only the master in the system, the result of this effort isn't correctly rendered in the executors box on the top page. I believe this is the "cause" of the most of the reports here — Hudson marked the node as offline, but it didn't show it in the UI. In 1.285, I corrected this in the UI, so that offline executors are clearly shown as offline, with a hyperlink to the cause. This proactive monitoring is useful for a large Hudson deployment, as there's almost always some nodes in the cluster that are in trouble. But for smaller Hudson, maybe this doesn't make much sense, and perhaps I should just doing this altogether when there's no slaves at all. I appreciate your input on this. There are also separate issues filed for making some of this monitoring configurable.
            Hide
            gcimpoies George Cimpoies added a comment -

            Has this been resolved?

            Show
            gcimpoies George Cimpoies added a comment - Has this been resolved?

              People

              Assignee:
              Unassigned Unassigned
              Reporter:
              pauldcchen pauldcchen
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: