-
Bug
-
Resolution: Duplicate
-
Critical
-
Jenkins 2.19.1 on Ubuntu 14.04 x64
Windows 10 (1607) Connected via JNLP
-
Powered by SuggestiMate
Freestyle jobs hang either at the beginning or the end of execution. I've let a few run for ~18 hours before they eventually crash with errors like these:
07:52:37 java.io.IOException: Unable to get hostname from slave. null
07:52:37 at hudson.plugins.perforce.PerforceSCM.checkout(PerforceSCM.java:1187)
07:52:37 at hudson.model.AbstractProject.checkout(AbstractProject.java:1278)
07:52:37 at hudson.model.AbstractBuild$AbstractBuildExecution.defaultCheckout(AbstractBuild.java:604)
07:52:37 at jenkins.scm.SCMCheckoutStrategy.checkout(SCMCheckoutStrategy.java:86)
07:52:37 at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:529)
07:52:37 at hudson.model.Run.execute(Run.java:1720)
07:52:37 at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
07:52:37 at hudson.model.ResourceController.execute(ResourceController.java:98)
07:52:37 at hudson.model.Executor.run(Executor.java:404)
Reverting back to Jenkins 2.7.4 immediately resolved the issue. Issue was not observed with any Linux/OSX/older Windows nodes.
[JENKINS-38834] Freestyle jobs hang in 2.19.1 on Windows 10 Nodes
Will do, I tried today to reproduce it in a non-production environment unsuccessfully. I'm working on deploying a clone of our production environment now, but it could take a few days...
Jenkins ThreadDump from one of our hung executors:
Executor #0 for GJ-WTX64-S05V13 : executing jenkins_admin_component_1 #25147 / waiting for hudson.remoting.Channel@45b513d2:GJ-WTX64-S05V13
"Executor #0 for GJ-WTX64-S05V13 : executing jenkins_admin_component_1 #25147 / waiting for hudson.remoting.Channel@45b513d2:GJ-WTX64-S05V13" Id=204 Group=main TIMED_WAITING on hudson.remoting.UserRequest@33f1c9bd
at java.lang.Object.wait(Native Method)
- waiting on hudson.remoting.UserRequest@33f1c9bd
at hudson.remoting.Request.call(Request.java:147)
at hudson.remoting.Channel.call(Channel.java:796)
at hudson.FilePath.act(FilePath.java:1007)
at hudson.FilePath.act(FilePath.java:996)
at hudson.FilePath.deleteRecursive(FilePath.java:1198)
at hudson.plugins.perforce.PerforceSCM.checkout(PerforceSCM.java:934)
at hudson.model.AbstractProject.checkout(AbstractProject.java:1278)
at hudson.model.AbstractBuild$AbstractBuildExecution.defaultCheckout(AbstractBuild.java:604)
at jenkins.scm.SCMCheckoutStrategy.checkout(SCMCheckoutStrategy.java:86)
at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:529)
at hudson.model.Run.execute(Run.java:1720)
at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
at hudson.model.ResourceController.execute(ResourceController.java:98)
at hudson.model.Executor.run(Executor.java:404)
ThreadDump from that server:
Channel reader thread: channel
"Channel reader thread: channel" Id=2301 Group=main RUNNABLE (in native)
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.socketRead(Unknown Source)
at java.net.SocketInputStream.read(Unknown Source)
at java.net.SocketInputStream.read(Unknown Source)
at java.io.BufferedInputStream.fill(Unknown Source)
at java.io.BufferedInputStream.read(Unknown Source)
- locked java.io.BufferedInputStream@40bda018
at hudson.remoting.FlightRecorderInputStream.read(FlightRecorderInputStream.java:86)
at hudson.remoting.ChunkedInputStream.readHeader(ChunkedInputStream.java:72)
at hudson.remoting.ChunkedInputStream.readUntilBreak(ChunkedInputStream.java:103)
at hudson.remoting.ChunkedCommandTransport.readBlock(ChunkedCommandTransport.java:39)
at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:34)
at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:59)main
"main" Id=1 Group=main WAITING on hudson.remoting.Engine@a3d1956
at java.lang.Object.wait(Native Method)
- waiting on hudson.remoting.Engine@a3d1956
at java.lang.Thread.join(Unknown Source)
at java.lang.Thread.join(Unknown Source)
at hudson.remoting.jnlp.Main.main(Main.java:150)
at hudson.remoting.jnlp.Main._main(Main.java:143)
at hudson.remoting.Launcher.run(Launcher.java:231)
at hudson.remoting.Launcher.main(Launcher.java:195)Ping thread for channel hudson.remoting.Channel@662535d4:channel
"Ping thread for channel hudson.remoting.Channel@662535d4:channel" Id=2302 Group=main TIMED_WAITING
at java.lang.Thread.sleep(Native Method)
at hudson.remoting.PingThread.run(PingThread.java:91)pool-1-thread-2253 for channel
"pool-1-thread-2253 for channel" Id=2298 Group=main RUNNABLE
at com.sun.jna.Native.initIDs(Native Method)
at com.sun.jna.Native.<clinit>(Native.java:148)
at hudson.util.jna.Kernel32Utils.load(Kernel32Utils.java:112)
at hudson.util.jna.Kernel32.<clinit>(Kernel32.java:37)
at hudson.util.jna.Kernel32Utils.getWin32FileAttributes(Kernel32Utils.java:77)
at hudson.util.jna.Kernel32Utils.isJunctionOrSymlink(Kernel32Utils.java:98)
at hudson.Util.isSymlink(Util.java:510)
at hudson.FilePath.deleteRecursive(FilePath.java:1221)
at hudson.FilePath.access$1000(FilePath.java:195)
at hudson.FilePath$14.invoke(FilePath.java:1201)
at hudson.FilePath$14.invoke(FilePath.java:1198)
at hudson.FilePath$FileCallableWrapper.call(FilePath.java:2772)
at hudson.remoting.UserRequest.perform(UserRequest.java:153)
at hudson.remoting.UserRequest.perform(UserRequest.java:50)
at hudson.remoting.Request$2.run(Request.java:332)
at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:68)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at hudson.remoting.Engine$1$1.run(Engine.java:85)
at java.lang.Thread.run(Unknown Source)Number of locked synchronizers = 1
- java.util.concurrent.ThreadPoolExecutor$Worker@75d956c4
pool-1-thread-2256 for channel
"pool-1-thread-2256 for channel" Id=2309 Group=main RUNNABLE
at com.sun.jna.Pointer.<clinit>(Pointer.java:41)
at com.sun.jna.Structure.<clinit>(Structure.java:2078)
at org.jvnet.hudson.Windows.monitor(Windows.java:42)
at hudson.node_monitors.SwapSpaceMonitor$MonitorTask.call(SwapSpaceMonitor.java:124)
at hudson.node_monitors.SwapSpaceMonitor$MonitorTask.call(SwapSpaceMonitor.java:114)
at hudson.remoting.UserRequest.perform(UserRequest.java:153)
at hudson.remoting.UserRequest.perform(UserRequest.java:50)
at hudson.remoting.Request$2.run(Request.java:332)
at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:68)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at hudson.remoting.Engine$1$1.run(Engine.java:85)
at java.lang.Thread.run(Unknown Source)Number of locked synchronizers = 1
- java.util.concurrent.ThreadPoolExecutor$Worker@756a3221
pool-1-thread-2261 for channel
"pool-1-thread-2261 for channel" Id=2314 Group=main RUNNABLE
at sun.management.ThreadImpl.dumpThreads0(Native Method)
at sun.management.ThreadImpl.dumpAllThreads(Unknown Source)
at hudson.Functions.getThreadInfos(Functions.java:1220)
at hudson.util.RemotingDiagnostics$GetThreadDump.call(RemotingDiagnostics.java:98)
at hudson.util.RemotingDiagnostics$GetThreadDump.call(RemotingDiagnostics.java:95)
at hudson.remoting.UserRequest.perform(UserRequest.java:153)
at hudson.remoting.UserRequest.perform(UserRequest.java:50)
at hudson.remoting.Request$2.run(Request.java:332)
at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:68)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at hudson.remoting.Engine$1$1.run(Engine.java:85)
at java.lang.Thread.run(Unknown Source)Number of locked synchronizers = 1
- java.util.concurrent.ThreadPoolExecutor$Worker@50474da7
RemoteInvocationHandler 19
"RemoteInvocationHandler 19" Id=2300 Group=main TIMED_WAITING on java.lang.ref.ReferenceQueue$Lock@fec0bff
at java.lang.Object.wait(Native Method)
- waiting on java.lang.ref.ReferenceQueue$Lock@fec0bff
at java.lang.ref.ReferenceQueue.remove(Unknown Source)
at hudson.remoting.RemoteInvocationHandler$Unexporter.run(RemoteInvocationHandler.java:564)
at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
at java.util.concurrent.FutureTask.run(Unknown Source)
at hudson.remoting.AtmostOneThreadExecutor$Worker.run(AtmostOneThreadExecutor.java:110)
at java.lang.Thread.run(Unknown Source)Thread-1
"Thread-1" Id=13 Group=main TIMED_WAITING on hudson.remoting.Channel@662535d4
at java.lang.Object.wait(Native Method)
- waiting on hudson.remoting.Channel@662535d4
at hudson.remoting.Channel.join(Channel.java:948)
at hudson.remoting.Engine.run(Engine.java:316)Attach Listener
"Attach Listener" Id=5 Group=system RUNNABLE
Finalizer
"Finalizer" Id=3 Group=system WAITING on java.lang.ref.ReferenceQueue$Lock@6641224e
at java.lang.Object.wait(Native Method)
- waiting on java.lang.ref.ReferenceQueue$Lock@6641224e
at java.lang.ref.ReferenceQueue.remove(Unknown Source)
at java.lang.ref.ReferenceQueue.remove(Unknown Source)
at java.lang.ref.Finalizer$FinalizerThread.run(Unknown Source)Reference Handler
"Reference Handler" Id=2 Group=system WAITING on java.lang.ref.Reference$Lock@272dd0cb
at java.lang.Object.wait(Native Method)
- waiting on java.lang.ref.Reference$Lock@272dd0cb
at java.lang.Object.wait(Unknown Source)
at java.lang.ref.Reference$ReferenceHandler.run(Unknown Source)Signal Dispatcher
"Signal Dispatcher" Id=4 Group=system RUNNABLE
I've got 12 systems hung like this, all Windows 10 X64.
Your stack seems to be pretty similar to mine, in this bug:
https://issues.jenkins-ci.org/browse/JENKINS-39179
That bug is driving me and my team crazy. They could be the same.
Are all of your Windows 10 machines updated to the anniversary edition? For us, it is happening and locking up all builds. If you find the build with the JNI stack, you can kill that single slave, and all other slaves will be able to continue.
Yes all running Windows 10 Anniversary Edition. Luckily for us the issue remains isolated to specific nodes, and other jobs seem to run unimpeded. As I said above, downgrading to 2.7.4 made the issue disappear for me!
To clarify, all affected nodes run Windows 10 Anniversary Edition, but not all nodes running Windows 10 Anniversary Edition are affected by this issue (despite running the same kinds of jobs)?
That appears to be the case, all affected nodes are Win10 AE, but not all of my Win10 AE nodes are affected. It doesn't look like the affliction is consistent: Sometimes jobs will run fine on these nodes, sometimes they won't.
I believe the difference between my environment (where this hang causes all builds to stall) and your environment (where only that single node stalls) - is that we are using Pipeline builds, where you are using freestyle builds.
As part of the code for a pipeline build runs on the master, this hang seems to block all other activity on the master. That makes it a bit worse in the scenario where Pipeline is being used – but either way, its very bad.
We were able to recreate this problem with per-anniversary edition of Windows 10, so its not related to that.
It looks like this might be related to https://issues.jenkins-ci.org/browse/JENKINS-19445
It looks like upgrading my master to Ubuntu 16.04 resolved this issue. It was a fresh build, so it's possible Jenkins upgrades left some garbage in a config file somewhere, and my clean build didn't have that issue? Anyways, the issue is resolved for me, but it looks like people on certain configs/following certain upgrade paths may still have this issue.
While it appears to hang, create a thread dump: https://wiki.jenkins-ci.org/display/JENKINS/Obtaining+a+thread+dump