-
Bug
-
Resolution: Cannot Reproduce
-
Major
-
None
-
Ubuntu Server 10.04 64-bit
-
Powered by SuggestiMate
I don't know why this happens, but my slaves have begun to hang when they get to the Archiving Artifacts portion of my job:
Archiving artifacts ERROR: Failed to archive artifacts: dist/** hudson.util.IOException2: Failed to extract /mnt/hudsonslave/workspace/simplegeo-puppet-manifests/dist/** at hudson.FilePath.readFromTar(FilePath.java:1577) at hudson.FilePath.copyRecursiveTo(FilePath.java:1491) at hudson.tasks.ArtifactArchiver.perform(ArtifactArchiver.java:117) at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:19) at hudson.model.AbstractBuild$AbstractRunner.perform(AbstractBuild.java:601) at hudson.model.AbstractBuild$AbstractRunner.performAllBuildSteps(AbstractBuild.java:580) at hudson.model.AbstractBuild$AbstractRunner.performAllBuildSteps(AbstractBuild.java:558) at hudson.model.Build$RunnerImpl.post2(Build.java:157) at hudson.model.AbstractBuild$AbstractRunner.post(AbstractBuild.java:528) at hudson.model.Run.run(Run.java:1303) at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46) at hudson.model.ResourceController.execute(ResourceController.java:88) at hudson.model.Executor.run(Executor.java:137) Caused by: java.io.IOException at hudson.remoting.FastPipedInputStream.read(FastPipedInputStream.java:173) at hudson.util.HeadBufferingStream.read(HeadBufferingStream.java:61) at java.util.zip.InflaterInputStream.fill(InflaterInputStream.java:221) at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:141) at java.util.zip.GZIPInputStream.read(GZIPInputStream.java:92) at org.apache.tools.tar.TarBuffer.readBlock(TarBuffer.java:257) at org.apache.tools.tar.TarBuffer.readRecord(TarBuffer.java:223) at hudson.org.apache.tools.tar.TarInputStream.read(TarInputStream.java:345) at java.io.FilterInputStream.read(FilterInputStream.java:90) at org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:1025) at org.apache.commons.io.IOUtils.copy(IOUtils.java:999) at hudson.util.IOUtils.copy(IOUtils.java:33) at hudson.FilePath.readFromTar(FilePath.java:1565) ... 12 more
I aborted this job, so I don't know if the error is related or not. The job runs fine, but it hangs during archiving. If I restart the connection to the node (in /computer/; not restarting Hudson or the node itself), jobs will build successfully again for a while.
- is related to
-
JENKINS-11586 Regression introduced with Slave Side ChannelPinger
-
- Resolved
-
-
JENKINS-13614 archiving artefacts from remote MacOS X, IBM AIX slave fails
-
- Resolved
-
-
JENKINS-15682 archive artifacts hangs on ia64 slave due to JNA initialization error
-
- Resolved
-
[JENKINS-7641] Slaves hang when archiving artifacts
For what it's worth, we are seeing this too with 1.425 on ubuntu 10.04.03 x86_64. Strangely, it's only started being a problem in the past 2-3 weeks, but it happens quite frequently now, as often as once every other matrix build. I did a threadDump but didn't see anything particularly obvious. Would be happy to provide more information.
What we see is jobs which should normally take no longer than 20-30 minutes getting stuck like this:
Started 3 hr 10 min ago on $build_node
The console shows this at the end:
+ exit 0 Archiving artifacts (spinner)
If I cancel the job, I get this:
+ exit 0
Archiving artifacts
ERROR: Failed to archive artifacts: artifacts/**
hudson.util.IOException2: hudson.util.IOException2: Failed to extract /var/lib/jenkins/workspace/lustre-reviews/arch/x86_64/build_type/client/distro/el6/ib_stack/inkernel/artifacts/**
at hudson.FilePath.readFromTar(FilePath.java:1662)
at hudson.FilePath.copyRecursiveTo(FilePath.java:1580)
at hudson.tasks.ArtifactArchiver.perform(ArtifactArchiver.java:116)
at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:19)
at hudson.model.AbstractBuild$AbstractRunner.perform(AbstractBuild.java:682)
at hudson.model.AbstractBuild$AbstractRunner.performAllBuildSteps(AbstractBuild.java:657)
at hudson.model.AbstractBuild$AbstractRunner.performAllBuildSteps(AbstractBuild.java:635)
at hudson.model.Build$RunnerImpl.post2(Build.java:161)
at hudson.model.AbstractBuild$AbstractRunner.post(AbstractBuild.java:604)
at hudson.model.Run.run(Run.java:1400)
at hudson.matrix.MatrixRun.run(MatrixRun.java:146)
at hudson.model.ResourceController.execute(ResourceController.java:88)
at hudson.model.Executor.run(Executor.java:175)
Caused by: java.io.IOException
at hudson.remoting.FastPipedInputStream.read(FastPipedInputStream.java:175)
at hudson.util.HeadBufferingStream.read(HeadBufferingStream.java:61)
at java.util.zip.InflaterInputStream.fill(InflaterInputStream.java:238)
at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:158)
at java.util.zip.GZIPInputStream.read(GZIPInputStream.java:109)
at org.apache.tools.tar.TarBuffer.readBlock(TarBuffer.java:257)
at org.apache.tools.tar.TarBuffer.readRecord(TarBuffer.java:223)
at hudson.org.apache.tools.tar.TarInputStream.read(TarInputStream.java:345)
at java.io.FilterInputStream.read(FilterInputStream.java:107)
at org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:1025)
at org.apache.commons.io.IOUtils.copy(IOUtils.java:999)
at hudson.util.IOUtils.copy(IOUtils.java:36)
at hudson.FilePath.readFromTar(FilePath.java:1654)
... 12 more
at hudson.FilePath.copyRecursiveTo(FilePath.java:1587)
at hudson.tasks.ArtifactArchiver.perform(ArtifactArchiver.java:116)
at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:19)
at hudson.model.AbstractBuild$AbstractRunner.perform(AbstractBuild.java:682)
at hudson.model.AbstractBuild$AbstractRunner.performAllBuildSteps(AbstractBuild.java:657)
at hudson.model.AbstractBuild$AbstractRunner.performAllBuildSteps(AbstractBuild.java:635)
at hudson.model.Build$RunnerImpl.post2(Build.java:161)
at hudson.model.AbstractBuild$AbstractRunner.post(AbstractBuild.java:604)
at hudson.model.Run.run(Run.java:1400)
at hudson.matrix.MatrixRun.run(MatrixRun.java:146)
at hudson.model.ResourceController.execute(ResourceController.java:88)
at hudson.model.Executor.run(Executor.java:175)
Caused by: java.util.concurrent.ExecutionException: hudson.remoting.RequestAbortedException: java.net.SocketException: Socket closed
at hudson.remoting.Request$1.get(Request.java:252)
at hudson.remoting.Request$1.get(Request.java:184)
at hudson.remoting.FutureAdapter.get(FutureAdapter.java:59)
at hudson.FilePath.copyRecursiveTo(FilePath.java:1583)
... 11 more
Caused by: hudson.remoting.RequestAbortedException: java.net.SocketException: Socket closed
at hudson.remoting.Request.abort(Request.java:273)
at hudson.remoting.Channel.terminate(Channel.java:719)
at hudson.remoting.Channel$ReaderThread.run(Channel.java:1060)
Caused by: java.net.SocketException: Socket closed
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:146)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
at java.io.ObjectInputStream$PeekInputStream.read(ObjectInputStream.java:2282)
at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2295)
at java.io.ObjectInputStream$BlockDataInputStream.readUTFBody(ObjectInputStream.java:3035)
at java.io.ObjectInputStream$BlockDataInputStream.readUTF(ObjectInputStream.java:2836)
at java.io.ObjectInputStream.readString(ObjectInputStream.java:1616)
at java.io.ObjectInputStream.readTypeString(ObjectInputStream.java:1418)
at java.io.ObjectStreamClass.readNonProxy(ObjectStreamClass.java:667)
at java.io.ObjectInputStream.readClassDescriptor(ObjectInputStream.java:826)
at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1582)
at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1513)
at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1600)
at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1513)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1749)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1346)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:368)
at hudson.remoting.Channel$ReaderThread.run(Channel.java:1031)
Finished: SUCCESS
The job is indeed marked as successful, and life goes on, until the next one gets stuck...
Happening every single time on an OSX 10.4 SSH slave, Jenkins 1.467. It was not copying the artifacts at all with 1.461, and now it simply hangs. Cancelling the slave job does not work (the job does not die), I have to cancel the main job instead. I reverted to using 1.461 in the time being – it also does not work, but at least it does not hang.
Same behavior here on OS X 10.4 slave (JRE 1.5.0_19, slave.jar 2.16). As described by @crusius, although I can cancel the job if I try it twice. Then I get the following, slightly different error:
Archiving artifacts ERROR: Failed to archive artifacts: build.log, dist/* hudson.util.IOException2: java.io.IOException at hudson.remoting.FastPipedInputStream.read(FastPipedInputStream.java:175) at hudson.util.HeadBufferingStream.read(HeadBufferingStream.java:61) at java.io.FilterInputStream.read(FilterInputStream.java:107) at hudson.util.HeadBufferingStream.fillSide(HeadBufferingStream.java:83) at hudson.FilePath$TarCompression$2.extract(FilePath.java:619) at hudson.FilePath.copyRecursiveTo(FilePath.java:1771) at hudson.tasks.ArtifactArchiver.perform(ArtifactArchiver.java:116) at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:19) at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:717) at hudson.model.AbstractBuild$AbstractBuildExecution.performAllBuildSteps(AbstractBuild.java:692) at hudson.model.Build$BuildExecution.post2(Build.java:183) at hudson.model.AbstractBuild$AbstractBuildExecution.post(AbstractBuild.java:639) at hudson.model.Run.execute(Run.java:1513) at hudson.matrix.MatrixRun.run(MatrixRun.java:146) at hudson.model.ResourceController.execute(ResourceController.java:88) at hudson.model.Executor.run(Executor.java:236) at hudson.FilePath.copyRecursiveTo(FilePath.java:1778) at hudson.tasks.ArtifactArchiver.perform(ArtifactArchiver.java:116) at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:19) at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:717) at hudson.model.AbstractBuild$AbstractBuildExecution.performAllBuildSteps(AbstractBuild.java:692) at hudson.model.Build$BuildExecution.post2(Build.java:183) at hudson.model.AbstractBuild$AbstractBuildExecution.post(AbstractBuild.java:639) at hudson.model.Run.execute(Run.java:1513) at hudson.matrix.MatrixRun.run(MatrixRun.java:146) at hudson.model.ResourceController.execute(ResourceController.java:88) at hudson.model.Executor.run(Executor.java:236) Caused by: java.util.concurrent.ExecutionException: hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected termination of the channel at hudson.remoting.Request$1.get(Request.java:278) at hudson.remoting.Request$1.get(Request.java:210) at hudson.remoting.FutureAdapter.get(FutureAdapter.java:59) at hudson.FilePath.copyRecursiveTo(FilePath.java:1774) ... 10 more Caused by: hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected termination of the channel at hudson.remoting.Request.abort(Request.java:299) at hudson.remoting.Channel.terminate(Channel.java:719) at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:69) Caused by: java.io.IOException: Unexpected termination of the channel at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:50) Caused by: java.io.EOFException at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2570) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1314) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:368) at hudson.remoting.Command.readFrom(Command.java:90) at hudson.remoting.ClassicCommandTransport.read(ClassicCommandTransport.java:59) at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:48)
Same thing here (RHEL 5.5). But as i've browsed the reports it seems to originate in the slow disk.
The configuration is like that:
RHEL slave with workspace on NFS. The network (100mbit) is shared so the actual throughput is even less.
Master (RHEL 5.8, Jenkins 477) has working directory on NFS too.
The copying artifacts step hasn't finished (the artifact is some 150M). The step (before i've cancelled it) worked about 15 hours, half of that at night without user activity on the net. Therefore i conclude that things went actually wrong (probably on the slave) and it's not a matter of time.
After i've cancelled the job, i see the following lines in the log:
Archiving artifacts
ERROR: Failed to archive artifacts: *.pax.gz
hudson.util.IOException2: java.io.IOException
at hudson.remoting.FastPipedInputStream.read(FastPipedInputStream.java:175)
at hudson.util.HeadBufferingStream.read(HeadBufferingStream.java:61)
at java.io.FilterInputStream.read(FilterInputStream.java:107)
at hudson.util.HeadBufferingStream.fillSide(HeadBufferingStream.java:83)
at hudson.FilePath$TarCompression$2.extract(FilePath.java:619)
at hudson.FilePath.copyRecursiveTo(FilePath.java:1771)
at hudson.tasks.ArtifactArchiver.perform(ArtifactArchiver.java:116)
at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:19)
at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:717)
at hudson.model.AbstractBuild$AbstractBuildExecution.performAllBuildSteps(AbstractBuild.java:692)
at hudson.model.Build$BuildExecution.post2(Build.java:183)
at hudson.model.AbstractBuild$AbstractBuildExecution.post(AbstractBuild.java:639)
at hudson.model.Run.execute(Run.java:1527)
at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
at hudson.model.ResourceController.execute(ResourceController.java:88)
at hudson.model.Executor.run(Executor.java:236)at hudson.FilePath.copyRecursiveTo(FilePath.java:1778)
at hudson.tasks.ArtifactArchiver.perform(ArtifactArchiver.java:116)
at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:19)
at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:717)
at hudson.model.AbstractBuild$AbstractBuildExecution.performAllBuildSteps(AbstractBuild.java:692)
at hudson.model.Build$BuildExecution.post2(Build.java:183)
at hudson.model.AbstractBuild$AbstractBuildExecution.post(AbstractBuild.java:639)
at hudson.model.Run.execute(Run.java:1527)
at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
at hudson.model.ResourceController.execute(ResourceController.java:88)
at hudson.model.Executor.run(Executor.java:236)
Caused by: java.util.concurrent.ExecutionException: java.lang.IllegalAccessError: class hudson.remoting.Pipe$ConnectCommand cannot access its superclass hudson.remoting.Command
at hudson.remoting.Channel$4.adapt(Channel.java:696)
at hudson.remoting.Channel$4.adapt(Channel.java:691)
at hudson.remoting.FutureAdapter.get(FutureAdapter.java:59)
at hudson.FilePath.copyRecursiveTo(FilePath.java:1774)
... 10 more
Caused by: java.lang.IllegalAccessError: class hudson.remoting.Pipe$ConnectCommand cannot access its superclass hudson.remoting.Command
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:632)
at java.lang.ClassLoader.defineClass(ClassLoader.java:478)
at hudson.remoting.RemoteClassLoader.loadClassFile(RemoteClassLoader.java:152)
at hudson.remoting.RemoteClassLoader.findClass(RemoteClassLoader.java:131)
at java.lang.ClassLoader.loadClass(ClassLoader.java:319)
at java.lang.ClassLoader.loadClass(ClassLoader.java:264)
at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:332)
at java.lang.Class.getDeclaredFields0(Native Method)
at java.lang.Class.privateGetDeclaredFields(Class.java:2308)
at java.lang.Class.getDeclaredField(Class.java:1897)
at java.io.ObjectStreamClass.getDeclaredSUID(ObjectStreamClass.java:1627)
at java.io.ObjectStreamClass.access$700(ObjectStreamClass.java:69)
at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:442)
at java.security.AccessController.doPrivileged(Native Method)
at java.io.ObjectStreamClass.<init>(ObjectStreamClass.java:430)
at java.io.ObjectStreamClass.lookup(ObjectStreamClass.java:327)
at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:564)
at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1600)
at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1513)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1749)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1346)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1963)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1887)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1770)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1346)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1963)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1887)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1770)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1346)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:368)
at hudson.remoting.UserRequest.deserialize(UserRequest.java:182)
at hudson.remoting.UserRequest.perform(UserRequest.java:98)
at hudson.remoting.UserRequest.perform(UserRequest.java:48)
at hudson.remoting.Request$2.run(Request.java:326)
at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:72)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:636)
Triggering a new build of SVFE-Basic #1
Finished: SUCCESS
I'm getting this error quite regularly since upgrading from v1.447.2 to v1.479. Both the master and slave are running Ubuntu 10.04 64-bit. There are just 4 artifacts being uploaded with the biggest only around 8MB.
Seeing the same issue for every build on three platforms: aix, hpux and linux-ia64. Now using the latest jenkins 1.487 and still seeing the problem.
Thread dump for a linux ia64 server currently with a hung job is as follows:
Thread Dump
Channel reader thread: channel
"Channel reader thread: channel" Id=9 Group=main RUNNABLE (in native)
at java.io.FileInputStream.readBytes(Native Method)
at java.io.FileInputStream.read(FileInputStream.java:199)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
- locked java.io.BufferedInputStream@27736da0
at java.io.ObjectInputStream$PeekInputStream.peek(ObjectInputStream.java:2248)
at java.io.ObjectInputStream$BlockDataInputStream.peek(ObjectInputStream.java:2541)
at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2551)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1296)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:350)
at hudson.remoting.Command.readFrom(Command.java:90)
at hudson.remoting.ClassicCommandTransport.read(ClassicCommandTransport.java:59)
at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:48)
main
"main" Id=1 Group=main WAITING on hudson.remoting.Channel@7a148bd3
at java.lang.Object.wait(Native Method)
- waiting on hudson.remoting.Channel@7a148bd3
at java.lang.Object.wait(Object.java:485)
at hudson.remoting.Channel.join(Channel.java:792)
at hudson.remoting.Launcher.main(Launcher.java:428)
at hudson.remoting.Launcher.runWithStdinStdout(Launcher.java:374)
at hudson.remoting.Launcher.run(Launcher.java:214)
at hudson.remoting.Launcher.main(Launcher.java:173)
Ping thread for channel hudson.remoting.Channel@7a148bd3:channel
"Ping thread for channel hudson.remoting.Channel@7a148bd3:channel" Id=10 Group=main TIMED_WAITING
at java.lang.Thread.sleep(Native Method)
at hudson.remoting.PingThread.run(PingThread.java:86)
Pipe writer thread: channel
"Pipe writer thread: channel" Id=12 Group=main WAITING on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@6484439e
at sun.misc.Unsafe.park(Native Method)
- waiting on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@6484439e
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399)
at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:947)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
at java.lang.Thread.run(Thread.java:660)
pool-1-thread-29
"pool-1-thread-29" Id=72 Group=main RUNNABLE
at sun.management.ThreadImpl.dumpThreads0(Native Method)
at sun.management.ThreadImpl.dumpAllThreads(ThreadImpl.java:374)
at hudson.Functions.getThreadInfos(Functions.java:889)
at hudson.util.RemotingDiagnostics$GetThreadDump.call(RemotingDiagnostics.java:96)
at hudson.util.RemotingDiagnostics$GetThreadDump.call(RemotingDiagnostics.java:92)
at hudson.remoting.UserRequest.perform(UserRequest.java:118)
at hudson.remoting.UserRequest.perform(UserRequest.java:48)
at hudson.remoting.Request$2.run(Request.java:326)
at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:72)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:660)
Number of locked synchronizers = 1
- java.util.concurrent.locks.ReentrantLock$NonfairSync@2ac4e4b9
Finalizer
"Finalizer" Id=3 Group=system WAITING on java.lang.ref.ReferenceQueue$Lock@b22379c
at java.lang.Object.wait(Native Method)
- waiting on java.lang.ref.ReferenceQueue$Lock@b22379c
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:118)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:134)
at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159)
Reference Handler
"Reference Handler" Id=2 Group=system WAITING on java.lang.ref.Reference$Lock@7370e879
at java.lang.Object.wait(Native Method)
- waiting on java.lang.ref.Reference$Lock@7370e879
at java.lang.Object.wait(Object.java:485)
at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116):1
Signal Dispatcher
"Signal Dispatcher" Id=4 Group=system RUNNABLE
Relevant portion of jenkins log for the job is as follows (it appears the jenkins thought that the job has successfully completed):
INFO: prevent-davis.linux-ia64 #65 main build action completed: SUCCESS
Oct 22, 2012 6:48:49 PM hudson.remoting.SynchronousCommandTransport$ReaderThread run
SEVERE: I/O error in channel b-linuxia64-01
java.io.IOException: Unexpected termination of the channel
at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:50)
Caused by: java.io.EOFException
at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2553)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1296)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:350)
at hudson.remoting.Command.readFrom(Command.java:90)
at hudson.remoting.ClassicCommandTransport.read(ClassicCommandTransport.java:59)
at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:48)
The slave on the server that processes the job does not seem to have any problem talking to the server as I had no problem displaying its System Information:
Unix slave, version 2.17
System Properties
Name ↓
Value
file.encoding ANSI_X3.4-1968
file.encoding.pkg sun.io
file.separator /
java.awt.graphicsenv sun.awt.X11GraphicsEnvironment
java.awt.printerjob sun.print.PSPrinterJob
java.class.path slave.jar
java.class.version 50.0
java.endorsed.dirs /usr/java/jdk1.6.0_26/jre/lib/endorsed
java.ext.dirs /usr/java/jdk1.6.0_26/jre/lib/ext:/usr/java/packages/lib/ext
java.home /usr/java/jdk1.6.0_26/jre
...
I'm getting a similar error message the original bug poster is getting. My setup is two CentOS servers, one master one slave. The slave finishes the build successfully and the archiving hangs. Jenkins version 1.4886 I believe.
This does indeed sound like it's related to JENKINS-11586.
This bug is a real pain for us since we have to figure out some way of 'archiving' the packages without using the built-in archiving tools now... Makes life much more difficult :/
also see this error - it's impossible to archive artifacts on ia64 slave (redhat linux) - just hangs.
files a separate issue for that: https://issues.jenkins-ci.org/browse/JENKINS-15682
We had a similar issue to this that was resolved by switching to Oracle's JDK from Open JDK.
switching to Oracle's JDK from Open JDK
Unfortunately that's not an option for OSX < 10.7
This issue just got worse (at least on OSX 10.4): with 1.502, when the build itself was successful, a failure to archive the artefacts did at least not change the overall build result to FAILURE. Unfortunately, with 1.511 it does:
=========================================== Running test =========================================== Test passed =========================================== Archiving artifacts ERROR: Failed to archive artifacts: build.log hudson.util.IOException2: java.io.IOException at hudson.remoting.FastPipedInputStream.read(FastPipedInputStream.java:175) at hudson.util.HeadBufferingStream.read(HeadBufferingStream.java:61) at java.io.FilterInputStream.read(FilterInputStream.java:107) at hudson.util.HeadBufferingStream.fillSide(HeadBufferingStream.java:83) at hudson.FilePath$TarCompression$2.extract(FilePath.java:625) at hudson.FilePath.copyRecursiveTo(FilePath.java:1926) at hudson.tasks.ArtifactArchiver.perform(ArtifactArchiver.java:133) at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:19) at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:802) at hudson.model.AbstractBuild$AbstractBuildExecution.performAllBuildSteps(AbstractBuild.java:774) at hudson.model.Build$BuildExecution.post2(Build.java:183) at hudson.model.AbstractBuild$AbstractBuildExecution.post(AbstractBuild.java:724) at hudson.model.Run.execute(Run.java:1600) at hudson.matrix.MatrixRun.run(MatrixRun.java:146) at hudson.model.ResourceController.execute(ResourceController.java:88) at hudson.model.Executor.run(Executor.java:237) at hudson.FilePath.copyRecursiveTo(FilePath.java:1933) at hudson.tasks.ArtifactArchiver.perform(ArtifactArchiver.java:133) at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:19) at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:802) at hudson.model.AbstractBuild$AbstractBuildExecution.performAllBuildSteps(AbstractBuild.java:774) at hudson.model.Build$BuildExecution.post2(Build.java:183) at hudson.model.AbstractBuild$AbstractBuildExecution.post(AbstractBuild.java:724) at hudson.model.Run.execute(Run.java:1600) at hudson.matrix.MatrixRun.run(MatrixRun.java:146) at hudson.model.ResourceController.execute(ResourceController.java:88) at hudson.model.Executor.run(Executor.java:237) Caused by: java.util.concurrent.ExecutionException: hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected termination of the channel at hudson.remoting.Request$1.get(Request.java:278) at hudson.remoting.Request$1.get(Request.java:210) at hudson.remoting.FutureAdapter.get(FutureAdapter.java:59) at hudson.FilePath.copyRecursiveTo(FilePath.java:1929) ... 10 more Caused by: hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected termination of the channel at hudson.remoting.Request.abort(Request.java:299) at hudson.remoting.Channel.terminate(Channel.java:732) at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:69) Caused by: java.io.IOException: Unexpected termination of the channel at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:50) Caused by: java.io.EOFException at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2570) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1314) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:368) at hudson.remoting.Command.readFrom(Command.java:92) at hudson.remoting.ClassicCommandTransport.read(ClassicCommandTransport.java:59) at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:48) Build step 'Archive the artifacts' changed build result to FAILURE Finished: FAILURE
Hi, I use jenkins version 1.514 and this issues happens for me for the past 1 week. This is a big blocker for me in executing builds in slave.
I have installed jenkins on linux box and all the slaves are also remote linux boxes connected via SSH.
Can anyone advise how to resolve this at the earliest? Thanks a mil for your help in advance.
Regards,
Aswini
Ok, so this is probably as old as Jenkins itself, and probably ready to become (if not already) its businesscard - what's that Jenkins thing? Ah, it does some CI and bunch of server hanging around. And the biggest problem is not that the issue described here happens, it's what shown by this log:
01:53:56.709 Archiving artifacts
12:02:21.094 ERROR: Failed to archive artifacts: *.sum
12:02:21.096 hudson.util.IOException2: hudson.util.IOException2: Failed to extract /home/buildslave/workspace/cbuild/transfer of 1 files
12:02:21.096 Caused by: java.io.IOException
12:02:21.096 at hudson.remoting.FastPipedInputStream.read(FastPipedInputStream.java:175)
So, it hanged build server for 10hrs. As it usually happens, there're more jobs accumulate for this build slave, other builds waiting for this build to complete, so entire CI system comes to a halt, followed by kaboom. And if nobody's looking, Jenkins will happily and shamelessly lock up your servers for days.
So, why this happens? Surely, there's no single reason, but carefully selected assorti of toxic stuff:
1. Bugs in JVM/Java - just because there're comments above that switching to another JDK version/provider seem to alleviate it.
2. Bugs in Jenkins - just because every new release brings only security fixes as if previous version was something like 0.0.1, so you may imagine there's lot to fix yet.
3. But most importantly, that's the way Jenkins is written. To illustrate that, let's look at linked JENKINS-11586, comment straight from Kohsuke: "The ping currently is supposed to wait for 4 minutes before it gives up and kills the channel. But I have a hard time believing that the channel did really clog for 4 minutes." Here we go. Network over which that channel goes is UNRELIABLE. Everything you find hard to believe about it is actually true. It may fail to deliver stuff, it may clog for 4 or 40 minutes, RIAA may knock on your door telling you download forbidden torrents which you never did. Or take another example, straight from FastPipedInputStream.java as quoted in stacktrace above. In its header, it confesses to be java.io.PipedInputStream equivalent, which just "uses proper synchronization" and "doesn't rely on polling". A seasonal engineer would recognize smartness and forthlooking of Java engineers who did not trust Java to do synchronization and instead used strings-and-stick method. Now smart kids came who thought they could make it better, and now their code locks up servers throughout the world.
Ok, enough intro to problem area. Let's look straight into FastPipedInputStream.java:175 which was caught red-handed in the stacktrace above: https://github.com/jenkinsci/remoting/blob/master/src/main/java/hudson/remoting/FastPipedInputStream.java#L163
And what we see immediately is infinite loop. Just see yourself: it blocks in wait on buffer for 10s, then does some liveness checks (which, as we already learned, will fail to detect any issues regularly), the checks for updates from outside, and if nothing happens, it will hang there forever. Here it is, Jenkins coding style.
Here's the patch: https://github.com/pfalcon/remoting/commit/239b3dcf26498ff296fabf770dffc7a456b2878c
So, why am I writing all this? Over time, I learned that if people come to me with artifact archiving issues, the best thing I can suggest them is AVOID NATIVE JENKINS ARTIFACT ARCHIVING LIKE A PLAGUE. People listen, and that job whose stack trace is quoted above is no longer uses it (rest of our jobs didn't use it for years). So, while I hacked up the patch above, I don't really have sandbox to test it with. So, if you experience this issue, I encourage you to try that patch and share results. Just to clarify - the patch above is not going to fix this issue (if you want it not fail, then Java and Jenkins are wrong technologies). But instead it makes it fail fast and not waste the resources (it also addresses only one infinite loop, I'm sure there're dozens more).
The stacktrace above happened with Jenins 1.532.1 on Linux/Ubuntu master/slave (x86, x64, arm slaves affected).
Thanks for listening to the rant, and happy (or sour) Jenkinsing!
Woo-hoo, the patch in the previous comment works:
00:18:59.873 FATAL: HTML Publisher failure
00:18:59.873 hudson.util.IOException2: hudson.util.IOException2: Failed to extract /srv/jenkins/workspace/job/doc/html/*/
00:18:59.873 at hudson.FilePath.readFromTar(FilePath.java:2066)
00:18:59.873 at hudson.FilePath.copyRecursiveTo(FilePath.java:1978)
00:18:59.873 at hudson.FilePath.copyRecursiveTo(FilePath.java:1889)
00:18:59.873 at hudson.FilePath.copyRecursiveTo(FilePath.java:1872)
00:18:59.873 at htmlpublisher.HtmlPublisher.perform(HtmlPublisher.java:213)
00:18:59.873 at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:20)
00:18:59.873 at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:785)
00:18:59.873 at hudson.model.AbstractBuild$AbstractBuildExecution.performAllBuildSteps(AbstractBuild.java:757)
00:18:59.873 at hudson.model.Build$BuildExecution.post2(Build.java:183)
00:18:59.873 at hudson.model.AbstractBuild$AbstractBuildExecution.post(AbstractBuild.java:706)
00:18:59.873 at hudson.model.Run.execute(Run.java:1690)
00:18:59.873 at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
00:18:59.873 at hudson.model.ResourceController.execute(ResourceController.java:88)
00:18:59.873 at hudson.model.Executor.run(Executor.java:246)
00:18:59.873 Caused by: java.io.IOException: High-level timeout waiting for activity on pipe
00:18:59.873 at hudson.remoting.FastPipedInputStream.read(FastPipedInputStream.java:180)
00:18:59.873 at hudson.util.HeadBufferingStream.read(HeadBufferingStream.java:61)
00:18:59.873 at com.jcraft.jzlib.InflaterInputStream.fill(InflaterInputStream.java:175)
00:18:59.873 at com.jcraft.jzlib.InflaterInputStream.read(InflaterInputStream.java:106)
00:18:59.873 at org.apache.tools.tar.TarBuffer.readBlock(TarBuffer.java:257)
00:18:59.873 at org.apache.tools.tar.TarBuffer.readRecord(TarBuffer.java:223)
00:18:59.873 at hudson.org.apache.tools.tar.TarInputStream.getNextEntry(TarInputStream.java:228)
00:18:59.873 at hudson.FilePath.readFromTar(FilePath.java:2044)
00:18:59.873 ... 13 more
So, the timeout as it is now is 1000s. Trying to cut that in 2 and see if any false positive timeouts would happen.
Like mjmac, hangs after "Archiving artifacts" if I don't have "Response Time" checked under "Preventive Node Monitoring" for my slave nodes. However, if the monitoring is enabled, I get the IOException. Happens more often than not, but is intermittent. If I use my Jenkins master (jenkins-1.549-1.1; Intel server with RHEL 6.4) as a slave I don't see any issues, ever. My other slaves are IBM PPC servers (also with RHEL6.4) - that's where I see the problem. I've only tested with a 20-byte file, so size isn't the issue.
If I kill a hung job I get the following:
Archiving artifacts
ERROR: Failed to archive artifacts: the_date.txt
java.io.IOException: java.io.IOException: Failed to extract /bgusr/home1/jenkins/bgqmsn0/workspace/gene_triggerer2/transfer of 1 files
at hudson.FilePath.readFromTar(FilePath.java:2088)
at hudson.FilePath.copyRecursiveTo(FilePath.java:2000)
at jenkins.model.StandardArtifactManager.archive(StandardArtifactManager.java:57)
at hudson.tasks.ArtifactArchiver.perform(ArtifactArchiver.java:140)
at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:20)
at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:784)
at hudson.model.AbstractBuild$AbstractBuildExecution.performAllBuildSteps(AbstractBuild.java:756)
at hudson.model.Build$BuildExecution.post2(Build.java:183)
at hudson.model.AbstractBuild$AbstractBuildExecution.post(AbstractBuild.java:705)
at hudson.model.Run.execute(Run.java:1695)
at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
at hudson.model.ResourceController.execute(ResourceController.java:88)
at hudson.model.Executor.run(Executor.java:231)
Caused by: java.io.IOException
at hudson.remoting.FastPipedInputStream.read(FastPipedInputStream.java:175)
at hudson.util.HeadBufferingStream.read(HeadBufferingStream.java:61)
at com.jcraft.jzlib.InflaterInputStream.fill(InflaterInputStream.java:175)
at com.jcraft.jzlib.InflaterInputStream.read(InflaterInputStream.java:106)
at org.apache.tools.tar.TarBuffer.readBlock(TarBuffer.java:257)
at org.apache.tools.tar.TarBuffer.readRecord(TarBuffer.java:223)
at hudson.org.apache.tools.tar.TarInputStream.getNextEntry(TarInputStream.java:228)
at hudson.FilePath.readFromTar(FilePath.java:2066)
... 12 more
at hudson.FilePath.copyRecursiveTo(FilePath.java:2007)
at jenkins.model.StandardArtifactManager.archive(StandardArtifactManager.java:57)
at hudson.tasks.ArtifactArchiver.perform(ArtifactArchiver.java:140)
at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:20)
at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:784)
at hudson.model.AbstractBuild$AbstractBuildExecution.performAllBuildSteps(AbstractBuild.java:756)
at hudson.model.Build$BuildExecution.post2(Build.java:183)
at hudson.model.AbstractBuild$AbstractBuildExecution.post(AbstractBuild.java:705)
at hudson.model.Run.execute(Run.java:1695)
at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
at hudson.model.ResourceController.execute(ResourceController.java:88)
at hudson.model.Executor.run(Executor.java:231)
Caused by: java.util.concurrent.ExecutionException: hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected termination of the channel
at hudson.remoting.Request$1.get(Request.java:278)
at hudson.remoting.Request$1.get(Request.java:210)
at hudson.remoting.FutureAdapter.get(FutureAdapter.java:59)
at hudson.FilePath.copyRecursiveTo(FilePath.java:2003)
... 11 more
Caused by: hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected termination of the channel
at hudson.remoting.Request.abort(Request.java:299)
at hudson.remoting.Channel.terminate(Channel.java:782)
at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:69)
Caused by: java.io.IOException: Unexpected termination of the channel
at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:50)
Caused by: java.io.EOFException
at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2570)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1314)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:368)
at hudson.remoting.Command.readFrom(Command.java:92)
at hudson.remoting.ClassicCommandTransport.read(ClassicCommandTransport.java:71)
at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:48)
This only became a problem for me when I upgraded from 556 to 561, but it now hangs on every slave build.
valkolovos: You're experiencing the unrelated and newly introduced in 1.560 JENKINS-22734.
I'm getting this about once every three or four builds now, mostly with windows (cygwin) slaves but not exclusively. My artifacts range from 1-2 mb to 800mb. Size appears not to be an issue.
At the time things hang (forever) both master and slave become quiet. Other jobs on other slaves progress without issue.
When I cancel the stuck build I see:
ERROR: Failed to archive artifacts: pool.tar.gz,*/.log,built.,pool//-Results.xml,_pool.tar.gz,.html,*.st
java.io.IOException: java.io.IOException: Failed to extract .....
<snip>
Caused by: java.io.IOException
at hudson.remoting.FastPipedInputStream.read(FastPipedInputStream.java:177)
at hudson.util.HeadBufferingStream.read(HeadBufferingStream.java:61)
<snip>
Will pfalcon's patch cure this?
And the beat goes on...
I am getting the same exception with Ubunto 12.04 remote and a Jenkins 1.509.1 master... sigh, I know it's an older master but still, yet another data point. This project worked and archived 25 days ago without any problem. Now it won't pass for nothing.
Is anyone experiencing this on Oracle JDK?
Quoting an earlier comment:
We had a similar issue to this that was resolved by switching to Oracle's JDK from Open JDK.
That would be a real data point...
Unfortunately Oracle Java isn't available on OS X < 10.7 but we only see the issue (see above: comment 1, comment 2) on 10.4 - and we still need to support it.
Hmm.. my windows slaves are the ones mostly (but not exclusivly) involved in this bug. All my windows slaves are using Oracle Java 1.7. My master is oepnjdk 1.7 on 64bit linux
Just had a thought. This gets here because the sink is waiting for more data from the source (and the source is still connected) but the source has no more data to write.. Perhaps there is another buffer in-between the two that isn't getting flushed?
Oh and my windows slaves all use Oracle Java. My master is openjdk. I shall investigate!
After having had a real good stare and think about this I have been inspecting the tests and code for FastInputStream and FastOutputStream. I unfortunately have to disagree a little with Paul on this. The implementation uses remoteing and synchronization through pure java calls. The two objects wait and notify each other, FOS waits to enter the synchronized block and either writes and calls notify or yields the block if FOS hasn't read yet. FIS waits for the sync block and then waits for FOS to notify it. So, given the heavy use of java remoting I can well believe that different versions/builds of JRE can impact this.
Using Oracle vm doesn't seem to solve the problem. My master is Jenkins 1.580.2 on Ubuntu 14.04 (Oracle Java 1.8.0_25) , slaves are running Win Xp (Oracle Java 1.8.0_25). Changing the way that slaves are connecting (Windows Service -> jnlp) didn't help neither. After rebooting affected slave, issue goes away for some time. What's interesting , when I run into situation with "archiving artifacts stuck" , I'm also not able to download anything from slave's workspace ( browsing the workspace works properly though)
Havin sometimes this issue with
Jenkins Master v1.565.3 @ Ubuntu 2.6.24-23-server
and Jenkins Slave @ Windows 7
but was maybe because of the build not making the *.zip file that is our artifact
I see this also (the stack trace is from killing the job while it's archiving). The problem is sometimes it gets stuck while archiving. It looks as if everything is archived, but for some reason the job won't complete..until you cancel it.
Using 1.598. Jenkins running on RHEL6.4 using jdk 1.8.0_31.
12/21/15: Update: using 1.642, I see the same as before. Job appears to have completed, and the last visible line (before killing the job) is
"Archiving artifacts"
and there's a spinning icon below that.
This is not new. It has been an ongoing problem.
These are testing portion of jdk builds. The test runs are separate jobs. The one in question has run to completion and hangs while archiving(apparently). Logs do not reveal anything helpful.
This happens on several different platforms, windows-x86, windowx-x64, solarix-x64, osx, linux-x86, sparc, linux-x64. It seems if I wait it out, jobs will eventually complete. It may take 2-3 or more hour (jobs usually take 15 minutes to 1 hr).
We seem to also be running into this issue using swarm slaves.
We have just started experiencing the same problem, and its killing our build farm. Its happening across platforms as mentioned before and does not seem specific to how the connection is launched. Both ssh from the master or webstart from a slave have the same problem.
Some information that might help. We don't always have artifacts, they are only present if tests fail and leave behind what they expected to be created which we can manually inspect later. Could it be because it didn't find anything?
I have the same problem using version 1.609.
I think the problem is related somehow to the dimensions of files to archive.
Infact if i change my job configuration in order to archive less artifacts, the job completes successfully
I'm running into this now in version 1.651.3 on a windows slave using ssh
I'm not sure what is preventing the Archiving step from succeeding, but whatever it is leaves the build in a hung state indefinitely. This build agent has successfully built before and only started having this problem recently.
The most disruptive part of this hang is that it disregards the build timeout. I have 'Abort the build if it is stuck' set with an absolute timeout of 45 minutes configured in the job and the build was stuck at the Archiving step for over 2 and a half hours.
We were hitting with this issue but atleast in our case I was able to get a work around as below and with my limited experience I doubt the actual problem is with the Jenkins code interaction with DNS.
In my case the Jenkins Master and the slave where I had the Artifact hanging was in different network domain.
Example a simple ping command for me from jenkins master
ping windows_machine_with_problem
PING windows_machine_with_problem.DOMAIN1 (172.23.136.85) 56(84) bytes of data.
64 bytes from windows_machine_with_problem.DOMAIN1 (172.23.136.85): icmp_seq=1 ttl=127 time=0.287 ms
64 bytes from windows_machine_with_problem.DOMAIN1 (172.23.136.85): icmp_seq=2 ttl=127 time=0.335 ms
64 bytes from windows_machine_with_problem.DOMAIN1 (172.23.136.85): icmp_seq=3 ttl=127 time=0.377 ms
ping windows_machine_with_noproblem
PING windows_machine_with_noproblem.DOMAIN2 (172.28.8.87) 56(84) bytes of data.
64 bytes from 172.28.8.87: icmp_seq=1 ttl=128 time=0.379 ms
64 bytes from 172.28.8.87: icmp_seq=2 ttl=128 time=0.519 ms
64 bytes from 172.28.8.87: icmp_seq=3 ttl=128 time=0.400 ms
This indicated any traffic which goes from the master to windows_machine_with_problem passes through some networking entities.
To avoid any DNS from equation I just replaced the "hostname" in slave configuration directly with IP address and the artifact hanging disappeared.
So to summarize when I used ip-address instead of DNS name , atleast in my case artifact hanging issue was resolved.
This still happens for us in jenkins 2.21 with a number of Ubuntu 14.04 slaves.
Another interesting data point: when the archive step hangs, jenkins does not honor the build timeout parameter: Abort the build if it's stuck was set to 25 minutes and the build sat there for an hour and 20 minutes.
Could someone try to use the new option to disable the TCP_NODELAY (1.28)? it will use a buffered connection that will improve the data transfer on large files transfer.
see https://github.com/jenkinsci/ssh-slaves-plugin/blob/master/doc/CONFIGURE.md#advanced-settings
I have the same problem where some jobs hang when archiving artifacts.
...
Archiving artifacts
I am stuck on this and it would be useful to have some advice on how to proceed for exploring this issue.
For information in this particular case, the server and the slave run on the same machine.
The machine is a RedHat 3 / i686 32 bits:
$ uname -a
Linux 2.4.21-20.EL #1 Wed Aug 18 20:58:25 EDT 2004 i686 i686 i386 GNU/Linux
I had another case some weeks ago where the slave and the server were on a different machine, and the artifacts archiving was also stalled.
The slave was this time a RedHat 4 / x86_64 64 bits:
$ uname -a
Linux 2.6.9-89.0.23.ELsmp #1 SMP Fri Mar 5 23:27:13 EST 2010 x86_64 x86_64 x86_64 GNU/Linux
I have at that time upgraded the java VM from a 32 bits to a 64 bits VM and for now we didn't suffer anymore, but I suspect that it did not solve the problem fundamentally.