-
Bug
-
Resolution: Done
-
Major
-
jenkins 2.7.2
amazon-ecs 1.5
We have our Jenkins master running on a NFS 4.1 mount (AWS EFS).
When a job completes we see the following in the logs:
Aug 09, 2016 4:21:09 PM INFO com.cloudbees.jenkins.plugins.amazonecs.ECSService deleteTask Delete ECS Slave task: arn:aws:ecs:us-east-1:XXXXXXXXXXXX:task/80b3bc38-774c-4bbe-9e79-f9db126a2b9b Aug 09, 2016 4:21:09 PM WARNING hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor monitor Failed to monitor jenkins-ecs-d6d51f89f024 for Free Disk Space java.util.concurrent.ExecutionException: hudson.remoting.RequestAbortedException: hudson.remoting.Channel$OrderlyShutdown at hudson.remoting.Request$1.get(Request.java:282) at hudson.remoting.Request$1.get(Request.java:207) at hudson.remoting.FutureAdapter.get(FutureAdapter.java:59) at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitor(AbstractAsyncNodeMonitorDescriptor.java:96) at hudson.node_monitors.AbstractNodeMonitorDescriptor$Record.run(AbstractNodeMonitorDescriptor.java:305) Caused by: hudson.remoting.RequestAbortedException: hudson.remoting.Channel$OrderlyShutdown at hudson.remoting.Request.abort(Request.java:303) at hudson.remoting.Channel.terminate(Channel.java:847) at hudson.remoting.Channel$CloseCommand.execute(Channel.java:1080) at hudson.remoting.Channel$1.handle(Channel.java:501) at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:83) Caused by: hudson.remoting.Channel$OrderlyShutdown ... 3 more Caused by: Command close created at at hudson.remoting.Command.<init>(Command.java:62) at hudson.remoting.Command.<init>(Command.java:47) at hudson.remoting.Channel$CloseCommand.<init>(Channel.java:822) at hudson.remoting.Channel$CloseCommand.<init>(Channel.java:822) at hudson.remoting.Channel.close(Channel.java:867) at hudson.remoting.Channel$CloseCommand.execute(Channel.java:825) at hudson.remoting.Channel$ReaderThread.run(Channel.java:1106) Aug 09, 2016 4:21:09 PM WARNING hudson.slaves.SlaveComputer kill Unable to delete agent logs java.io.IOException: Unable to delete '/var/lib/jenkins/logs/slaves/jenkins-ecs-d6d51f89f024'. Tried 3 times (of a maximum of 3) waiting 0.1 sec between attempts. at hudson.Util.deleteRecursive(Util.java:355) at hudson.slaves.SlaveComputer.kill(SlaveComputer.java:670) at hudson.model.AbstractCIBase.killComputer(AbstractCIBase.java:86) at hudson.model.AbstractCIBase.updateComputerList(AbstractCIBase.java:205) at jenkins.model.Jenkins.updateComputerList(Jenkins.java:1338) at jenkins.model.Nodes$4.run(Nodes.java:219) at hudson.model.Queue._withLock(Queue.java:1320) at hudson.model.Queue.withLock(Queue.java:1197) at jenkins.model.Nodes.removeNode(Nodes.java:210) at jenkins.model.Jenkins.removeNode(Jenkins.java:1860) at hudson.slaves.AbstractCloudSlave.terminate(AbstractCloudSlave.java:70) at com.cloudbees.jenkins.plugins.amazonecs.ECSComputer.terminate(ECSComputer.java:62) at com.cloudbees.jenkins.plugins.amazonecs.ECSComputer.taskCompleted(ECSComputer.java:47) at hudson.model.queue.WorkUnitContext.synchronizeEnd(WorkUnitContext.java:140) at hudson.model.Executor.finish1(Executor.java:457) at hudson.model.Executor.run(Executor.java:430) Caused by: java.nio.file.FileSystemException: /var/lib/jenkins/logs/slaves/jenkins-ecs-d6d51f89f024/.nfs9e9ba4fd7c349fa200000003: Device or resource busy at sun.nio.fs.UnixException.translateToIOException(UnixException.java:91) at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107) at sun.nio.fs.UnixFileSystemProvider.implDelete(UnixFileSystemProvider.java:244) at sun.nio.fs.AbstractFileSystemProvider.deleteIfExists(AbstractFileSystemProvider.java:108) at java.nio.file.Files.deleteIfExists(Files.java:1165) at hudson.Util.tryOnceDeleteFile(Util.java:296) at hudson.Util.tryOnceDeleteRecursive(Util.java:373) at hudson.Util.tryOnceDeleteContentsRecursive(Util.java:392) at hudson.Util.tryOnceDeleteRecursive(Util.java:372) at hudson.Util.deleteRecursive(Util.java:350) ... 15 more
Looking at the Jenkins master I do indeed see the log file still open:
eborgstrom@devops-jenkins-7d98bfe3:/var/lib/jenkins$ sudo lsof +D /var/lib/jenkins/logs/slaves/jenkins-ecs-d6d51f89f024 COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME java 26119 jenkins 486w REG 0,25 22 11428909888000270242 /var/lib/jenkins/logs/slaves/jenkins-ecs-d6d51f89f024/.nfs9e9ba4fd7c349fa200000003 eborgstrom@devops-jenkins-7d98bfe3:/var/lib/jenkins$ find /var/lib/jenkins/logs/slaves/jenkins-ecs-d6d51f89f024 -ls 414054371676357616 4 drwxr-xr-x 2 jenkins jenkins 4096 Aug 9 23:21 /var/lib/jenkins/logs/slaves/jenkins-ecs-d6d51f89f024 11428909888000270242 4 -rw-r--r-- 1 jenkins jenkins 22 Aug 9 23:21 /var/lib/jenkins/logs/slaves/jenkins-ecs-d6d51f89f024/.nfs9e9ba4fd7c349fa200000003
This means agents remain around and need to be cleaned up manually.
The logs should be fully closed before the slave is deleted.