Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-37293

ECS Agents are not deleted because Jenkins is unable to delete logs due to NFS locks

XMLWordPrintable

      We have our Jenkins master running on a NFS 4.1 mount (AWS EFS).

      When a job completes we see the following in the logs:

      Aug 09, 2016 4:21:09 PM INFO com.cloudbees.jenkins.plugins.amazonecs.ECSService deleteTask
      Delete ECS Slave task: arn:aws:ecs:us-east-1:XXXXXXXXXXXX:task/80b3bc38-774c-4bbe-9e79-f9db126a2b9b
      Aug 09, 2016 4:21:09 PM WARNING hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor monitor
      Failed to monitor jenkins-ecs-d6d51f89f024 for Free Disk Space
      java.util.concurrent.ExecutionException: hudson.remoting.RequestAbortedException: hudson.remoting.Channel$OrderlyShutdown
      	at hudson.remoting.Request$1.get(Request.java:282)
      	at hudson.remoting.Request$1.get(Request.java:207)
      	at hudson.remoting.FutureAdapter.get(FutureAdapter.java:59)
      	at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitor(AbstractAsyncNodeMonitorDescriptor.java:96)
      	at hudson.node_monitors.AbstractNodeMonitorDescriptor$Record.run(AbstractNodeMonitorDescriptor.java:305)
      Caused by: hudson.remoting.RequestAbortedException: hudson.remoting.Channel$OrderlyShutdown
      	at hudson.remoting.Request.abort(Request.java:303)
      	at hudson.remoting.Channel.terminate(Channel.java:847)
      	at hudson.remoting.Channel$CloseCommand.execute(Channel.java:1080)
      	at hudson.remoting.Channel$1.handle(Channel.java:501)
      	at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:83)
      Caused by: hudson.remoting.Channel$OrderlyShutdown
      	... 3 more
      Caused by: Command close created at
      	at hudson.remoting.Command.<init>(Command.java:62)
      	at hudson.remoting.Command.<init>(Command.java:47)
      	at hudson.remoting.Channel$CloseCommand.<init>(Channel.java:822)
      	at hudson.remoting.Channel$CloseCommand.<init>(Channel.java:822)
      	at hudson.remoting.Channel.close(Channel.java:867)
      	at hudson.remoting.Channel$CloseCommand.execute(Channel.java:825)
      	at hudson.remoting.Channel$ReaderThread.run(Channel.java:1106)
      
      Aug 09, 2016 4:21:09 PM WARNING hudson.slaves.SlaveComputer kill
      Unable to delete agent logs
      java.io.IOException: Unable to delete '/var/lib/jenkins/logs/slaves/jenkins-ecs-d6d51f89f024'. Tried 3 times (of a maximum of 3) waiting 0.1 sec between attempts.
      	at hudson.Util.deleteRecursive(Util.java:355)
      	at hudson.slaves.SlaveComputer.kill(SlaveComputer.java:670)
      	at hudson.model.AbstractCIBase.killComputer(AbstractCIBase.java:86)
      	at hudson.model.AbstractCIBase.updateComputerList(AbstractCIBase.java:205)
      	at jenkins.model.Jenkins.updateComputerList(Jenkins.java:1338)
      	at jenkins.model.Nodes$4.run(Nodes.java:219)
      	at hudson.model.Queue._withLock(Queue.java:1320)
      	at hudson.model.Queue.withLock(Queue.java:1197)
      	at jenkins.model.Nodes.removeNode(Nodes.java:210)
      	at jenkins.model.Jenkins.removeNode(Jenkins.java:1860)
      	at hudson.slaves.AbstractCloudSlave.terminate(AbstractCloudSlave.java:70)
      	at com.cloudbees.jenkins.plugins.amazonecs.ECSComputer.terminate(ECSComputer.java:62)
      	at com.cloudbees.jenkins.plugins.amazonecs.ECSComputer.taskCompleted(ECSComputer.java:47)
      	at hudson.model.queue.WorkUnitContext.synchronizeEnd(WorkUnitContext.java:140)
      	at hudson.model.Executor.finish1(Executor.java:457)
      	at hudson.model.Executor.run(Executor.java:430)
      Caused by: java.nio.file.FileSystemException: /var/lib/jenkins/logs/slaves/jenkins-ecs-d6d51f89f024/.nfs9e9ba4fd7c349fa200000003: Device or resource busy
      	at sun.nio.fs.UnixException.translateToIOException(UnixException.java:91)
      	at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
      	at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
      	at sun.nio.fs.UnixFileSystemProvider.implDelete(UnixFileSystemProvider.java:244)
      	at sun.nio.fs.AbstractFileSystemProvider.deleteIfExists(AbstractFileSystemProvider.java:108)
      	at java.nio.file.Files.deleteIfExists(Files.java:1165)
      	at hudson.Util.tryOnceDeleteFile(Util.java:296)
      	at hudson.Util.tryOnceDeleteRecursive(Util.java:373)
      	at hudson.Util.tryOnceDeleteContentsRecursive(Util.java:392)
      	at hudson.Util.tryOnceDeleteRecursive(Util.java:372)
      	at hudson.Util.deleteRecursive(Util.java:350)
      	... 15 more
      

      Looking at the Jenkins master I do indeed see the log file still open:

      eborgstrom@devops-jenkins-7d98bfe3:/var/lib/jenkins$ sudo lsof +D /var/lib/jenkins/logs/slaves/jenkins-ecs-d6d51f89f024
      COMMAND   PID    USER   FD   TYPE DEVICE SIZE/OFF                 NODE NAME
      java    26119 jenkins  486w   REG   0,25       22 11428909888000270242 /var/lib/jenkins/logs/slaves/jenkins-ecs-d6d51f89f024/.nfs9e9ba4fd7c349fa200000003
      
      eborgstrom@devops-jenkins-7d98bfe3:/var/lib/jenkins$ find /var/lib/jenkins/logs/slaves/jenkins-ecs-d6d51f89f024 -ls
      414054371676357616    4 drwxr-xr-x   2 jenkins  jenkins      4096 Aug  9 23:21 /var/lib/jenkins/logs/slaves/jenkins-ecs-d6d51f89f024
      11428909888000270242    4 -rw-r--r--   1 jenkins  jenkins        22 Aug  9 23:21 /var/lib/jenkins/logs/slaves/jenkins-ecs-d6d51f89f024/.nfs9e9ba4fd7c349fa200000003
      

      This means agents remain around and need to be cleaned up manually.

      The logs should be fully closed before the slave is deleted.

            roehrijn2 Jan Roehrich
            borgstrom Evan Borgstrom
            Votes:
            1 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: