Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-37293

ECS Agents are not deleted because Jenkins is unable to delete logs due to NFS locks

    XMLWordPrintable

    Details

    • Similar Issues:

      Description

      We have our Jenkins master running on a NFS 4.1 mount (AWS EFS).

      When a job completes we see the following in the logs:

      Aug 09, 2016 4:21:09 PM INFO com.cloudbees.jenkins.plugins.amazonecs.ECSService deleteTask
      Delete ECS Slave task: arn:aws:ecs:us-east-1:XXXXXXXXXXXX:task/80b3bc38-774c-4bbe-9e79-f9db126a2b9b
      Aug 09, 2016 4:21:09 PM WARNING hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor monitor
      Failed to monitor jenkins-ecs-d6d51f89f024 for Free Disk Space
      java.util.concurrent.ExecutionException: hudson.remoting.RequestAbortedException: hudson.remoting.Channel$OrderlyShutdown
      	at hudson.remoting.Request$1.get(Request.java:282)
      	at hudson.remoting.Request$1.get(Request.java:207)
      	at hudson.remoting.FutureAdapter.get(FutureAdapter.java:59)
      	at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitor(AbstractAsyncNodeMonitorDescriptor.java:96)
      	at hudson.node_monitors.AbstractNodeMonitorDescriptor$Record.run(AbstractNodeMonitorDescriptor.java:305)
      Caused by: hudson.remoting.RequestAbortedException: hudson.remoting.Channel$OrderlyShutdown
      	at hudson.remoting.Request.abort(Request.java:303)
      	at hudson.remoting.Channel.terminate(Channel.java:847)
      	at hudson.remoting.Channel$CloseCommand.execute(Channel.java:1080)
      	at hudson.remoting.Channel$1.handle(Channel.java:501)
      	at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:83)
      Caused by: hudson.remoting.Channel$OrderlyShutdown
      	... 3 more
      Caused by: Command close created at
      	at hudson.remoting.Command.<init>(Command.java:62)
      	at hudson.remoting.Command.<init>(Command.java:47)
      	at hudson.remoting.Channel$CloseCommand.<init>(Channel.java:822)
      	at hudson.remoting.Channel$CloseCommand.<init>(Channel.java:822)
      	at hudson.remoting.Channel.close(Channel.java:867)
      	at hudson.remoting.Channel$CloseCommand.execute(Channel.java:825)
      	at hudson.remoting.Channel$ReaderThread.run(Channel.java:1106)
      
      Aug 09, 2016 4:21:09 PM WARNING hudson.slaves.SlaveComputer kill
      Unable to delete agent logs
      java.io.IOException: Unable to delete '/var/lib/jenkins/logs/slaves/jenkins-ecs-d6d51f89f024'. Tried 3 times (of a maximum of 3) waiting 0.1 sec between attempts.
      	at hudson.Util.deleteRecursive(Util.java:355)
      	at hudson.slaves.SlaveComputer.kill(SlaveComputer.java:670)
      	at hudson.model.AbstractCIBase.killComputer(AbstractCIBase.java:86)
      	at hudson.model.AbstractCIBase.updateComputerList(AbstractCIBase.java:205)
      	at jenkins.model.Jenkins.updateComputerList(Jenkins.java:1338)
      	at jenkins.model.Nodes$4.run(Nodes.java:219)
      	at hudson.model.Queue._withLock(Queue.java:1320)
      	at hudson.model.Queue.withLock(Queue.java:1197)
      	at jenkins.model.Nodes.removeNode(Nodes.java:210)
      	at jenkins.model.Jenkins.removeNode(Jenkins.java:1860)
      	at hudson.slaves.AbstractCloudSlave.terminate(AbstractCloudSlave.java:70)
      	at com.cloudbees.jenkins.plugins.amazonecs.ECSComputer.terminate(ECSComputer.java:62)
      	at com.cloudbees.jenkins.plugins.amazonecs.ECSComputer.taskCompleted(ECSComputer.java:47)
      	at hudson.model.queue.WorkUnitContext.synchronizeEnd(WorkUnitContext.java:140)
      	at hudson.model.Executor.finish1(Executor.java:457)
      	at hudson.model.Executor.run(Executor.java:430)
      Caused by: java.nio.file.FileSystemException: /var/lib/jenkins/logs/slaves/jenkins-ecs-d6d51f89f024/.nfs9e9ba4fd7c349fa200000003: Device or resource busy
      	at sun.nio.fs.UnixException.translateToIOException(UnixException.java:91)
      	at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
      	at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
      	at sun.nio.fs.UnixFileSystemProvider.implDelete(UnixFileSystemProvider.java:244)
      	at sun.nio.fs.AbstractFileSystemProvider.deleteIfExists(AbstractFileSystemProvider.java:108)
      	at java.nio.file.Files.deleteIfExists(Files.java:1165)
      	at hudson.Util.tryOnceDeleteFile(Util.java:296)
      	at hudson.Util.tryOnceDeleteRecursive(Util.java:373)
      	at hudson.Util.tryOnceDeleteContentsRecursive(Util.java:392)
      	at hudson.Util.tryOnceDeleteRecursive(Util.java:372)
      	at hudson.Util.deleteRecursive(Util.java:350)
      	... 15 more
      

      Looking at the Jenkins master I do indeed see the log file still open:

      eborgstrom@devops-jenkins-7d98bfe3:/var/lib/jenkins$ sudo lsof +D /var/lib/jenkins/logs/slaves/jenkins-ecs-d6d51f89f024
      COMMAND   PID    USER   FD   TYPE DEVICE SIZE/OFF                 NODE NAME
      java    26119 jenkins  486w   REG   0,25       22 11428909888000270242 /var/lib/jenkins/logs/slaves/jenkins-ecs-d6d51f89f024/.nfs9e9ba4fd7c349fa200000003
      
      eborgstrom@devops-jenkins-7d98bfe3:/var/lib/jenkins$ find /var/lib/jenkins/logs/slaves/jenkins-ecs-d6d51f89f024 -ls
      414054371676357616    4 drwxr-xr-x   2 jenkins  jenkins      4096 Aug  9 23:21 /var/lib/jenkins/logs/slaves/jenkins-ecs-d6d51f89f024
      11428909888000270242    4 -rw-r--r--   1 jenkins  jenkins        22 Aug  9 23:21 /var/lib/jenkins/logs/slaves/jenkins-ecs-d6d51f89f024/.nfs9e9ba4fd7c349fa200000003
      

      This means agents remain around and need to be cleaned up manually.

      The logs should be fully closed before the slave is deleted.

        Attachments

          Activity

            People

            Assignee:
            roehrijn2 Jan Roehrich
            Reporter:
            borgstrom Evan Borgstrom
            Votes:
            1 Vote for this issue
            Watchers:
            2 Start watching this issue

              Dates

              Created:
              Updated:
              Resolved: