Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-53401

Random FileNotFoundException when creating lots of agents in parallel threads

      Upon creating lots of agents in parallel (Cloud provisioning containers), I see sometimes random exceptions reported moving temporary files to node/config.xml.

      Also:   java.nio.file.NoSuchFileException: /var/jenkins_home/nodes/myagent-5pr7b/atomic4488666319135941520tmp -> /var/jenkins_home/nodes/myagent-5pr7b/config.xml
      		at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
      		at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
      		at sun.nio.fs.UnixCopyFile.move(UnixCopyFile.java:396)
      		at sun.nio.fs.UnixFileSystemProvider.move(UnixFileSystemProvider.java:262)
      		at java.nio.file.Files.move(Files.java:1395)
      		at hudson.util.AtomicFileWriter.commit(AtomicFileWriter.java:191)
      java.nio.file.NoSuchFileException: /var/jenkins_home/nodes/myagent-5pr7b/atomic4488666319135941520tmp
      	at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
      	at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
      	at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
      	at sun.nio.fs.UnixCopyFile.move(UnixCopyFile.java:409)
      	at sun.nio.fs.UnixFileSystemProvider.move(UnixFileSystemProvider.java:262)
      	at java.nio.file.Files.move(Files.java:1395)
      	at hudson.util.AtomicFileWriter.commit(AtomicFileWriter.java:206)
      	at hudson.XmlFile.write(XmlFile.java:198)
      	at jenkins.model.Nodes.save(Nodes.java:289)
      	at hudson.util.PersistedList.onModified(PersistedList.java:173)
      	at hudson.util.PersistedList.replaceBy(PersistedList.java:85)
      	at hudson.model.Slave.<init>(Slave.java:198)
      	at hudson.slaves.AbstractCloudSlave.<init>(AbstractCloudSlave.java:51)
      	at org.csanchez.jenkins.plugins.kubernetes.KubernetesSlave.<init>(KubernetesSlave.java:116)
      	at org.csanchez.jenkins.plugins.kubernetes.KubernetesSlave$Builder.build(KubernetesSlave.java:408)
      	at com.cloudbees.jenkins.plugins.kube.PlannedKubernetesSlave.call(PlannedKubernetesSlave.java:122)
      	at com.cloudbees.jenkins.plugins.kube.PlannedKubernetesSlave.call(PlannedKubernetesSlave.java:35)
      	at jenkins.util.ContextResettingExecutorService$2.call(ContextResettingExecutorService.java:46)
      	at jenkins.security.ImpersonatingExecutorService$2.call(ImpersonatingExecutorService.java:71)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
      	at java.lang.Thread.run(Thread.java:748)
      

      I tracked the root cause being the nodeProperties field in hudson.model.Slave.

      If you have a lot of agents created in different threads, this will cause to call Jenkins.get().getNodesObject().save in each thread. This method is not thread-safe, and affects all nodes storage. As a result, in some threads, save() throws an exception because the node has been already processed through another thread.

      In JENKINS-31055, Stephen made Node implement Saveable, which means the persisted lists should be tied to the node instead of the Nodes object. The corresponding save() operation is fine-grained, so the issue would be avoided completely.

          [JENKINS-53401] Random FileNotFoundException when creating lots of agents in parallel threads

          Vincent Latombe created issue -
          Vincent Latombe made changes -
          Summary Original: Random errors when creating lots of agents in parallel threads New: Random FileNotFoundException when creating lots of agents in parallel threads
          Vincent Latombe made changes -
          Description Original: Upon creating lots of agents in parallel (Cloud provisioning containers), I see sometimes random exceptions reported moving temporary files to {{node/config.xml}} (can't recall the exact stacktrace as I file this ticket though)

          I tracked the root cause being the {{nodeProperties}} [field|https://github.com/jenkinsci/jenkins/blob/master/core/src/main/java/hudson/model/Slave.java#L149]

          If you have a lot of agents created in different threads, this will cause to call {{Jenkins.get().getNodesObject().save}} in each thread. This method is not thread-safe, and affects all nodes storage.

          In JENKINS-31055, Stephen made Node implement Saveable, which means the persisted lists should be tied to the node instead of the Nodes object. The corresponding save() operation is fine-grained, so the issue would be avoided completely.
          New: Upon creating lots of agents in parallel (Cloud provisioning containers), I see sometimes random exceptions reported moving temporary files to {{node/config.xml}} (can't recall the exact stacktrace as I file this ticket though)

          I tracked the root cause being the {{nodeProperties}} [field|https://github.com/jenkinsci/jenkins/blob/master/core/src/main/java/hudson/model/Slave.java#L149].

          If you have a lot of agents created in different threads, this will cause to call {{Jenkins.get().getNodesObject().save}} in each thread. This method is not thread-safe, and affects all nodes storage.

          In JENKINS-31055, Stephen made Node implement Saveable, which means the persisted lists should be tied to the node instead of the Nodes object. The corresponding save() operation is fine-grained, so the issue would be avoided completely.
          Vincent Latombe made changes -
          Description Original: Upon creating lots of agents in parallel (Cloud provisioning containers), I see sometimes random exceptions reported moving temporary files to {{node/config.xml}} (can't recall the exact stacktrace as I file this ticket though)

          I tracked the root cause being the {{nodeProperties}} [field|https://github.com/jenkinsci/jenkins/blob/master/core/src/main/java/hudson/model/Slave.java#L149].

          If you have a lot of agents created in different threads, this will cause to call {{Jenkins.get().getNodesObject().save}} in each thread. This method is not thread-safe, and affects all nodes storage.

          In JENKINS-31055, Stephen made Node implement Saveable, which means the persisted lists should be tied to the node instead of the Nodes object. The corresponding save() operation is fine-grained, so the issue would be avoided completely.
          New: Upon creating lots of agents in parallel (Cloud provisioning containers), I see sometimes random exceptions reported moving temporary files to {{node/config.xml}} (can't recall the exact stacktrace as I file this ticket though)

          I tracked the root cause being the {{nodeProperties}} [field|https://github.com/jenkinsci/jenkins/blob/master/core/src/main/java/hudson/model/Slave.java#L149] in hudson.model.Slave.

          If you have a lot of agents created in different threads, this will cause to call {{Jenkins.get().getNodesObject().save}} in each thread. This method is not thread-safe, and affects all nodes storage.

          In JENKINS-31055, Stephen made Node implement Saveable, which means the persisted lists should be tied to the node instead of the Nodes object. The corresponding save() operation is fine-grained, so the issue would be avoided completely.
          Vincent Latombe made changes -
          Status Original: Open [ 1 ] New: In Progress [ 3 ]
          Vincent Latombe made changes -
          Status Original: In Progress [ 3 ] New: In Review [ 10005 ]
          Vincent Latombe made changes -
          Remote Link New: This issue links to "PR #3609 (Web Link)" [ 21435 ]
          Vincent Latombe made changes -
          Description Original: Upon creating lots of agents in parallel (Cloud provisioning containers), I see sometimes random exceptions reported moving temporary files to {{node/config.xml}} (can't recall the exact stacktrace as I file this ticket though)

          I tracked the root cause being the {{nodeProperties}} [field|https://github.com/jenkinsci/jenkins/blob/master/core/src/main/java/hudson/model/Slave.java#L149] in hudson.model.Slave.

          If you have a lot of agents created in different threads, this will cause to call {{Jenkins.get().getNodesObject().save}} in each thread. This method is not thread-safe, and affects all nodes storage.

          In JENKINS-31055, Stephen made Node implement Saveable, which means the persisted lists should be tied to the node instead of the Nodes object. The corresponding save() operation is fine-grained, so the issue would be avoided completely.
          New: Upon creating lots of agents in parallel (Cloud provisioning containers), I see sometimes random exceptions reported moving temporary files to {{node/config.xml}} (can't recall the exact stacktrace as I file this ticket though)

          I tracked the root cause being the {{nodeProperties}} [field|https://github.com/jenkinsci/jenkins/blob/master/core/src/main/java/hudson/model/Slave.java#L149] in hudson.model.Slave.

          If you have a lot of agents created in different threads, this will cause to call {{Jenkins.get().getNodesObject().save}} in each thread. This method is not thread-safe, and affects all nodes storage. As a result, in some threads, save() throws an exception because the node has been already processed through another thread.

          In JENKINS-31055, Stephen made Node implement Saveable, which means the persisted lists should be tied to the node instead of the Nodes object. The corresponding save() operation is fine-grained, so the issue would be avoided completely.
          Vincent Latombe made changes -
          Link New: This issue relates to JENKINS-31055 [ JENKINS-31055 ]
          Vincent Latombe made changes -
          Description Original: Upon creating lots of agents in parallel (Cloud provisioning containers), I see sometimes random exceptions reported moving temporary files to {{node/config.xml}} (can't recall the exact stacktrace as I file this ticket though)

          I tracked the root cause being the {{nodeProperties}} [field|https://github.com/jenkinsci/jenkins/blob/master/core/src/main/java/hudson/model/Slave.java#L149] in hudson.model.Slave.

          If you have a lot of agents created in different threads, this will cause to call {{Jenkins.get().getNodesObject().save}} in each thread. This method is not thread-safe, and affects all nodes storage. As a result, in some threads, save() throws an exception because the node has been already processed through another thread.

          In JENKINS-31055, Stephen made Node implement Saveable, which means the persisted lists should be tied to the node instead of the Nodes object. The corresponding save() operation is fine-grained, so the issue would be avoided completely.
          New: Upon creating lots of agents in parallel (Cloud provisioning containers), I see sometimes random exceptions reported moving temporary files to {{node/config.xml}}.

          {code}
          Also: java.nio.file.NoSuchFileException: /var/jenkins_home/nodes/myagent-5pr7b/atomic4488666319135941520tmp -> /var/jenkins_home/nodes/myagent-5pr7b/config.xml
          at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
          at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
          at sun.nio.fs.UnixCopyFile.move(UnixCopyFile.java:396)
          at sun.nio.fs.UnixFileSystemProvider.move(UnixFileSystemProvider.java:262)
          at java.nio.file.Files.move(Files.java:1395)
          at hudson.util.AtomicFileWriter.commit(AtomicFileWriter.java:191)
          java.nio.file.NoSuchFileException: /var/jenkins_home/nodes/myagent-5pr7b/atomic4488666319135941520tmp
          at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
          at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
          at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
          at sun.nio.fs.UnixCopyFile.move(UnixCopyFile.java:409)
          at sun.nio.fs.UnixFileSystemProvider.move(UnixFileSystemProvider.java:262)
          at java.nio.file.Files.move(Files.java:1395)
          at hudson.util.AtomicFileWriter.commit(AtomicFileWriter.java:206)
          at hudson.XmlFile.write(XmlFile.java:198)
          at jenkins.model.Nodes.save(Nodes.java:289)
          at hudson.util.PersistedList.onModified(PersistedList.java:173)
          at hudson.util.PersistedList.replaceBy(PersistedList.java:85)
          at hudson.model.Slave.<init>(Slave.java:198)
          at hudson.slaves.AbstractCloudSlave.<init>(AbstractCloudSlave.java:51)
          at org.csanchez.jenkins.plugins.kubernetes.KubernetesSlave.<init>(KubernetesSlave.java:116)
          at org.csanchez.jenkins.plugins.kubernetes.KubernetesSlave$Builder.build(KubernetesSlave.java:408)
          at com.cloudbees.jenkins.plugins.kube.PlannedKubernetesSlave.call(PlannedKubernetesSlave.java:122)
          at com.cloudbees.jenkins.plugins.kube.PlannedKubernetesSlave.call(PlannedKubernetesSlave.java:35)
          at jenkins.util.ContextResettingExecutorService$2.call(ContextResettingExecutorService.java:46)
          at jenkins.security.ImpersonatingExecutorService$2.call(ImpersonatingExecutorService.java:71)
          at java.util.concurrent.FutureTask.run(FutureTask.java:266)
          at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
          at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
          at java.lang.Thread.run(Thread.java:748)
          {code}

          I tracked the root cause being the {{nodeProperties}} [field|https://github.com/jenkinsci/jenkins/blob/master/core/src/main/java/hudson/model/Slave.java#L149] in hudson.model.Slave.

          If you have a lot of agents created in different threads, this will cause to call {{Jenkins.get().getNodesObject().save}} in each thread. This method is not thread-safe, and affects all nodes storage. As a result, in some threads, save() throws an exception because the node has been already processed through another thread.

          In JENKINS-31055, Stephen made Node implement Saveable, which means the persisted lists should be tied to the node instead of the Nodes object. The corresponding save() operation is fine-grained, so the issue would be avoided completely.
          Daniel Beck made changes -
          Resolution New: Fixed [ 1 ]
          Status Original: In Review [ 10005 ] New: Resolved [ 5 ]

            vlatombe Vincent Latombe
            vlatombe Vincent Latombe
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: