The implementation of AtomicFileWriter is not actually atomic.

      There are several issues with it.

      1. the rename first deletes the target giving a window of opportunity for no file to exist.
      2. the rename of the file will rename the file but the data in the file may not have been flushed to disk. In XFS this is a biggy as the metadata and filedata are written separatly - so despite the file being closed there is not fsync call - so if the system crashes you end up with zero length files.
        This has been observed in the wild when the OS hosting jenkins crashed, on restart several of Jenkins files were zero length.

      Jenkins should make use of java.nio to make sure

      1. data is actually synced
      2. it uses atomic operations for move on platforms where it is available.

          [JENKINS-34855] AtomicFileWriter isn't Atomic

          Oleg Nenashev added a comment -

          The fix has been integrated towards 2.93.

          batmat schristou please assign the label if you want the fix to be backported to 2.89.2/3. I am not 100% sure it is safe to backport it without lots of soak testing

          Oleg Nenashev added a comment - The fix has been integrated towards 2.93. batmat schristou please assign the label if you want the fix to be backported to 2.89.2/3. I am not 100% sure it is safe to backport it without lots of soak testing

          Oleg Nenashev added a comment -

          Actually, added the label. But the discussion is required

          Oleg Nenashev added a comment - Actually, added the label. But the discussion is required

          oleg_nenashev the issue is only partly fixed.  And probably only improved should we say for now. We agreed to merge the first shot so that we could split the risk and indeed let it soak separately to possibly better understand and narrow down potential issues later on.

          https://github.com/jenkinsci/jenkins/pull/3170 contains the final fix.

          That is why I had kept this JIRA open for now. 

          Baptiste Mathus added a comment - oleg_nenashev the issue is only partly fixed.  And probably only improved should we say for now. We agreed to merge the first shot so that we could split the risk and indeed let it soak separately to possibly better understand and narrow down potential issues later on. https://github.com/jenkinsci/jenkins/pull/3170  contains the final fix. That is why I had kept this JIRA open for now. 

          https://github.com/jenkinsci/jenkins/pull/3170 now has a successful build and is ready for review. So, yes, ideally it would be great once it's reviewed and merged if we can have it in the .2, or maybe .3 for whatever reason.

          With .2, with the christmas release break (.2 ETA January 17th), it would make it 4+ weeks of soak testing. So iff we don't see bad feedback or alerts, I think it would be worth it to get it backported indeed. It's a pretty nasty issue, when it shows up. 

          Baptiste Mathus added a comment - https://github.com/jenkinsci/jenkins/pull/3170  now has a successful build and is ready for review. So, yes, ideally it would be great once it's reviewed and merged if we can have it in the .2, or maybe .3 for whatever reason. With .2, with the christmas release break (.2 ETA January 17th), it would make it 4+ weeks of soak testing. So iff we don't see bad feedback or alerts, I think it would be worth it to get it backported indeed. It's a pretty nasty issue, when it shows up. 

          FYI an opt-out flag has been added to the ongoing PR: https://github.com/jenkinsci/jenkins/pull/3170 as per svanoort good points. Reviews welcome.

          Baptiste Mathus added a comment - FYI an opt-out flag has been added to the ongoing PR: https://github.com/jenkinsci/jenkins/pull/3170  as per svanoort good points. Reviews welcome.

          Once merged & released, this should probably be picked up in next LTS only once/if JENKINS-48407 is fixed too.

          Baptiste Mathus added a comment - Once merged & released, this should probably be picked up in next LTS only once/if  JENKINS-48407 is fixed too.

          Code changed in jenkins
          User: Baptiste Mathus
          Path:
          core/src/main/java/hudson/util/FileChannelWriter.java
          core/src/test/java/hudson/util/FileChannelWriterTest.java
          http://jenkins-ci.org/commit/jenkins/22efbad9450fc95748e0bc3bee04ec49702d7ed2
          Log:
          JENKINS-34855 Create a FileChannelWriter

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Baptiste Mathus Path: core/src/main/java/hudson/util/FileChannelWriter.java core/src/test/java/hudson/util/FileChannelWriterTest.java http://jenkins-ci.org/commit/jenkins/22efbad9450fc95748e0bc3bee04ec49702d7ed2 Log: JENKINS-34855 Create a FileChannelWriter

          Code changed in jenkins
          User: Baptiste Mathus
          Path:
          core/src/main/java/hudson/util/AtomicFileWriter.java
          core/src/main/java/hudson/util/FileChannelWriter.java
          core/src/test/java/hudson/util/FileChannelWriterTest.java
          test/src/test/java/hudson/util/AtomicFileWriterPerfTest.java
          http://jenkins-ci.org/commit/jenkins/438e9296f0646d920e19e05c7696d8cec5307a88
          Log:
          Merge pull request #3170 from batmat/JENKINS-34855

          [FIX JENKINS-34855] Create a FileChannelWriter and use it for AtomicFileWriter "enforced" flushing

          Compare: https://github.com/jenkinsci/jenkins/compare/02d0531add35...438e9296f064

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Baptiste Mathus Path: core/src/main/java/hudson/util/AtomicFileWriter.java core/src/main/java/hudson/util/FileChannelWriter.java core/src/test/java/hudson/util/FileChannelWriterTest.java test/src/test/java/hudson/util/AtomicFileWriterPerfTest.java http://jenkins-ci.org/commit/jenkins/438e9296f0646d920e19e05c7696d8cec5307a88 Log: Merge pull request #3170 from batmat/ JENKINS-34855 [FIX JENKINS-34855] Create a FileChannelWriter and use it for AtomicFileWriter "enforced" flushing Compare: https://github.com/jenkinsci/jenkins/compare/02d0531add35...438e9296f064

          Fixed towards 2.102, expected to be released in the next ~12 hours.

          Baptiste Mathus added a comment - Fixed towards 2.102, expected to be released in the next ~12 hours.

          Reminder: we must pick up JENKINS-48407 with this one if accepted for next LTS.

          Baptiste Mathus added a comment - Reminder: we must pick up  JENKINS-48407 with this one if accepted for next LTS.

            batmat Baptiste Mathus
            teilo James Nord
            Votes:
            1 Vote for this issue
            Watchers:
            7 Start watching this issue

              Created:
              Updated:
              Resolved: