Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-66001

Want a way to disable fsync(2) for Pipeline but retaining atomicity

      The Scaling Pipeline documentation states:

      Atomic writes: All settings except "maximum durability" currently avoid atomic writes — what this means is that if the operating system running Jenkins fails, data that is buffered for writing to disk will not be flushed, it will be lost. This is quite rare, but can happen as a result of container or virtualization operations that halt the operating system or disconnect storage. Usually this data is flushed pretty quickly to disk, so the window for data loss is brief. On Linux this flush-to-disk can be forced by running 'sync'. In some rare cases this can also result in a build that cannot be loaded.

      I think this is a bit misleading. I was running with SURVIVABLE_NONATOMIC for a while. My kernels don't panic and my storage doesn't drop writes. But my JVM doesn't always go down cleanly. My Ansible deployment scripts don't POST to /exit before stopping the container. And sometimes the JVM runs out of memory and the Docker container restarts. According to the documentation, those shouldn't be a big deal: after all, my OS and storage are stable.

      But JVM restarts are a big deal in SURVIVABLE_NONATOMIC mode. Yes SURVIVABLE_NONATOMIC avoids XmlFile#write which avoids fsync(2), which is great. But SURVIVABLE_NONATOMIC is not so great because rather than writing the XML file to a temporary location and moving it (atomically) on top of the destination once it has been closed, it truncates the destination and then writes the new content to it. That leaves a window of time where, if the JVM goes down, data might be lost in the destination file. "But you didn't want atomicity!" you say. Fair enough, but yet the documentation told me that all I had to do was make sure my OS and storage were stable. It didn't say anything about keeping my JVM from shutting down uncleanly, and that's a tall order. In fact, I explicitly ran sync(1) before (uncleanly) shutting down my JVM in the hope that I wouldn't have any durability problems, but I still ran into durability problems. Looking into the implementation after the fact, it all makes sense now.

      What I really want is a mode for Pipeline to stop calling fsync(2) but to keep writing out temporary files and atomically moving them on top of the destination. That way I don't need to worry about my JVM crashing, but I do need to worry about my kernel panicking or my storage dropping writes. I can handle that. Yet there seems to be no such option short of the nuclear -Dhudson.util.AtomicFileWriter.DISABLE_FORCED_FLUSH=true tunable, which disables fsync(2) not just for Pipeline but for all of Jenkins. That is a little more risk than I am willing to take on as well.

      What is to be done about all of this? I could just use -Dhudson.util.AtomicFileWriter.DISABLE_FORCED_FLUSH=true and live with the risk, but I think something ought to be done about Pipeline. If we are to treat the current description of SURVIVABLE_NONATOMIC as normative, then I think it should continue skipping the fsync(2) but still use temporary files and atomic moves. But then doing atomic moves with a mode called "non-atomic" makes no sense, so we would likely need a new mode with the description of the current SURVIVABLE_NONATOMIC mode (i.e., protects you from JVM crashes but not OS/storage failures). The current SURVIVABLE_NONATOMIC mode would need to have its description changed to alert users that they are subject to durability issues with unclean JVM shutdowns as well as OS and storage problems.

      If there is consensus about the above, it wouldn't be that hard to implement. We could expose a new XmlFile#write(Object, boolean) in core that allows one to disable the fsync(2). Then it's just a matter of defining a new durability mode and plumbing that through to workflow-support's PipelineIOUtils. A lot of plumbing but a relatively simple change. This would allow Pipeline users to opt out of fsync(2) for Pipeline while retaining atomicity and protection from JVM restarts, while also retaining fsync(2) for non-Pipeline use cases.

          [JENKINS-66001] Want a way to disable fsync(2) for Pipeline but retaining atomicity

          Basil Crow added a comment -

          Perhaps raihaan (who has done some performance work recently), batmat (who added the hudson.util.AtomicFileWriter.DISABLE_FORCED_FLUSH to core), or dnusbaum (who is maintaining Pipeline) might have some thoughts about this. If not, sorry for the mention.

          Basil Crow added a comment - Perhaps raihaan (who has done some performance work recently), batmat (who added the hudson.util.AtomicFileWriter.DISABLE_FORCED_FLUSH to core), or dnusbaum (who is maintaining Pipeline) might have some thoughts about this. If not, sorry for the mention.

          Devin Nusbaum added a comment -

          A few thoughts:

          • If you think that the documentation is wrong or misleading, please feel free to file a PR to update it (https://github.com/jenkins-infra/jenkins.io/blob/master/content/doc/book/pipeline/scaling-pipeline.adoc).
          • Making it possible to disable calls to fsync for a specific XmlFile instance seems fine to me (i.e making it possible to disable the fixes for JENKINS-34855 on a case-by-case basis). Even if the naming is confusing, I would not recommend introducing a new Pipeline durability level with this behavior, I would just update the implementation of SURVIVABLE_NONATOMIC, since the proposed behavior is probably how users expect that setting to work anyway.
          • Going by some usage statistics that I have access to, SURVIVABLE_NONATOMIC is an extremely unpopular setting (less than 0.5% of roughly 5000 controllers use it at all (PERFORMANCE_OPTIMIZED and MAXIMUM_SURVIVABILITY each have roughly 50% usage)). In the past I have thought about removing it entirely to simplify things both for users and maintainers since with the way it currently works there is not really any reason to use it over PERFORMANCE_OPTIMIZED.
          • Just FYI, I am not really maintaining Pipeline these days (I was moved moved to a different team at my employer). I try to help out where I can though with reviews and such.

          Devin Nusbaum added a comment - A few thoughts: If you think that the documentation is wrong or misleading, please feel free to file a PR to update it ( https://github.com/jenkins-infra/jenkins.io/blob/master/content/doc/book/pipeline/scaling-pipeline.adoc ). Making it possible to disable calls to fsync for a specific XmlFile instance seems fine to me (i.e making it possible to disable the fixes for JENKINS-34855 on a case-by-case basis). Even if the naming is confusing, I would not recommend introducing a new Pipeline durability level with this behavior, I would just update the implementation of  SURVIVABLE_NONATOMIC , since the proposed behavior is probably how users expect that setting to work anyway. Going by some usage statistics that I have access to, SURVIVABLE_NONATOMIC is an extremely unpopular setting (less than 0.5% of roughly 5000 controllers use it at all ( PERFORMANCE_OPTIMIZED  and MAXIMUM_SURVIVABILITY each have roughly 50% usage)). In the past I have thought about removing it entirely to simplify things both for users and maintainers since with the way it currently works there is not really any reason to use it over PERFORMANCE_OPTIMIZED . Just FYI, I am not really maintaining Pipeline these days (I was moved moved to a different team at my employer). I try to help out where I can though with reviews and such.

          Basil Crow added a comment -

          If you think that the documentation is wrong or misleading, please feel free to file a PR to update it

          I think I'd rather just fix SURVIVABLE_NONATOMIC to work the way I want it to (and the way the documentation currently describes).

          Going by some usage statistics that I have access to, SURVIVABLE_NONATOMIC is an extremely unpopular setting

          Probably because it results in the durability problems that I described in practice. I used it for all of 2 months before I had to revert back to MAX_SURVIVABILITY because I was experiencing issues with builds whose on-disk state was corrupt after unclean JVM shutdowns. But what the documentation is describing is a useful scenario: and indeed it's my scenario (wanting to achieve higher performance without losing too much durability with a trustworthy OS/storage system).

          Just FYI, I am not really maintaining Pipeline these days (I was moved moved to a different team at my employer).

          Thanks for letting me know. We miss you from the community side. Is there anyone else that is now officially maintaining Pipeline that I could ping for design reviews and such? I guess I could mention jglick at the risk of annoying him. (Please let me know if so and I'll stop.)

          Basil Crow added a comment - If you think that the documentation is wrong or misleading, please feel free to file a PR to update it I think I'd rather just fix SURVIVABLE_NONATOMIC to work the way I want it to (and the way the documentation currently describes). Going by some usage statistics that I have access to, SURVIVABLE_NONATOMIC is an extremely unpopular setting Probably because it results in the durability problems that I described in practice. I used it for all of 2 months before I had to revert back to MAX_SURVIVABILITY because I was experiencing issues with builds whose on-disk state was corrupt after unclean JVM shutdowns. But what the documentation is describing is a useful scenario: and indeed it's my scenario (wanting to achieve higher performance without losing too much durability with a trustworthy OS/storage system). Just FYI, I am not really maintaining Pipeline these days (I was moved moved to a different team at my employer). Thanks for letting me know. We miss you from the community side. Is there anyone else that is now officially maintaining Pipeline that I could ping for design reviews and such? I guess I could mention jglick at the risk of annoying him. (Please let me know if so and I'll stop.)

          Jesse Glick added a comment -

          I am not maintaining Pipeline either. I do not think there is anyone working seriously on fundamentals. Just triaging miscellaneous PRs is plenty work enough.

          From my dim memory of this stuff, I would like there to be fewer knobs, and ideally there would just be one well-tested mode which offers decent performance as well as being defensive against abrupt JVM crashes and common filesystem outages. (For inspiration, the design of the Mercurial binary store relies on append-only journals, and lower-level structures are always written before higher-level structures, so losing some writes after a certain point just means the most recent mutation was lost but everything before it remains intact.)

          There was also a notion to rewrite the storage layer for the flow graph to use a more reasonable format, rather than a directory with a ton of little XML files. It gets tricky because there is also a program.dat and build.xml and you would really like the whole build state to be written atomically.

          Jesse Glick added a comment - I am not maintaining Pipeline either. I do not think there is anyone working seriously on fundamentals. Just triaging miscellaneous PRs is plenty work enough. From my dim memory of this stuff, I would like there to be fewer knobs, and ideally there would just be one well-tested mode which offers decent performance as well as being defensive against abrupt JVM crashes and common filesystem outages. (For inspiration, the design of the Mercurial binary store relies on append-only journals, and lower-level structures are always written before higher-level structures, so losing some writes after a certain point just means the most recent mutation was lost but everything before it remains intact.) There was also a notion to rewrite the storage layer for the flow graph to use a more reasonable format, rather than a directory with a ton of little XML files. It gets tricky because there is also a program.dat and build.xml and you would really like the whole build state to be written atomically.

          Basil Crow added a comment -

          I am not maintaining Pipeline either. I do not think there is anyone working seriously on fundamentals. Just triaging miscellaneous PRs is plenty work enough.

          I know that, Jesse, and I am trying to respect that. But the fact is that CloudBees employees are the only individuals with commit access to Pipeline at present. So whether Pipeline maintenance is being funded by CloudBees or not, there is a need to at least communicate openly with the community.

          From my dim memory of this stuff, I would like there to be fewer knobs, and ideally there would just be one well-tested mode which offers decent performance as well as being defensive against abrupt JVM crashes and common filesystem outages.

          That aligns with Devin's opinion. I think that SURVIVABLE_NONATOMIC as documented (but not as currently implemented) is the mode you are describing and the mode I want to use for my use case, if only it behaved as documented. (As implemented, it is not resilient to JVM crashes, which is the point of this bug.) I'll go ahead and file a series of PRs to make it work as documented; namely, to do atomic writes (to be resilient to JVM crashes, which is not the current behavior) but to skip fsync(2) (which improves performance at the cost of exposing you to OS and storage failures, which is the current and desired behavior).

          Basil Crow added a comment - I am not maintaining Pipeline either. I do not think there is anyone working seriously on fundamentals. Just triaging miscellaneous PRs is plenty work enough. I know that, Jesse, and I am trying to respect that. But the fact is that CloudBees employees are the only individuals with commit access to Pipeline at present. So whether Pipeline maintenance is being funded by CloudBees or not, there is a need to at least communicate openly with the community. From my dim memory of this stuff, I would like there to be fewer knobs, and ideally there would just be one well-tested mode which offers decent performance as well as being defensive against abrupt JVM crashes and common filesystem outages. That aligns with Devin's opinion. I think that SURVIVABLE_NONATOMIC as documented (but not as currently implemented) is the mode you are describing and the mode I want to use for my use case, if only it behaved as documented. (As implemented, it is not resilient to JVM crashes, which is the point of this bug.) I'll go ahead and file a series of PRs to make it work as documented; namely, to do atomic writes (to be resilient to JVM crashes, which is not the current behavior) but to skip fsync(2) (which improves performance at the cost of exposing you to OS and storage failures, which is the current and desired behavior).

          Jesse Glick added a comment -

          That sounds reasonable.

          Jesse Glick added a comment - That sounds reasonable.

          I think changing SURVIVABLE_NONATOMIC is very reasonably as it stands.

          MAX_SURVIVABILITY - Protects from JVM and OS level issues
          SURVIVABLE_NONATOMIC - Protects from some level of JVM issues and no OS issues, better perf than previous tier
          PERFORMANCE_OPTIMIZED - No protection but offers best perf

          The change to SURVIVABLE_NONATOMIC - makes it more resilient to data corruption possibly at the cost of performance (adding the move operation)

          You might want to consider adding the other relevant bits to the plugins (maybe using AtomicFileWriter directly) before changing XMLFile because using the core api is just going to cause lots of bumps to the plugins

          Raihaan Shouhell added a comment - I think changing SURVIVABLE_NONATOMIC is very reasonably as it stands. MAX_SURVIVABILITY - Protects from JVM and OS level issues SURVIVABLE_NONATOMIC - Protects from some level of JVM issues and no OS issues, better perf than previous tier PERFORMANCE_OPTIMIZED - No protection but offers best perf The change to SURVIVABLE_NONATOMIC - makes it more resilient to data corruption possibly at the cost of performance (adding the move operation) You might want to consider adding the other relevant bits to the plugins (maybe using AtomicFileWriter directly) before changing XMLFile because using the core api is just going to cause lots of bumps to the plugins

          Basil Crow added a comment -

          For comparison, see how SQLite handles fsync(2). They have FULL, NORMAL, and OFF modes. OFF does no fsync(2) (like Jenkins' -Dhudson.util.AtomicFileWriter.DISABLE_FORCED_FLUSH=true), FULL is like the current behavior of Pipeline in maximum survivability mode, and for NORMAL "the SQLite database engine will still sync at the most critical moments, but less often than in FULL mode," which is very similar to what I'm proposing here. With my proposed changes we still fsync(2) for queue maintenance, the flow execution list, and other "most critical moments", but far less often than in maximum survivability mode.

          Basil Crow added a comment - For comparison, see how SQLite handles fsync(2) . They have FULL , NORMAL , and OFF modes. OFF does no fsync(2) (like Jenkins' -Dhudson.util.AtomicFileWriter.DISABLE_FORCED_FLUSH=true ), FULL is like the current behavior of Pipeline in maximum survivability mode, and for NORMAL "the SQLite database engine will still sync at the most critical moments, but less often than in FULL mode," which is very similar to what I'm proposing here. With my proposed changes we still fsync(2) for queue maintenance, the flow execution list, and other "most critical moments", but far less often than in maximum survivability mode.

            Unassigned Unassigned
            basil Basil Crow
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated: