AtomicFileWriter performance issue on CephFS in case of Empty File creation

This issue is archived. You can view it, but you can't modify it. Learn more

XMLWordPrintable

    • Jenkins 2.206

      Hello, during migration from NFS to CephFS file storage we faced with performance degradation of Server startup due to RunIdMigrator.

       

      After trace analysis we figure out following thing:

      AtomicFileWriter create FileChannelWriterwith with only one OpenOption - StandardOpenOption.WRITE.

      For a newly created File, in case when AtomicFileWriter used to create new Empty file (ex: jenkins.model.RunIdMigrator#save) it leads to full fs sync instead of fsync on dirty inodes

      As a result this operation took up to 5 sec on CephFS.

      As a fix we add StandardOpenOption.CREATE OpenOption. MR - https://github.com/jenkinsci/jenkins/pull/4357

       

      Ceph logs Before Fix:

       [Wed Nov 13 16:17:26 2019] ceph: alloc_inode 000000000aeb2b5f
       [Wed Nov 13 16:17:26 2019] ceph: fsync 000000000aeb2b5f
       [Wed Nov 13 16:17:26 2019] ceph: fsync dirty caps are -
       [Wed Nov 13 16:17:30 2019] ceph: fsync 000000000aeb2b5f result=0

       

      Ceph logs After Fix:

       [Wed Nov 13 16:05:43 2019] ceph: alloc_inode 000000001442a671
       [Wed Nov 13 16:05:43 2019] ceph: inode 000000001442a671 now !dirty
       [Wed Nov 13 16:05:43 2019] ceph: fsync 000000001442a671
       [Wed Nov 13 16:05:43 2019] ceph: fsync dirty caps are Fw
       [Wed Nov 13 16:05:43 2019] ceph: inode 000000001442a671 now !flushing
       [Wed Nov 13 16:05:43 2019] ceph: inode 000000001442a671 now clean
       [Wed Nov 13 16:05:43 2019] ceph: fsync 000000001442a671 result=0

      Server startup with 2k job required to be migrated:

      • before fix startup took ~30min
      • after startup 2 min

            Assignee:
            Konstantin Bulanov
            Reporter:
            Konstantin Bulanov
            Archiver:
            Jenkins Service Account

              Created:
              Updated:
              Resolved:
              Archived: