Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-60167

AtomicFileWriter performance issue on CephFS in case of Empty File creation

    • Jenkins 2.206

      Hello, during migration from NFS to CephFS file storage we faced with performance degradation of Server startup due to RunIdMigrator.

       

      After trace analysis we figure out following thing:

      AtomicFileWriter create FileChannelWriterwith with only one OpenOption - StandardOpenOption.WRITE.

      For a newly created File, in case when AtomicFileWriter used to create new Empty file (ex: jenkins.model.RunIdMigrator#save) it leads to full fs sync instead of fsync on dirty inodes

      As a result this operation took up to 5 sec on CephFS.

      As a fix we add StandardOpenOption.CREATE OpenOption. MR - https://github.com/jenkinsci/jenkins/pull/4357

       

      Ceph logs Before Fix:

       [Wed Nov 13 16:17:26 2019] ceph: alloc_inode 000000000aeb2b5f
       [Wed Nov 13 16:17:26 2019] ceph: fsync 000000000aeb2b5f
       [Wed Nov 13 16:17:26 2019] ceph: fsync dirty caps are -
       [Wed Nov 13 16:17:30 2019] ceph: fsync 000000000aeb2b5f result=0

       

      Ceph logs After Fix:

       [Wed Nov 13 16:05:43 2019] ceph: alloc_inode 000000001442a671
       [Wed Nov 13 16:05:43 2019] ceph: inode 000000001442a671 now !dirty
       [Wed Nov 13 16:05:43 2019] ceph: fsync 000000001442a671
       [Wed Nov 13 16:05:43 2019] ceph: fsync dirty caps are Fw
       [Wed Nov 13 16:05:43 2019] ceph: inode 000000001442a671 now !flushing
       [Wed Nov 13 16:05:43 2019] ceph: inode 000000001442a671 now clean
       [Wed Nov 13 16:05:43 2019] ceph: fsync 000000001442a671 result=0

      Server startup with 2k job required to be migrated:

      • before fix startup took ~30min
      • after startup 2 min

          [JENKINS-60167] AtomicFileWriter performance issue on CephFS in case of Empty File creation

          Konstantin Bulanov created issue -
          Konstantin Bulanov made changes -
          Description Original: Hello, during migration for NFS to CephFS file storage we faced with performance degradation of Server startup due to RunIdMigrator.

           

          After trace analysis we figure out following thing:

          AtomicFileWriter create FileChannelWriterwith with only one OpenOption - StandardOpenOption.WRITE.

          For a newly created File, in case when AtomicFileWriter used to create new Empty file (ex: jenkins.model.RunIdMigrator#save) it leads to full fs sync instead of fsync on dirty inodes

          As a result this operation took up to 5 sec on CephFS.

          Ceph logs for not Empty file:
          [Wed Nov 13 16:05:43 2019] ceph: alloc_inode 000000001442a671
          [Wed Nov 13 16:05:43 2019] ceph: inode 000000001442a671 now !dirty
          [Wed Nov 13 16:05:43 2019] ceph: fsync 000000001442a671
          [Wed Nov 13 16:05:43 2019] ceph: fsync dirty caps are Fw
          [Wed Nov 13 16:05:43 2019] ceph: inode 000000001442a671 now !flushing
          [Wed Nov 13 16:05:43 2019] ceph: inode 000000001442a671 now clean
          [Wed Nov 13 16:05:43 2019] ceph: fsync 000000001442a671 result=0

          Ceph logs for Empty file:
          [Wed Nov 13 16:17:26 2019] ceph: alloc_inode 000000000aeb2b5f
          [Wed Nov 13 16:17:26 2019] ceph: fsync 000000000aeb2b5f
          [Wed Nov 13 16:17:26 2019] ceph: fsync dirty caps are -
          [Wed Nov 13 16:17:30 2019] ceph: fsync 000000000aeb2b5f result=0

           

          As a fix we add StandardOpenOption.CREATE OpenOption.

          Server startup with 2k job required to be migrated before fix startup took ~30min

          after startup 2 min
          New: Hello, during migration for NFS to CephFS file storage we faced with performance degradation of Server startup due to *RunIdMigrator*.

           

          After trace analysis we figure out following thing:

          *AtomicFileWriter* create *FileChannelWriterwith* with only one OpenOption - *StandardOpenOption.WRITE*.

          For a newly created File, in case when *AtomicFileWriter* used to create new Empty file (ex: jenkins.model.RunIdMigrator#save) it leads to full fs sync instead of fsync on dirty inodes

          As a result this operation took up to 5 sec on CephFS.

          As a fix we add *StandardOpenOption.CREATE* OpenOption.

           

          Ceph logs Before Fix:
          {code:java}
           [Wed Nov 13 16:17:26 2019] ceph: alloc_inode 000000000aeb2b5f
           [Wed Nov 13 16:17:26 2019] ceph: fsync 000000000aeb2b5f
           [Wed Nov 13 16:17:26 2019] ceph: fsync dirty caps are -
           [Wed Nov 13 16:17:30 2019] ceph: fsync 000000000aeb2b5f result=0{code}
           

          Ceph logs After Fix:
          {code:java}
           [Wed Nov 13 16:05:43 2019] ceph: alloc_inode 000000001442a671
           [Wed Nov 13 16:05:43 2019] ceph: inode 000000001442a671 now !dirty
           [Wed Nov 13 16:05:43 2019] ceph: fsync 000000001442a671
           [Wed Nov 13 16:05:43 2019] ceph: fsync dirty caps are Fw
           [Wed Nov 13 16:05:43 2019] ceph: inode 000000001442a671 now !flushing
           [Wed Nov 13 16:05:43 2019] ceph: inode 000000001442a671 now clean
           [Wed Nov 13 16:05:43 2019] ceph: fsync 000000001442a671 result=0{code}
          Server startup with 2k job required to be migrated:
           * before fix startup took ~30min
           * after startup 2 min
          Konstantin Bulanov made changes -
          Description Original: Hello, during migration for NFS to CephFS file storage we faced with performance degradation of Server startup due to *RunIdMigrator*.

           

          After trace analysis we figure out following thing:

          *AtomicFileWriter* create *FileChannelWriterwith* with only one OpenOption - *StandardOpenOption.WRITE*.

          For a newly created File, in case when *AtomicFileWriter* used to create new Empty file (ex: jenkins.model.RunIdMigrator#save) it leads to full fs sync instead of fsync on dirty inodes

          As a result this operation took up to 5 sec on CephFS.

          As a fix we add *StandardOpenOption.CREATE* OpenOption.

           

          Ceph logs Before Fix:
          {code:java}
           [Wed Nov 13 16:17:26 2019] ceph: alloc_inode 000000000aeb2b5f
           [Wed Nov 13 16:17:26 2019] ceph: fsync 000000000aeb2b5f
           [Wed Nov 13 16:17:26 2019] ceph: fsync dirty caps are -
           [Wed Nov 13 16:17:30 2019] ceph: fsync 000000000aeb2b5f result=0{code}
           

          Ceph logs After Fix:
          {code:java}
           [Wed Nov 13 16:05:43 2019] ceph: alloc_inode 000000001442a671
           [Wed Nov 13 16:05:43 2019] ceph: inode 000000001442a671 now !dirty
           [Wed Nov 13 16:05:43 2019] ceph: fsync 000000001442a671
           [Wed Nov 13 16:05:43 2019] ceph: fsync dirty caps are Fw
           [Wed Nov 13 16:05:43 2019] ceph: inode 000000001442a671 now !flushing
           [Wed Nov 13 16:05:43 2019] ceph: inode 000000001442a671 now clean
           [Wed Nov 13 16:05:43 2019] ceph: fsync 000000001442a671 result=0{code}
          Server startup with 2k job required to be migrated:
           * before fix startup took ~30min
           * after startup 2 min
          New: Hello, during migration for NFS to CephFS file storage we faced with performance degradation of Server startup due to *RunIdMigrator*.

           

          After trace analysis we figure out following thing:

          *AtomicFileWriter* create *FileChannelWriterwith* with only one OpenOption - *StandardOpenOption.WRITE*.

          For a newly created File, in case when *AtomicFileWriter* used to create new Empty file (ex: jenkins.model.RunIdMigrator#save) it leads to full fs sync instead of fsync on dirty inodes

          As a result this operation took up to 5 sec on CephFS.

          As a fix we add *StandardOpenOption.CREATE* OpenOption. MR - [https://github.com/jenkinsci/jenkins/pull/4357]

           

          Ceph logs Before Fix:
          {code:java}
           [Wed Nov 13 16:17:26 2019] ceph: alloc_inode 000000000aeb2b5f
           [Wed Nov 13 16:17:26 2019] ceph: fsync 000000000aeb2b5f
           [Wed Nov 13 16:17:26 2019] ceph: fsync dirty caps are -
           [Wed Nov 13 16:17:30 2019] ceph: fsync 000000000aeb2b5f result=0{code}
           

          Ceph logs After Fix:
          {code:java}
           [Wed Nov 13 16:05:43 2019] ceph: alloc_inode 000000001442a671
           [Wed Nov 13 16:05:43 2019] ceph: inode 000000001442a671 now !dirty
           [Wed Nov 13 16:05:43 2019] ceph: fsync 000000001442a671
           [Wed Nov 13 16:05:43 2019] ceph: fsync dirty caps are Fw
           [Wed Nov 13 16:05:43 2019] ceph: inode 000000001442a671 now !flushing
           [Wed Nov 13 16:05:43 2019] ceph: inode 000000001442a671 now clean
           [Wed Nov 13 16:05:43 2019] ceph: fsync 000000001442a671 result=0{code}
          Server startup with 2k job required to be migrated:
           * before fix startup took ~30min
           * after startup 2 min
          Adrien Lecharpentier made changes -
          Assignee New: Konstantin Bulanov [ bulanovk ]
          Adrien Lecharpentier made changes -
          Status Original: Open [ 1 ] New: In Progress [ 3 ]
          Adrien Lecharpentier made changes -
          Status Original: In Progress [ 3 ] New: In Review [ 10005 ]
          Konstantin Bulanov made changes -
          Description Original: Hello, during migration for NFS to CephFS file storage we faced with performance degradation of Server startup due to *RunIdMigrator*.

           

          After trace analysis we figure out following thing:

          *AtomicFileWriter* create *FileChannelWriterwith* with only one OpenOption - *StandardOpenOption.WRITE*.

          For a newly created File, in case when *AtomicFileWriter* used to create new Empty file (ex: jenkins.model.RunIdMigrator#save) it leads to full fs sync instead of fsync on dirty inodes

          As a result this operation took up to 5 sec on CephFS.

          As a fix we add *StandardOpenOption.CREATE* OpenOption. MR - [https://github.com/jenkinsci/jenkins/pull/4357]

           

          Ceph logs Before Fix:
          {code:java}
           [Wed Nov 13 16:17:26 2019] ceph: alloc_inode 000000000aeb2b5f
           [Wed Nov 13 16:17:26 2019] ceph: fsync 000000000aeb2b5f
           [Wed Nov 13 16:17:26 2019] ceph: fsync dirty caps are -
           [Wed Nov 13 16:17:30 2019] ceph: fsync 000000000aeb2b5f result=0{code}
           

          Ceph logs After Fix:
          {code:java}
           [Wed Nov 13 16:05:43 2019] ceph: alloc_inode 000000001442a671
           [Wed Nov 13 16:05:43 2019] ceph: inode 000000001442a671 now !dirty
           [Wed Nov 13 16:05:43 2019] ceph: fsync 000000001442a671
           [Wed Nov 13 16:05:43 2019] ceph: fsync dirty caps are Fw
           [Wed Nov 13 16:05:43 2019] ceph: inode 000000001442a671 now !flushing
           [Wed Nov 13 16:05:43 2019] ceph: inode 000000001442a671 now clean
           [Wed Nov 13 16:05:43 2019] ceph: fsync 000000001442a671 result=0{code}
          Server startup with 2k job required to be migrated:
           * before fix startup took ~30min
           * after startup 2 min
          New: Hello, during migration from NFS to CephFS file storage we faced with performance degradation of Server startup due to *RunIdMigrator*.

           

          After trace analysis we figure out following thing:

          *AtomicFileWriter* create *FileChannelWriterwith* with only one OpenOption - *StandardOpenOption.WRITE*.

          For a newly created File, in case when *AtomicFileWriter* used to create new Empty file (ex: jenkins.model.RunIdMigrator#save) it leads to full fs sync instead of fsync on dirty inodes

          As a result this operation took up to 5 sec on CephFS.

          As a fix we add *StandardOpenOption.CREATE* OpenOption. MR - [https://github.com/jenkinsci/jenkins/pull/4357]

           

          Ceph logs Before Fix:
          {code:java}
           [Wed Nov 13 16:17:26 2019] ceph: alloc_inode 000000000aeb2b5f
           [Wed Nov 13 16:17:26 2019] ceph: fsync 000000000aeb2b5f
           [Wed Nov 13 16:17:26 2019] ceph: fsync dirty caps are -
           [Wed Nov 13 16:17:30 2019] ceph: fsync 000000000aeb2b5f result=0{code}
           

          Ceph logs After Fix:
          {code:java}
           [Wed Nov 13 16:05:43 2019] ceph: alloc_inode 000000001442a671
           [Wed Nov 13 16:05:43 2019] ceph: inode 000000001442a671 now !dirty
           [Wed Nov 13 16:05:43 2019] ceph: fsync 000000001442a671
           [Wed Nov 13 16:05:43 2019] ceph: fsync dirty caps are Fw
           [Wed Nov 13 16:05:43 2019] ceph: inode 000000001442a671 now !flushing
           [Wed Nov 13 16:05:43 2019] ceph: inode 000000001442a671 now clean
           [Wed Nov 13 16:05:43 2019] ceph: fsync 000000001442a671 result=0{code}
          Server startup with 2k job required to be migrated:
           * before fix startup took ~30min
           * after startup 2 min
          Oleg Nenashev made changes -
          Released As New: Jenkins 2.206
          Resolution New: Fixed [ 1 ]
          Status Original: In Review [ 10005 ] New: Resolved [ 5 ]
          Oleg Nenashev made changes -
          Labels New: lts-candidate
          Oliver Gondža made changes -
          Remote Link New: This issue links to "jenkinsci/jenkins#4357 (Web Link)" [ 24022 ]
          Oliver Gondža made changes -
          Labels Original: lts-candidate New: 2.204.1-rejected lts-candidate

            bulanovk Konstantin Bulanov
            bulanovk Konstantin Bulanov
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: