-
Bug
-
Resolution: Fixed
-
Critical
-
Jenkins 2.168-2.204
-
-
Jenkins 2.206
Hello, during migration from NFS to CephFS file storage we faced with performance degradation of Server startup due to RunIdMigrator.
After trace analysis we figure out following thing:
AtomicFileWriter create FileChannelWriterwith with only one OpenOption - StandardOpenOption.WRITE.
For a newly created File, in case when AtomicFileWriter used to create new Empty file (ex: jenkins.model.RunIdMigrator#save) it leads to full fs sync instead of fsync on dirty inodes
As a result this operation took up to 5 sec on CephFS.
As a fix we add StandardOpenOption.CREATE OpenOption. MR - https://github.com/jenkinsci/jenkins/pull/4357
Ceph logs Before Fix:
[Wed Nov 13 16:17:26 2019] ceph: alloc_inode 000000000aeb2b5f [Wed Nov 13 16:17:26 2019] ceph: fsync 000000000aeb2b5f [Wed Nov 13 16:17:26 2019] ceph: fsync dirty caps are - [Wed Nov 13 16:17:30 2019] ceph: fsync 000000000aeb2b5f result=0
Ceph logs After Fix:
[Wed Nov 13 16:05:43 2019] ceph: alloc_inode 000000001442a671 [Wed Nov 13 16:05:43 2019] ceph: inode 000000001442a671 now !dirty [Wed Nov 13 16:05:43 2019] ceph: fsync 000000001442a671 [Wed Nov 13 16:05:43 2019] ceph: fsync dirty caps are Fw [Wed Nov 13 16:05:43 2019] ceph: inode 000000001442a671 now !flushing [Wed Nov 13 16:05:43 2019] ceph: inode 000000001442a671 now clean [Wed Nov 13 16:05:43 2019] ceph: fsync 000000001442a671 result=0
Server startup with 2k job required to be migrated:
- before fix startup took ~30min
- after startup 2 min
- links to
[JENKINS-60167] AtomicFileWriter performance issue on CephFS in case of Empty File creation
Description |
Original:
Hello, during migration for NFS to CephFS file storage we faced with performance degradation of Server startup due to RunIdMigrator. After trace analysis we figure out following thing: AtomicFileWriter create FileChannelWriterwith with only one OpenOption - StandardOpenOption.WRITE. For a newly created File, in case when AtomicFileWriter used to create new Empty file (ex: jenkins.model.RunIdMigrator#save) it leads to full fs sync instead of fsync on dirty inodes As a result this operation took up to 5 sec on CephFS. Ceph logs for not Empty file: [Wed Nov 13 16:05:43 2019] ceph: alloc_inode 000000001442a671 [Wed Nov 13 16:05:43 2019] ceph: inode 000000001442a671 now !dirty [Wed Nov 13 16:05:43 2019] ceph: fsync 000000001442a671 [Wed Nov 13 16:05:43 2019] ceph: fsync dirty caps are Fw [Wed Nov 13 16:05:43 2019] ceph: inode 000000001442a671 now !flushing [Wed Nov 13 16:05:43 2019] ceph: inode 000000001442a671 now clean [Wed Nov 13 16:05:43 2019] ceph: fsync 000000001442a671 result=0 Ceph logs for Empty file: [Wed Nov 13 16:17:26 2019] ceph: alloc_inode 000000000aeb2b5f [Wed Nov 13 16:17:26 2019] ceph: fsync 000000000aeb2b5f [Wed Nov 13 16:17:26 2019] ceph: fsync dirty caps are - [Wed Nov 13 16:17:30 2019] ceph: fsync 000000000aeb2b5f result=0 As a fix we add StandardOpenOption.CREATE OpenOption. Server startup with 2k job required to be migrated before fix startup took ~30min after startup 2 min |
New:
Hello, during migration for NFS to CephFS file storage we faced with performance degradation of Server startup due to *RunIdMigrator*. After trace analysis we figure out following thing: *AtomicFileWriter* create *FileChannelWriterwith* with only one OpenOption - *StandardOpenOption.WRITE*. For a newly created File, in case when *AtomicFileWriter* used to create new Empty file (ex: jenkins.model.RunIdMigrator#save) it leads to full fs sync instead of fsync on dirty inodes As a result this operation took up to 5 sec on CephFS. As a fix we add *StandardOpenOption.CREATE* OpenOption. Ceph logs Before Fix: {code:java} [Wed Nov 13 16:17:26 2019] ceph: alloc_inode 000000000aeb2b5f [Wed Nov 13 16:17:26 2019] ceph: fsync 000000000aeb2b5f [Wed Nov 13 16:17:26 2019] ceph: fsync dirty caps are - [Wed Nov 13 16:17:30 2019] ceph: fsync 000000000aeb2b5f result=0{code} Ceph logs After Fix: {code:java} [Wed Nov 13 16:05:43 2019] ceph: alloc_inode 000000001442a671 [Wed Nov 13 16:05:43 2019] ceph: inode 000000001442a671 now !dirty [Wed Nov 13 16:05:43 2019] ceph: fsync 000000001442a671 [Wed Nov 13 16:05:43 2019] ceph: fsync dirty caps are Fw [Wed Nov 13 16:05:43 2019] ceph: inode 000000001442a671 now !flushing [Wed Nov 13 16:05:43 2019] ceph: inode 000000001442a671 now clean [Wed Nov 13 16:05:43 2019] ceph: fsync 000000001442a671 result=0{code} Server startup with 2k job required to be migrated: * before fix startup took ~30min * after startup 2 min |
Description |
Original:
Hello, during migration for NFS to CephFS file storage we faced with performance degradation of Server startup due to *RunIdMigrator*. After trace analysis we figure out following thing: *AtomicFileWriter* create *FileChannelWriterwith* with only one OpenOption - *StandardOpenOption.WRITE*. For a newly created File, in case when *AtomicFileWriter* used to create new Empty file (ex: jenkins.model.RunIdMigrator#save) it leads to full fs sync instead of fsync on dirty inodes As a result this operation took up to 5 sec on CephFS. As a fix we add *StandardOpenOption.CREATE* OpenOption. Ceph logs Before Fix: {code:java} [Wed Nov 13 16:17:26 2019] ceph: alloc_inode 000000000aeb2b5f [Wed Nov 13 16:17:26 2019] ceph: fsync 000000000aeb2b5f [Wed Nov 13 16:17:26 2019] ceph: fsync dirty caps are - [Wed Nov 13 16:17:30 2019] ceph: fsync 000000000aeb2b5f result=0{code} Ceph logs After Fix: {code:java} [Wed Nov 13 16:05:43 2019] ceph: alloc_inode 000000001442a671 [Wed Nov 13 16:05:43 2019] ceph: inode 000000001442a671 now !dirty [Wed Nov 13 16:05:43 2019] ceph: fsync 000000001442a671 [Wed Nov 13 16:05:43 2019] ceph: fsync dirty caps are Fw [Wed Nov 13 16:05:43 2019] ceph: inode 000000001442a671 now !flushing [Wed Nov 13 16:05:43 2019] ceph: inode 000000001442a671 now clean [Wed Nov 13 16:05:43 2019] ceph: fsync 000000001442a671 result=0{code} Server startup with 2k job required to be migrated: * before fix startup took ~30min * after startup 2 min |
New:
Hello, during migration for NFS to CephFS file storage we faced with performance degradation of Server startup due to *RunIdMigrator*. After trace analysis we figure out following thing: *AtomicFileWriter* create *FileChannelWriterwith* with only one OpenOption - *StandardOpenOption.WRITE*. For a newly created File, in case when *AtomicFileWriter* used to create new Empty file (ex: jenkins.model.RunIdMigrator#save) it leads to full fs sync instead of fsync on dirty inodes As a result this operation took up to 5 sec on CephFS. As a fix we add *StandardOpenOption.CREATE* OpenOption. MR - [https://github.com/jenkinsci/jenkins/pull/4357] Ceph logs Before Fix: {code:java} [Wed Nov 13 16:17:26 2019] ceph: alloc_inode 000000000aeb2b5f [Wed Nov 13 16:17:26 2019] ceph: fsync 000000000aeb2b5f [Wed Nov 13 16:17:26 2019] ceph: fsync dirty caps are - [Wed Nov 13 16:17:30 2019] ceph: fsync 000000000aeb2b5f result=0{code} Ceph logs After Fix: {code:java} [Wed Nov 13 16:05:43 2019] ceph: alloc_inode 000000001442a671 [Wed Nov 13 16:05:43 2019] ceph: inode 000000001442a671 now !dirty [Wed Nov 13 16:05:43 2019] ceph: fsync 000000001442a671 [Wed Nov 13 16:05:43 2019] ceph: fsync dirty caps are Fw [Wed Nov 13 16:05:43 2019] ceph: inode 000000001442a671 now !flushing [Wed Nov 13 16:05:43 2019] ceph: inode 000000001442a671 now clean [Wed Nov 13 16:05:43 2019] ceph: fsync 000000001442a671 result=0{code} Server startup with 2k job required to be migrated: * before fix startup took ~30min * after startup 2 min |
Assignee | New: Konstantin Bulanov [ bulanovk ] |
Status | Original: Open [ 1 ] | New: In Progress [ 3 ] |
Status | Original: In Progress [ 3 ] | New: In Review [ 10005 ] |
Description |
Original:
Hello, during migration for NFS to CephFS file storage we faced with performance degradation of Server startup due to *RunIdMigrator*. After trace analysis we figure out following thing: *AtomicFileWriter* create *FileChannelWriterwith* with only one OpenOption - *StandardOpenOption.WRITE*. For a newly created File, in case when *AtomicFileWriter* used to create new Empty file (ex: jenkins.model.RunIdMigrator#save) it leads to full fs sync instead of fsync on dirty inodes As a result this operation took up to 5 sec on CephFS. As a fix we add *StandardOpenOption.CREATE* OpenOption. MR - [https://github.com/jenkinsci/jenkins/pull/4357] Ceph logs Before Fix: {code:java} [Wed Nov 13 16:17:26 2019] ceph: alloc_inode 000000000aeb2b5f [Wed Nov 13 16:17:26 2019] ceph: fsync 000000000aeb2b5f [Wed Nov 13 16:17:26 2019] ceph: fsync dirty caps are - [Wed Nov 13 16:17:30 2019] ceph: fsync 000000000aeb2b5f result=0{code} Ceph logs After Fix: {code:java} [Wed Nov 13 16:05:43 2019] ceph: alloc_inode 000000001442a671 [Wed Nov 13 16:05:43 2019] ceph: inode 000000001442a671 now !dirty [Wed Nov 13 16:05:43 2019] ceph: fsync 000000001442a671 [Wed Nov 13 16:05:43 2019] ceph: fsync dirty caps are Fw [Wed Nov 13 16:05:43 2019] ceph: inode 000000001442a671 now !flushing [Wed Nov 13 16:05:43 2019] ceph: inode 000000001442a671 now clean [Wed Nov 13 16:05:43 2019] ceph: fsync 000000001442a671 result=0{code} Server startup with 2k job required to be migrated: * before fix startup took ~30min * after startup 2 min |
New:
Hello, during migration from NFS to CephFS file storage we faced with performance degradation of Server startup due to *RunIdMigrator*. After trace analysis we figure out following thing: *AtomicFileWriter* create *FileChannelWriterwith* with only one OpenOption - *StandardOpenOption.WRITE*. For a newly created File, in case when *AtomicFileWriter* used to create new Empty file (ex: jenkins.model.RunIdMigrator#save) it leads to full fs sync instead of fsync on dirty inodes As a result this operation took up to 5 sec on CephFS. As a fix we add *StandardOpenOption.CREATE* OpenOption. MR - [https://github.com/jenkinsci/jenkins/pull/4357] Ceph logs Before Fix: {code:java} [Wed Nov 13 16:17:26 2019] ceph: alloc_inode 000000000aeb2b5f [Wed Nov 13 16:17:26 2019] ceph: fsync 000000000aeb2b5f [Wed Nov 13 16:17:26 2019] ceph: fsync dirty caps are - [Wed Nov 13 16:17:30 2019] ceph: fsync 000000000aeb2b5f result=0{code} Ceph logs After Fix: {code:java} [Wed Nov 13 16:05:43 2019] ceph: alloc_inode 000000001442a671 [Wed Nov 13 16:05:43 2019] ceph: inode 000000001442a671 now !dirty [Wed Nov 13 16:05:43 2019] ceph: fsync 000000001442a671 [Wed Nov 13 16:05:43 2019] ceph: fsync dirty caps are Fw [Wed Nov 13 16:05:43 2019] ceph: inode 000000001442a671 now !flushing [Wed Nov 13 16:05:43 2019] ceph: inode 000000001442a671 now clean [Wed Nov 13 16:05:43 2019] ceph: fsync 000000001442a671 result=0{code} Server startup with 2k job required to be migrated: * before fix startup took ~30min * after startup 2 min |
Released As | New: Jenkins 2.206 | |
Resolution | New: Fixed [ 1 ] | |
Status | Original: In Review [ 10005 ] | New: Resolved [ 5 ] |
Labels | New: lts-candidate |
Released in Jenkins 2.206. It might be a good LTS candideate, but I am not sure about 2.204.1. Maybe it needs more soak testing though the change itself is definitely valid