-
Bug
-
Resolution: Fixed
-
Critical
-
Jenkins 2.168-2.204
-
-
Jenkins 2.206
Hello, during migration from NFS to CephFS file storage we faced with performance degradation of Server startup due to RunIdMigrator.
After trace analysis we figure out following thing:
AtomicFileWriter create FileChannelWriterwith with only one OpenOption - StandardOpenOption.WRITE.
For a newly created File, in case when AtomicFileWriter used to create new Empty file (ex: jenkins.model.RunIdMigrator#save) it leads to full fs sync instead of fsync on dirty inodes
As a result this operation took up to 5 sec on CephFS.
As a fix we add StandardOpenOption.CREATE OpenOption. MR - https://github.com/jenkinsci/jenkins/pull/4357
Ceph logs Before Fix:
[Wed Nov 13 16:17:26 2019] ceph: alloc_inode 000000000aeb2b5f [Wed Nov 13 16:17:26 2019] ceph: fsync 000000000aeb2b5f [Wed Nov 13 16:17:26 2019] ceph: fsync dirty caps are - [Wed Nov 13 16:17:30 2019] ceph: fsync 000000000aeb2b5f result=0
Ceph logs After Fix:
[Wed Nov 13 16:05:43 2019] ceph: alloc_inode 000000001442a671 [Wed Nov 13 16:05:43 2019] ceph: inode 000000001442a671 now !dirty [Wed Nov 13 16:05:43 2019] ceph: fsync 000000001442a671 [Wed Nov 13 16:05:43 2019] ceph: fsync dirty caps are Fw [Wed Nov 13 16:05:43 2019] ceph: inode 000000001442a671 now !flushing [Wed Nov 13 16:05:43 2019] ceph: inode 000000001442a671 now clean [Wed Nov 13 16:05:43 2019] ceph: fsync 000000001442a671 result=0
Server startup with 2k job required to be migrated:
- before fix startup took ~30min
- after startup 2 min
- links to