Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-27591

Improve performance of initial scan by narrowing the job config pattern

    XMLWordPrintable

    Details

    • Similar Issues:

      Description

      Our Jenkins jobs archive quite a lot of artifacts, and we've been running our jobs for a couple of years. As a result, under our jobs directory there are tens of millions of files and directories.

      When activating the SCM Sync Configuration Plugin on such a large directory, it will scan for files that matches the pattern **/jobs/*/config.xml under the root directory Hudson.getInstance().getRootDir(). The initial ** segment makes this scan recursive, and every directory has to be traversed.

      In our Jenkins installation, the SCM Sync Configuration Plugin never completes the scan, the data to scan is simply too large. I gave up after a couple of hours of scanning and aborted it.

      The jobs directory is hardcoded in Jenkins core to be located at Jenkins.getInstance().getRootDir() + "jobs". See jenkins.model.Jenkins.getRootDirFor(String name) for an example. So, as far as I can see, there is no use for the initial ** segment in the pattern.

      I suggest that the pattern is narrowed to improve the performance of the initial scan. By changing the PATTERNS constant in hudson.plugins.scm_sync_configuration.strategies.impl.JobConfigScmSyncStrategy to jobs/*/config.xml, the scan is no longer recursive. I've tested this change on our Jenkins installation, and it allows the scan to finish within seconds as opposed to hours.

        Attachments

          Issue Links

            Activity

            jonaskaveby Jonas Lind created issue -
            Hide
            fcamblor Frédéric Camblor added a comment -

            This change was introduced by @ndeloof for Cloudbees folder plugin (see this commit)

            Wondering what to do...

            Show
            fcamblor Frédéric Camblor added a comment - This change was introduced by @ndeloof for Cloudbees folder plugin (see this commit ) Wondering what to do...
            fcamblor Frédéric Camblor made changes -
            Field Original Value New Value
            Priority Blocker [ 1 ] Major [ 3 ]
            jonaskaveby Jonas Lind made changes -
            Description Our Jenkins jobs archive quite a lot of artifacts, and we've been running our jobs for a couple of years. As a result, under our jobs directory there are tens of millions of files and directories.

            When activating the SCM Sync Configuration Plugin on such a large directory, it will scan for files that matches the pattern "**/jobs/*/config.xml" under the root directory Hudson.getInstance().getRootDir(). The initial "**" segment makes this scan recursive, and every directory has to be traversed.

            In our Jenkins installation, the SCM Sync Configuration Plugin never completes the scan, the data to scan is simply too large. I gave up after a couple of hours of scanning and aborted it.

            The jobs directory is hardcoded in Jenkins core to be located at Jenkins.getInstance().getRootDir() + "jobs". See jenkins.model.Jenkins.getRootDirFor(String name) for an example. So, as far as I can see, there is no use for the initial "**" segment in the pattern.

            I suggest that the pattern is narrowed to improve the performance of the initial scan. By changing the PATTERNS constant in hudson.plugins.scm_sync_configuration.strategies.impl.JobConfigScmSyncStrategy to "jobs/*/config.xml", the scan is no longer recursive. I've tested this change on our Jenkins installation, and it allows the scan to finish within seconds as opposed to hours.
            Our Jenkins jobs archive quite a lot of artifacts, and we've been running our jobs for a couple of years. As a result, under our jobs directory there are tens of millions of files and directories.

            When activating the SCM Sync Configuration Plugin on such a large directory, it will scan for files that matches the pattern {{\*\*/jobs/\*/config.xml}} under the root directory {{Hudson.getInstance().getRootDir()}}. The initial {{\*\*}} segment makes this scan recursive, and every directory has to be traversed.

            In our Jenkins installation, the SCM Sync Configuration Plugin never completes the scan, the data to scan is simply too large. I gave up after a couple of hours of scanning and aborted it.

            The jobs directory is hardcoded in Jenkins core to be located at {{Jenkins.getInstance().getRootDir() + "jobs"}}. See {{jenkins.model.Jenkins.getRootDirFor(String name)}} for an example. So, as far as I can see, there is no use for the initial {{**}} segment in the pattern.

            I suggest that the pattern is narrowed to improve the performance of the initial scan. By changing the {{PATTERNS}} constant in {{hudson.plugins.scm_sync_configuration.strategies.impl.JobConfigScmSyncStrategy}} to {{jobs/\*/config.xml}}, the scan is no longer recursive. I've tested this change on our Jenkins installation, and it allows the scan to finish within seconds as opposed to hours.
            Hide
            jonaskaveby Jonas Lind added a comment -

            Ah, we're not using the Cloudbees folder plugin so unfortunately I missed that feature. Thank you for the information.

            The problem still remains, but my proposed solution is not valid. I'll see if I can come up with a better solution.

            Show
            jonaskaveby Jonas Lind added a comment - Ah, we're not using the Cloudbees folder plugin so unfortunately I missed that feature. Thank you for the information. The problem still remains, but my proposed solution is not valid. I'll see if I can come up with a better solution.
            Hide
            fcamblor Frédéric Camblor added a comment -

            I think you might be interested by JENKINS-19659 which would be able to solve your problem once implemented

            Show
            fcamblor Frédéric Camblor added a comment - I think you might be interested by JENKINS-19659 which would be able to solve your problem once implemented
            fcamblor Frédéric Camblor made changes -
            Link This issue is related to JENKINS-19659 [ JENKINS-19659 ]
            Hide
            jonaskaveby Jonas Lind added a comment -

            Thanks, that might be a solution.

            So, if JENKINS-19659 is implemented, I would be able to disable default includes and then manually add the following includes:
            config.xml
            hudson*.xml
            scm-sync-configuration.xml
            jobs/*/config.xml
            users/*/config.xml

            If I understand the current suggestion in JENKINS-19659 and the SCM Sync Configuration Plugin source code, such a solution would have the drawback of losing the following functionality:

            • the SCM Sync status footer
            • detailed commit messages
            • the ability to customize commit messages

            Perhaps that is an acceptable tradeoff.

            Show
            jonaskaveby Jonas Lind added a comment - Thanks, that might be a solution. So, if JENKINS-19659 is implemented, I would be able to disable default includes and then manually add the following includes: config.xml hudson*.xml scm-sync-configuration.xml jobs/*/config.xml users/*/config.xml If I understand the current suggestion in JENKINS-19659 and the SCM Sync Configuration Plugin source code, such a solution would have the drawback of losing the following functionality: the SCM Sync status footer detailed commit messages the ability to customize commit messages Perhaps that is an acceptable tradeoff.
            Hide
            fcamblor Frédéric Camblor added a comment -

            In JENKINS-19659, I was planning to disable strategies one by one (job strategies, standard config strategies, etc..)

            Anyway, I don't understand your assumption :

            If I understand the current suggestion in JENKINS-19659 and the SCM Sync Configuration Plugin source code, such a solution would have the drawback of losing the following functionality:

            • the SCM Sync status footer
            • detailed commit messages
            • the ability to customize commit messages

            I don't think we would have this drawback.
            What made you make this assumption?

            Show
            fcamblor Frédéric Camblor added a comment - In JENKINS-19659 , I was planning to disable strategies one by one (job strategies, standard config strategies, etc..) Anyway, I don't understand your assumption : If I understand the current suggestion in JENKINS-19659 and the SCM Sync Configuration Plugin source code, such a solution would have the drawback of losing the following functionality: the SCM Sync status footer detailed commit messages the ability to customize commit messages I don't think we would have this drawback. What made you make this assumption?
            Hide
            jonaskaveby Jonas Lind added a comment -

            Okay, disabling strategies one by one makes more sense, my misunderstanding.

            the SCM Sync status footer

            I was wrong, this doesn't seem to be affected (I was sloppy reading footer.jelly yesterday).

            detailed commit messages

            Since the job strategy would be disabled and the manual strategy used instead, the commit messages would be Modification on configuration(s) / Item renamed / File hierarchy deleted instead of Job XYZ configuration updated / Job XYZ hierarchy renamed from ABC to DEF / Job XYZ hierarchy deleted. It makes it less clear what's been changed when skimming the changes in an scm browser, but it's not a blocker.

            the ability to customize commit messages

            The decorateOnsubmitForm logic in footer.jelly is only shown if the current page matches a PageMatcher from a strategy. If the job strategy is disabled and the manual strategy used instead, I assume there will be no PageMatcher matching URIs like view/All/job/myjob/configure, so the user won't be given an opportunity to provide a custom commit message when configuring jobs. Not a big issue.

            So, the tradeoff would be improved performance against less clarity in the SCM history. I think it's an acceptable tradeoff, but the best would be if I could keep both of course.

            By the way, I found a bug in the PageMatcher regexp in JobConfigScmSyncStrategy. I'll post a separate JIRA issue for that.

            Show
            jonaskaveby Jonas Lind added a comment - Okay, disabling strategies one by one makes more sense, my misunderstanding. the SCM Sync status footer I was wrong, this doesn't seem to be affected (I was sloppy reading footer.jelly yesterday). detailed commit messages Since the job strategy would be disabled and the manual strategy used instead, the commit messages would be Modification on configuration(s) / Item renamed / File hierarchy deleted instead of Job XYZ configuration updated / Job XYZ hierarchy renamed from ABC to DEF / Job XYZ hierarchy deleted . It makes it less clear what's been changed when skimming the changes in an scm browser, but it's not a blocker. the ability to customize commit messages The decorateOnsubmitForm logic in footer.jelly is only shown if the current page matches a PageMatcher from a strategy. If the job strategy is disabled and the manual strategy used instead, I assume there will be no PageMatcher matching URIs like view/All/job/myjob/configure , so the user won't be given an opportunity to provide a custom commit message when configuring jobs. Not a big issue. So, the tradeoff would be improved performance against less clarity in the SCM history. I think it's an acceptable tradeoff, but the best would be if I could keep both of course. By the way, I found a bug in the PageMatcher regexp in JobConfigScmSyncStrategy . I'll post a separate JIRA issue for that.
            Hide
            fcamblor Frédéric Camblor added a comment -

            Agree with your assumptions, you're right

            Show
            fcamblor Frédéric Camblor added a comment - Agree with your assumptions, you're right
            Hide
            jonaskaveby Jonas Lind added a comment -

            I agree with you that JENKINS-19659 is the proper solution to the problem reported in this issue. Thank you for your time analysing this!

            I've written JENKINS-27649 to track the regexp bug.

            Show
            jonaskaveby Jonas Lind added a comment - I agree with you that JENKINS-19659 is the proper solution to the problem reported in this issue. Thank you for your time analysing this! I've written JENKINS-27649 to track the regexp bug.
            rtyler R. Tyler Croy made changes -
            Workflow JNJira [ 161803 ] JNJira + In-Review [ 180844 ]
            fcamblor Frédéric Camblor made changes -
            Assignee Frédéric Camblor [ fcamblor ] Craig Rodrigues [ rodrigc ]
            rodrigc Craig Rodrigues made changes -
            Assignee Craig Rodrigues [ rodrigc ]

              People

              Assignee:
              Unassigned Unassigned
              Reporter:
              jonaskaveby Jonas Lind
              Votes:
              2 Vote for this issue
              Watchers:
              3 Start watching this issue

                Dates

                Created:
                Updated: