Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-27591

Improve performance of initial scan by narrowing the job config pattern

    XMLWordPrintable

    Details

    • Similar Issues:

      Description

      Our Jenkins jobs archive quite a lot of artifacts, and we've been running our jobs for a couple of years. As a result, under our jobs directory there are tens of millions of files and directories.

      When activating the SCM Sync Configuration Plugin on such a large directory, it will scan for files that matches the pattern **/jobs/*/config.xml under the root directory Hudson.getInstance().getRootDir(). The initial ** segment makes this scan recursive, and every directory has to be traversed.

      In our Jenkins installation, the SCM Sync Configuration Plugin never completes the scan, the data to scan is simply too large. I gave up after a couple of hours of scanning and aborted it.

      The jobs directory is hardcoded in Jenkins core to be located at Jenkins.getInstance().getRootDir() + "jobs". See jenkins.model.Jenkins.getRootDirFor(String name) for an example. So, as far as I can see, there is no use for the initial ** segment in the pattern.

      I suggest that the pattern is narrowed to improve the performance of the initial scan. By changing the PATTERNS constant in hudson.plugins.scm_sync_configuration.strategies.impl.JobConfigScmSyncStrategy to jobs/*/config.xml, the scan is no longer recursive. I've tested this change on our Jenkins installation, and it allows the scan to finish within seconds as opposed to hours.

        Attachments

          Issue Links

            Activity

            jonaskaveby Jonas Lind created issue -
            fcamblor Frédéric Camblor made changes -
            Field Original Value New Value
            Priority Blocker [ 1 ] Major [ 3 ]
            jonaskaveby Jonas Lind made changes -
            Description Our Jenkins jobs archive quite a lot of artifacts, and we've been running our jobs for a couple of years. As a result, under our jobs directory there are tens of millions of files and directories.

            When activating the SCM Sync Configuration Plugin on such a large directory, it will scan for files that matches the pattern "**/jobs/*/config.xml" under the root directory Hudson.getInstance().getRootDir(). The initial "**" segment makes this scan recursive, and every directory has to be traversed.

            In our Jenkins installation, the SCM Sync Configuration Plugin never completes the scan, the data to scan is simply too large. I gave up after a couple of hours of scanning and aborted it.

            The jobs directory is hardcoded in Jenkins core to be located at Jenkins.getInstance().getRootDir() + "jobs". See jenkins.model.Jenkins.getRootDirFor(String name) for an example. So, as far as I can see, there is no use for the initial "**" segment in the pattern.

            I suggest that the pattern is narrowed to improve the performance of the initial scan. By changing the PATTERNS constant in hudson.plugins.scm_sync_configuration.strategies.impl.JobConfigScmSyncStrategy to "jobs/*/config.xml", the scan is no longer recursive. I've tested this change on our Jenkins installation, and it allows the scan to finish within seconds as opposed to hours.
            Our Jenkins jobs archive quite a lot of artifacts, and we've been running our jobs for a couple of years. As a result, under our jobs directory there are tens of millions of files and directories.

            When activating the SCM Sync Configuration Plugin on such a large directory, it will scan for files that matches the pattern {{\*\*/jobs/\*/config.xml}} under the root directory {{Hudson.getInstance().getRootDir()}}. The initial {{\*\*}} segment makes this scan recursive, and every directory has to be traversed.

            In our Jenkins installation, the SCM Sync Configuration Plugin never completes the scan, the data to scan is simply too large. I gave up after a couple of hours of scanning and aborted it.

            The jobs directory is hardcoded in Jenkins core to be located at {{Jenkins.getInstance().getRootDir() + "jobs"}}. See {{jenkins.model.Jenkins.getRootDirFor(String name)}} for an example. So, as far as I can see, there is no use for the initial {{**}} segment in the pattern.

            I suggest that the pattern is narrowed to improve the performance of the initial scan. By changing the {{PATTERNS}} constant in {{hudson.plugins.scm_sync_configuration.strategies.impl.JobConfigScmSyncStrategy}} to {{jobs/\*/config.xml}}, the scan is no longer recursive. I've tested this change on our Jenkins installation, and it allows the scan to finish within seconds as opposed to hours.
            fcamblor Frédéric Camblor made changes -
            Link This issue is related to JENKINS-19659 [ JENKINS-19659 ]
            rtyler R. Tyler Croy made changes -
            Workflow JNJira [ 161803 ] JNJira + In-Review [ 180844 ]
            fcamblor Frédéric Camblor made changes -
            Assignee Frédéric Camblor [ fcamblor ] Craig Rodrigues [ rodrigc ]
            rodrigc Craig Rodrigues made changes -
            Assignee Craig Rodrigues [ rodrigc ]

              People

              Assignee:
              Unassigned Unassigned
              Reporter:
              jonaskaveby Jonas Lind
              Votes:
              2 Vote for this issue
              Watchers:
              3 Start watching this issue

                Dates

                Created:
                Updated: