Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-54999

Performance issue due to the bundle anonymization feature of the support-core plugin

      The CPU usage is peaking when filtering is enabled.

      [com.cloudbees.jenkins.support.filter.ContentFilters plugin="support-core@2.50"]
        [enabled]true[/enabled]
      [/com.cloudbees.jenkins.support.filter.ContentFilters]
      

      The logs are displaying stacktraces related to this anonymization

          at java.util.regex.Pattern$Start.match(Pattern.java:3463)
          at java.util.regex.Matcher.search(Matcher.java:1248)
          at java.util.regex.Matcher.find(Matcher.java:637)
          at java.util.regex.Matcher.replaceAll(Matcher.java:951)
          at com.cloudbees.jenkins.support.filter.ContentMapping.filter(ContentMapping.java:96)
          at com.cloudbees.jenkins.support.filter.SensitiveContentFilter.filter(SensitiveContentFilter.java:56)
          at com.cloudbees.jenkins.support.filter.AllContentFilters.filter(AllContentFilters.java:43)
          at com.cloudbees.jenkins.support.filter.FilteredOutputStream.filterFlushLines(FilteredOutputStream.java:185)
          at com.cloudbees.jenkins.support.filter.FilteredOutputStream.write(FilteredOutputStream.java:125)
      

      Workaround
      Disable the bundles anonymisation from the global settings

          [JENKINS-54999] Performance issue due to the bundle anonymization feature of the support-core plugin

          After analysing the issue I've seen that the CPU usage is high even in support-core-2.47, which is the version previous to anonymization since it's moving huge files.
          Anonymization implies the process of huge files such as logs. The difference is the process time needed to anonymize those files. That process is performed line per line and applies some filter to each line of each log and file, so the anonymization takes a long time. This heavy process during such a long time makes the instances unresponsive. Where can we touch to improve the performance?

          1. InetAddressContentFilter search for any String that is susceptible to be an IP address and create an new mapping object. Although the mapping existed previously, the mapping file is saved all the times by the ContentMappings class and not only when the mapping is a new one. That's a bunch of unnecessary disk accesses on write mode.
          2. Both InetAddressContentFilter and SensitiveContentFilter are using the method ContentMapping#filter which executes the Pattern#matcher and therefore compiles the pattern every time the filter is executed. This is a heavy operation and as the filter is executed line per line the process needs a long time to be executed. Since the mapping has already got the replacement String, we can substitute the Pattern objects with StringUtils invocations.
          3. As the logs are showing, the filtering is taking place inside the OutputStream (FilteredOutputStream and FilteredWriter) objects that are writing the content of the files in the final zip file. It's less expensive (from the PoV of the process time) filter the content previously to write it on the zip file. The more Stream are needed, the more expensive is the performance.

          Francisco Fernández added a comment - After analysing the issue I've seen that the CPU usage is high even in support-core-2.47 , which is the version previous to anonymization since it's moving huge files. Anonymization implies the process of huge files such as logs. The difference is the process time needed to anonymize those files. That process is performed line per line and applies some filter to each line of each log and file, so the anonymization takes a long time. This heavy process during such a long time makes the instances unresponsive. Where can we touch to improve the performance? InetAddressContentFilter search for any String that is susceptible to be an IP address and create an new mapping object. Although the mapping existed previously, the mapping file is saved all the times by the ContentMappings class and not only when the mapping is a new one. That's a bunch of unnecessary disk accesses on write mode. Both InetAddressContentFilter and SensitiveContentFilter are using the method ContentMapping#filter which executes the Pattern#matcher and therefore compiles the pattern every time the filter is executed. This is a heavy operation and as the filter is executed line per line the process needs a long time to be executed. Since the mapping has already got the replacement String, we can substitute the Pattern objects with StringUtils invocations. As the logs are showing, the filtering is taking place inside the OutputStream ( FilteredOutputStream and FilteredWriter ) objects that are writing the content of the files in the final zip file. It's less expensive (from the PoV of the process time) filter the content previously to write it on the zip file. The more Stream are needed, the more expensive is the performance.

          Comparative numbers (Some checks using 2x85MB and 1x105Mb logs and files)

          • Version 2.47: 10 sec.
          • Master with anonymisation disabled: 90 sec.
          • Master with anonymisation enabled: 638 sec.
          • PR#158 with anonymisation disabled: 90 sec.
          • PR#158 with anonymisation enabled: 169 sec.

          Francisco Fernández added a comment - Comparative numbers (Some checks using 2x85MB and 1x105Mb logs and files) Version 2.47: 10 sec. Master with anonymisation disabled: 90 sec. Master with anonymisation enabled: 638 sec. PR#158 with anonymisation disabled: 90 sec. PR#158 with anonymisation enabled: 169 sec.

            fcojfernandez Francisco Fernández
            fcojfernandez Francisco Fernández
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated:
              Resolved: