Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-20212

TAP Plugin Uses High CPU/IO When Projects Have Lots of Tests

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • tap-plugin
    • None
    • Jenkins 1.536-1.1
      TAP Plugin v 1.15
      Redhat EL 6.3
      Java 1.6.0 / Java 1.7.0_07 (tried both)

      We have several projects which run TAP tests. One such project is getting hundreds of new tests each week, and we're finding Jenkins is getting slower and slower each time we add some new tests.

      It seems that the TAP plugin reads a lot of files and their contents every time you click around inside a project (it may be due to the trend chart, or to report "Latest test result" on the project home page - or both, or something else entirely). I did an 'strace' of the Java process, and found lots of this sort of thing:

      [pid 9323] read(155, " <org.tap4j.model.TestResult re"..., 8192) = 8192
      [pid 9323] fstat(155,

      {st_mode=S_IFREG|0644, st_size=17020120, ...}) = 0
      [pid 9323] lseek(155, 0, SEEK_CUR) = 15171584
      [pid 9323] lseek(155, 0, SEEK_END) = 17020120
      [pid 9323] lseek(155, 15171584, SEEK_SET) = 15171584
      [pid 9323] read(155, "<org.tap4j.model.TestResult refe"..., 8192) = 8192
      [pid 9323] fstat(155, {st_mode=S_IFREG|0644, st_size=17020120, ...}

      ) = 0
      [pid 9323] lseek(155, 0, SEEK_CUR) = 15179776
      [pid 9323] lseek(155, 0, SEEK_END) = 17020120
      [pid 9323] lseek(155, 15179776, SEEK_SET) = 15179776
      [pid 9323] read(155, "ce=\"../../tapLines/org.tap4j.mod"..., 8192) = 8192
      [pid 9323] fstat(155,

      {st_mode=S_IFREG|0644, st_size=17020120, ...}) = 0
      [pid 9323] lseek(155, 0, SEEK_CUR) = 15187968
      [pid 9323] lseek(155, 0, SEEK_END) = 17020120
      [pid 9323] lseek(155, 15187968, SEEK_SET) = 15187968
      [pid 9323] read(155, ".tap4j.model.Comment[2]\"/>\n "..., 8192) = 8192
      [pid 9323] fstat(155, {st_mode=S_IFREG|0644, st_size=17020120, ...}

      ) = 0
      [pid 9323] lseek(155, 0, SEEK_CUR) = 15196160
      [pid 9323] lseek(155, 0, SEEK_END) = 17020120
      [pid 9323] lseek(155, 15196160, SEEK_SET) = 15196160
      [pid 9323] read(155, "mment reference=\"../../tapLines/"..., 8192) = 8192
      [pid 9323] fstat(155,

      {st_mode=S_IFREG|0644, st_size=17020120, ...}) = 0
      [pid 9323] lseek(155, 0, SEEK_CUR) = 15204352
      [pid 9323] lseek(155, 0, SEEK_END) = 17020120
      [pid 9323] lseek(155, 15204352, SEEK_SET) = 15204352
      [pid 9323] read(155, "stResult[106]/comments/org.tap4j"..., 8192) = 8192
      [pid 9323] fstat(155, {st_mode=S_IFREG|0644, st_size=17020120, ...}

      ) = 0
      [pid 9323] lseek(155, 0, SEEK_CUR) = 15212544
      [pid 9323] lseek(155, 0, SEEK_END) = 17020120
      [pid 9323] lseek(155, 15212544, SEEK_SET) = 15212544
      [pid 9323] read(155, "Result[136]/comments/org.tap4j.m"..., 8192) = 8192
      [pid 9323] fstat(155,

      {st_mode=S_IFREG|0644, st_size=17020120, ...}

      ) = 0
      [pid 9323] lseek(155, 0, SEEK_CUR) = 15220736
      [pid 9323] lseek(155, 0, SEEK_END) = 17020120
      [pid 9323] lseek(155, 15220736, SEEK_SET) = 15220736

      (occasionally, I see a bit of text that looks like the TAP test text in there too).

      The good news is that when Jenkins is working on a slow request, it doesn't seem to affect the rest of the server - clicking about in other projects seems to work fine (maybe a bit slower than usual, but acceptable).

      We now have >10000 tests in one particular project, and Jenkins pretty much never responds to clicks. We've had to remove blocks of tests that we think work so that Jenkins runs properly again. Disabling the TAP plugin also works, although means we can't see our test results

      What can we do about this? What can I/we do to help find a solution?

          [JENKINS-20212] TAP Plugin Uses High CPU/IO When Projects Have Lots of Tests

          Ralph Bolton added a comment - - edited

          On further investigation, it seems it's slow to read all the previous build.xml files (eg. /var/lib/jenkins/jobs/projectname/builds/2013-10-21_16-47-52/build.xml). Our workaround now is either to have lest tests, or less build history.

          Ralph Bolton added a comment - - edited On further investigation, it seems it's slow to read all the previous build.xml files (eg. /var/lib/jenkins/jobs/projectname/builds/2013-10-21_16-47-52/build.xml). Our workaround now is either to have lest tests, or less build history.

          I use the tap2junit converter and the default test reporter for now.

          http://search.cpan.org/~gtermars/TAP-Formatter-JUnit/bin/tap2junit.

          I also have too many tests and too much history for the TAP plugin to be useful.

          Ken Raffenetti added a comment - I use the tap2junit converter and the default test reporter for now. http://search.cpan.org/~gtermars/TAP-Formatter-JUnit/bin/tap2junit . I also have too many tests and too much history for the TAP plugin to be useful.

          Ralph Bolton added a comment -

          If junit is the only way to go, then maybe we'll go ahead and generate junit results instead of TAP (although the converter may find its way into our Perl unit testing). It would be a shame though, as TAP is all we really need, and very console-friendly. For now, we've restricted the problematic (massive) project to keeping only 5 previous builds - it seems to behave itself much better now, but it's not really a great solution.

          Ralph Bolton added a comment - If junit is the only way to go, then maybe we'll go ahead and generate junit results instead of TAP (although the converter may find its way into our Perl unit testing). It would be a shame though, as TAP is all we really need, and very console-friendly. For now, we've restricted the problematic (massive) project to keeping only 5 previous builds - it seems to behave itself much better now, but it's not really a great solution.

          Ralph Bolton added a comment -

          Sadly, the pain level got too high and so I switched one of our testing systems to use JUnit instead of TAP (this system has a couple of dozen projects each with say a dozen tests in them, but a couple of the projects have thousands of tests - and there's where the TAP plugin didn't work so well).

          In retrospect, TAP is actually a much better format, although JUnit integrates with Jenkins a little better. It's not a whole lot of work to switch back, so if the TAP plugin gets fixed we could go back to it.

          Ralph Bolton added a comment - Sadly, the pain level got too high and so I switched one of our testing systems to use JUnit instead of TAP (this system has a couple of dozen projects each with say a dozen tests in them, but a couple of the projects have thousands of tests - and there's where the TAP plugin didn't work so well). In retrospect, TAP is actually a much better format, although JUnit integrates with Jenkins a little better. It's not a whole lot of work to switch back, so if the TAP plugin gets fixed we could go back to it.

          David Whitley added a comment -

          It seems to be the fact that the TAP plugin saves its test results in the main build.xml file that is the problem. One of our core modules has over 60,000 tests, which leads to build.xml reaching close to 100Mb in size. When the Jenkins UI populates the build history display on the left of the project page it parses build.xml to extract the <result> tag to flag the build as success, failed, unstable etc. It must parse the entire xml to extract this tag however, so with a 20 job history it parses 2Gb of xml, which causes the UI to grind to a halt for minutes. A more elegant solution would be to structure the TAP plugin as the warnings plugin, where the number of test passes / failures / skips can be stored in build.xml (allowing a quick rendering of the trend plots etc) with the actual test results / output being stored in a separate xml file which only needs to be parsed when the TAP results page is loaded.

          Does this sound sensible / achievable?

          David Whitley added a comment - It seems to be the fact that the TAP plugin saves its test results in the main build.xml file that is the problem. One of our core modules has over 60,000 tests, which leads to build.xml reaching close to 100Mb in size. When the Jenkins UI populates the build history display on the left of the project page it parses build.xml to extract the <result> tag to flag the build as success, failed, unstable etc. It must parse the entire xml to extract this tag however, so with a 20 job history it parses 2Gb of xml, which causes the UI to grind to a halt for minutes. A more elegant solution would be to structure the TAP plugin as the warnings plugin, where the number of test passes / failures / skips can be stored in build.xml (allowing a quick rendering of the trend plots etc) with the actual test results / output being stored in a separate xml file which only needs to be parsed when the TAP results page is loaded. Does this sound sensible / achievable?

            kinow Bruno P. Kinoshita
            coofercat Ralph Bolton
            Votes:
            5 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated: