Upon upgrading plugin "build-timeout" from 1.12.2 to 1.13, all builds took about 50% longer to complete. This resulted in an unacceptably long build queue and builds timing out. This continued for 6 or 7 hours until an emergency downgrade of "build-timeout" from 1.13 back to 1.12.2 was done, which cleared the problem immediately. No other plugins were upgraded or downgraded during this time, nor were any other system wide configuration changes made.

      I suspect the use of "synchronized" in the source code change made writing to the console effectively single threaded for all running builds. (My jenkins instance has 125 slave-nodes, so I have several dozens of concurrent builds all the time.)

      Priority: I've made this "Major" because a mere install this plugin version causes my Jenkins instance to be unusable due to slower builds which cause a growing build queue and timed-out builds.

      Environment: I'm running Jenkins core LTS 1.532.2 version. I'll be glad to furnish more information as you request it.

          [JENKINS-23012] Build-timeout plugin causes builds to slow

          Darrel Vuncannon created issue -

          ikedam added a comment -

          As those synchronized methods are called at the start and the end of builds, I don't think they cause being slow.

          Rather, changes to watch log outputs may cause the problem.
          https://github.com/jenkinsci/build-timeout-plugin/commit/2129c5d8fc4a9d9432cf95ce34ce522e646eb3ff#diff-891dfa43e0d85dea7162d46b430299d7R184

          I want to make some testing versions to identify the cause.
          But I'm not sure how to reproduce the problem.

          Can you try testing versions if I provide?

          ikedam added a comment - As those synchronized methods are called at the start and the end of builds, I don't think they cause being slow. Rather, changes to watch log outputs may cause the problem. https://github.com/jenkinsci/build-timeout-plugin/commit/2129c5d8fc4a9d9432cf95ce34ce522e646eb3ff#diff-891dfa43e0d85dea7162d46b430299d7R184 I want to make some testing versions to identify the cause. But I'm not sure how to reproduce the problem. Can you try testing versions if I provide?
          ikedam made changes -
          Assignee Original: Kohsuke Kawaguchi [ kohsuke ] New: ikedam [ ikedam ]

          No, I cannot subject my employer's production jenkins to this problem again because that would impact hundreds of developers. That's unfortunate, because I know you need a place to investigate this problem.

          Can you think of another way to proceed? If you think you know the problem and can generate a version that you think is 70% likely to fix this problem, then I will risk my jenkins instance on trying it.

          Darrel Vuncannon added a comment - No, I cannot subject my employer's production jenkins to this problem again because that would impact hundreds of developers. That's unfortunate, because I know you need a place to investigate this problem. Can you think of another way to proceed? If you think you know the problem and can generate a version that you think is 70% likely to fix this problem, then I will risk my jenkins instance on trying it.

          ikedam added a comment -

          I agree that it's risky to install testing versions to the production environmtnt. I'm not so sure what the root cause is and how to fix that.
          I'll try to reproduce it in my local enviroment.

          Please let me know followings:

          • OS of the master node and slave nodes.
          • Outline of your build process. For example, running maven, running gcc, or running other native process.
            • I think whether builds are performed in Java or native processes can affect this problem.
          • How much log outputs? I think the size of whole log and the time builds take will be helpful.
            • I think much log outputs may trigger the problem.
          • Can you see what process gets slow in builds?
            • If building processes outputs timestamps, please compare them before and after downgrading build-timeout plugin.
            • If you installed Timestampler plugin, please compare timestamps in console outputs.
              • Timestamps logged with timestampler-plugin may differ from the activity of the building process as they can be buffered and delayed.
            • If you don't have timestamper-plugin installed, you'd better not install that as that plugin also captures log output and may cause the same problem.

          I think there are following possible causes to slow builds:

          • Native processes lauched in builds get slow.
            • Like processes launched with "Execute shell".
            • I don't think Jenkins cannot affect native processes as they should be completely separated by OS.
            • But slowed log output can flood output buffers of processes and may cause the processes hold for a while.
          • Jenkins takes much time to proceed build steps.
            • In this case, native processes don't get slow.
          • Jenkins takes much time to start and stop builds.
            • This can be caused by synchronized.

          ikedam added a comment - I agree that it's risky to install testing versions to the production environmtnt. I'm not so sure what the root cause is and how to fix that. I'll try to reproduce it in my local enviroment. Please let me know followings: OS of the master node and slave nodes. Outline of your build process. For example, running maven, running gcc, or running other native process. I think whether builds are performed in Java or native processes can affect this problem. How much log outputs? I think the size of whole log and the time builds take will be helpful. I think much log outputs may trigger the problem. Can you see what process gets slow in builds? If building processes outputs timestamps, please compare them before and after downgrading build-timeout plugin. If you installed Timestampler plugin , please compare timestamps in console outputs. Timestamps logged with timestampler-plugin may differ from the activity of the building process as they can be buffered and delayed. If you don't have timestamper-plugin installed, you'd better not install that as that plugin also captures log output and may cause the same problem. I think there are following possible causes to slow builds: Native processes lauched in builds get slow. Like processes launched with "Execute shell". I don't think Jenkins cannot affect native processes as they should be completely separated by OS. But slowed log output can flood output buffers of processes and may cause the processes hold for a while. Jenkins takes much time to proceed build steps. In this case, native processes don't get slow. Jenkins takes much time to start and stop builds. This can be caused by synchronized .

          I've put my reply in-line with a copy of your questions. My text is dark red to make it easier to follow. Again, thank you for your attention to this.

          • OS of the master node and slave nodes.

            - Master and slaves are Windows Server 2008 R2 Standard.
          • Outline of your build process. For example, running maven, running gcc, or running other native process.
            • I think whether builds are performed in Java or native processes can affect this problem.

              - There are 561 defined projects that have built in the past two weeks, so I'll have to generalize.
              - Our projects are 80 to 90% "free-style software projects", the rest being "multi-configuration projects".
              - We use Git and Gerrit for source code management, however some projects select "none" and use multiple git repos.
              - Our build-steps are normally "execute shell" or "execute windows batch command"
              - Overwhelmingly we use WAF for building C/C++ source files; that's both with licensed compilers and "free" compilers.
              - For authorization, we use "project-based Matrix Authorization Strategy" with 40 defined users plus anonymous. Some projects also enable project-based security.
          • How much log outputs? I think the size of whole log and the time builds take will be helpful.
            • I think much log outputs may trigger the problem.

              - Quick survey says... 20,000 to 40,000 lines of text for our most popular projects.
              - These builds average 25 to 50 minutes; strangely, the quicker projects tend to generate more logs.
          • Can you see what process gets slow in builds?
            • If building processes outputs timestamps, please compare them before and after downgrading build-timeout plugin.
            • If you installed Timestampler plugin, please compare timestamps in console outputs.
              • Timestamps logged with timestampler-plugin may differ from the activity of the building process as they can be buffered and delayed.
            • If you don't have timestamper-plugin installed, you'd better not install that as that plugin also captures log output and may cause the same problem.

              - Plugin timestamper is installed, but only some projects use it. However, the builds in question are no longer saved because they're so long ago, therefore I cannot examine them with the timestamps.
              - I do have some data captured in a database about those builds. That's data like job-name, build-number, result, build-duration, and interestingly, excerpts from the logs for failed/aborted builds.

          Darrel Vuncannon added a comment - I've put my reply in-line with a copy of your questions. My text is dark red to make it easier to follow. Again, thank you for your attention to this. OS of the master node and slave nodes. - Master and slaves are Windows Server 2008 R2 Standard. Outline of your build process. For example, running maven, running gcc, or running other native process. I think whether builds are performed in Java or native processes can affect this problem. - There are 561 defined projects that have built in the past two weeks, so I'll have to generalize. - Our projects are 80 to 90% "free-style software projects", the rest being "multi-configuration projects". - We use Git and Gerrit for source code management, however some projects select "none" and use multiple git repos. - Our build-steps are normally "execute shell" or "execute windows batch command" - Overwhelmingly we use WAF for building C/C++ source files; that's both with licensed compilers and "free" compilers. - For authorization, we use "project-based Matrix Authorization Strategy" with 40 defined users plus anonymous. Some projects also enable project-based security. How much log outputs? I think the size of whole log and the time builds take will be helpful. I think much log outputs may trigger the problem. - Quick survey says... 20,000 to 40,000 lines of text for our most popular projects. - These builds average 25 to 50 minutes; strangely, the quicker projects tend to generate more logs. Can you see what process gets slow in builds? If building processes outputs timestamps, please compare them before and after downgrading build-timeout plugin. If you installed Timestampler plugin, please compare timestamps in console outputs. Timestamps logged with timestampler-plugin may differ from the activity of the building process as they can be buffered and delayed. If you don't have timestamper-plugin installed, you'd better not install that as that plugin also captures log output and may cause the same problem. - Plugin timestamper is installed, but only some projects use it. However, the builds in question are no longer saved because they're so long ago, therefore I cannot examine them with the timestamps. - I do have some data captured in a database about those builds. That's data like job-name, build-number, result, build-duration, and interestingly, excerpts from the logs for failed/aborted builds.

          ikedam added a comment -

          Thanks for information.

          I tried to reproduce the problem using native processes and I think I could.
          I continue the investigation.

          How I reproduce:

          • Installed build-timeout-plugin 1.13
          • Create a free style project with "Execute a shell":
            #!/bin/bash
            
            for i in $(seq 300); do
              for j in $(seq 65535); do
                echo ${i} ${j}
              done
            done
            
            • I tested this on Windows 8, using cygwin for 64 bit.
          • Run a build with and without "Abort the build if it's stuck".
            • "Absolute" Timeout strategy with 30 minutes.

          Result:

          Condition Duration
          Without build-timeout 10 minutes
          With build-timeout 12 minutes

          and I found that the duration gets 32 minutes if I enabled timestamper-plugin...Amazing.

          • I do have some data captured in a database about those builds. That's data like job-name, build-number, result, build-duration, and interestingly, excerpts from the logs for failed/aborted builds.

          If that contains amount of logs, I want to know whether amount of logs affects how much builds get slow.

          ikedam added a comment - Thanks for information. I tried to reproduce the problem using native processes and I think I could. I continue the investigation. How I reproduce: Installed build-timeout-plugin 1.13 Create a free style project with "Execute a shell": #!/bin/bash for i in $(seq 300); do for j in $(seq 65535); do echo ${i} ${j} done done I tested this on Windows 8, using cygwin for 64 bit. Run a build with and without "Abort the build if it's stuck". "Absolute" Timeout strategy with 30 minutes. Result: Condition Duration Without build-timeout 10 minutes With build-timeout 12 minutes and I found that the duration gets 32 minutes if I enabled timestamper-plugin...Amazing. I do have some data captured in a database about those builds. That's data like job-name, build-number, result, build-duration, and interestingly, excerpts from the logs for failed/aborted builds. If that contains amount of logs, I want to know whether amount of logs affects how much builds get slow.

          ikedam added a comment -

          I identified the cause is watching log output.

          Condition Duration
          Without build-timeout 10 minutes
          With build-timeout 12 minutes
          With build-timeout
          disabling log watching
          10 minutes

          ikedam added a comment - I identified the cause is watching log output. Condition Duration Without build-timeout 10 minutes With build-timeout 12 minutes With build-timeout disabling log watching 10 minutes

          ikedam added a comment -

          ikedam added a comment - https://github.com/jenkinsci/build-timeout-plugin/pull/26

          Wow, great job reproducing the problem!

          Questions:

          • Do you have any special advice I should pass on to my Jenkins users regarding their behavior to avoid slow builds?
          • What's the next step in getting the fix?

          And thank you so much for your work on this!

          Darrel Vuncannon added a comment - Wow, great job reproducing the problem! Questions: Do you have any special advice I should pass on to my Jenkins users regarding their behavior to avoid slow builds? What's the next step in getting the fix? And thank you so much for your work on this!

            darrelvun Darrel Vuncannon
            darrelvun Darrel Vuncannon
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: