Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-52842

xUnit plugin blocks PingThread responses

    XMLWordPrintable

Details

    • 2.3.8

    Description

      If your xUnit parsing takes a long time (don't ask), the entire build agent can be kicked offline, by passing the timeout threshhold on pings (default 4 minutes).

       

       

      Agent:

      12:39:02 [xUnit] [INFO] - [NUnit-2 (default)] - 110 test report file(s) were found with the pattern '*/TestResult.xml' relative to '/home/builder/jenkins/workspace/foo' for the testing framework 'NUnit-2 (default)'.
      12:44:28 [xUnit] [ERROR] - The plugin hasn't been performed correctly: Remote call on JNLP4-connect connection from 1.1.1.1/1.1.1.1:1880 failed

      Server:

      INFO: Ping failed. Terminating the channel JNLP4-connect connection from 1.1.1.1/1.1.1.1:7468.
      java.util.concurrent.TimeoutException: Ping started at 1532979041327 hasn't completed by 1532979281327
      at hudson.remoting.PingThread.ping(PingThread.java:134)
      at hudson.remoting.PingThread.run(PingThread.java:90)

      Attachments

        Activity

          Will test the plugin version that you provided.

          johnlengeling John Lengeling added a comment - Will test the plugin version that you provided.

          Nickolas,

          I had 7 successful builds before the pingThread killed the node during xunit processing when running xunit version 2.3.8-rc831.cbb77af6dfed.

           

          Console Output:Console Output:

          [2020-02-10T13:35:50.172Z] WARNING: XUnitBuilder step is deprecated since 2.x, it has been replaced by XUnitPublisher. This builer will be remove in version 3.x
          [2020-02-10T13:35:50.175Z] INFO: Starting to record.
          [2020-02-10T13:35:50.175Z] INFO: Processing GoogleTest-1.8[2020-02-10T13:35:50.250Z] INFO: [GoogleTest-1.8] - 959 test report file(s) were found with the pattern 'j/wr/build/_TestArtifacts*/*.xml' relative to '/home/jenkins/workspace/kb/os' for the testing framework 'GoogleTest-1.8'.
          

          Theads related to this aws node i-0b11d0f08d8d59c51:

          dev-large (i-0b11d0f08d8d59c51) 
          "org.jenkinsci.plugins.workflow.steps.SynchronousNonBlockingStepExecution [#93] / waiting for EC2 (foo-aws) - dev-large (i-0b11d0f08d8d59c51) id=166000" daemon prio=5 TIMED_WAITING java.lang.Object.wait(Native Method) hudson.remoting.Request.call(Request.java:177) hudson.remoting.Channel.call(Channel.java:954) hudson.FilePath.act(FilePath.java:1069) hudson.FilePath.act(FilePath.java:1058) org.jenkinsci.plugins.xunit.XUnitProcessor.processTestsReport(XUnitProcessor.java:195) org.jenkinsci.plugins.xunit.XUnitProcessor.process(XUnitProcessor.java:159) org.jenkinsci.plugins.xunit.XUnitBuilder.perform(XUnitBuilder.java:126) org.jenkinsci.plugins.workflow.steps.CoreStep$Execution.run(CoreStep.java:80) org.jenkinsci.plugins.workflow.steps.CoreStep$Execution.run(CoreStep.java:67) org.jenkinsci.plugins.workflow.steps.SynchronousNonBlockingStepExecution.lambda$start$0(SynchronousNonBlockingStepExecution.java:47) org.jenkinsci.plugins.workflow.steps.SynchronousNonBlockingStepExecution$$Lambda$339/847132060.run(Unknown Source) java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) java.util.concurrent.FutureTask.run(FutureTask.java:266) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) java.lang.Thread.run(Thread.java:748)
           
          "Channel reader thread: EC2 (foo-aws) - dev-large (i-0b11d0f08d8d59c51)" daemon prio=5 WAITING java.lang.Object.wait(Native Method) java.lang.Object.wait(Object.java:502) com.trilead.ssh2.channel.FifoBuffer.read(FifoBuffer.java:212) com.trilead.ssh2.channel.Channel$Output.read(Channel.java:127) com.trilead.ssh2.channel.ChannelManager.getChannelData(ChannelManager.java:933) com.trilead.ssh2.channel.ChannelInputStream.read(ChannelInputStream.java:58) com.trilead.ssh2.channel.ChannelInputStream.read(ChannelInputStream.java:79) hudson.remoting.FlightRecorderInputStream.read(FlightRecorderInputStream.java:91) hudson.remoting.ChunkedInputStream.readHeader(ChunkedInputStream.java:72) hudson.remoting.ChunkedInputStream.readUntilBreak(ChunkedInputStream.java:103) hudson.remoting.ChunkedCommandTransport.readBlock(ChunkedCommandTransport.java:39) hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:34) hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:63)
           
          "Monitoring EC2 (foo-aws) - dev-large (i-0b11d0f08d8d59c51) for Remoting Version / waiting for EC2 (foo-aws) - dev-large (i-0b11d0f08d8d59c51) id=166965" daemon prio=5 TIMED_WAITING java.lang.Object.wait(Native Method) hudson.remoting.Request.call(Request.java:177) hudson.remoting.Channel.call(Channel.java:954) hudson.plugin.versioncolumn.VersionMonitor$1.monitor(VersionMonitor.java:58) hudson.plugin.versioncolumn.VersionMonitor$1.monitor(VersionMonitor.java:55) hudson.node_monitors.AbstractNodeMonitorDescriptor.monitor(AbstractNodeMonitorDescriptor.java:154) hudson.node_monitors.AbstractNodeMonitorDescriptor$Record.run(AbstractNodeMonitorDescriptor.java:306)
          

           

          johnlengeling John Lengeling added a comment - Nickolas, I had 7 successful builds before the pingThread killed the node during xunit processing when running xunit version  2.3.8-rc831.cbb77af6dfed.   Console Output:Console Output: [2020-02-10T13:35:50.172Z] WARNING: XUnitBuilder step is deprecated since 2.x, it has been replaced by XUnitPublisher. This builer will be remove in version 3.x [2020-02-10T13:35:50.175Z] INFO: Starting to record. [2020-02-10T13:35:50.175Z] INFO: Processing GoogleTest-1.8[2020-02-10T13:35:50.250Z] INFO: [GoogleTest-1.8] - 959 test report file(s) were found with the pattern 'j/wr/build/_TestArtifacts*/*.xml' relative to '/home/jenkins/workspace/kb/os' for the testing framework 'GoogleTest-1.8'. Theads related to this aws node i-0b11d0f08d8d59c51: dev-large (i-0b11d0f08d8d59c51)  "org.jenkinsci.plugins.workflow.steps.SynchronousNonBlockingStepExecution [#93] / waiting for EC2 (foo-aws) - dev-large (i-0b11d0f08d8d59c51) id=166000" daemon prio=5 TIMED_WAITING java.lang.Object.wait(Native Method) hudson.remoting.Request.call(Request.java:177) hudson.remoting.Channel.call(Channel.java:954) hudson.FilePath.act(FilePath.java:1069) hudson.FilePath.act(FilePath.java:1058) org.jenkinsci.plugins.xunit.XUnitProcessor.processTestsReport(XUnitProcessor.java:195) org.jenkinsci.plugins.xunit.XUnitProcessor.process(XUnitProcessor.java:159) org.jenkinsci.plugins.xunit.XUnitBuilder.perform(XUnitBuilder.java:126) org.jenkinsci.plugins.workflow.steps.CoreStep$Execution.run(CoreStep.java:80) org.jenkinsci.plugins.workflow.steps.CoreStep$Execution.run(CoreStep.java:67) org.jenkinsci.plugins.workflow.steps.SynchronousNonBlockingStepExecution.lambda$start$0(SynchronousNonBlockingStepExecution.java:47) org.jenkinsci.plugins.workflow.steps.SynchronousNonBlockingStepExecution$$Lambda$339/847132060.run(Unknown Source) java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) java.util.concurrent.FutureTask.run(FutureTask.java:266) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) java.lang.Thread.run(Thread.java:748)   "Channel reader thread: EC2 (foo-aws) - dev-large (i-0b11d0f08d8d59c51)" daemon prio=5 WAITING java.lang.Object.wait(Native Method) java.lang.Object.wait(Object.java:502) com.trilead.ssh2.channel.FifoBuffer.read(FifoBuffer.java:212) com.trilead.ssh2.channel.Channel$Output.read(Channel.java:127) com.trilead.ssh2.channel.ChannelManager.getChannelData(ChannelManager.java:933) com.trilead.ssh2.channel.ChannelInputStream.read(ChannelInputStream.java:58) com.trilead.ssh2.channel.ChannelInputStream.read(ChannelInputStream.java:79) hudson.remoting.FlightRecorderInputStream.read(FlightRecorderInputStream.java:91) hudson.remoting.ChunkedInputStream.readHeader(ChunkedInputStream.java:72) hudson.remoting.ChunkedInputStream.readUntilBreak(ChunkedInputStream.java:103) hudson.remoting.ChunkedCommandTransport.readBlock(ChunkedCommandTransport.java:39) hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:34) hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:63)   "Monitoring EC2 (foo-aws) - dev-large (i-0b11d0f08d8d59c51) for Remoting Version / waiting for EC2 (foo-aws) - dev-large (i-0b11d0f08d8d59c51) id=166965" daemon prio=5 TIMED_WAITING java.lang.Object.wait(Native Method) hudson.remoting.Request.call(Request.java:177) hudson.remoting.Channel.call(Channel.java:954) hudson.plugin.versioncolumn.VersionMonitor$1.monitor(VersionMonitor.java:58) hudson.plugin.versioncolumn.VersionMonitor$1.monitor(VersionMonitor.java:55) hudson.node_monitors.AbstractNodeMonitorDescriptor.monitor(AbstractNodeMonitorDescriptor.java:154) hudson.node_monitors.AbstractNodeMonitorDescriptor$Record.run(AbstractNodeMonitorDescriptor.java:306)  
          nfalco Nikolas Falco added a comment -

          I can try to increase the sleep time after 20 processed input files and maybe also try to setup the maximun number of thread used by saxon. By default saxon use the maximun number of thread to make CPU full load.

          nfalco Nikolas Falco added a comment - I can try to increase the sleep time after 20 processed input files and maybe also try to setup the maximun number of thread used by saxon. By default saxon use the maximun number of thread to make CPU full load.

          Other than the 1 hung job soon after I loaded xunit-2.3.8-rc831.cbb77af6dfed, the issue looks to be improved.  

          The test job has been running hourly for the past 2 days with 0 hangs versus several hangs per day with 2.3.7.  So you might be on the right track.

          I did also increase the ping thread interval and timeout values to 1500/1200, but I have now restored the ping thread interval/timeout values back to the defaults (300/240).  I will let that bake for a few days.

           

          johnlengeling John Lengeling added a comment - Other than the 1 hung job soon after I loaded xunit-2.3.8-rc831.cbb77af6dfed, the issue looks to be improved.   The test job has been running hourly for the past 2 days with 0 hangs versus several hangs per day with 2.3.7.  So you might be on the right track. I did also increase the ping thread interval and timeout values to 1500/1200, but I have now restored the ping thread interval/timeout values back to the defaults (300/240).  I will let that bake for a few days.  
          nfalco Nikolas Falco added a comment -

          I had complete the work. The sleep time is not configurable so a futher changes in the code should not be needed.

          To be back compatible is 0ms for old configurations (is not the case of pipelines). By default for new definition is *10*ms.

          johnlengeling output log warn a message that you are using a deprecated class and you instantiate it by reflection by means pipeline (steps.step([$class : 'XUnitBuilder', testTimeMargin: '3000', ....) that is subject any class changes. I heavy suggest to replace with xunit step (that is not XUnitBuilder) that manage the configuration of sleep parameter (XUnitBuilder does not).

          If I do not remember bad you had test 20ms each 20 processed report. The default is now 10ms every 10 processed reports.

          xunit thresholdMode: 2, thresholds: [failed(failureThreshold: '100', unstableThreshold: '100'), skipped(failureThreshold: '100', unstableThreshold: '100')], tools: [GoogleTest(deleteOutputFiles: true, failIfNotNew: false, pattern: $filePattern, skipNoTestFiles: true, stopProcessingIfError: true)], sleepTime: 15
          
          nfalco Nikolas Falco added a comment - I had complete the work. The sleep time is not configurable so a futher changes in the code should not be needed. To be back compatible is 0ms for old configurations (is not the case of pipelines). By default for new definition is *10*ms. johnlengeling output log warn a message that you are using a deprecated class and you instantiate it by reflection by means pipeline ( steps.step([$class : 'XUnitBuilder', testTimeMargin: '3000', .... ) that is subject any class changes. I heavy suggest to replace with xunit step (that is not XUnitBuilder) that manage the configuration of sleep parameter (XUnitBuilder does not). If I do not remember bad you had test 20ms each 20 processed report. The default is now 10ms every 10 processed reports. xunit thresholdMode: 2, thresholds: [failed(failureThreshold: '100' , unstableThreshold: '100' ), skipped(failureThreshold: '100' , unstableThreshold: '100' )], tools: [GoogleTest(deleteOutputFiles: true , failIfNotNew: false , pattern: $filePattern, skipNoTestFiles: true , stopProcessingIfError: true )], sleepTime: 15

          People

            nfalco Nikolas Falco
            directhex Jo Shields
            Votes:
            1 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: