Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-52842

xUnit plugin blocks PingThread responses

    XMLWordPrintable

Details

    • 2.3.8

    Description

      If your xUnit parsing takes a long time (don't ask), the entire build agent can be kicked offline, by passing the timeout threshhold on pings (default 4 minutes).

       

       

      Agent:

      12:39:02 [xUnit] [INFO] - [NUnit-2 (default)] - 110 test report file(s) were found with the pattern '*/TestResult.xml' relative to '/home/builder/jenkins/workspace/foo' for the testing framework 'NUnit-2 (default)'.
      12:44:28 [xUnit] [ERROR] - The plugin hasn't been performed correctly: Remote call on JNLP4-connect connection from 1.1.1.1/1.1.1.1:1880 failed

      Server:

      INFO: Ping failed. Terminating the channel JNLP4-connect connection from 1.1.1.1/1.1.1.1:7468.
      java.util.concurrent.TimeoutException: Ping started at 1532979041327 hasn't completed by 1532979281327
      at hudson.remoting.PingThread.ping(PingThread.java:134)
      at hudson.remoting.PingThread.run(PingThread.java:90)

      Attachments

        Activity

          directhex Jo Shields created issue -
          directhex Jo Shields made changes -
          Field Original Value New Value
          Description If your xUnit parsing takes a long time (don't ask), the entire build agent can be kicked offline, by passing the timeout threshhold on pings (default 4 minutes).

           

           

          Agent:

          {{12:39:02 [xUnit] [INFO] - [NUnit-2 (default)] - 110 test report file(s) were found with the pattern '**/TestResult*.xml' relative to '/home/builder/jenkins/workspace/foo' for the testing framework 'NUnit-2 (default)'.}}
          {{12:44:28 [xUnit] [ERROR] - The plugin hasn't been performed correctly: Remote call on JNLP4-connect connection from 1.1.1.1/1.1.1.1:1880 failed}}

          Server:

          {{INFO: Ping failed. Terminating the channel JNLP4-connect connection from 1.1.1.1/1.1.1.1:7468.}}
          {{java.util.concurrent.TimeoutException: Ping started at 1532979041327 hasn't completed by 1532979281327}}
          {{ at hudson.remoting.PingThread.ping(PingThread.java:134)}}
          {{ at hudson.remoting.PingThread.run(PingThread.java:90)}}
          If your xUnit parsing takes a long time (don't ask), the entire build agent can be kicked offline, by passing the timeout threshhold on pings (default 4 minutes).

           

           

          Agent:

          {{12:39:02 [xUnit] [INFO] - [NUnit-2 (default)] - 110 test report file(s) were found with the pattern '**/TestResult*.xml' relative to '/home/builder/jenkins/workspace/foo' for the testing framework 'NUnit-2 (default)'.}}
          {{12:44:28 [xUnit] [ERROR] - The plugin hasn't been performed correctly: Remote call on JNLP4-connect connection from 1.1.1.1/1.1.1.1:1880 failed}}

          Server:

          {{INFO: Ping failed. Terminating the channel JNLP4-connect connection from 1.1.1.1/1.1.1.1:7468.}}
          {{java.util.concurrent.TimeoutException: Ping started at 1532979041327 hasn't completed by 1532979281327}}
          {{at hudson.remoting.PingThread.ping(PingThread.java:134)}}
          {{at hudson.remoting.PingThread.run(PingThread.java:90)}}
          nfalco Nikolas Falco added a comment -

          Please keep in mind that transformation is a CPU intensive operation. If you have complex XSLT, large XML report and heavy volumn, 100% CPU may be expected this maybe cause JVM threads agent freeze and not serve the PingThread.

          Let me make some try threadpool, sleep time after 10 transformation update saxon libraries...

          nfalco Nikolas Falco added a comment - Please keep in mind that transformation is a CPU intensive operation. If you have complex XSLT, large XML report and heavy volumn, 100% CPU may be expected this maybe cause JVM threads agent freeze and not serve the PingThread. Let me make some try threadpool, sleep time after 10 transformation update saxon libraries...
          nfalco Nikolas Falco added a comment -

          Reopen if happens again

          nfalco Nikolas Falco added a comment - Reopen if happens again
          nfalco Nikolas Falco made changes -
          Resolution Fixed [ 1 ]
          Status Open [ 1 ] Closed [ 6 ]

          Nikolas,

          Looks like I have run into this issue.   We generate a lot of XML files..sometimes 800+.  Looks like the ping thread killed it.

          We are running Jenkins 2.204.1 and xunit 2.37.  Is there a workaround or a snapshot to test?

          Thanks!

           

          Error message:

          Processing xunit results failed, archiving test result files foo/wr8-64/build/_TestArtifacts*/*.xml for troubleshooting
          [Pipeline] archiveArtifacts
          10:07:27 EC2 (foo-es-aws) - team-foo.fooTool-large (i-06488c01860a95d82) was marked offline: Connection was broken: java.util.concurrent.TimeoutException: Ping started at 1580227324607 hasn't completed by 1580227564613
          10:07:27 at hudson.remoting.PingThread.ping(PingThread.java:133)
          10:07:27 at hudson.remoting.PingThread.run(PingThread.java:89)
          {{10:07:27 }}
          {{[Pipeline] }}}
          [Pipeline] // script
          Error when executing always post condition:
          java.lang.IllegalArgumentException: Failed to prepare archiveArtifacts step
          {{ at org.jenkinsci.plugins.workflow.cps.DSL.invokeDescribable(DSL.java:419)}}
          {{ at org.jenkinsci.plugins.workflow.cps.DSL.invokeMethod(DSL.java:182)}}
          {{ at org.jenkinsci.plugins.workflow.cps.CpsScript.invokeMethod(CpsScript.java:122)}}
          {{ at org.codehaus.groovy.runtime.callsite.PogoMetaClassSite.call(PogoMetaClassSite.java:48)}}
          {{ at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:48)}}
          {{ at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:113)}}
          {{ at com.cloudbees.groovy.cps.sandbox.DefaultInvoker.methodCall(DefaultInvoker.java:20)}}
          {{ at com.fooCorp.pipeline.fooTool.publishUnitTests(fooTool.groovy:597)}}
          {{ at com.fooCorp.pipeline.fooTool.publishfooToolReports(fooTool.groovy:666)}}
          {{ at com.fooCorp.pipeline.fooTool.publish(fooTool.groovy:610)}}
          {{ at WorkflowScript.run(WorkflowScript:61)}}
          {{ at __cps.transform__(Native Method)}}
          {{ at com.cloudbees.groovy.cps.impl.ContinuationGroup.methodCall(ContinuationGroup.java:86)}}
          {{ at com.cloudbees.groovy.cps.impl.FunctionCallBlock$ContinuationImpl.dispatchOrArg(FunctionCallBlock.java:113)}}
          {{ at com.cloudbees.groovy.cps.impl.FunctionCallBlock$ContinuationImpl.fixArg(FunctionCallBlock.java:83)}}
          {{ at sun.reflect.GeneratedMethodAccessor960.invoke(Unknown Source)}}
          {{ at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)}}
          {{ at java.lang.reflect.Method.invoke(Method.java:498)}}
          {{ at com.cloudbees.groovy.cps.impl.ContinuationPtr$ContinuationImpl.receive(ContinuationPtr.java:72)}}
          {{ at com.cloudbees.groovy.cps.impl.CollectionLiteralBlock$ContinuationImpl.dispatch(CollectionLiteralBlock.java:55)}}
          {{ at com.cloudbees.groovy.cps.impl.CollectionLiteralBlock$ContinuationImpl.item(CollectionLiteralBlock.java:45)}}
          {{ at sun.reflect.GeneratedMethodAccessor988.invoke(Unknown Source)}}
          {{ at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)}}
          {{ at java.lang.reflect.Method.invoke(Method.java:498)}}
          {{ at com.cloudbees.groovy.cps.impl.ContinuationPtr$ContinuationImpl.receive(ContinuationPtr.java:72)}}
          {{ at com.cloudbees.groovy.cps.impl.LocalVariableBlock$LocalVariable.get(LocalVariableBlock.java:39)}}
          {{ at com.cloudbees.groovy.cps.LValueBlock$GetAdapter.receive(LValueBlock.java:30)}}
          {{ at com.cloudbees.groovy.cps.impl.LocalVariableBlock.evalLValue(LocalVariableBlock.java:28)}}
          {{ at com.cloudbees.groovy.cps.LValueBlock$BlockImpl.eval(LValueBlock.java:55)}}
          {{ at com.cloudbees.groovy.cps.LValueBlock.eval(LValueBlock.java:16)}}
          {{ at com.cloudbees.groovy.cps.Next.step(Next.java:83)}}
          {{ at com.cloudbees.groovy.cps.Continuable$1.call(Continuable.java:174)}}
          {{ at com.cloudbees.groovy.cps.Continuable$1.call(Continuable.java:163)}}
          {{ at org.codehaus.groovy.runtime.GroovyCategorySupport$ThreadCategoryInfo.use(GroovyCategorySupport.java:129)}}
          {{ at org.codehaus.groovy.runtime.GroovyCategorySupport.use(GroovyCategorySupport.java:268)}}
          {{ at com.cloudbees.groovy.cps.Continuable.run0(Continuable.java:163)}}
          {{ at org.jenkinsci.plugins.workflow.cps.SandboxContinuable.access$001(SandboxContinuable.java:18)}}
          {{ at org.jenkinsci.plugins.workflow.cps.SandboxContinuable.run0(SandboxContinuable.java:51)}}
          {{ at org.jenkinsci.plugins.workflow.cps.CpsThread.runNextChunk(CpsThread.java:185)}}
          {{ at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.run(CpsThreadGroup.java:405)}}
          {{ at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.access$400(CpsThreadGroup.java:96)}}
          {{ at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:317)}}
          {{ at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:281)}}
          {{ at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$2.call(CpsVmExecutorService.java:67)}}
          {{ at java.util.concurrent.FutureTask.run(FutureTask.java:266)}}
          {{ at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:131)}}
          {{ at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)}}
          {{ at jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:59)}}
          {{ at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)}}
          {{ at java.util.concurrent.FutureTask.run(FutureTask.java:266)}}
          {{ at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)}}
          {{ at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)}}
          {{ at java.lang.Thread.run(Thread.java:748)}}
          Caused by: org.codehaus.groovy.runtime.InvokerInvocationException: java.io.IOException: Unable to create live FilePath for EC2 (foo-es-aws) - team-foo.fooTool-large (i-06488c01860a95d82)
          {{ at org.jenkinsci.plugins.workflow.cps.CpsStepContext.replay(CpsStepContext.java:496)}}
          {{ at org.jenkinsci.plugins.workflow.cps.DSL.invokeStep(DSL.java:317)}}
          {{ at org.jenkinsci.plugins.workflow.cps.DSL.invokeDescribable(DSL.java:417)}}
          {{ at org.jenkinsci.plugins.workflow.cps.DSL.invokeMethod(DSL.java:182)}}
          {{ at org.jenkinsci.plugins.workflow.cps.CpsScript.invokeMethod(CpsScript.java:122)}}
          {{ at org.codehaus.groovy.runtime.callsite.PogoMetaClassSite.call(PogoMetaClassSite.java:48)}}
          {{ at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:48)}}
          {{ at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:113)}}
          {{ at com.cloudbees.groovy.cps.sandbox.DefaultInvoker.methodCall(DefaultInvoker.java:20)}}
          {{ ... 41 more}}
          Caused by: java.io.IOException: Unable to create live FilePath for EC2 (foo-es-aws) - team-foo.fooTool-large (i-06488c01860a95d82)
          {{ at org.jenkinsci.plugins.workflow.support.steps.FilePathDynamicContext.get(FilePathDynamicContext.java:64)}}
          {{ at org.jenkinsci.plugins.workflow.support.steps.FilePathDynamicContext.get(FilePathDynamicContext.java:47)}}
          {{ at org.jenkinsci.plugins.workflow.steps.DynamicContext$Typed.get(DynamicContext.java:94)}}
          {{ at org.jenkinsci.plugins.workflow.cps.ContextVariableSet.get(ContextVariableSet.java:138)}}
          {{ at org.jenkinsci.plugins.workflow.cps.CpsThread.getContextVariable(CpsThread.java:135)}}
          {{ at org.jenkinsci.plugins.workflow.cps.CpsStepContext.doGet(CpsStepContext.java:297)}}
          {{ at org.jenkinsci.plugins.workflow.support.DefaultStepContext.get(DefaultStepContext.java:67)}}
          {{ at org.jenkinsci.plugins.workflow.steps.StepDescriptor.checkContextAvailability(StepDescriptor.java:264)}}
          {{ at org.jenkinsci.plugins.workflow.cps.DSL.invokeStep(DSL.java:263)}}
          {{ ... 48 more}}
          {{ Suppressed: hudson.model.Computer$TerminationRequest: Termination requested at Tue Jan 28 10:06:04 CST 2020 by Thread[Ping thread for channel hudson.remoting.Channel@1f0be3d7:EC2 (foo-es-aws) - team-foo.fooTool-large (i-06488c01860a95d82),5,main] [id=79994]}}
          {{ at hudson.model.Computer.recordTermination(Computer.java:226)}}
          {{ at hudson.model.Computer.disconnect(Computer.java:490)}}
          {{ at hudson.slaves.SlaveComputer.disconnect(SlaveComputer.java:727)}}
          {{ at hudson.slaves.ChannelPinger$1.onDead(ChannelPinger.java:198)}}
          {{ at hudson.remoting.PingThread.ping(PingThread.java:133)}}
          {{ at hudson.remoting.PingThread.run(PingThread.java:89)}}

           

          johnlengeling John Lengeling added a comment - Nikolas, Looks like I have run into this issue.   We generate a lot of XML files..sometimes 800+.  Looks like the ping thread killed it. We are running Jenkins 2.204.1 and xunit 2.37.  Is there a workaround or a snapshot to test? Thanks!   Error message: Processing xunit results failed, archiving test result files foo/wr8-64/build/_TestArtifacts*/*.xml for troubleshooting [Pipeline] archiveArtifacts 10:07:27 EC2 (foo-es-aws) - team-foo.fooTool-large (i-06488c01860a95d82) was marked offline: Connection was broken: java.util.concurrent.TimeoutException: Ping started at 1580227324607 hasn't completed by 1580227564613 10:07:27 at hudson.remoting.PingThread.ping(PingThread.java:133) 10:07:27 at hudson.remoting.PingThread.run(PingThread.java:89) {{10:07:27 }} {{ [Pipeline] }}} [Pipeline] // script Error when executing always post condition: java.lang.IllegalArgumentException: Failed to prepare archiveArtifacts step {{ at org.jenkinsci.plugins.workflow.cps.DSL.invokeDescribable(DSL.java:419)}} {{ at org.jenkinsci.plugins.workflow.cps.DSL.invokeMethod(DSL.java:182)}} {{ at org.jenkinsci.plugins.workflow.cps.CpsScript.invokeMethod(CpsScript.java:122)}} {{ at org.codehaus.groovy.runtime.callsite.PogoMetaClassSite.call(PogoMetaClassSite.java:48)}} {{ at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:48)}} {{ at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:113)}} {{ at com.cloudbees.groovy.cps.sandbox.DefaultInvoker.methodCall(DefaultInvoker.java:20)}} {{ at com.fooCorp.pipeline.fooTool.publishUnitTests(fooTool.groovy:597)}} {{ at com.fooCorp.pipeline.fooTool.publishfooToolReports(fooTool.groovy:666)}} {{ at com.fooCorp.pipeline.fooTool.publish(fooTool.groovy:610)}} {{ at WorkflowScript.run(WorkflowScript:61)}} {{ at __ cps.transform __(Native Method)}} {{ at com.cloudbees.groovy.cps.impl.ContinuationGroup.methodCall(ContinuationGroup.java:86)}} {{ at com.cloudbees.groovy.cps.impl.FunctionCallBlock$ContinuationImpl.dispatchOrArg(FunctionCallBlock.java:113)}} {{ at com.cloudbees.groovy.cps.impl.FunctionCallBlock$ContinuationImpl.fixArg(FunctionCallBlock.java:83)}} {{ at sun.reflect.GeneratedMethodAccessor960.invoke(Unknown Source)}} {{ at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)}} {{ at java.lang.reflect.Method.invoke(Method.java:498)}} {{ at com.cloudbees.groovy.cps.impl.ContinuationPtr$ContinuationImpl.receive(ContinuationPtr.java:72)}} {{ at com.cloudbees.groovy.cps.impl.CollectionLiteralBlock$ContinuationImpl.dispatch(CollectionLiteralBlock.java:55)}} {{ at com.cloudbees.groovy.cps.impl.CollectionLiteralBlock$ContinuationImpl.item(CollectionLiteralBlock.java:45)}} {{ at sun.reflect.GeneratedMethodAccessor988.invoke(Unknown Source)}} {{ at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)}} {{ at java.lang.reflect.Method.invoke(Method.java:498)}} {{ at com.cloudbees.groovy.cps.impl.ContinuationPtr$ContinuationImpl.receive(ContinuationPtr.java:72)}} {{ at com.cloudbees.groovy.cps.impl.LocalVariableBlock$LocalVariable.get(LocalVariableBlock.java:39)}} {{ at com.cloudbees.groovy.cps.LValueBlock$GetAdapter.receive(LValueBlock.java:30)}} {{ at com.cloudbees.groovy.cps.impl.LocalVariableBlock.evalLValue(LocalVariableBlock.java:28)}} {{ at com.cloudbees.groovy.cps.LValueBlock$BlockImpl.eval(LValueBlock.java:55)}} {{ at com.cloudbees.groovy.cps.LValueBlock.eval(LValueBlock.java:16)}} {{ at com.cloudbees.groovy.cps.Next.step(Next.java:83)}} {{ at com.cloudbees.groovy.cps.Continuable$1.call(Continuable.java:174)}} {{ at com.cloudbees.groovy.cps.Continuable$1.call(Continuable.java:163)}} {{ at org.codehaus.groovy.runtime.GroovyCategorySupport$ThreadCategoryInfo.use(GroovyCategorySupport.java:129)}} {{ at org.codehaus.groovy.runtime.GroovyCategorySupport.use(GroovyCategorySupport.java:268)}} {{ at com.cloudbees.groovy.cps.Continuable.run0(Continuable.java:163)}} {{ at org.jenkinsci.plugins.workflow.cps.SandboxContinuable.access$001(SandboxContinuable.java:18)}} {{ at org.jenkinsci.plugins.workflow.cps.SandboxContinuable.run0(SandboxContinuable.java:51)}} {{ at org.jenkinsci.plugins.workflow.cps.CpsThread.runNextChunk(CpsThread.java:185)}} {{ at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.run(CpsThreadGroup.java:405)}} {{ at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.access$400(CpsThreadGroup.java:96)}} {{ at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:317)}} {{ at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:281)}} {{ at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$2.call(CpsVmExecutorService.java:67)}} {{ at java.util.concurrent.FutureTask.run(FutureTask.java:266)}} {{ at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:131)}} {{ at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)}} {{ at jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:59)}} {{ at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)}} {{ at java.util.concurrent.FutureTask.run(FutureTask.java:266)}} {{ at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)}} {{ at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)}} {{ at java.lang.Thread.run(Thread.java:748)}} Caused by: org.codehaus.groovy.runtime.InvokerInvocationException: java.io.IOException: Unable to create live FilePath for EC2 (foo-es-aws) - team-foo.fooTool-large (i-06488c01860a95d82) {{ at org.jenkinsci.plugins.workflow.cps.CpsStepContext.replay(CpsStepContext.java:496)}} {{ at org.jenkinsci.plugins.workflow.cps.DSL.invokeStep(DSL.java:317)}} {{ at org.jenkinsci.plugins.workflow.cps.DSL.invokeDescribable(DSL.java:417)}} {{ at org.jenkinsci.plugins.workflow.cps.DSL.invokeMethod(DSL.java:182)}} {{ at org.jenkinsci.plugins.workflow.cps.CpsScript.invokeMethod(CpsScript.java:122)}} {{ at org.codehaus.groovy.runtime.callsite.PogoMetaClassSite.call(PogoMetaClassSite.java:48)}} {{ at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:48)}} {{ at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:113)}} {{ at com.cloudbees.groovy.cps.sandbox.DefaultInvoker.methodCall(DefaultInvoker.java:20)}} {{ ... 41 more}} Caused by: java.io.IOException: Unable to create live FilePath for EC2 (foo-es-aws) - team-foo.fooTool-large (i-06488c01860a95d82) {{ at org.jenkinsci.plugins.workflow.support.steps.FilePathDynamicContext.get(FilePathDynamicContext.java:64)}} {{ at org.jenkinsci.plugins.workflow.support.steps.FilePathDynamicContext.get(FilePathDynamicContext.java:47)}} {{ at org.jenkinsci.plugins.workflow.steps.DynamicContext$Typed.get(DynamicContext.java:94)}} {{ at org.jenkinsci.plugins.workflow.cps.ContextVariableSet.get(ContextVariableSet.java:138)}} {{ at org.jenkinsci.plugins.workflow.cps.CpsThread.getContextVariable(CpsThread.java:135)}} {{ at org.jenkinsci.plugins.workflow.cps.CpsStepContext.doGet(CpsStepContext.java:297)}} {{ at org.jenkinsci.plugins.workflow.support.DefaultStepContext.get(DefaultStepContext.java:67)}} {{ at org.jenkinsci.plugins.workflow.steps.StepDescriptor.checkContextAvailability(StepDescriptor.java:264)}} {{ at org.jenkinsci.plugins.workflow.cps.DSL.invokeStep(DSL.java:263)}} {{ ... 48 more}} {{ Suppressed: hudson.model.Computer$TerminationRequest: Termination requested at Tue Jan 28 10:06:04 CST 2020 by Thread [Ping thread for channel hudson.remoting.Channel@1f0be3d7:EC2 (foo-es-aws) - team-foo.fooTool-large (i-06488c01860a95d82),5,main] [id=79994] }} {{ at hudson.model.Computer.recordTermination(Computer.java:226)}} {{ at hudson.model.Computer.disconnect(Computer.java:490)}} {{ at hudson.slaves.SlaveComputer.disconnect(SlaveComputer.java:727)}} {{ at hudson.slaves.ChannelPinger$1.onDead(ChannelPinger.java:198)}} {{ at hudson.remoting.PingThread.ping(PingThread.java:133)}} {{ at hudson.remoting.PingThread.run(PingThread.java:89)}}  

          We are seeing the pingthread killing xunit processsing as described.   Provided stacktrace and error mesages.

          johnlengeling John Lengeling added a comment - We are seeing the pingthread killing xunit processsing as described.   Provided stacktrace and error mesages.
          johnlengeling John Lengeling made changes -
          Resolution Fixed [ 1 ]
          Status Closed [ 6 ] Reopened [ 4 ]
          nfalco Nikolas Falco added a comment - - edited

          Which step is at fooTool.groovy:597?

          I have no idea what to do, and the reason is that not only do I not know the exact point where it is blocked but also the reason. The only thing I can assume is that the CPU is busy by the JVM by the XSLT. The only test I can do is add a sort of thread sleep after 50 transformations.
           
           
          Obviously this change may have no effect if the ping thread only monitors before and after the execution of a callable (means the whole step)

          nfalco Nikolas Falco added a comment - - edited Which step is at fooTool.groovy:597? I have no idea what to do, and the reason is that not only do I not know the exact point where it is blocked but also the reason. The only thing I can assume is that the CPU is busy by the JVM by the XSLT. The only test I can do is add a sort of thread sleep after 50 transformations.     Obviously this change may have no effect if the ping thread only monitors before and after the execution of a callable (means the whole step)
          nfalco Nikolas Falco added a comment - Please try with this: https://ci.jenkins.io/job/Plugins/job/xunit-plugin/job/feature%252FJENKINS-52842/2/artifact/org/jenkins-ci/plugins/xunit/2.3.8-rc831.cbb77af6dfed/xunit-2.3.8-rc831.cbb77af6dfed.hpi

          Sorry I sent you the stacktrace from the job console which is after the job is aborted. fooTool.groovy is our Jenkins pipeline library which calls has a publishUnitTests method which calls XUnitBuilder.

          {{
          try {
          steps.step([$class : 'XUnitBuilder', testTimeMargin: '3000', thresholdMode: 2,
          thresholds: [
          [
          $class : 'FailedThreshold',
          failureNewThreshold : '100',
          failureThreshold : '100',
          unstableNewThreshold: '100',
          unstableThreshold : '100'
          ],
          [
          $class : 'SkippedThreshold',
          failureNewThreshold : '100',
          failureThreshold : '100',
          unstableNewThreshold: '100',
          unstableThreshold : '100'
          ]
          ],
          tools : [
          [
          $class : 'GoogleTestType',
          deleteOutputFiles : true,
          failIfNotNew : false,
          pattern : filePattern,
          skipNoTestFiles : true,
          stopProcessingIfError: true
          ]
          ]
          ])

          }}

          Here is the thread dump of the thread that is hanging in xunit:

          {{
          "org.jenkinsci.plugins.workflow.steps.SynchronousNonBlockingStepExecution 2624 / waiting for EC2 (foo-aws) - team-bar-large (i-086328a69a4a0b130) id=8001805" daemon prio=5 TIMED_WAITING
          java.lang.Object.wait(Native Method)
          hudson.remoting.Request.call(Request.java:177)
          hudson.remoting.Channel.call(Channel.java:954)
          hudson.FilePath.act(FilePath.java:1069)
          hudson.FilePath.act(FilePath.java:1058)
          org.jenkinsci.plugins.xunit.XUnitProcessor.processTestsReport(XUnitProcessor.java:195)
          org.jenkinsci.plugins.xunit.XUnitProcessor.process(XUnitProcessor.java:159)
          org.jenkinsci.plugins.xunit.XUnitBuilder.perform(XUnitBuilder.java:126)
          org.jenkinsci.plugins.workflow.steps.CoreStep$Execution.run(CoreStep.java:80)
          org.jenkinsci.plugins.workflow.steps.CoreStep$Execution.run(CoreStep.java:67)
          }}

          johnlengeling John Lengeling added a comment - Sorry I sent you the stacktrace from the job console which is after the job is aborted. fooTool.groovy is our Jenkins pipeline library which calls has a publishUnitTests method which calls XUnitBuilder. {{ try { steps.step([$class : 'XUnitBuilder', testTimeMargin: '3000', thresholdMode: 2, thresholds: [ [ $class : 'FailedThreshold', failureNewThreshold : '100', failureThreshold : '100', unstableNewThreshold: '100', unstableThreshold : '100' ], [ $class : 'SkippedThreshold', failureNewThreshold : '100', failureThreshold : '100', unstableNewThreshold: '100', unstableThreshold : '100' ] ], tools : [ [ $class : 'GoogleTestType', deleteOutputFiles : true, failIfNotNew : false, pattern : filePattern, skipNoTestFiles : true, stopProcessingIfError: true ] ] ]) }} Here is the thread dump of the thread that is hanging in xunit: {{ "org.jenkinsci.plugins.workflow.steps.SynchronousNonBlockingStepExecution 2624 / waiting for EC2 (foo-aws) - team-bar-large (i-086328a69a4a0b130) id=8001805" daemon prio=5 TIMED_WAITING java.lang.Object.wait(Native Method) hudson.remoting.Request.call(Request.java:177) hudson.remoting.Channel.call(Channel.java:954) hudson.FilePath.act(FilePath.java:1069) hudson.FilePath.act(FilePath.java:1058) org.jenkinsci.plugins.xunit.XUnitProcessor.processTestsReport(XUnitProcessor.java:195) org.jenkinsci.plugins.xunit.XUnitProcessor.process(XUnitProcessor.java:159) org.jenkinsci.plugins.xunit.XUnitBuilder.perform(XUnitBuilder.java:126) org.jenkinsci.plugins.workflow.steps.CoreStep$Execution.run(CoreStep.java:80) org.jenkinsci.plugins.workflow.steps.CoreStep$Execution.run(CoreStep.java:67) }}

          Will test the plugin version that you provided.

          johnlengeling John Lengeling added a comment - Will test the plugin version that you provided.

          Nickolas,

          I had 7 successful builds before the pingThread killed the node during xunit processing when running xunit version 2.3.8-rc831.cbb77af6dfed.

           

          Console Output:Console Output:

          [2020-02-10T13:35:50.172Z] WARNING: XUnitBuilder step is deprecated since 2.x, it has been replaced by XUnitPublisher. This builer will be remove in version 3.x
          [2020-02-10T13:35:50.175Z] INFO: Starting to record.
          [2020-02-10T13:35:50.175Z] INFO: Processing GoogleTest-1.8[2020-02-10T13:35:50.250Z] INFO: [GoogleTest-1.8] - 959 test report file(s) were found with the pattern 'j/wr/build/_TestArtifacts*/*.xml' relative to '/home/jenkins/workspace/kb/os' for the testing framework 'GoogleTest-1.8'.
          

          Theads related to this aws node i-0b11d0f08d8d59c51:

          dev-large (i-0b11d0f08d8d59c51) 
          "org.jenkinsci.plugins.workflow.steps.SynchronousNonBlockingStepExecution [#93] / waiting for EC2 (foo-aws) - dev-large (i-0b11d0f08d8d59c51) id=166000" daemon prio=5 TIMED_WAITING java.lang.Object.wait(Native Method) hudson.remoting.Request.call(Request.java:177) hudson.remoting.Channel.call(Channel.java:954) hudson.FilePath.act(FilePath.java:1069) hudson.FilePath.act(FilePath.java:1058) org.jenkinsci.plugins.xunit.XUnitProcessor.processTestsReport(XUnitProcessor.java:195) org.jenkinsci.plugins.xunit.XUnitProcessor.process(XUnitProcessor.java:159) org.jenkinsci.plugins.xunit.XUnitBuilder.perform(XUnitBuilder.java:126) org.jenkinsci.plugins.workflow.steps.CoreStep$Execution.run(CoreStep.java:80) org.jenkinsci.plugins.workflow.steps.CoreStep$Execution.run(CoreStep.java:67) org.jenkinsci.plugins.workflow.steps.SynchronousNonBlockingStepExecution.lambda$start$0(SynchronousNonBlockingStepExecution.java:47) org.jenkinsci.plugins.workflow.steps.SynchronousNonBlockingStepExecution$$Lambda$339/847132060.run(Unknown Source) java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) java.util.concurrent.FutureTask.run(FutureTask.java:266) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) java.lang.Thread.run(Thread.java:748)
           
          "Channel reader thread: EC2 (foo-aws) - dev-large (i-0b11d0f08d8d59c51)" daemon prio=5 WAITING java.lang.Object.wait(Native Method) java.lang.Object.wait(Object.java:502) com.trilead.ssh2.channel.FifoBuffer.read(FifoBuffer.java:212) com.trilead.ssh2.channel.Channel$Output.read(Channel.java:127) com.trilead.ssh2.channel.ChannelManager.getChannelData(ChannelManager.java:933) com.trilead.ssh2.channel.ChannelInputStream.read(ChannelInputStream.java:58) com.trilead.ssh2.channel.ChannelInputStream.read(ChannelInputStream.java:79) hudson.remoting.FlightRecorderInputStream.read(FlightRecorderInputStream.java:91) hudson.remoting.ChunkedInputStream.readHeader(ChunkedInputStream.java:72) hudson.remoting.ChunkedInputStream.readUntilBreak(ChunkedInputStream.java:103) hudson.remoting.ChunkedCommandTransport.readBlock(ChunkedCommandTransport.java:39) hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:34) hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:63)
           
          "Monitoring EC2 (foo-aws) - dev-large (i-0b11d0f08d8d59c51) for Remoting Version / waiting for EC2 (foo-aws) - dev-large (i-0b11d0f08d8d59c51) id=166965" daemon prio=5 TIMED_WAITING java.lang.Object.wait(Native Method) hudson.remoting.Request.call(Request.java:177) hudson.remoting.Channel.call(Channel.java:954) hudson.plugin.versioncolumn.VersionMonitor$1.monitor(VersionMonitor.java:58) hudson.plugin.versioncolumn.VersionMonitor$1.monitor(VersionMonitor.java:55) hudson.node_monitors.AbstractNodeMonitorDescriptor.monitor(AbstractNodeMonitorDescriptor.java:154) hudson.node_monitors.AbstractNodeMonitorDescriptor$Record.run(AbstractNodeMonitorDescriptor.java:306)
          

           

          johnlengeling John Lengeling added a comment - Nickolas, I had 7 successful builds before the pingThread killed the node during xunit processing when running xunit version  2.3.8-rc831.cbb77af6dfed.   Console Output:Console Output: [2020-02-10T13:35:50.172Z] WARNING: XUnitBuilder step is deprecated since 2.x, it has been replaced by XUnitPublisher. This builer will be remove in version 3.x [2020-02-10T13:35:50.175Z] INFO: Starting to record. [2020-02-10T13:35:50.175Z] INFO: Processing GoogleTest-1.8[2020-02-10T13:35:50.250Z] INFO: [GoogleTest-1.8] - 959 test report file(s) were found with the pattern 'j/wr/build/_TestArtifacts*/*.xml' relative to '/home/jenkins/workspace/kb/os' for the testing framework 'GoogleTest-1.8'. Theads related to this aws node i-0b11d0f08d8d59c51: dev-large (i-0b11d0f08d8d59c51)  "org.jenkinsci.plugins.workflow.steps.SynchronousNonBlockingStepExecution [#93] / waiting for EC2 (foo-aws) - dev-large (i-0b11d0f08d8d59c51) id=166000" daemon prio=5 TIMED_WAITING java.lang.Object.wait(Native Method) hudson.remoting.Request.call(Request.java:177) hudson.remoting.Channel.call(Channel.java:954) hudson.FilePath.act(FilePath.java:1069) hudson.FilePath.act(FilePath.java:1058) org.jenkinsci.plugins.xunit.XUnitProcessor.processTestsReport(XUnitProcessor.java:195) org.jenkinsci.plugins.xunit.XUnitProcessor.process(XUnitProcessor.java:159) org.jenkinsci.plugins.xunit.XUnitBuilder.perform(XUnitBuilder.java:126) org.jenkinsci.plugins.workflow.steps.CoreStep$Execution.run(CoreStep.java:80) org.jenkinsci.plugins.workflow.steps.CoreStep$Execution.run(CoreStep.java:67) org.jenkinsci.plugins.workflow.steps.SynchronousNonBlockingStepExecution.lambda$start$0(SynchronousNonBlockingStepExecution.java:47) org.jenkinsci.plugins.workflow.steps.SynchronousNonBlockingStepExecution$$Lambda$339/847132060.run(Unknown Source) java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) java.util.concurrent.FutureTask.run(FutureTask.java:266) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) java.lang.Thread.run(Thread.java:748)   "Channel reader thread: EC2 (foo-aws) - dev-large (i-0b11d0f08d8d59c51)" daemon prio=5 WAITING java.lang.Object.wait(Native Method) java.lang.Object.wait(Object.java:502) com.trilead.ssh2.channel.FifoBuffer.read(FifoBuffer.java:212) com.trilead.ssh2.channel.Channel$Output.read(Channel.java:127) com.trilead.ssh2.channel.ChannelManager.getChannelData(ChannelManager.java:933) com.trilead.ssh2.channel.ChannelInputStream.read(ChannelInputStream.java:58) com.trilead.ssh2.channel.ChannelInputStream.read(ChannelInputStream.java:79) hudson.remoting.FlightRecorderInputStream.read(FlightRecorderInputStream.java:91) hudson.remoting.ChunkedInputStream.readHeader(ChunkedInputStream.java:72) hudson.remoting.ChunkedInputStream.readUntilBreak(ChunkedInputStream.java:103) hudson.remoting.ChunkedCommandTransport.readBlock(ChunkedCommandTransport.java:39) hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:34) hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:63)   "Monitoring EC2 (foo-aws) - dev-large (i-0b11d0f08d8d59c51) for Remoting Version / waiting for EC2 (foo-aws) - dev-large (i-0b11d0f08d8d59c51) id=166965" daemon prio=5 TIMED_WAITING java.lang.Object.wait(Native Method) hudson.remoting.Request.call(Request.java:177) hudson.remoting.Channel.call(Channel.java:954) hudson.plugin.versioncolumn.VersionMonitor$1.monitor(VersionMonitor.java:58) hudson.plugin.versioncolumn.VersionMonitor$1.monitor(VersionMonitor.java:55) hudson.node_monitors.AbstractNodeMonitorDescriptor.monitor(AbstractNodeMonitorDescriptor.java:154) hudson.node_monitors.AbstractNodeMonitorDescriptor$Record.run(AbstractNodeMonitorDescriptor.java:306)  
          nfalco Nikolas Falco added a comment -

          I can try to increase the sleep time after 20 processed input files and maybe also try to setup the maximun number of thread used by saxon. By default saxon use the maximun number of thread to make CPU full load.

          nfalco Nikolas Falco added a comment - I can try to increase the sleep time after 20 processed input files and maybe also try to setup the maximun number of thread used by saxon. By default saxon use the maximun number of thread to make CPU full load.

          Other than the 1 hung job soon after I loaded xunit-2.3.8-rc831.cbb77af6dfed, the issue looks to be improved.  

          The test job has been running hourly for the past 2 days with 0 hangs versus several hangs per day with 2.3.7.  So you might be on the right track.

          I did also increase the ping thread interval and timeout values to 1500/1200, but I have now restored the ping thread interval/timeout values back to the defaults (300/240).  I will let that bake for a few days.

           

          johnlengeling John Lengeling added a comment - Other than the 1 hung job soon after I loaded xunit-2.3.8-rc831.cbb77af6dfed, the issue looks to be improved.   The test job has been running hourly for the past 2 days with 0 hangs versus several hangs per day with 2.3.7.  So you might be on the right track. I did also increase the ping thread interval and timeout values to 1500/1200, but I have now restored the ping thread interval/timeout values back to the defaults (300/240).  I will let that bake for a few days.  
          nfalco Nikolas Falco added a comment -

          I had complete the work. The sleep time is not configurable so a futher changes in the code should not be needed.

          To be back compatible is 0ms for old configurations (is not the case of pipelines). By default for new definition is *10*ms.

          johnlengeling output log warn a message that you are using a deprecated class and you instantiate it by reflection by means pipeline (steps.step([$class : 'XUnitBuilder', testTimeMargin: '3000', ....) that is subject any class changes. I heavy suggest to replace with xunit step (that is not XUnitBuilder) that manage the configuration of sleep parameter (XUnitBuilder does not).

          If I do not remember bad you had test 20ms each 20 processed report. The default is now 10ms every 10 processed reports.

          xunit thresholdMode: 2, thresholds: [failed(failureThreshold: '100', unstableThreshold: '100'), skipped(failureThreshold: '100', unstableThreshold: '100')], tools: [GoogleTest(deleteOutputFiles: true, failIfNotNew: false, pattern: $filePattern, skipNoTestFiles: true, stopProcessingIfError: true)], sleepTime: 15
          
          nfalco Nikolas Falco added a comment - I had complete the work. The sleep time is not configurable so a futher changes in the code should not be needed. To be back compatible is 0ms for old configurations (is not the case of pipelines). By default for new definition is *10*ms. johnlengeling output log warn a message that you are using a deprecated class and you instantiate it by reflection by means pipeline ( steps.step([$class : 'XUnitBuilder', testTimeMargin: '3000', .... ) that is subject any class changes. I heavy suggest to replace with xunit step (that is not XUnitBuilder) that manage the configuration of sleep parameter (XUnitBuilder does not). If I do not remember bad you had test 20ms each 20 processed report. The default is now 10ms every 10 processed reports. xunit thresholdMode: 2, thresholds: [failed(failureThreshold: '100' , unstableThreshold: '100' ), skipped(failureThreshold: '100' , unstableThreshold: '100' )], tools: [GoogleTest(deleteOutputFiles: true , failIfNotNew: false , pattern: $filePattern, skipNoTestFiles: true , stopProcessingIfError: true )], sleepTime: 15
          nfalco Nikolas Falco made changes -
          Resolution Fixed [ 1 ]
          Status Reopened [ 4 ] Fixed but Unreleased [ 10203 ]
          nfalco Nikolas Falco made changes -
          Released As 2.3.8
          Status Fixed but Unreleased [ 10203 ] Resolved [ 5 ]

          People

            nfalco Nikolas Falco
            directhex Jo Shields
            Votes:
            1 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: