Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-63586

xUnit publishing occasionally hangs indefinitely on windows agent

    • Icon: Bug Bug
    • Resolution: Fixed
    • Icon: Major Major
    • xunit-plugin
    • None

       

      The build after finishing testing, started to parse GoogleTest results with xUnit plugin.
      At some log it hanged indefinitely.
      It does occur once in 2 month with a rate around 600 build per day on windows machines.
      It did occurred with xUnit version 1.102 too, which was used previously. We hoped it would not occur with the new version.

      After killing the hanged process, the results are processed properly.

      23:45:13 INFO: Starting to record.
      23:45:13 INFO: Processing GoogleTest-1.8
      23:45:13 INFO: [GoogleTest-1.8] - 1 test report file(s) were found with the pattern '*result.xml' relative to '...' for the testing framework 'GoogleTest-1.8'.
      
      # killed process (3 processes were grouped together, from which I left running the jvm process and the connection initiator)
      
      10:54:58 INFO: Check 'Failed Tests' threshold.
      10:54:58 INFO: The total number of tests for the threshold 'Failed Tests' exceeds the specified "unstable threshold" value.
      10:54:58 INFO: Setting the build status to UNSTABLE
      10:54:58 INFO: Stopping recording.
      10:54:58 Build step 'Publish xUnit test result report' changed build result to UNSTABLE

      Seemingly unrelevant, but int the agent connection console, the following suspicious messages were:

      org.jvnet.winp.Native loadByUrlWarning: Failed to load dll from static location
      java.lang.UnsatisfiedLinkError: Native Library ... cache/jars/44/winp.x64....dll already loaded in another classloader
      ...
      at org.jvnet.winp.WinProcess.enableDebugPrivilege(WinProcess.java:245)
      ...

      Maybe related, but not obvious: https://issues.jenkins-ci.org/browse/JENKINS-20913

      Warning at function xunit:is-empty on line 51 SXWN9000: A function that computes atomic values should use xsl:sequence rather than xsl:value-of

      This did not appeared with the older xUnit version, but still had the hang.

       

       

          [JENKINS-63586] xUnit publishing occasionally hangs indefinitely on windows agent

          Nikolas Falco added a comment -

          Very difficult to understand the root cause.

          The warning is not a problem, I will change the XSLT to avoid the warning.

          About hang if due to the plugin than increase the "Process release time (ms)" advanced option. This leave more time between 10 unit test block and the next one (releases the JVM threads for a while due high computation). If not resolve it's impossible to understand where is blocked. Maybe a thread dump when hangs could help.

          Nikolas Falco added a comment - Very difficult to understand the root cause. The warning is not a problem, I will change the XSLT to avoid the warning. About hang if due to the plugin than increase the "Process release time (ms)" advanced option. This leave more time between 10 unit test block and the next one (releases the JVM threads for a while due high computation). If not resolve it's impossible to understand where is blocked. Maybe a thread dump when hangs could help.

          Okay, will double the process release time, and try to create a thread dump next time.

          Rudolf Horvath added a comment - Okay, will double the process release time, and try to create a thread dump next time.

          Nikolas Falco added a comment -

          Use 30/40ms, depends on how many test case do you have to process.

          I will leave this bug open for two months after that I will close as not reproducible because I suppose it's ok

          Nikolas Falco added a comment - Use 30/40ms, depends on how many test case do you have to process. I will leave this bug open for two months after that I will close as not reproducible because I suppose it's ok

          Yes, it is ok, we will see.

          Rudolf Horvath added a comment - Yes, it is ok, we will see.

          Rudolf Horvath added a comment - - edited

          It started to occur more (2 times in 6 hours).  Increased Process release time to 40ms.
          Here are the stack trace, caused by the job abort:

          INFO: Starting to record.
          INFO: Processing GoogleTest-1.8
          INFO: [GoogleTest-1.8] - 1 test report file(s) were found with the pattern '*result.xml' relative to '...' for the testing framework 'GoogleTest-1.8'.
          ERROR: Step ‘Publish xUnit test result report’ aborted due to exception: 
          java.lang.InterruptedException
          	at java.base/java.lang.Object.wait(Native Method)
          	at hudson.remoting.Request.call(Request.java:176)
          	at hudson.remoting.Channel.call(Channel.java:1000)
          	at hudson.FilePath.act(FilePath.java:1070)
          	at hudson.FilePath.act(FilePath.java:1059)
          	at org.jenkinsci.plugins.xunit.XUnitProcessor.processTestsReport(XUnitProcessor.java:195)
          	at org.jenkinsci.plugins.xunit.XUnitProcessor.process(XUnitProcessor.java:159)
          	at org.jenkinsci.plugins.xunit.XUnitPublisher.perform(XUnitPublisher.java:186)
          	at jenkins.tasks.SimpleBuildStep.perform(SimpleBuildStep.java:112)
          	at hudson.tasks.BuildStepCompatibilityLayer.perform(BuildStepCompatibilityLayer.java:78)
          	at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:20)
          	at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:741)
          	at hudson.model.AbstractBuild$AbstractBuildExecution.performAllBuildSteps(AbstractBuild.java:690)
          	at hudson.model.Build$BuildExecution.post2(Build.java:186)
          	at hudson.model.AbstractBuild$AbstractBuildExecution.post(AbstractBuild.java:635)
          	at hudson.model.Run.execute(Run.java:1905)
          	at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
          	at hudson.model.ResourceController.execute(ResourceController.java:97)
          	at hudson.model.Executor.run(Executor.java:428)
          Finished: FAILURE
          
          INFO: Starting to record.
          INFO: Processing GoogleTest-1.8
          INFO: [GoogleTest-1.8] - 1 test report file(s) were found with the pattern '*result.xml' relative to '...' for the testing framework 'GoogleTest-1.8'.
          ERROR: Step ‘Publish xUnit test result report’ aborted due to exception: 
          java.lang.InterruptedException
          	at java.base/java.lang.Object.wait(Native Method)
          	at hudson.remoting.Request.call(Request.java:176)
          	at hudson.remoting.Channel.call(Channel.java:1000)
          	at hudson.FilePath.act(FilePath.java:1070)
          	at hudson.FilePath.act(FilePath.java:1059)
          	at org.jenkinsci.plugins.xunit.XUnitProcessor.processTestsReport(XUnitProcessor.java:195)
          	at org.jenkinsci.plugins.xunit.XUnitProcessor.process(XUnitProcessor.java:159)
          	at org.jenkinsci.plugins.xunit.XUnitPublisher.perform(XUnitPublisher.java:186)
          	at jenkins.tasks.SimpleBuildStep.perform(SimpleBuildStep.java:112)
          	at hudson.tasks.BuildStepCompatibilityLayer.perform(BuildStepCompatibilityLayer.java:78)
          	at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:20)
          	at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:741)
          	at hudson.model.AbstractBuild$AbstractBuildExecution.performAllBuildSteps(AbstractBuild.java:690)
          	at hudson.model.Build$BuildExecution.post2(Build.java:186)
          	at hudson.model.AbstractBuild$AbstractBuildExecution.post(AbstractBuild.java:635)
          	at hudson.model.Run.execute(Run.java:1905)
          	at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
          	at hudson.model.ResourceController.execute(ResourceController.java:97)
          	at hudson.model.Executor.run(Executor.java:428)
          ERROR: Step ‘Groovy Postbuild’ aborted due to exception: 
          java.nio.channels.ClosedByInterruptException
          	at java.base/java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:199)
          	at java.base/sun.nio.ch.FileChannelImpl.endBlocking(FileChannelImpl.java:162)
          	at java.base/sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:285)
          	at hudson.util.FileChannelWriter.write(FileChannelWriter.java:73)
          	at java.base/java.io.Writer.write(Writer.java:290)
          	at hudson.util.AtomicFileWriter.write(AtomicFileWriter.java:162)
          	at java.base/java.io.Writer.write(Writer.java:249)
          	at hudson.XmlFile.write(XmlFile.java:191)
          	at hudson.model.Run.save(Run.java:2050)
          	at org.jvnet.hudson.plugins.groovypostbuild.GroovyPostbuildRecorder.perform(GroovyPostbuildRecorder.java:414)
          	at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:20)
          	at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:741)
          	at hudson.model.AbstractBuild$AbstractBuildExecution.performAllBuildSteps(AbstractBuild.java:690)
          	at hudson.model.Build$BuildExecution.post2(Build.java:186)
          	at hudson.model.AbstractBuild$AbstractBuildExecution.post(AbstractBuild.java:635)
          	at hudson.model.Run.execute(Run.java:1905)
          	at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
          	at hudson.model.ResourceController.execute(ResourceController.java:97)
          	at hudson.model.Executor.run(Executor.java:428)
          Finished: FAILURE

          Will do thread dump next time, I just communicated the situation to our team.

          Rudolf Horvath added a comment - - edited It started to occur more (2 times in 6 hours).  Increased Process release time to 40ms. Here are the stack trace, caused by the job abort: INFO: Starting to record. INFO: Processing GoogleTest-1.8 INFO: [GoogleTest-1.8] - 1 test report file(s) were found with the pattern '*result.xml' relative to '...' for the testing framework 'GoogleTest-1.8' . ERROR: Step ‘Publish xUnit test result report’ aborted due to exception: java.lang.InterruptedException at java.base/java.lang. Object .wait(Native Method) at hudson.remoting.Request.call(Request.java:176) at hudson.remoting.Channel.call(Channel.java:1000) at hudson.FilePath.act(FilePath.java:1070) at hudson.FilePath.act(FilePath.java:1059) at org.jenkinsci.plugins.xunit.XUnitProcessor.processTestsReport(XUnitProcessor.java:195) at org.jenkinsci.plugins.xunit.XUnitProcessor.process(XUnitProcessor.java:159) at org.jenkinsci.plugins.xunit.XUnitPublisher.perform(XUnitPublisher.java:186) at jenkins.tasks.SimpleBuildStep.perform(SimpleBuildStep.java:112) at hudson.tasks.BuildStepCompatibilityLayer.perform(BuildStepCompatibilityLayer.java:78) at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:20) at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:741) at hudson.model.AbstractBuild$AbstractBuildExecution.performAllBuildSteps(AbstractBuild.java:690) at hudson.model.Build$BuildExecution.post2(Build.java:186) at hudson.model.AbstractBuild$AbstractBuildExecution.post(AbstractBuild.java:635) at hudson.model.Run.execute(Run.java:1905) at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43) at hudson.model.ResourceController.execute(ResourceController.java:97) at hudson.model.Executor.run(Executor.java:428) Finished: FAILURE INFO: Starting to record. INFO: Processing GoogleTest-1.8 INFO: [GoogleTest-1.8] - 1 test report file(s) were found with the pattern '*result.xml' relative to '...' for the testing framework 'GoogleTest-1.8' . ERROR: Step ‘Publish xUnit test result report’ aborted due to exception: java.lang.InterruptedException at java.base/java.lang. Object .wait(Native Method) at hudson.remoting.Request.call(Request.java:176) at hudson.remoting.Channel.call(Channel.java:1000) at hudson.FilePath.act(FilePath.java:1070) at hudson.FilePath.act(FilePath.java:1059) at org.jenkinsci.plugins.xunit.XUnitProcessor.processTestsReport(XUnitProcessor.java:195) at org.jenkinsci.plugins.xunit.XUnitProcessor.process(XUnitProcessor.java:159) at org.jenkinsci.plugins.xunit.XUnitPublisher.perform(XUnitPublisher.java:186) at jenkins.tasks.SimpleBuildStep.perform(SimpleBuildStep.java:112) at hudson.tasks.BuildStepCompatibilityLayer.perform(BuildStepCompatibilityLayer.java:78) at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:20) at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:741) at hudson.model.AbstractBuild$AbstractBuildExecution.performAllBuildSteps(AbstractBuild.java:690) at hudson.model.Build$BuildExecution.post2(Build.java:186) at hudson.model.AbstractBuild$AbstractBuildExecution.post(AbstractBuild.java:635) at hudson.model.Run.execute(Run.java:1905) at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43) at hudson.model.ResourceController.execute(ResourceController.java:97) at hudson.model.Executor.run(Executor.java:428) ERROR: Step ‘Groovy Postbuild’ aborted due to exception: java.nio.channels.ClosedByInterruptException at java.base/java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:199) at java.base/sun.nio.ch.FileChannelImpl.endBlocking(FileChannelImpl.java:162) at java.base/sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:285) at hudson.util.FileChannelWriter.write(FileChannelWriter.java:73) at java.base/java.io.Writer.write(Writer.java:290) at hudson.util.AtomicFileWriter.write(AtomicFileWriter.java:162) at java.base/java.io.Writer.write(Writer.java:249) at hudson.XmlFile.write(XmlFile.java:191) at hudson.model.Run.save(Run.java:2050) at org.jvnet.hudson.plugins.groovypostbuild.GroovyPostbuildRecorder.perform(GroovyPostbuildRecorder.java:414) at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:20) at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:741) at hudson.model.AbstractBuild$AbstractBuildExecution.performAllBuildSteps(AbstractBuild.java:690) at hudson.model.Build$BuildExecution.post2(Build.java:186) at hudson.model.AbstractBuild$AbstractBuildExecution.post(AbstractBuild.java:635) at hudson.model.Run.execute(Run.java:1905) at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43) at hudson.model.ResourceController.execute(ResourceController.java:97) at hudson.model.Executor.run(Executor.java:428) Finished: FAILURE Will do thread dump next time, I just communicated the situation to our team.

          In some cases the plugin hangs only, when somebody "clicks into" the cmd window, which initiates the jenkins connection (agent.jar).
          We guess, after the click it somehow blocks the stderr stream, into which the plugin tries to write.

          Rudolf Horvath added a comment - In some cases the plugin hangs only, when somebody "clicks into" the cmd window, which initiates the jenkins connection (agent.jar). We guess, after the click it somehow blocks the stderr stream, into which the plugin tries to write.

          Nikolas Falco added a comment -

          Master log is not usefull I need the client (agent) log to see where the callable code hangs or raise the exception.

          Nikolas Falco added a comment - Master log is not usefull I need the client (agent) log to see where the callable code hangs or raise the exception.

          Nikolas Falco added a comment -

          In version 2.4.0 I change the build step to non blocking async that means it does not block anymore the execution thread. This should resolve. If not please reopen

          Nikolas Falco added a comment - In version 2.4.0 I change the build step to non blocking async that means it does not block anymore the execution thread. This should resolve. If not please reopen

          Thank you, we will check it out!

          Rudolf Horvath added a comment - Thank you, we will check it out!

            nfalco Nikolas Falco
            rudolf Rudolf Horvath
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: