Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-53030

Intermittent hudson.util.IOException2 + org.xml.sax.SAXParseException

    • Icon: Bug Bug
    • Resolution: Cannot Reproduce
    • Icon: Major Major

      I am working on a build + test + deployment script for multiple platforms. So far the script is still mostly a dummy with the actual work still left to fill in. Nevertheless, when running the script I get intermittent (about one in three or four runs) failures caused by always the same two exceptions:

       

      org.xml.sax.SAXParseException; systemId: file:/var/lib/jenkins/jobs/Temp.ScriptTest/branches/branches-nodenesting.skpm19/builds/109/changelog2.xml; lineNumber: 1; columnNumber: 1; Premature end of file.
       at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1239)
       at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:643)
       at org.apache.commons.digester.Digester.parse(Digester.java:1871)
       at hudson.scm.SubversionChangeLogParser.parse(SubversionChangeLogParser.java:76)
      Caused: hudson.util.IOException2: Failed to parse /var/lib/jenkins/jobs/Temp.ScriptTest/branches/branches-nodenesting.skpm19/builds/109/changelog2.xml
       at hudson.scm.SubversionChangeLogParser.parse(SubversionChangeLogParser.java:80)
       at hudson.scm.SubversionChangeLogParser.parse(SubversionChangeLogParser.java:43)
       at org.jenkinsci.plugins.workflow.job.WorkflowRun.onCheckout(WorkflowRun.java:1024)
       at org.jenkinsci.plugins.workflow.job.WorkflowRun.access$1400(WorkflowRun.java:143)
       at org.jenkinsci.plugins.workflow.job.WorkflowRun$SCMListenerImpl.onCheckout(WorkflowRun.java:1232)
       at org.jenkinsci.plugins.workflow.steps.scm.SCMStep.checkout(SCMStep.java:127)
       at org.jenkinsci.plugins.workflow.steps.scm.SCMStep$StepExecutionImpl.run(SCMStep.java:85)
       at org.jenkinsci.plugins.workflow.steps.scm.SCMStep$StepExecutionImpl.run(SCMStep.java:75)
       at org.jenkinsci.plugins.workflow.steps.AbstractSynchronousNonBlockingStepExecution$1$1.call(AbstractSynchronousNonBlockingStepExecution.java:47)
       at hudson.security.ACL.impersonate(ACL.java:290)
       at org.jenkinsci.plugins.workflow.steps.AbstractSynchronousNonBlockingStepExecution$1.run(AbstractSynchronousNonBlockingStepExecution.java:44)
       at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
       at java.util.concurrent.FutureTask.run(FutureTask.java:266)
       at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
       at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
       at java.lang.Thread.run(Thread.java:745)

       

      The file involved is always changelog2.xml.

      The script I am running is this:

       

      // --------------------------------------------------------------------------
      // Project definitions used in the declarative pipeline below
      // --------------------------------------------------------------------------
      def maintainers = "${env.MAIL_VGIMPLE}"
          // these people will be notified when the build fails; comma separated
      def platforms = util.StringToArray(env.CONAN_PLATFORMS)
          // alternative: ['win32', 'x64', ...]
      def upstreamjobs = "BuildSystem.CMake.all"
          // these jobs will trigger this job; comma separated
      def parallelbuild = true
          // set to true to execute builds parallel;
      def crossplatformbuild = true
          // use cross platform builds where possible (i.e. Linux ARM builds will
          // be carried out on Linux PC nodes)
      
      // --------------------------------------------------------------------------
      // Diagnostic output and preparations
      // --------------------------------------------------------------------------
      echo "**** Building for the following platforms:"
      print platforms
      
      builds = [:]
      builds.failFast = true
      
      stashNameArtifacts = "artifacts." + env.BUILD_NUMBER + "."
      stashNameTests = "tests." + env.BUILD_NUMBER + "."
      
      // --------------------------------------------------------------------------
      // The build and test step
      // --------------------------------------------------------------------------
      def BuildAndTestStep(platform) {
          def isCrossPlatformBuild = util.IsCrossPlatformBuild(platform)
          try {        testplatform = platform.startsWith('build.') ? platform - 'build.' : platform
              node (platform) {
                  checkout scm
                  echo "--------> build step for platform " + platform
                  echo "--------> stash " + stashNameArtifacts + testplatform
                  if (isCrossPlatformBuild) {
                      echo "--------> stash " + stashNameTests + testplatform
                  } else {
                      echo "--------> execute tests directly on the node where the build ran"
                  }
              }
              if (isCrossPlatformBuild) {
                  node (testplatform) {
                      echo "--------> unstash " + stashNameTests + testplatform
                      echo "--------> execute cross platform test"
                  }
              }
          } catch(Exception e) {
              echo "**** An error occurred on build platform " + platform
              print e
              throw e
          }
      }
      
      // --------------------------------------------------------------------------
      // Pipeline Declaration
      // --------------------------------------------------------------------------
      pipeline {
          agent none
          triggers {
              pollSCM('@hourly')
              upstream(upstreamProjects: upstreamjobs, threshold: hudson.model.Result.SUCCESS)
          }
          stages {
              // ------------------------------------------------------------------
              // Build and Test step
              // ------------------------------------------------------------------
              stage ('build & test') {
                  steps {
                      script {
                          // now execute the builds as determined - either parallel
                          // or sequentially
                          prefix = crossplatformbuild ? "build." : ""
                          if (parallelbuild) {
                              for (int i = 0; i < platforms.size(); ++i) {
                                  def plt = prefix + platforms[i]
                                  builds[plt] = { -> BuildAndTestStep(plt) }
                              }
                              parallel builds
                          } else {
                              for (int i = 0; i < platforms.size(); ++i) {
                                  BuildAndTestStep(prefix + platforms[i])
                              }
                          }
                      }
                  }
              }
              // ------------------------------------------------------------------
              // Deployment step
              // ------------------------------------------------------------------
              stage ('deploy') {
                  steps {
                      script {
                          for (int i = 0; i < platforms.size(); ++i) {
                              node { // actual node does not matter!
                                  echo "--------> unstash " + stashNameArtifacts + platforms[i]
                                  echo "--------> install unstashed artifacts to either cvdev2 or conan"
                              }
                          }
                      }
                  }
              }
          }
          post {
              failure {
                  echo "*** Build Regression ***"
                  script {
                      mail (to: "${env.MAIL_VGIMPLE}",
                      subject: "Build failed in Jenkins: ${JOB_NAME} #${BUILD_NUMBER}",
                      body: 'See ' + currentBuild.absoluteUrl)
                  }
              }
              fixed {
                  echo "*** Build Fixed ***"
                  script {
                      mail (to: "${env.MAIL_VGIMPLE}",
                      subject: "Jenkins build is back to normal: ${JOB_NAME} #${BUILD_NUMBER}",
                      body: 'See ' + currentBuild.absoluteUrl)
                  }
              }
          }
      }

      A typical output log is attached (file Log114.txt).

      Looking at the failed builds it looks like in most cases the (generally slower) ARM-based test nodes are the ones where the exception is raised. The exceptions never occur when executing the script sequentially (i.e. when setting 'parallelbuild' to false and the 'parallel' block is never hit).

      My next move will be to try and filter out these particular exceptions and see if I have a chance at completing the build regardless but an approach where this is not necessary would of course be my preference.

       

          [JENKINS-53030] Intermittent hudson.util.IOException2 + org.xml.sax.SAXParseException

          Volker Gimple added a comment -

          After doing some more testing with more finely grained and selective try/catch blocks I can add the following details:

          • If it does occur, then the IOException2 always occurs during the scm checkout call in BuildAndTestStep()
          • I cannot confirm my earlier impression that the exception is more likely to happen for one of the ARM platforms (in fact the scm step is usually not executed on one of those as these are typically used for testing only, not for building)
          • I have not seen the exception fly when there is no workspace left from the previous run
          • When the exception is thrown, the sources on the affected machines have not been updated to the latest scm version

          For the moment I am working around this by sleeping something between 500 and 1500 milliseconds before retrying the scm checkout step.

          Volker Gimple added a comment - After doing some more testing with more finely grained and selective try/catch blocks I can add the following details: If it does occur, then the IOException2 always occurs during the scm checkout call in BuildAndTestStep() I cannot confirm my earlier impression that the exception is more likely to happen for one of the ARM platforms (in fact the scm step is usually not executed on one of those as these are typically used for testing only, not for building) I have not seen the exception fly when there is no workspace left from the previous run When the exception is thrown, the sources on the affected machines have not been updated to the latest scm version For the moment I am working around this by sleeping something between 500 and 1500 milliseconds before retrying the scm checkout step.

          Hi all,

          I had a similar issue when running some (declarative) pipeline steps in parallel:

          org.xml.sax.SAXParseException; systemId: file:/var/lib/jenkins/jobs/XRA31_components-software/branches/branches-1-x-sc-qam.9mj31f/builds/468/changelog2.xml; lineNumber: 1; columnNumber: 537; Content is not allowed in trailing section.
          	at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1239)
          	at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:643)
          	at org.apache.commons.digester.Digester.parse(Digester.java:1871)
          	at hudson.scm.SubversionChangeLogParser.parse(SubversionChangeLogParser.java:76)
          Caused: hudson.util.IOException2: Failed to parse /var/lib/jenkins/jobs/XRA31_components-software/branches/branches-1-x-sc-qam.9mj31f/builds/468/changelog2.xml
          	at hudson.scm.SubversionChangeLogParser.parse(SubversionChangeLogParser.java:80)
          	at hudson.scm.SubversionChangeLogParser.parse(SubversionChangeLogParser.java:43)
          	at org.jenkinsci.plugins.workflow.job.WorkflowRun.onCheckout(WorkflowRun.java:1045)
          	at org.jenkinsci.plugins.workflow.job.WorkflowRun.access$1400(WorkflowRun.java:143)
          	at org.jenkinsci.plugins.workflow.job.WorkflowRun$SCMListenerImpl.onCheckout(WorkflowRun.java:1253)
          	at org.jenkinsci.plugins.workflow.steps.scm.SCMStep.checkout(SCMStep.java:136)
          	at org.jenkinsci.plugins.workflow.steps.scm.SCMStep$StepExecutionImpl.run(SCMStep.java:90)
          	at org.jenkinsci.plugins.workflow.steps.scm.SCMStep$StepExecutionImpl.run(SCMStep.java:77)
          	at org.jenkinsci.plugins.workflow.steps.SynchronousNonBlockingStepExecution$1$1.call(SynchronousNonBlockingStepExecution.java:50)
          	at hudson.security.ACL.impersonate(ACL.java:290)
          	at org.jenkinsci.plugins.workflow.steps.SynchronousNonBlockingStepExecution$1.run(SynchronousNonBlockingStepExecution.java:47)
          	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
          	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
          	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
          	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
          	at java.lang.Thread.run(Thread.java:748)

          The created changelog2.xml was:

          <?xml version="1.0" encoding="UTF-8"?><log><logentry revision="3931"><author>tom.g</author><date>2018-10-03T10:29:53.640919Z</date><paths><path action="M" localPath="src/xra31rpc/xra31rpc.thrift" kind="file">/software/branches/1.x-sc_qam/rpc/src/xra31rpc/xra31rpc.thrift</path><path action="M" localPath="src/samplecode/test_cpp_client.cpp" kind="file">/software/branches/1.x-sc_qam/rpc/src/samplecode/test_cpp_client.cpp</path></paths><msg>The `inputLevel` is not yet available for the `PhysicalConnectorStatus`.</msg></logentry></log>ware/branches/1.x-sc_qam/libxra31ctrl/src/xra31ctrl/PcapFileSaver.cpp</path><path action="M" localPath="src/xra31ctrl/PcapFileSaver.h" kind="file">/software/branches/1.x-sc_qam/libxra31ctrl/src/xra31ctrl/PcapFileSaver.h</path></paths><msg>- Remove packet filtering (i.e. `PacketFilter`) support in `PcapFileSaver`.
          - Use correct thread protection by using `std::atomic` where applicable.
          - Disable storing unused (member) variables.
          - Code cleanup.</msg></logentry><logentry revision="3930"><author>tom.g</author><date>2018-10-03T09:41:08.941382Z</date><paths><path action="M" localPath="src/xra31ctrl/DsSynchronizer.cpp" kind="file">/software/branches/1.x-sc_qam/libxra31ctrl/src/xra31ctrl/DsSynchronizer.cpp</path></paths><msg>- Use C++ interface for `stdio`.</msg></logentry></log>

          The localPath in this changelog file are part of different checkouts (from the same commit in the same repository though), but still I suspect it has something to do with the parallel execution...

          I hope this helps!

          With best regards,
          Tom.

          Tom Ghyselinck added a comment - Hi all, I had a similar issue when running some ( declarative ) pipeline steps in parallel : org.xml.sax.SAXParseException; systemId: file:/var/lib/jenkins/jobs/XRA31_components-software/branches/branches-1-x-sc-qam.9mj31f/builds/468/changelog2.xml; lineNumber: 1; columnNumber: 537; Content is not allowed in trailing section. at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1239) at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:643) at org.apache.commons.digester.Digester.parse(Digester.java:1871) at hudson.scm.SubversionChangeLogParser.parse(SubversionChangeLogParser.java:76) Caused: hudson.util.IOException2: Failed to parse /var/lib/jenkins/jobs/XRA31_components-software/branches/branches-1-x-sc-qam.9mj31f/builds/468/changelog2.xml at hudson.scm.SubversionChangeLogParser.parse(SubversionChangeLogParser.java:80) at hudson.scm.SubversionChangeLogParser.parse(SubversionChangeLogParser.java:43) at org.jenkinsci.plugins.workflow.job.WorkflowRun.onCheckout(WorkflowRun.java:1045) at org.jenkinsci.plugins.workflow.job.WorkflowRun.access$1400(WorkflowRun.java:143) at org.jenkinsci.plugins.workflow.job.WorkflowRun$SCMListenerImpl.onCheckout(WorkflowRun.java:1253) at org.jenkinsci.plugins.workflow.steps.scm.SCMStep.checkout(SCMStep.java:136) at org.jenkinsci.plugins.workflow.steps.scm.SCMStep$StepExecutionImpl.run(SCMStep.java:90) at org.jenkinsci.plugins.workflow.steps.scm.SCMStep$StepExecutionImpl.run(SCMStep.java:77) at org.jenkinsci.plugins.workflow.steps.SynchronousNonBlockingStepExecution$1$1.call(SynchronousNonBlockingStepExecution.java:50) at hudson.security.ACL.impersonate(ACL.java:290) at org.jenkinsci.plugins.workflow.steps.SynchronousNonBlockingStepExecution$1.run(SynchronousNonBlockingStepExecution.java:47) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) The created changelog2.xml was: <?xml version= "1.0" encoding= "UTF-8" ?> <log> <logentry revision= "3931" > <author> tom.g </author> <date> 2018-10-03T10:29:53.640919Z </date> <paths> <path action= "M" localPath= "src/xra31rpc/xra31rpc.thrift" kind= "file" > /software/branches/1.x-sc_qam/rpc/src/xra31rpc/xra31rpc.thrift </path> <path action= "M" localPath= "src/samplecode/test_cpp_client.cpp" kind= "file" > /software/branches/1.x-sc_qam/rpc/src/samplecode/test_cpp_client.cpp </path> </paths> <msg> The `inputLevel` is not yet available for the `PhysicalConnectorStatus`. </msg> </logentry> </log> ware/branches/1.x-sc_qam/libxra31ctrl/src/xra31ctrl/PcapFileSaver.cpp </path> <path action= "M" localPath= "src/xra31ctrl/PcapFileSaver.h" kind= "file" > /software/branches/1.x-sc_qam/libxra31ctrl/src/xra31ctrl/PcapFileSaver.h </path> </paths> <msg> - Remove packet filtering (i.e. `PacketFilter`) support in `PcapFileSaver`. - Use correct thread protection by using `std::atomic` where applicable. - Disable storing unused (member) variables. - Code cleanup. </msg> </logentry> <logentry revision= "3930" > <author> tom.g </author> <date> 2018-10-03T09:41:08.941382Z </date> <paths> <path action= "M" localPath= "src/xra31ctrl/DsSynchronizer.cpp" kind= "file" > /software/branches/1.x-sc_qam/libxra31ctrl/src/xra31ctrl/DsSynchronizer.cpp </path> </paths> <msg> - Use C++ interface for `stdio`. </msg> </logentry> </log> The localPath in this changelog file are part of different checkouts ( from the same commit in the same repository though ), but still I suspect it has something to do with the parallel execution... I hope this helps! With best regards, Tom.

          Paul Allen added a comment - - edited

          Sorry updating the wrong job.

          Internally we have the same problem with the changelog files being read before the writer has finished. For the moment we work around this issue in the `p4-plugin` by catching the exception, logging the error and returning an empty ChangeLogSet.

          Paul Allen added a comment - - edited Sorry updating the wrong job. Internally we have the same problem with the changelog files being read before the writer has finished. For the moment we work around this issue in the `p4-plugin` by catching the exception, logging the error and returning an empty ChangeLogSet.

          I had this error before when doing parallel checkouts.

          I removed parallel and I now do sequential checkouts. It's working fine since several months.

          Emmanuel Ruhnau added a comment - I had this error before when doing parallel checkouts. I removed parallel and I now do sequential checkouts. It's working fine since several months.

          Emmanuel Ruhnau added a comment - Seems to be fixed by  https://issues.jenkins-ci.org/browse/JENKINS-34313

          Mark Waite added a comment -

          We're regularly using ARM builds on ci.jenkins.io without seeing these issues. I'm regularly using ARM builds with ARM servers from AWS and Oracle without seeing this issue. I'm regularly usign Raspberry Pi single board computers as build agents without seeing this issue. I'm closing it as "cannot reproduce".

          Mark Waite added a comment - We're regularly using ARM builds on ci.jenkins.io without seeing these issues. I'm regularly using ARM builds with ARM servers from AWS and Oracle without seeing this issue. I'm regularly usign Raspberry Pi single board computers as build agents without seeing this issue. I'm closing it as "cannot reproduce".

            vaimr Denis Saponenko
            vgimple Volker Gimple
            Votes:
            3 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated:
              Resolved: