Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-43176

MercurialChangeLogParser fails in parallel checkouts

    XMLWordPrintable

Details

    Description

      Using parallel nodes and checkout sometimes causes the following error:

      hudson.util.IOException2: Failed to parse /var/lib/jenkins/jobs/parallel-test/builds/28/changelog0.xml: '<?xml version="1.0" encoding="UTF-8"?>
       <changesets>
       '
       at hudson.plugins.mercurial.MercurialChangeLogParser.parse(MercurialChangeLogParser.java:55)
       at hudson.plugins.mercurial.MercurialChangeLogParser.parse(MercurialChangeLogParser.java:26)
       at org.jenkinsci.plugins.workflow.job.WorkflowRun.onCheckout(WorkflowRun.java:746)
       at org.jenkinsci.plugins.workflow.job.WorkflowRun.access$1500(WorkflowRun.java:125)
       at org.jenkinsci.plugins.workflow.job.WorkflowRun$SCMListenerImpl.onCheckout(WorkflowRun.java:936)
       at org.jenkinsci.plugins.workflow.steps.scm.SCMStep.checkout(SCMStep.java:123)
       at org.jenkinsci.plugins.workflow.steps.scm.SCMStep$StepExecutionImpl.run(SCMStep.java:83)
       at org.jenkinsci.plugins.workflow.steps.scm.SCMStep$StepExecutionImpl.run(SCMStep.java:73)
       at org.jenkinsci.plugins.workflow.steps.AbstractSynchronousNonBlockingStepExecution$1$1.call(AbstractSynchronousNonBlockingStepExecution.java:47)
       at hudson.security.ACL.impersonate(ACL.java:260)
       at org.jenkinsci.plugins.workflow.steps.AbstractSynchronousNonBlockingStepExecution$1.run(AbstractSynchronousNonBlockingStepExecution.java:44)
       at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
       at java.util.concurrent.FutureTask.run(FutureTask.java:266)
       at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
       at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
       at java.lang.Thread.run(Thread.java:745)
       Suppressed: hudson.util.IOException2: Failed to parse /var/lib/jenkins/jobs/parallel-test/builds/28/changelog0.xml: '<?xml version="1.0" encoding="UTF-8"?>
       <changesets>
       '
       ... 16 more
       Caused by: org.xml.sax.SAXParseException; systemId: file:/var/lib/jenkins/jobs/parallel-test/builds/28/changelog0.xml; lineNumber: 3; columnNumber: 1; XML document structures must start and end within the same entity.
       at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1239)
       at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:643)
       at org.apache.commons.digester.Digester.parse(Digester.java:1871)
       at hudson.plugins.mercurial.MercurialChangeLogParser.parse(MercurialChangeLogParser.java:51)
       ... 15 more
       Caused by: org.xml.sax.SAXParseException; systemId: file:/var/lib/jenkins/jobs/parallel-test/builds/28/changelog0.xml; lineNumber: 3; columnNumber: 1; XML document structures must start and end within the same entity.
       at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1239)
       at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:643)
       at org.apache.commons.digester.Digester.parse(Digester.java:1871)
       at hudson.plugins.mercurial.MercurialChangeLogParser.parse(MercurialChangeLogParser.java:51)
       ... 15 more
       Finished: FAILURE

      Here is the code that I used to generate that error:

       

      #!groovy
      
      try
      {
      parallel (
      0: { node { checkout($class: 'MercurialSCM', source: 'https://bitbucket.org/vicyap/jenkins-parallel-test', clean: true) } },
      1: { node { checkout($class: 'MercurialSCM', source: 'https://bitbucket.org/vicyap/jenkins-parallel-test', clean: true) } },
      2: { node { checkout($class: 'MercurialSCM', source: 'https://bitbucket.org/vicyap/jenkins-parallel-test', clean: true) } },
      3: { node { checkout($class: 'MercurialSCM', source: 'https://bitbucket.org/vicyap/jenkins-parallel-test', clean: true) } },
      4: { node { checkout($class: 'MercurialSCM', source: 'https://bitbucket.org/vicyap/jenkins-parallel-test', clean: true) } },
      5: { node { checkout($class: 'MercurialSCM', source: 'https://bitbucket.org/vicyap/jenkins-parallel-test', clean: true) } },
      6: { node { checkout($class: 'MercurialSCM', source: 'https://bitbucket.org/vicyap/jenkins-parallel-test', clean: true) } },
      7: { node { checkout($class: 'MercurialSCM', source: 'https://bitbucket.org/vicyap/jenkins-parallel-test', clean: true) } },
      8: { node { checkout($class: 'MercurialSCM', source: 'https://bitbucket.org/vicyap/jenkins-parallel-test', clean: true) } },
      9: { node { checkout($class: 'MercurialSCM', source: 'https://bitbucket.org/vicyap/jenkins-parallel-test', clean: true) } }
      )
      }
      catch (e)
      {
      throw e
      }
      

       

      However this error happens randomly. In the attachments, I did two identical runs, one passes, the other has multiple failed nodes.

       

      Also I was able to reproduce this error with just two parallel nodes.

       

      My current workaround is to not use parallel nodes, but then jobs run much slower. Does anyone have an alternative workaround or solution? Or how do I even get started trying to debug this?

       

      Attachments

        Issue Links

          Activity

            We see the same issue and stack trace in our declarative multi-branch pipeline. It has parallel stages and does checkouts in each (dealing with x86 and x64 builds).

            We don't want to sacrifice the changelog, so our workaround is to wrap every checkout scm with a

            lock("master-changelog-${BUILD_TAG}") {
              checkout scm
            }

             
            This means using skipDefaultCheckout() option and explict checkout scm in all stages.

            kveretennicov Konstantin Veretennicov added a comment - We see the same issue and stack trace in our declarative multi-branch pipeline. It has parallel stages and does checkouts in each (dealing with x86 and x64 builds). We don't want to sacrifice the changelog, so our workaround is to wrap every  checkout scm with a lock("master-changelog-${BUILD_TAG}") {   checkout scm }   This means using skipDefaultCheckout() option and explict checkout scm in all stages.
            svanoort Sam Van Oort added a comment -

            Reducing priority to reflect that this (a) Is intermittent (b) only applies to one specific SCM (c) Has a known workaround (d) Doesn't completely break the system, only a fraction of the jobs.

            svanoort Sam Van Oort added a comment - Reducing priority to reflect that this (a) Is intermittent (b) only applies to one specific SCM (c) Has a known workaround (d) Doesn't completely break the system, only a fraction of the jobs.

            Some comments to your priority justification, svanoort:

            a) Intermittent is actually worse than failing reliably.

            b) A comment above reports this issue for Subversion too. Anyway, even if it's somehow Mercurial-specific, it's of little consolation when that's your VCS - it's not like we can snap fingers and switch to another one. It might be easier to switch the CI instead.

            d) That fraction can be 100% of the jobs that really matter.

             

            But c) is correct. We still use the workaround. It clutters our pipeline code to some extent, but we learned to look away.

            kveretennicov Konstantin Veretennicov added a comment - Some comments to your priority justification, svanoort : a) Intermittent is actually worse than failing reliably. b) A comment above reports this issue for Subversion too. Anyway, even if it's somehow Mercurial-specific, it's of little consolation when that's your VCS - it's not like we can snap fingers and switch to another one. It might be easier to switch the CI instead. d) That fraction can be 100% of the jobs that really matter.   But c) is correct. We still use the workaround. It clutters our pipeline code to some extent, but we learned to look away.
            svanoort Sam Van Oort added a comment -

            kveretennicov I understand your frustration at seeing the priority downgraded, but it's only been downgraded a step – please note that Major priority still denotes an important issue, to quote the Wiki causing a "Major loss of function" where Critical is reserved for issues that cause "Crashes, loss of data, severe memory leak." It will still be fixed, but please bear in mind that we are responsible for a huge amount of functionality and it's important we don't miss issues that cause catastrophic failures

            That said, perhaps rsandell could take a look and might have some insight into this – it sounds superficially like there's some sort of synchronization/race condition at play here with changelog read/write? It seems to me that while the likely bug is in the SCM implementation itself, there may be something we can do in Pipeline to protect against this...?

            svanoort Sam Van Oort added a comment - kveretennicov I understand your frustration at seeing the priority downgraded, but it's only been downgraded a step – please note that Major priority still denotes an important issue, to quote the Wiki causing a "Major loss of function" where Critical is reserved for issues that cause "Crashes, loss of data, severe memory leak." It will still be fixed, but please bear in mind that we are responsible for a huge amount of functionality and it's important we don't miss issues that cause catastrophic failures That said, perhaps rsandell could take a look and might have some insight into this – it sounds superficially like there's some sort of synchronization/race condition at play here with changelog read/write? It seems to me that while the likely bug is in the SCM implementation itself, there may be something we can do in Pipeline to protect against this...?

            svanoort, to be clear, for me it's completely fair to downgrade from "Blocker" when a workaround is available. I appreciate the sheer number of all the other issues you have to deal with in a project like Jenkins. I only wanted to correct the assessment of the impact before it's used by someone else to decrease the priority even further.

            kveretennicov Konstantin Veretennicov added a comment - svanoort , to be clear, for me it's completely fair to downgrade from "Blocker" when a workaround is available. I appreciate the sheer number of all the other issues you have to deal with in a project like Jenkins. I only wanted to correct the assessment of the impact before it's used by someone else to decrease the priority even further.

            People

              Unassigned Unassigned
              vicyap Victor Yap
              Votes:
              13 Vote for this issue
              Watchers:
              14 Start watching this issue

              Dates

                Created:
                Updated: