-
Bug
-
Resolution: Unresolved
-
Blocker
-
None
-
Jenkins 2.121.2 and Jenkins 2.81 Pipeline Groovy Plugin 2.54
I'm extracting xml file (nuspec) from some nuget packages and trying to parse it. In most cases it works fine, but in some the xml was written using UTF-8 with BOM encoding, and then parser gets upset and reports:
org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 1; Content is not allowed in prolog.
The way I'm parsing xml is:
@NonCPS def parsePackage(packageName, packageVersion) { def packageFullName = "${packageName}.${packageVersion}" bat """curl -L https://www.nuget.org/api/v2/package/${packageName}/${packageVersion} -o ${packageFullName}.nupkg""" bat """unzip ${packageFullName}.nupkg -d ${packageFullName}""" def nuspecPath = """${packageFullName}\\${packageName}.nuspec""" def nuspecContent = readFile file:nuspecPath def nuspecXML = new XmlSlurper( false, false ).parseText(nuspecContent) println nuspecXML.metadata.version def newXml = XmlUtil.serialize(nuspecXML) return newXml }
It looks like readFile is not supporting UTF-8 with BOM as it is passing leading BOM characters into returned string.
I tried to replicate it directly in groovy doing
def xmldata = new File("Newtonsoft.Json.nuspec").text def pkg = new XmlSlurper().parseText(xmldata) println pkg.metadata.version.text()
But here the leading BOM characters are not passed into xmldata variable
Attached example nuspec with BOM in it.
[JENKINS-53901] Using readFile does not handle UTF-8 with BOM files
Description |
Original:
The readFile step, when used inside a environment closure, whether top-level or in a stage, causes the following error: an exception which occurred: in field com.cloudbees.groovy.cps.impl.BlockScopeEnv.locals in object com.cloudbees.groovy.cps.impl.LoopBlockScopeEnv@29044815 in field com.cloudbees.groovy.cps.impl.ProxyEnv.parent in object com.cloudbees.groovy.cps.impl.BlockScopeEnv@25c9f135 in field com.cloudbees.groovy.cps.impl.CallEnv.caller in object com.cloudbees.groovy.cps.impl.FunctionCallEnv@307ab985 in field com.cloudbees.groovy.cps.impl.ProxyEnv.parent in object com.cloudbees.groovy.cps.impl.BlockScopeEnv@5a92c230 in field com.cloudbees.groovy.cps.impl.ProxyEnv.parent in object com.cloudbees.groovy.cps.impl.BlockScopeEnv@37a0a42f in field com.cloudbees.groovy.cps.impl.CallEnv.caller in object com.cloudbees.groovy.cps.impl.ClosureCallEnv@184a6ff5 in field com.cloudbees.groovy.cps.impl.ProxyEnv.parent in object com.cloudbees.groovy.cps.impl.BlockScopeEnv@676c6c8d in field com.cloudbees.groovy.cps.impl.ProxyEnv.parent in object com.cloudbees.groovy.cps.impl.BlockScopeEnv@19f01356 in field com.cloudbees.groovy.cps.impl.CallEnv.caller in object com.cloudbees.groovy.cps.impl.ClosureCallEnv@74d1467b in field com.cloudbees.groovy.cps.impl.ProxyEnv.parent in object com.cloudbees.groovy.cps.impl.BlockScopeEnv@4d098490 in field com.cloudbees.groovy.cps.impl.ProxyEnv.parent in object com.cloudbees.groovy.cps.impl.BlockScopeEnv@28223d82 in field com.cloudbees.groovy.cps.impl.CallEnv.caller in object com.cloudbees.groovy.cps.impl.FunctionCallEnv@6e27611b in field com.cloudbees.groovy.cps.Continuable.e in object org.jenkinsci.plugins.workflow.cps.SandboxContinuable@78ff9c41 in field org.jenkinsci.plugins.workflow.cps.CpsThread.program in object org.jenkinsci.plugins.workflow.cps.CpsThread@7841b6fe in field org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.threads in object org.jenkinsci.plugins.workflow.cps.CpsThreadGroup@4d2d90ce in object org.jenkinsci.plugins.workflow.cps.CpsThreadGroup@4d2d90ce Caused: java.io.NotSerializableException: java.util.TreeMap$Entry at org.jboss.marshalling.river.RiverMarshaller.doWriteObject(RiverMarshaller.java:860) at org.jboss.marshalling.river.BlockMarshaller.doWriteObject(BlockMarshaller.java:65) at org.jboss.marshalling.river.BlockMarshaller.writeObject(BlockMarshaller.java:56) at org.jboss.marshalling.MarshallerObjectOutputStream.writeObjectOverride(MarshallerObjectOutputStream.java:50) at org.jboss.marshalling.river.RiverObjectOutputStream.writeObjectOverride(RiverObjectOutputStream.java:179) at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:344) at java.util.HashMap.internalWriteEntries(HashMap.java:1785) at java.util.HashMap.writeObject(HashMap.java:1362) at sun.reflect.GeneratedMethodAccessor134.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.jboss.marshalling.reflect.SerializableClass.callWriteObject(SerializableClass.java:273) at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:976) at org.jboss.marshalling.river.RiverMarshaller.doWriteObject(RiverMarshaller.java:854) at org.jboss.marshalling.river.RiverMarshaller.doWriteFields(RiverMarshaller.java:1032) at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:988) at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:967) at org.jboss.marshalling.river.RiverMarshaller.doWriteObject(RiverMarshaller.java:854) at org.jboss.marshalling.river.RiverMarshaller.doWriteFields(RiverMarshaller.java:1032) at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:988) at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:967) at org.jboss.marshalling.river.RiverMarshaller.doWriteObject(RiverMarshaller.java:854) at org.jboss.marshalling.river.RiverMarshaller.doWriteFields(RiverMarshaller.java:1032) at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:988) at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:967) at org.jboss.marshalling.river.RiverMarshaller.doWriteObject(RiverMarshaller.java:854) at org.jboss.marshalling.river.RiverMarshaller.doWriteFields(RiverMarshaller.java:1032) at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:988) at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:967) at org.jboss.marshalling.river.RiverMarshaller.doWriteObject(RiverMarshaller.java:854) at org.jboss.marshalling.river.RiverMarshaller.doWriteFields(RiverMarshaller.java:1032) at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:988) at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:967) at org.jboss.marshalling.river.RiverMarshaller.doWriteObject(RiverMarshaller.java:854) at org.jboss.marshalling.river.RiverMarshaller.doWriteFields(RiverMarshaller.java:1032) at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:988) at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:967) at org.jboss.marshalling.river.RiverMarshaller.doWriteObject(RiverMarshaller.java:854) at org.jboss.marshalling.river.RiverMarshaller.doWriteFields(RiverMarshaller.java:1032) at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:988) at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:967) at org.jboss.marshalling.river.RiverMarshaller.doWriteObject(RiverMarshaller.java:854) at org.jboss.marshalling.river.RiverMarshaller.doWriteFields(RiverMarshaller.java:1032) at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:988) at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:967) at org.jboss.marshalling.river.RiverMarshaller.doWriteObject(RiverMarshaller.java:854) at org.jboss.marshalling.river.RiverMarshaller.doWriteFields(RiverMarshaller.java:1032) at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:988) at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:967) at org.jboss.marshalling.river.RiverMarshaller.doWriteObject(RiverMarshaller.java:854) at org.jboss.marshalling.river.RiverMarshaller.doWriteFields(RiverMarshaller.java:1032) at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:988) at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:967) at org.jboss.marshalling.river.RiverMarshaller.doWriteObject(RiverMarshaller.java:854) at org.jboss.marshalling.river.RiverMarshaller.doWriteFields(RiverMarshaller.java:1032) at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:988) at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:967) at org.jboss.marshalling.river.RiverMarshaller.doWriteObject(RiverMarshaller.java:854) at org.jboss.marshalling.river.RiverMarshaller.doWriteFields(RiverMarshaller.java:1032) at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:988) at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:967) at org.jboss.marshalling.river.RiverMarshaller.doWriteObject(RiverMarshaller.java:854) at org.jboss.marshalling.river.RiverMarshaller.doWriteFields(RiverMarshaller.java:1032) at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:988) at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:967) at org.jboss.marshalling.river.RiverMarshaller.doWriteObject(RiverMarshaller.java:854) at org.jboss.marshalling.river.RiverMarshaller.doWriteFields(RiverMarshaller.java:1032) at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:988) at org.jboss.marshalling.river.RiverMarshaller.doWriteObject(RiverMarshaller.java:854) at org.jboss.marshalling.river.BlockMarshaller.doWriteObject(BlockMarshaller.java:65) at org.jboss.marshalling.river.BlockMarshaller.writeObject(BlockMarshaller.java:56) at org.jboss.marshalling.MarshallerObjectOutputStream.writeObjectOverride(MarshallerObjectOutputStream.java:50) at org.jboss.marshalling.river.RiverObjectOutputStream.writeObjectOverride(RiverObjectOutputStream.java:179) at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:344) at java.util.TreeMap.writeObject(TreeMap.java:2438) at sun.reflect.GeneratedMethodAccessor176.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.jboss.marshalling.reflect.SerializableClass.callWriteObject(SerializableClass.java:273) at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:976) at org.jboss.marshalling.river.RiverMarshaller.doWriteObject(RiverMarshaller.java:854) at org.jboss.marshalling.river.RiverMarshaller.doWriteFields(RiverMarshaller.java:1032) at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:988) at org.jboss.marshalling.river.RiverMarshaller.doWriteObject(RiverMarshaller.java:854) at org.jboss.marshalling.AbstractObjectOutput.writeObject(AbstractObjectOutput.java:58) at org.jboss.marshalling.AbstractMarshaller.writeObject(AbstractMarshaller.java:111) at org.jenkinsci.plugins.workflow.support.pickles.serialization.RiverWriter.writeObject(RiverWriter.java:140) at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.saveProgram(CpsThreadGroup.java:458) at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.saveProgram(CpsThreadGroup.java:434) at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.saveProgramIfPossible(CpsThreadGroup.java:422) at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.run(CpsThreadGroup.java:362) at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.access$100(CpsThreadGroup.java:82) at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:242) at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:230) at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$2.call(CpsVmExecutorService.java:64) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:112) at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:748) A test repo was created to replicate this. https://github.com/sflynn-dell/pipeline-test Branches: declarative-script - readFile is successful when used inside a script closure. declarative-env - readFile fails when used inside an environment enclosure. |
New:
Using Jenkins ver. 2.121.2 I'm extracting xml file (nuspec) from some nuget packages and trying to parse it. In most cases it works fine, but in some the xml was written using UTF-8 with BOM encoding, and then parser gets upset and reports: {code:java} org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 1; Content is not allowed in prolog. {code} The way I'm parsing xml is: {code:java} @NonCPS def parsePackage(packageName, packageVersion) { def packageFullName = "${packageName}.${packageVersion}" bat """curl -L https://www.nuget.org/api/v2/package/${packageName}/${packageVersion} -o ${packageFullName}.nupkg""" bat """unzip ${packageFullName}.nupkg -d ${packageFullName}""" def nuspecPath = """${packageFullName}\\${packageName}.nuspec""" def nuspecContent = readFile file:nuspecPath def nuspecXML = new XmlSlurper( false, false ).parseText(nuspecContent) println nuspecXML.metadata.version def newXml = XmlUtil.serialize(nuspecXML) return newXml } {code} It looks like readFile is not supporting UTF-8 with BOM as it is passing leading BOM characters into returned string. I tried to replicate it directly in groovy doing {code:java} def xmldata = new File("Newtonsoft.Json.nuspec").text def pkg = new XmlSlurper().parseText(xmldata) println pkg.metadata.version.text() {code} But here the leading BOM characters are not passed into xmldata variable Attached example nuspec with BOM in it. |
Attachment | New: Newtonsoft.Json.nuspec [ 44656 ] |
Environment | Original: Jenkins 2.73.1 and Jenkins 2.81 Pipeline Groovy Plugin 2.40 | New: Jenkins 2.121.2 and Jenkins 2.81 Pipeline Groovy Plugin 2.54 |
Description |
Original:
Using Jenkins ver. 2.121.2 I'm extracting xml file (nuspec) from some nuget packages and trying to parse it. In most cases it works fine, but in some the xml was written using UTF-8 with BOM encoding, and then parser gets upset and reports: {code:java} org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 1; Content is not allowed in prolog. {code} The way I'm parsing xml is: {code:java} @NonCPS def parsePackage(packageName, packageVersion) { def packageFullName = "${packageName}.${packageVersion}" bat """curl -L https://www.nuget.org/api/v2/package/${packageName}/${packageVersion} -o ${packageFullName}.nupkg""" bat """unzip ${packageFullName}.nupkg -d ${packageFullName}""" def nuspecPath = """${packageFullName}\\${packageName}.nuspec""" def nuspecContent = readFile file:nuspecPath def nuspecXML = new XmlSlurper( false, false ).parseText(nuspecContent) println nuspecXML.metadata.version def newXml = XmlUtil.serialize(nuspecXML) return newXml } {code} It looks like readFile is not supporting UTF-8 with BOM as it is passing leading BOM characters into returned string. I tried to replicate it directly in groovy doing {code:java} def xmldata = new File("Newtonsoft.Json.nuspec").text def pkg = new XmlSlurper().parseText(xmldata) println pkg.metadata.version.text() {code} But here the leading BOM characters are not passed into xmldata variable Attached example nuspec with BOM in it. |
New:
I'm extracting xml file (nuspec) from some nuget packages and trying to parse it. In most cases it works fine, but in some the xml was written using UTF-8 with BOM encoding, and then parser gets upset and reports: {code:java} org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 1; Content is not allowed in prolog. {code} The way I'm parsing xml is: {code:java} @NonCPS def parsePackage(packageName, packageVersion) { def packageFullName = "${packageName}.${packageVersion}" bat """curl -L https://www.nuget.org/api/v2/package/${packageName}/${packageVersion} -o ${packageFullName}.nupkg""" bat """unzip ${packageFullName}.nupkg -d ${packageFullName}""" def nuspecPath = """${packageFullName}\\${packageName}.nuspec""" def nuspecContent = readFile file:nuspecPath def nuspecXML = new XmlSlurper( false, false ).parseText(nuspecContent) println nuspecXML.metadata.version def newXml = XmlUtil.serialize(nuspecXML) return newXml } {code} It looks like readFile is not supporting UTF-8 with BOM as it is passing leading BOM characters into returned string. I tried to replicate it directly in groovy doing {code:java} def xmldata = new File("Newtonsoft.Json.nuspec").text def pkg = new XmlSlurper().parseText(xmldata) println pkg.metadata.version.text() {code} But here the leading BOM characters are not passed into xmldata variable Attached example nuspec with BOM in it. |
Component/s | New: workflow-basic-steps-plugin [ 21712 ] | |
Component/s | Original: pipeline-model-definition-plugin [ 21706 ] | |
Assignee | Original: Andrew Bayer [ abayer ] |
Resolution | New: Not A Defect [ 7 ] | |
Status | Original: Open [ 1 ] | New: Closed [ 6 ] |
quas This is a known with the Unicode spec and the Java platform implementation of it, not Pipeline. In UTF-8 the BOM is neither needed nor suggested - since the BOM is essentially meaningless in UTF-8, Java transparently passes the BOM through.
First I'd make sure to add the "encloding: 'UTF-8'" argument to your readFile step to ensure it reads as UTF-8. Then we do postprocessing to correct for nonstandard input.
Some suggested solutions are available on StackOverflow.
Personally, I'd do something like this to sanitize your input:
(might need to be \u FEFF, try it both ways).
There's also code snippets out there that do a more efficient approach, which only considers the leading bytes of the String.