-
Bug
-
Resolution: Unresolved
-
Blocker
-
None
-
Jenkins 2.121.2 and Jenkins 2.81 Pipeline Groovy Plugin 2.54
I'm extracting xml file (nuspec) from some nuget packages and trying to parse it. In most cases it works fine, but in some the xml was written using UTF-8 with BOM encoding, and then parser gets upset and reports:
org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 1; Content is not allowed in prolog.
The way I'm parsing xml is:
@NonCPS def parsePackage(packageName, packageVersion) { def packageFullName = "${packageName}.${packageVersion}" bat """curl -L https://www.nuget.org/api/v2/package/${packageName}/${packageVersion} -o ${packageFullName}.nupkg""" bat """unzip ${packageFullName}.nupkg -d ${packageFullName}""" def nuspecPath = """${packageFullName}\\${packageName}.nuspec""" def nuspecContent = readFile file:nuspecPath def nuspecXML = new XmlSlurper( false, false ).parseText(nuspecContent) println nuspecXML.metadata.version def newXml = XmlUtil.serialize(nuspecXML) return newXml }
It looks like readFile is not supporting UTF-8 with BOM as it is passing leading BOM characters into returned string.
I tried to replicate it directly in groovy doing
def xmldata = new File("Newtonsoft.Json.nuspec").text def pkg = new XmlSlurper().parseText(xmldata) println pkg.metadata.version.text()
But here the leading BOM characters are not passed into xmldata variable
Attached example nuspec with BOM in it.