Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-53901

Using readFile does not handle UTF-8 with BOM files

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Blocker Blocker
    • None
    • Jenkins 2.121.2 and Jenkins 2.81 Pipeline Groovy Plugin 2.54

      I'm extracting xml file (nuspec) from some nuget packages and trying to parse it. In most cases it works fine, but in some the xml was written using UTF-8 with BOM encoding, and then parser gets upset and reports:

      org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 1; Content is not allowed in prolog.
      

      The way I'm parsing xml is:

      @NonCPS
      def parsePackage(packageName, packageVersion) {
          def packageFullName = "${packageName}.${packageVersion}"
        bat """curl -L https://www.nuget.org/api/v2/package/${packageName}/${packageVersion} -o ${packageFullName}.nupkg"""
        bat """unzip ${packageFullName}.nupkg -d ${packageFullName}"""
      
        def nuspecPath = """${packageFullName}\\${packageName}.nuspec"""
        def nuspecContent = readFile file:nuspecPath
        def nuspecXML = new XmlSlurper( false, false ).parseText(nuspecContent)
        println nuspecXML.metadata.version
        
        def newXml = XmlUtil.serialize(nuspecXML)
        return newXml
      }
      

      It looks like readFile is not supporting UTF-8 with BOM as it is passing leading BOM characters into returned string.

       

      I tried to replicate it directly in groovy doing 

      def xmldata = new File("Newtonsoft.Json.nuspec").text
      def pkg = new XmlSlurper().parseText(xmldata) 
      println pkg.metadata.version.text()
      

      But here the leading BOM characters are not passed into xmldata variable

       

      Attached example nuspec with BOM in it.

       

       

            Unassigned Unassigned
            quas Jakub Pawlinski
            Votes:
            1 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated: