Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-48493

Compress Artifacts Plugin corrupts non-ASCII file names on Windows

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • compress-artifacts-plugin 1.10
      Jenkins 2.73.3
      Java(TM) SE Runtime Environment 1.8.0_144-b01
      Windows Server 2012 R2

      I have a project that produces "cite/käyttötapakuvaus.html" as an artifact. With compress-artifacts-plugin 1.10 installed, the resulting archive.zip has the following file header in its central directory:

      • central file header signature: 50 4B 01 02
      • version made by: 3F 00, i.e. spec v6.3
      • version needed to extract: 14 00, i.e. spec v2.0
      • general purpose bit flag: 08 00, i.e. the file name is not claimed to be UTF-8
      • compression method: 08 00
      • last mod file time: 0D A7
      • last mod file date: 88 4B
      • crc-32: D2 3A 07 C1
      • compressed size: 05 07 00 00
      • uncompressed size: DF 17 00 00
      • file name length: 1A 00
      • extra field length: 00 00, i.e. no alternative file name is stored in the extra field
      • file comment length: 00 00
      • disk number start: 00 00
      • internal file attributes: 00 00
      • external file attributes: 00 00 00 00
      • relative offset of local header: 2E 90 03 00
      • file name: 63 69 74 65 2F 6B E4 79 74 74 F6 74 61 70 61 6B 75 76 61 75 73 2E 68 74 6D 6C, i.e. "ä" was encoded as 0xE4, and "ö" was encoded as 0xF6. This matches Latin-1 and Windows-1252, but not CP437 nor UTF-8.

      However, when I view the artifacts listing in Jenkins, it includes a link <a href="k%EF%BF%BDytt%EF%BF%BDtapakuvaus.html">k�ytt�tapakuvaus.html</a>, i.e. the non-ASCII characters have been replaced with U+FFFD REPLACEMENT CHARACTER. This link actually works, but it looks very ugly. Other HTML artifacts contain links like <a href="k%C3%A4ytt%C3%B6tapakuvaus.html">käyttötapakuvaus</a>, and those links do not work.

      If I understand correctly, the file names in archive.zip should not be Latin-1 at all. APPNOTE.TXT - .ZIP File Format Specification v6.3.4 says they should be CP437 by default, or UTF-8 if bit 11 of the general purpose bit flag is set. However, TrueZipArchiver.java does zip = new ZipOutputStream(out, Charset.defaultCharset()), and I suppose the default charset is Windows-1252 here.

      I'm not sure which charset ZipFile expects when ZipStorage.java constructs it as new ZipFile(archive); the javadocs used to be at java.net, which has been shut down. RawZipFile.DEFAULT_CHARSET suggests it may be expecting UTF-8.

      Because the archive.zip files are intended to be read back by the compress-artifacts-plugin itself rather than published as is, I think it would be best to hardcode UTF-8 in TrueZipArchiver.java.

            Unassigned Unassigned
            kon Kalle Niemitalo
            Votes:
            1 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated: