• Icon: Bug Bug
    • Resolution: Fixed
    • Icon: Major Major
    • core
    • debian wheezy 7.2 over openvz
      jenkins 1.540 installed from packages

      The encoding of file names in zip archive seems to be broken

      Steps to reproduce:
      1. create a custom job
      1.1. add a script job to create some file name with an accent in it (eacute), e.g. echo "foobar" > accentué.txt
      1.2. add post-build step to archive all **/*
      2. build
      3. go in build details, last success artifacts, the file name is listed correctly.
      4. click the (all files in zip) link. the file listing of the archive has wrong encoding.

      I tried extracting on win7 with explorer, 7zip, on linux with file-roller or unzip. In all cases the problem seems to rely in the encoding used for the zip file names

          [JENKINS-20663] file name encoding broken in zip archives

          Simon Poirier created issue -
          Simon Poirier made changes -
          Description Original: The encoding of file names in zip archive seems to be broken

          Steps to reproduce:
          1. create a custom job
          1.1. add a script job to create some file name with an accent in it (eacute), e.g. echo "foobar" > accentué.txt
          1.2. add post-build step to archive all **/*
          2. build
          3. go in build details, last success artifacts, the file name is listed correctly.
          4. click the (all files in zip) link. the file listing of the archive has wrong encoding.


          I tried extracting on win7 with explorer, 7zip, on linux with file-roller or unzip. In all cases the problem seems to rely in the encoding used for the zip file names
          New: The encoding of file names in zip archive seems to be broken

          Steps to reproduce:
          1. create a custom job
          1.1. add a script job to create some file name with an accent in it (eacute), e.g. echo "foobar" > accentué.txt
          1.2. add post-build step to archive all \*\*/\*
          2. build
          3. go in build details, last success artifacts, the file name is listed correctly.
          4. click the (all files in zip) link. the file listing of the archive has wrong encoding.


          I tried extracting on win7 with explorer, 7zip, on linux with file-roller or unzip. In all cases the problem seems to rely in the encoding used for the zip file names

          Lauri Taaleš added a comment -

          I have the same problem with filenames that contain 'õ', 'ä', 'ö' or 'ü'. Prior to version 1.532 it could be fixed by adding -Dfile.encoding=Cp775 to <arguments> element in jenkins.xml. I have traced this particular "regression" back to this commit, which replaced usage of hudson.util.io.ZipArchiver with java.util.zip.ZipOutputStream in hudson.model.DirectoryBrowserSupport. As a temporary work-around I have compiled version 1.532.1 from source on Java 7, which adds an additional Charset parameter to ZipOutputStream constructor and passed in the value of System.getProperty("file.encoding"), but I don't feel like doing this for every update.

          Lauri Taaleš added a comment - I have the same problem with filenames that contain 'õ', 'ä', 'ö' or 'ü'. Prior to version 1.532 it could be fixed by adding -Dfile.encoding=Cp775 to <arguments> element in jenkins.xml. I have traced this particular "regression" back to this commit , which replaced usage of hudson.util.io.ZipArchiver with java.util.zip.ZipOutputStream in hudson.model.DirectoryBrowserSupport. As a temporary work-around I have compiled version 1.532.1 from source on Java 7, which adds an additional Charset parameter to ZipOutputStream constructor and passed in the value of System.getProperty("file.encoding"), but I don't feel like doing this for every update.
          Lauri Taaleš made changes -
          Assignee New: Jesse Glick [ jglick ]
          Daniel Beck made changes -
          Labels New: lts-candidate regression
          Jesse Glick made changes -
          Labels Original: lts-candidate regression New: encoding lts-candidate regression

          Jesse Glick added a comment -

          passed in the value of System.getProperty("file.encoding")

          I.e., Charset.defaultCharset().

          I am not convinced this prior behavior was actually correct, though there is no clear answer. Cp437 is apparently the traditional encoding for ZIP files. JAR files specify UTF-8 unless told otherwise, which is what the current code uses. The file encoding that the Jenkins master happens to use may or may not match the file encoding used by the slave that ran the build producing the artifact, or the computer which has downloaded the artifact ZIP and wishes to extract it. Possibly the encoding should be taken from the Accept-Charset header in the request, if any.

          Really the only safe policy is to use UTF-8 on all computers you touch; this is the default file encoding on all modern Linux distributions that I know of, but other OSs use different defaults.

          Jesse Glick added a comment - passed in the value of System.getProperty("file.encoding") I.e., Charset.defaultCharset() . I am not convinced this prior behavior was actually correct, though there is no clear answer. Cp437 is apparently the traditional encoding for ZIP files. JAR files specify UTF-8 unless told otherwise, which is what the current code uses. The file encoding that the Jenkins master happens to use may or may not match the file encoding used by the slave that ran the build producing the artifact, or the computer which has downloaded the artifact ZIP and wishes to extract it. Possibly the encoding should be taken from the Accept-Charset header in the request, if any. Really the only safe policy is to use UTF-8 on all computers you touch; this is the default file encoding on all modern Linux distributions that I know of, but other OSs use different defaults.
          Jesse Glick made changes -
          Link New: This issue is blocking JENKINS-17236 [ JENKINS-17236 ]

          Jesse Glick added a comment -

          In particular, zipinfo (3.0.0) does not show the contents correctly unless you pass -O UTF-8 (unzip also handles this option), but jar tvf shows it correctly by default. And there seem to be several “standards” about specifying the charset in the ZIP itself.

          Jesse Glick added a comment - In particular, zipinfo (3.0.0) does not show the contents correctly unless you pass -O UTF-8 ( unzip also handles this option), but jar tvf shows it correctly by default. And there seem to be several “standards” about specifying the charset in the ZIP itself.

          Daniel Beck added a comment -

          FWIW modern OS X also uses UTF-8, switched from MacRoman in 10.7 or so.

          Daniel Beck added a comment - FWIW modern OS X also uses UTF-8, switched from MacRoman in 10.7 or so.

            jglick Jesse Glick
            simpoir Simon Poirier
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: