• Icon: Bug Bug
    • Resolution: Fixed
    • Icon: Major Major
    • core
    • debian wheezy 7.2 over openvz
      jenkins 1.540 installed from packages

      The encoding of file names in zip archive seems to be broken

      Steps to reproduce:
      1. create a custom job
      1.1. add a script job to create some file name with an accent in it (eacute), e.g. echo "foobar" > accentué.txt
      1.2. add post-build step to archive all **/*
      2. build
      3. go in build details, last success artifacts, the file name is listed correctly.
      4. click the (all files in zip) link. the file listing of the archive has wrong encoding.

      I tried extracting on win7 with explorer, 7zip, on linux with file-roller or unzip. In all cases the problem seems to rely in the encoding used for the zip file names

          [JENKINS-20663] file name encoding broken in zip archives

          Lauri Taaleš added a comment -

          I have the same problem with filenames that contain 'õ', 'ä', 'ö' or 'ü'. Prior to version 1.532 it could be fixed by adding -Dfile.encoding=Cp775 to <arguments> element in jenkins.xml. I have traced this particular "regression" back to this commit, which replaced usage of hudson.util.io.ZipArchiver with java.util.zip.ZipOutputStream in hudson.model.DirectoryBrowserSupport. As a temporary work-around I have compiled version 1.532.1 from source on Java 7, which adds an additional Charset parameter to ZipOutputStream constructor and passed in the value of System.getProperty("file.encoding"), but I don't feel like doing this for every update.

          Lauri Taaleš added a comment - I have the same problem with filenames that contain 'õ', 'ä', 'ö' or 'ü'. Prior to version 1.532 it could be fixed by adding -Dfile.encoding=Cp775 to <arguments> element in jenkins.xml. I have traced this particular "regression" back to this commit , which replaced usage of hudson.util.io.ZipArchiver with java.util.zip.ZipOutputStream in hudson.model.DirectoryBrowserSupport. As a temporary work-around I have compiled version 1.532.1 from source on Java 7, which adds an additional Charset parameter to ZipOutputStream constructor and passed in the value of System.getProperty("file.encoding"), but I don't feel like doing this for every update.

          Jesse Glick added a comment -

          passed in the value of System.getProperty("file.encoding")

          I.e., Charset.defaultCharset().

          I am not convinced this prior behavior was actually correct, though there is no clear answer. Cp437 is apparently the traditional encoding for ZIP files. JAR files specify UTF-8 unless told otherwise, which is what the current code uses. The file encoding that the Jenkins master happens to use may or may not match the file encoding used by the slave that ran the build producing the artifact, or the computer which has downloaded the artifact ZIP and wishes to extract it. Possibly the encoding should be taken from the Accept-Charset header in the request, if any.

          Really the only safe policy is to use UTF-8 on all computers you touch; this is the default file encoding on all modern Linux distributions that I know of, but other OSs use different defaults.

          Jesse Glick added a comment - passed in the value of System.getProperty("file.encoding") I.e., Charset.defaultCharset() . I am not convinced this prior behavior was actually correct, though there is no clear answer. Cp437 is apparently the traditional encoding for ZIP files. JAR files specify UTF-8 unless told otherwise, which is what the current code uses. The file encoding that the Jenkins master happens to use may or may not match the file encoding used by the slave that ran the build producing the artifact, or the computer which has downloaded the artifact ZIP and wishes to extract it. Possibly the encoding should be taken from the Accept-Charset header in the request, if any. Really the only safe policy is to use UTF-8 on all computers you touch; this is the default file encoding on all modern Linux distributions that I know of, but other OSs use different defaults.

          Jesse Glick added a comment -

          In particular, zipinfo (3.0.0) does not show the contents correctly unless you pass -O UTF-8 (unzip also handles this option), but jar tvf shows it correctly by default. And there seem to be several “standards” about specifying the charset in the ZIP itself.

          Jesse Glick added a comment - In particular, zipinfo (3.0.0) does not show the contents correctly unless you pass -O UTF-8 ( unzip also handles this option), but jar tvf shows it correctly by default. And there seem to be several “standards” about specifying the charset in the ZIP itself.

          Daniel Beck added a comment -

          FWIW modern OS X also uses UTF-8, switched from MacRoman in 10.7 or so.

          Daniel Beck added a comment - FWIW modern OS X also uses UTF-8, switched from MacRoman in 10.7 or so.

          Lauri Taaleš added a comment -

          I'm not really a Java dev, but I took some time to test the output of both java.util.zip.ZipOutputStream and org.apache.commons.compress.archivers.zip.ZipArchiveOutputStream. I used both to zip a file named 'õäöü.txt' and then tried to unzip it with the default application, i.e. whatever runs from context menu, on Windows 7 and Manjaro 0.8.10

          Archive created with ZipOutputStream had a mangled file name when unzipped on both Windows and Linux. The same was true for ZipArchiveOutputStream when using only the default options. But by setting additional flags as described at http://info.michael-simons.eu/2010/01/05/create-zip-archives-containing-unicode-filenames-with-java/ the archive produced became readable on Linux. Even though it does not help me with my Windows problem, I consider this preferrable to current behavior.

          Usage of ZipArchiveOutputStream would also allow to add a (maybe job specific) configuration option for setting zip archive encoding for those poor souls stuck on Windows.

          Lauri Taaleš added a comment - I'm not really a Java dev, but I took some time to test the output of both java.util.zip.ZipOutputStream and org.apache.commons.compress.archivers.zip.ZipArchiveOutputStream. I used both to zip a file named 'õäöü.txt' and then tried to unzip it with the default application, i.e. whatever runs from context menu, on Windows 7 and Manjaro 0.8.10 Archive created with ZipOutputStream had a mangled file name when unzipped on both Windows and Linux. The same was true for ZipArchiveOutputStream when using only the default options. But by setting additional flags as described at http://info.michael-simons.eu/2010/01/05/create-zip-archives-containing-unicode-filenames-with-java/ the archive produced became readable on Linux. Even though it does not help me with my Windows problem, I consider this preferrable to current behavior. Usage of ZipArchiveOutputStream would also allow to add a (maybe job specific) configuration option for setting zip archive encoding for those poor souls stuck on Windows.

          Code changed in jenkins
          User: Jesse Glick
          Path:
          changelog.html
          core/src/main/java/hudson/model/DirectoryBrowserSupport.java
          http://jenkins-ci.org/commit/jenkins/84c76253862a2f36f813a7aa45b77d99c1616be4
          Log:
          [FIXED JENKINS-20663] For now, go back to using ZipOutputStream from Ant that supports setting the filename encoding (present in java.util.zip only in Java 7+).

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Jesse Glick Path: changelog.html core/src/main/java/hudson/model/DirectoryBrowserSupport.java http://jenkins-ci.org/commit/jenkins/84c76253862a2f36f813a7aa45b77d99c1616be4 Log: [FIXED JENKINS-20663] For now, go back to using ZipOutputStream from Ant that supports setting the filename encoding (present in java.util.zip only in Java 7+).

          dogfood added a comment -

          Integrated in jenkins_main_trunk #3531
          [FIXED JENKINS-20663] For now, go back to using ZipOutputStream from Ant that supports setting the filename encoding (present in java.util.zip only in Java 7+). (Revision 84c76253862a2f36f813a7aa45b77d99c1616be4)

          Result = SUCCESS
          Jesse Glick : 84c76253862a2f36f813a7aa45b77d99c1616be4
          Files :

          • core/src/main/java/hudson/model/DirectoryBrowserSupport.java
          • changelog.html

          dogfood added a comment - Integrated in jenkins_main_trunk #3531 [FIXED JENKINS-20663] For now, go back to using ZipOutputStream from Ant that supports setting the filename encoding (present in java.util.zip only in Java 7+). (Revision 84c76253862a2f36f813a7aa45b77d99c1616be4) Result = SUCCESS Jesse Glick : 84c76253862a2f36f813a7aa45b77d99c1616be4 Files : core/src/main/java/hudson/model/DirectoryBrowserSupport.java changelog.html

          Daniel Beck added a comment -

          Jesse's fix was released today in 1.574. Feedback would be great.

          Daniel Beck added a comment - Jesse's fix was released today in 1.574. Feedback would be great.

          Code changed in jenkins
          User: Jesse Glick
          Path:
          core/src/main/java/hudson/model/DirectoryBrowserSupport.java
          http://jenkins-ci.org/commit/jenkins/e6a46d880dc7eaf06a6df368b0a42156447c0a6d
          Log:
          [FIXED JENKINS-20663] For now, go back to using ZipOutputStream from Ant that supports setting the filename encoding (present in java.util.zip only in Java 7+).
          (cherry picked from commit 84c76253862a2f36f813a7aa45b77d99c1616be4)

          Conflicts:
          changelog.html

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Jesse Glick Path: core/src/main/java/hudson/model/DirectoryBrowserSupport.java http://jenkins-ci.org/commit/jenkins/e6a46d880dc7eaf06a6df368b0a42156447c0a6d Log: [FIXED JENKINS-20663] For now, go back to using ZipOutputStream from Ant that supports setting the filename encoding (present in java.util.zip only in Java 7+). (cherry picked from commit 84c76253862a2f36f813a7aa45b77d99c1616be4) Conflicts: changelog.html

          Lauri Taaleš added a comment -

          This patch has fixed the problem I described in my original comment. Since the original reporter has not responded, I think this issue can be closed.

          Lauri Taaleš added a comment - This patch has fixed the problem I described in my original comment. Since the original reporter has not responded, I think this issue can be closed.

          Jesse Glick added a comment -

          This is already closed.

          Jesse Glick added a comment - This is already closed.

          Daniel Beck added a comment -

          lauri_taalesh: We're not consistently using the Closed issue status; most issues are done when Resolved. I tried to document this here: https://wiki.jenkins-ci.org/display/JENKINS/Issue+Tracking

          Daniel Beck added a comment - lauri_taalesh : We're not consistently using the Closed issue status; most issues are done when Resolved. I tried to document this here: https://wiki.jenkins-ci.org/display/JENKINS/Issue+Tracking

            jglick Jesse Glick
            simpoir Simon Poirier
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: