[JENKINS-20663] file name encoding broken in zip archives

Type: Bug
Resolution: Fixed
Priority: Major
Component/s: core
Labels:
Environment:
debian wheezy 7.2 over openvz
jenkins 1.540 installed from packages

Similar Issues:
Powered by SuggestiMate

Show

The encoding of file names in zip archive seems to be broken

Steps to reproduce:
1. create a custom job
1.1. add a script job to create some file name with an accent in it (eacute), e.g. echo "foobar" > accentué.txt
1.2. add post-build step to archive all **/*
2. build
3. go in build details, last success artifacts, the file name is listed correctly.
4. click the (all files in zip) link. the file listing of the archive has wrong encoding.

I tried extracting on win7 with explorer, 7zip, on linux with file-roller or unzip. In all cases the problem seems to rely in the encoding used for the zip file names

is blocking

JENKINS-17236 Pluggable artifact transfer & storage

Resolved

Simon Poirier created issue - 2013-11-19 22:06

Simon Poirier made changes - 2013-11-19 22:07

Description

Original: The encoding of file names in zip archive seems to be broken

Steps to reproduce:
1. create a custom job
1.1. add a script job to create some file name with an accent in it (eacute), e.g. echo "foobar" > accentué.txt
1.2. add post-build step to archive all **/*
2. build
3. go in build details, last success artifacts, the file name is listed correctly.
4. click the (all files in zip) link. the file listing of the archive has wrong encoding.

I tried extracting on win7 with explorer, 7zip, on linux with file-roller or unzip. In all cases the problem seems to rely in the encoding used for the zip file names

New: The encoding of file names in zip archive seems to be broken

Steps to reproduce:
1. create a custom job
1.1. add a script job to create some file name with an accent in it (eacute), e.g. echo "foobar" > accentué.txt
1.2. add post-build step to archive all \*\*/\*
2. build
3. go in build details, last success artifacts, the file name is listed correctly.
4. click the (all files in zip) link. the file listing of the archive has wrong encoding.

I tried extracting on win7 with explorer, 7zip, on linux with file-roller or unzip. In all cases the problem seems to rely in the encoding used for the zip file names

Lauri Taaleš added a comment - 2014-02-18 17:53

I have the same problem with filenames that contain 'õ', 'ä', 'ö' or 'ü'. Prior to version 1.532 it could be fixed by adding -Dfile.encoding=Cp775 to <arguments> element in jenkins.xml. I have traced this particular "regression" back to this commit, which replaced usage of hudson.util.io.ZipArchiver with java.util.zip.ZipOutputStream in hudson.model.DirectoryBrowserSupport. As a temporary work-around I have compiled version 1.532.1 from source on Java 7, which adds an additional Charset parameter to ZipOutputStream constructor and passed in the value of System.getProperty("file.encoding"), but I don't feel like doing this for every update.

Lauri Taaleš added a comment - 2014-02-18 17:53 I have the same problem with filenames that contain 'õ', 'ä', 'ö' or 'ü'. Prior to version 1.532 it could be fixed by adding -Dfile.encoding=Cp775 to <arguments> element in jenkins.xml. I have traced this particular "regression" back to this commit , which replaced usage of hudson.util.io.ZipArchiver with java.util.zip.ZipOutputStream in hudson.model.DirectoryBrowserSupport. As a temporary work-around I have compiled version 1.532.1 from source on Java 7, which adds an additional Charset parameter to ZipOutputStream constructor and passed in the value of System.getProperty("file.encoding"), but I don't feel like doing this for every update.

Lauri Taaleš made changes - 2014-05-09 07:24

Assignee

New: Jesse Glick [ jglick ]

Daniel Beck made changes - 2014-06-28 15:29

Labels

New: lts-candidate regression

Jesse Glick made changes - 2014-07-01 19:31

Labels

Original: lts-candidate regression

New: encoding lts-candidate regression

Jesse Glick added a comment - 2014-07-01 19:41

passed in the value of System.getProperty("file.encoding")

I.e., Charset.defaultCharset().

I am not convinced this prior behavior was actually correct, though there is no clear answer. Cp437 is apparently the traditional encoding for ZIP files. JAR files specify UTF-8 unless told otherwise, which is what the current code uses. The file encoding that the Jenkins master happens to use may or may not match the file encoding used by the slave that ran the build producing the artifact, or the computer which has downloaded the artifact ZIP and wishes to extract it. Possibly the encoding should be taken from the Accept-Charset header in the request, if any.

Really the only safe policy is to use UTF-8 on all computers you touch; this is the default file encoding on all modern Linux distributions that I know of, but other OSs use different defaults.

Jesse Glick added a comment - 2014-07-01 19:41 passed in the value of System.getProperty("file.encoding") I.e., Charset.defaultCharset() . I am not convinced this prior behavior was actually correct, though there is no clear answer. Cp437 is apparently the traditional encoding for ZIP files. JAR files specify UTF-8 unless told otherwise, which is what the current code uses. The file encoding that the Jenkins master happens to use may or may not match the file encoding used by the slave that ran the build producing the artifact, or the computer which has downloaded the artifact ZIP and wishes to extract it. Possibly the encoding should be taken from the Accept-Charset header in the request, if any. Really the only safe policy is to use UTF-8 on all computers you touch; this is the default file encoding on all modern Linux distributions that I know of, but other OSs use different defaults.

Jesse Glick made changes - 2014-07-01 19:41

Link

New: This issue is blocking ~~JENKINS-17236~~ [ ~~JENKINS-17236~~ ]

Jesse Glick added a comment - 2014-07-01 19:49

In particular, zipinfo (3.0.0) does not show the contents correctly unless you pass -O UTF-8 (unzip also handles this option), but jar tvf shows it correctly by default. And there seem to be several “standards” about specifying the charset in the ZIP itself.

Jesse Glick added a comment - 2014-07-01 19:49 In particular, zipinfo (3.0.0) does not show the contents correctly unless you pass -O UTF-8 ( unzip also handles this option), but jar tvf shows it correctly by default. And there seem to be several “standards” about specifying the charset in the ZIP itself.

Daniel Beck added a comment - 2014-07-07 13:43

FWIW modern OS X also uses UTF-8, switched from MacRoman in 10.7 or so.

Daniel Beck added a comment - 2014-07-07 13:43 FWIW modern OS X also uses UTF-8, switched from MacRoman in 10.7 or so.

Assignee:: Jesse Glick

Reporter:: Simon Poirier

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Created:: 2013-11-19 22:06

Updated:: 2014-10-06 16:13

Resolved:: 2014-07-18 20:46

Jenkins

Details

Description

Attachments

Issue Links

Activity

Collapse comment: Lauri Taaleš added a comment - 2014-02-18 17:53

Expand comment: Lauri Taaleš added a comment - 2014-02-18 17:53

Collapse comment: Jesse Glick added a comment - 2014-07-01 19:41

Expand comment: Jesse Glick added a comment - 2014-07-01 19:41

Collapse comment: Jesse Glick added a comment - 2014-07-01 19:49

Expand comment: Jesse Glick added a comment - 2014-07-01 19:49

Collapse comment: Daniel Beck added a comment - 2014-07-07 13:43

Expand comment: Daniel Beck added a comment - 2014-07-07 13:43

People

Dates