Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-24953

Filenames with accents cause the changelog to appear empty

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • mercurial-plugin
    • Jenkins 1.480.3 + Mercurial plugin 1.45 or
      Jenkins 1.565.2 + Mercurial plugin 1.50
      Windows 7 x64

      If a file with an accent (e.g. "é") is part of a list of changes, the changelog.xml file will be apparently unreadable. For example, the character "é" ends up as 0xE9 in the xml file. Jenkins then shows No changes. in the build information. By manually editing changelog.xml and replacing 0xE9 by "é", a simple refresh of the page shows the proper changes.

      Note that accents in changeset comments are handled correctly.

      I set the priority to "Major" because one of the consequences of this is that no e-mails are sent because it thinks there are no changes, therefore nobody made changes.
      The job console output shows: An attempt to send an e-mail to empty list of recipients, ignored.
      This can mean a build failure without notification (which was our case). Whether this is a minor or major loss of function depends on the usage, I guess.

          [JENKINS-24953] Filenames with accents cause the changelog to appear empty

          Daniel Beck added a comment -

          Does this behavior depend on the value of the file.encoding Java system property? What is its value when this issue occurs? (check /systemInfo URL)

          Daniel Beck added a comment - Does this behavior depend on the value of the file.encoding Java system property? What is its value when this issue occurs? (check /systemInfo URL)

          It's currently set to Cp1252. No idea how I could try with different values for this...

          If it's relevant in any way, the command Jenkins uses, as seen in the console output, uses --encoding UTF-8:

          [workspace] $ hg log --template "<changeset node='{node}' author='{author|xmlescape}' rev='{rev}' date='{date}'><msg>{desc|xmlescape}</msg><added>{file_adds|stringify|xmlescape}</added><deleted>{file_dels|stringify|xmlescape}</deleted><files>{files|stringify|xmlescape}</files><parents>{parents}</parents></changeset>\n" --rev default:0 --follow --prune 437fba1b4ca5207fa18a08d0b72c88d753c2412b --encoding UTF-8 --encodingmode replace

          Samuel Delisle added a comment - It's currently set to Cp1252 . No idea how I could try with different values for this... If it's relevant in any way, the command Jenkins uses, as seen in the console output, uses --encoding UTF-8 : [workspace] $ hg log --template "<changeset node='{node}' author='{author|xmlescape}' rev='{rev}' date='{date}'><msg>{desc|xmlescape}</msg><added>{file_adds|stringify|xmlescape}</added><deleted>{file_dels|stringify|xmlescape}</deleted><files>{files|stringify|xmlescape}</files><parents>{parents}</parents></changeset>\n" --rev default:0 --follow --prune 437fba1b4ca5207fa18a08d0b72c88d753c2412b --encoding UTF-8 --encodingmode replace

          Daniel Beck added a comment -

          In the script that starts Jenkins (java -jar jenkins.war) add a parameter -Dfile.encoding=UTF-8 to the left of the -jar parameter.

          If installed using the Windows installer, you need to change jenkins.xml to include that.

          (Note that this may have unintended side effects)

          Daniel Beck added a comment - In the script that starts Jenkins ( java -jar jenkins.war ) add a parameter -Dfile.encoding=UTF-8 to the left of the -jar parameter. If installed using the Windows installer, you need to change jenkins.xml to include that. (Note that this may have unintended side effects)

          Samuel Delisle added a comment - - edited

          Using -Dfile.encoding=UTF-8 does not have any effect on the contents of changelog.xml, the issue is still present.

          Tested this only with the Jenkins 1.565.2 install.

          Samuel Delisle added a comment - - edited Using -Dfile.encoding=UTF-8 does not have any effect on the contents of changelog.xml , the issue is still present. Tested this only with the Jenkins 1.565.2 install.

          Jesse Glick added a comment -

          The Jenkins plugin passes --encoding UTF-8 since it also writes the XML header and footer and has to parse the file in a specific encoding. It could of course omit this flag and parse the XML in the system default encoding. But this would have the unfortunate effect that, say, Jenkins running on Windows server would be unable to handle non-ASCII commit messages created by developers working on Linux; and it would not prevent a malformed changelog.xml from being written. Mercurial itself punts on the whole issue and just treats all filenames and messages as opaque 8-bit content, but Jenkins has to interoperate with the wider world (your web browser especially) so it really needs to have a defined encoding. Apparently the filenames you are committing to your repository are in Cp1252 and Mercurial running with --encoding UTF-8 does not notice this for some reason.

          As a workaround, it might suffice for the plugin to also pass --encodingmode replace (or perhaps ignore). If that solves your issue well enough, file a pull request to that effect. However my understanding is that the default is strict, which ought to have prevented changelog.xml from being written at all (or from being completed once started), which is not what you seem to have observed. I know of no mode in which it is allowed to write 0xE9 when --encoding UTF-8 has been specified, so this might be a Mercurial bug.

          The only full and portable solution is to consistently use UTF-8 in your repository (in filenames and commit messages).

          Jesse Glick added a comment - The Jenkins plugin passes --encoding UTF-8 since it also writes the XML header and footer and has to parse the file in a specific encoding. It could of course omit this flag and parse the XML in the system default encoding. But this would have the unfortunate effect that, say, Jenkins running on Windows server would be unable to handle non-ASCII commit messages created by developers working on Linux; and it would not prevent a malformed changelog.xml from being written. Mercurial itself punts on the whole issue and just treats all filenames and messages as opaque 8-bit content, but Jenkins has to interoperate with the wider world (your web browser especially) so it really needs to have a defined encoding. Apparently the filenames you are committing to your repository are in Cp1252 and Mercurial running with --encoding UTF-8 does not notice this for some reason. As a workaround, it might suffice for the plugin to also pass --encodingmode replace (or perhaps ignore ). If that solves your issue well enough, file a pull request to that effect. However my understanding is that the default is strict , which ought to have prevented changelog.xml from being written at all (or from being completed once started), which is not what you seem to have observed. I know of no mode in which it is allowed to write 0xE9 when --encoding UTF-8 has been specified, so this might be a Mercurial bug. The only full and portable solution is to consistently use UTF-8 in your repository (in filenames and commit messages).

            jglick Jesse Glick
            samapico Samuel Delisle
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated: