Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-6203

Git plugin uses default encoding to read change log file

    • Icon: Bug Bug
    • Resolution: Fixed
    • Icon: Major Major
    • git-plugin
    • None
    • Windows Server 2003 R2, Cygwin 1.7.2, git 1.7.0.4, Hudson 1.353, GIT plugin 0.8.2

      [Problem]
      When we look at changes, commit log is garbled.

      [Cause]
      Git assumes that commit log is written by utf-8, if we don't set "i18n.commitencoding" property.

      hudson.plugins.git.GitChangeLogParser class uses java.io.FileReader class at 27.
      java.io.FileReader class uses the default character encoding to read file.
      Japanese version of Windows uses "MS932" as default character encoding.

      So we run hudson on it, commit log is garbled.

      [Solution]
      use java.io.FileInputStream and java.io.InputStreamReader classes instead of java.io.FileReader class.
      java.io.InputStreamReader class has a constructor that is able to set the encoding.

      [Example]
      BufferedReader rdr = null;
      try {
      // fetch encoding out of configuration
      rdr = new BufferedReader(new InputStreamReader(new FileInputStream(changelogFile), encoding));
      /* ... */
      } finally {
      // close rdr whether constructors throw exception or not
      if (rdr != null) rdr.close();
      }

          [JENKINS-6203] Git plugin uses default encoding to read change log file

          Andrey Rogozhnikov added a comment - - edited

          Andrey Rogozhnikov added a comment - - edited The same: http://i.imgur.com/63mSQKC.jpg

          Any update?

          Alexander Radionov added a comment - Any update?

          Gen Nishimura added a comment -

          The call to FileUtils.lineIterator() in GitChangeLogParser.java lacks the second parameter(encoding), which seems to cause the fallback to default encoding.
          lineIterator = FileUtils.lineIterator(changelogFile);

          Since changelog is written in UTF-8 (in GitSCM.java), following change should clear the issue (worked for me).
          lineIterator = FileUtils.lineIterator(changelogFile, "UTF-8");

          Gen Nishimura added a comment - The call to FileUtils.lineIterator() in GitChangeLogParser.java lacks the second parameter(encoding), which seems to cause the fallback to default encoding. lineIterator = FileUtils.lineIterator(changelogFile); Since changelog is written in UTF-8 (in GitSCM.java), following change should clear the issue (worked for me). lineIterator = FileUtils.lineIterator(changelogFile, "UTF-8");

          This can be avoided if you set default file encoding to UTF-8 in system properties by adding -Dfile.encoding=UTF-8 to java opts while starting Jenkins or setting JAVA_TOOL_OPTIONS environment variable

          Gennady Evstratov added a comment - This can be avoided if you set default file encoding to UTF-8 in system properties by adding -Dfile.encoding=UTF-8 to java opts while starting Jenkins or setting JAVA_TOOL_OPTIONS environment variable

          We also have this problem. We're running Jenkins as a Windows server
          and I added -Dfile.encoding=UTF-8 to the start-up parameters and restarted
          the service. The log messages till shows the UTF-8 characters instead
          of for example å ä ö.

          Jörgen Lundberg added a comment - We also have this problem. We're running Jenkins as a Windows server and I added -Dfile.encoding=UTF-8 to the start-up parameters and restarted the service. The log messages till shows the UTF-8 characters instead of for example å ä ö.

          It was my bad. I didn't realize the log was written to disk. So, the utf-8 setting works for all builds done after the server restart.

          Jörgen Lundberg added a comment - It was my bad. I didn't realize the log was written to disk. So, the utf-8 setting works for all builds done after the server restart.

          hayarobi Park added a comment - - edited

          I'm running Jenkins on Windows 7 (Korean version, encoding is CP949) with msysgit-1.9.5-xxx. The git repository is on remote Linux machine.
          I have the issue that recent change message is corrupted.

          After some testing and debugging git-plugin, I found, at least in my case, that the log message received from standard output of external git.exe was already corrupted. So, changelog.xml file in builds directory stores corrupted text.

          git-client plugin execute external git.exe like this command, "git.exe whatchanged --no-abbrev -M --pretty=raw df1cca6135b7019dbd583693b59f6b97f408f5c5 ", and git.exe out change log to standard out. git-client plugin takes that output. In this point, the wrongfully converted message was received to git-client plugin.

          The original message was UTF-8, but this message was assumed to current OS's encoding(CP949 in my computer) and then wrong encoding conversion (CP949 to UCS-2) is occured. I don't yet know which one is doing this wrong conversion; hudson Locallauncher, get.exe or other.

          hayarobi Park added a comment - - edited I'm running Jenkins on Windows 7 (Korean version, encoding is CP949) with msysgit-1.9.5-xxx. The git repository is on remote Linux machine. I have the issue that recent change message is corrupted. After some testing and debugging git-plugin, I found, at least in my case, that the log message received from standard output of external git.exe was already corrupted. So, changelog.xml file in builds directory stores corrupted text. git-client plugin execute external git.exe like this command, "git.exe whatchanged --no-abbrev -M --pretty=raw df1cca6135b7019dbd583693b59f6b97f408f5c5 ", and git.exe out change log to standard out. git-client plugin takes that output. In this point, the wrongfully converted message was received to git-client plugin. The original message was UTF-8, but this message was assumed to current OS's encoding(CP949 in my computer) and then wrong encoding conversion (CP949 to UCS-2) is occured. I don't yet know which one is doing this wrong conversion; hudson Locallauncher, get.exe or other.

          I might have a working fix for this issue. Here is the fix: https://github.com/gennady/git-client-plugin/commit/aef7fff3ff765e2f8fd2b270d89e3f6b462cc2de

          Give it a try if you don't mind.

          You can compile the plugin yourself with

          mvn package
          

          or try already compiled version https://github.com/gennady/git-client-plugin/raw/8383bd7c222b52e26b0d1b395b2eb26766f86cf7/compiled-plugin/git-client.hpi

          How to try:

          • stop jenkins
          • remove git-client, git-client.hpi, git-client.jpi from the plugins folder
          • copy git-client.hpi to the plugins folder
          • start jenkins

          Gennady Trafimenkov added a comment - I might have a working fix for this issue. Here is the fix: https://github.com/gennady/git-client-plugin/commit/aef7fff3ff765e2f8fd2b270d89e3f6b462cc2de Give it a try if you don't mind. You can compile the plugin yourself with mvn package or try already compiled version https://github.com/gennady/git-client-plugin/raw/8383bd7c222b52e26b0d1b395b2eb26766f86cf7/compiled-plugin/git-client.hpi How to try: stop jenkins remove git-client, git-client.hpi, git-client.jpi from the plugins folder copy git-client.hpi to the plugins folder start jenkins

          Code changed in jenkins
          User: Gennady Trafimenkov
          Path:
          src/main/java/org/jenkinsci/plugins/gitclient/CliGitAPIImpl.java
          src/test/java/org/jenkinsci/plugins/gitclient/GitAPITestCase.java
          src/test/resources/unicodeCharsInChangelogRepo.zip
          src/test/resources/unicodeCharsInChangelogRepoCreate.sh
          http://jenkins-ci.org/commit/git-client-plugin/c99c91fcf497e784204398761be5c10f438d0e55
          Log:
          Fixed garbled commit messages on Windows

          On windows changelog commit messages with unicode characters are
          not saved correctly to changelog.xml when CliGitAPI
          implementation is in use.

          That happens because "git whatchanged" gives byte stream of data.
          Commit messages in that stream are encoded in UTF-8. It is
          necessary to explicitly decode bytestream to strings using UTF-8
          encoding, otherwise default system encoding will be used.

          This should fix issues:
          https://issues.jenkins-ci.org/browse/JENKINS-6203
          https://issues.jenkins-ci.org/browse/JENKINS-14798
          https://issues.jenkins-ci.org/browse/JENKINS-23091

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Gennady Trafimenkov Path: src/main/java/org/jenkinsci/plugins/gitclient/CliGitAPIImpl.java src/test/java/org/jenkinsci/plugins/gitclient/GitAPITestCase.java src/test/resources/unicodeCharsInChangelogRepo.zip src/test/resources/unicodeCharsInChangelogRepoCreate.sh http://jenkins-ci.org/commit/git-client-plugin/c99c91fcf497e784204398761be5c10f438d0e55 Log: Fixed garbled commit messages on Windows On windows changelog commit messages with unicode characters are not saved correctly to changelog.xml when CliGitAPI implementation is in use. That happens because "git whatchanged" gives byte stream of data. Commit messages in that stream are encoded in UTF-8. It is necessary to explicitly decode bytestream to strings using UTF-8 encoding, otherwise default system encoding will be used. This should fix issues: https://issues.jenkins-ci.org/browse/JENKINS-6203 https://issues.jenkins-ci.org/browse/JENKINS-14798 https://issues.jenkins-ci.org/browse/JENKINS-23091

          Mark Waite added a comment -

          Resolved in git client plugin 1.19.3 released 6 Feb 2016

          Mark Waite added a comment - Resolved in git client plugin 1.19.3 released 6 Feb 2016

            ndeloof Nicolas De Loof
            bleis bleis
            Votes:
            10 Vote for this issue
            Watchers:
            22 Start watching this issue

              Created:
              Updated:
              Resolved: