Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-6203

Git plugin uses default encoding to read change log file

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed (View Workflow)
    • Priority: Major
    • Resolution: Fixed
    • Component/s: git-plugin
    • Labels:
      None
    • Environment:
      Windows Server 2003 R2, Cygwin 1.7.2, git 1.7.0.4, Hudson 1.353, GIT plugin 0.8.2
    • Similar Issues:

      Description

      [Problem]
      When we look at changes, commit log is garbled.

      [Cause]
      Git assumes that commit log is written by utf-8, if we don't set "i18n.commitencoding" property.

      hudson.plugins.git.GitChangeLogParser class uses java.io.FileReader class at 27.
      java.io.FileReader class uses the default character encoding to read file.
      Japanese version of Windows uses "MS932" as default character encoding.

      So we run hudson on it, commit log is garbled.

      [Solution]
      use java.io.FileInputStream and java.io.InputStreamReader classes instead of java.io.FileReader class.
      java.io.InputStreamReader class has a constructor that is able to set the encoding.

      [Example]
      BufferedReader rdr = null;
      try {
      // fetch encoding out of configuration
      rdr = new BufferedReader(new InputStreamReader(new FileInputStream(changelogFile), encoding));
      /* ... */
      } finally {
      // close rdr whether constructors throw exception or not
      if (rdr != null) rdr.close();
      }

        Attachments

          Issue Links

            Activity

            bleis bleis created issue -
            sogabe sogabe made changes -
            Field Original Value New Value
            Assignee magnayn [ magnayn ]
            Hide
            abayer Andrew Bayer added a comment -

            Resolved

            Show
            abayer Andrew Bayer added a comment - Resolved
            abayer Andrew Bayer made changes -
            Resolution Fixed [ 1 ]
            Status Open [ 1 ] Resolved [ 5 ]
            Hide
            dogfood dogfood added a comment -
            Show
            dogfood dogfood added a comment - Integrated in plugins_hudson-git-plugin #35
            Hide
            adept Anton Smirnov added a comment -

            1. perform git commit in utf-8 (russian alphabet).
            2. build done with wrong changes description (??? instead of russian letters).

            "git log" displays right discription (russian letters) in cloned jenkins repository (in '..\jobs%project_name%\workspace).
            tortoiseGit displays right log messages (in utf-8, i.e. commit is done in utf-8 definitely)

            environment:
            windows 7 professional,
            system default encoding is win1251,
            jenkins 1.405,
            git-plugin 1.1.6,
            msysgit-1.7.? (git 1.7.1)

            Show
            adept Anton Smirnov added a comment - 1. perform git commit in utf-8 (russian alphabet). 2. build done with wrong changes description (??? instead of russian letters). "git log" displays right discription (russian letters) in cloned jenkins repository (in '..\jobs%project_name%\workspace). tortoiseGit displays right log messages (in utf-8, i.e. commit is done in utf-8 definitely) environment: windows 7 professional, system default encoding is win1251, jenkins 1.405, git-plugin 1.1.6, msysgit-1.7.? (git 1.7.1)
            adept Anton Smirnov made changes -
            Resolution Fixed [ 1 ]
            Status Resolved [ 5 ] Reopened [ 4 ]
            adept Anton Smirnov made changes -
            Assignee magnayn [ magnayn ] Anton Smirnov [ adept ]
            Hide
            adept Anton Smirnov added a comment -

            debugged it and checked:
            private String computeChangeLog(...) in GitSCM.java
            returns right log description, but the problems remains.

            Jenkins build page still contains wrong text.
            Moving issue to Jenkins issues...

            Show
            adept Anton Smirnov added a comment - debugged it and checked: private String computeChangeLog(...) in GitSCM.java returns right log description, but the problems remains. Jenkins build page still contains wrong text. Moving issue to Jenkins issues...
            adept Anton Smirnov made changes -
            Resolution Not A Defect [ 7 ]
            Status Reopened [ 4 ] Resolved [ 5 ]
            adept Anton Smirnov made changes -
            Status Resolved [ 5 ] Closed [ 6 ]
            Hide
            shadowcat Dmitry Salashnik added a comment -

            As i see, changelog.xml contains correct utf-8 text. BUT this is not XML file!!!
            May be some problems with this:
            plugin parse git output correctly, store it into file correctly, but jenkins don't know about used encoding (all data in changelog.xml stored as plain text) and use default encoding when reading this file...

            Show
            shadowcat Dmitry Salashnik added a comment - As i see, changelog.xml contains correct utf-8 text. BUT this is not XML file!!! May be some problems with this: plugin parse git output correctly, store it into file correctly, but jenkins don't know about used encoding (all data in changelog.xml stored as plain text) and use default encoding when reading this file...
            shadowcat Dmitry Salashnik made changes -
            Assignee Anton Smirnov [ adept ] Dmitry Salashnik [ shadowcat ]
            Resolution Not A Defect [ 7 ]
            Status Closed [ 6 ] Reopened [ 4 ]
            shadowcat Dmitry Salashnik made changes -
            Assignee Dmitry Salashnik [ shadowcat ] abayer [ abayer ]
            Hide
            evernat evernat added a comment -

            Perhaps, fixed by this pull request:
            https://github.com/jenkinsci/git-plugin/pull/98

            Show
            evernat evernat added a comment - Perhaps, fixed by this pull request: https://github.com/jenkinsci/git-plugin/pull/98
            Hide
            alex_orlando Alex Orlando added a comment -

            This is still broken. It's been almost 3 years...

            Environment
            ===========
            Ubuntu 12.04.2
            git 1.7.9.5
            Jenkins 1.502
            Jenkins GIT client plugin 1.0.2
            Jenkins GIT plugin 1.2.0

            Show
            alex_orlando Alex Orlando added a comment - This is still broken. It's been almost 3 years... Environment =========== Ubuntu 12.04.2 git 1.7.9.5 Jenkins 1.502 Jenkins GIT client plugin 1.0.2 Jenkins GIT plugin 1.2.0
            Hide
            evernat evernat added a comment -

            @Alex
            Do you want to maintain the git plugin?

            Show
            evernat evernat added a comment - @Alex Do you want to maintain the git plugin?
            Hide
            scm_issue_link SCM/JIRA link daemon added a comment -

            Code changed in jenkins
            User: Nicolas De Loof
            Path:
            src/main/java/hudson/plugins/git/GitChangeLogParser.java
            src/main/java/hudson/plugins/git/GitSCM.java
            http://jenkins-ci.org/commit/git-plugin/cdf149a73a51861bff07897aa42fa0535b88f99e
            Log:
            [FIXED JENKINS-6203] force UTF-8 when reading change log
            according to https://www.kernel.org/pub/software/scm/git/docs/git-log.html, git CLI uses UTF-8 by default to produce change log entries, so no impact on git-client.


            You received this message because you are subscribed to the Google Groups "Jenkins Commits" group.
            To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-commits+unsubscribe@googlegroups.com.
            For more options, visit https://groups.google.com/groups/opt_out.

            Show
            scm_issue_link SCM/JIRA link daemon added a comment - Code changed in jenkins User: Nicolas De Loof Path: src/main/java/hudson/plugins/git/GitChangeLogParser.java src/main/java/hudson/plugins/git/GitSCM.java http://jenkins-ci.org/commit/git-plugin/cdf149a73a51861bff07897aa42fa0535b88f99e Log: [FIXED JENKINS-6203] force UTF-8 when reading change log according to https://www.kernel.org/pub/software/scm/git/docs/git-log.html , git CLI uses UTF-8 by default to produce change log entries, so no impact on git-client. – You received this message because you are subscribed to the Google Groups "Jenkins Commits" group. To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-commits+unsubscribe@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out .
            ndeloof Nicolas De Loof made changes -
            Assignee abayer [ abayer ] Nicolas De Loof [ ndeloof ]
            Resolution Fixed [ 1 ]
            Status Reopened [ 4 ] Resolved [ 5 ]
            Hide
            mr_const Konst Kolesnichenko added a comment -

            It seems I have same issue with latest jenkins and git. Job's build changelog shows garbled cyrillic characters (They're UTF-8, but shown as CP1251). Here is a screenshot: http://i.imgur.com/wLuXUWV.png

            Show
            mr_const Konst Kolesnichenko added a comment - It seems I have same issue with latest jenkins and git. Job's build changelog shows garbled cyrillic characters (They're UTF-8, but shown as CP1251). Here is a screenshot: http://i.imgur.com/wLuXUWV.png
            mr_const Konst Kolesnichenko made changes -
            Resolution Fixed [ 1 ]
            Status Resolved [ 5 ] Reopened [ 4 ]
            Hide
            andrey_x64 Andrey Rogozhnikov added a comment - - edited
            Show
            andrey_x64 Andrey Rogozhnikov added a comment - - edited The same: http://i.imgur.com/63mSQKC.jpg
            Hide
            alexradionov Alexander Radionov added a comment -

            Any update?

            Show
            alexradionov Alexander Radionov added a comment - Any update?
            Hide
            gm7add9 Gen Nishimura added a comment -

            The call to FileUtils.lineIterator() in GitChangeLogParser.java lacks the second parameter(encoding), which seems to cause the fallback to default encoding.
            lineIterator = FileUtils.lineIterator(changelogFile);

            Since changelog is written in UTF-8 (in GitSCM.java), following change should clear the issue (worked for me).
            lineIterator = FileUtils.lineIterator(changelogFile, "UTF-8");

            Show
            gm7add9 Gen Nishimura added a comment - The call to FileUtils.lineIterator() in GitChangeLogParser.java lacks the second parameter(encoding), which seems to cause the fallback to default encoding. lineIterator = FileUtils.lineIterator(changelogFile); Since changelog is written in UTF-8 (in GitSCM.java), following change should clear the issue (worked for me). lineIterator = FileUtils.lineIterator(changelogFile, "UTF-8");
            Hide
            egv Gennady Evstratov added a comment -

            This can be avoided if you set default file encoding to UTF-8 in system properties by adding -Dfile.encoding=UTF-8 to java opts while starting Jenkins or setting JAVA_TOOL_OPTIONS environment variable

            Show
            egv Gennady Evstratov added a comment - This can be avoided if you set default file encoding to UTF-8 in system properties by adding -Dfile.encoding=UTF-8 to java opts while starting Jenkins or setting JAVA_TOOL_OPTIONS environment variable
            Hide
            jorgen99 Jörgen Lundberg added a comment -

            We also have this problem. We're running Jenkins as a Windows server
            and I added -Dfile.encoding=UTF-8 to the start-up parameters and restarted
            the service. The log messages till shows the UTF-8 characters instead
            of for example å ä ö.

            Show
            jorgen99 Jörgen Lundberg added a comment - We also have this problem. We're running Jenkins as a Windows server and I added -Dfile.encoding=UTF-8 to the start-up parameters and restarted the service. The log messages till shows the UTF-8 characters instead of for example å ä ö.
            Hide
            jorgen99 Jörgen Lundberg added a comment -

            It was my bad. I didn't realize the log was written to disk. So, the utf-8 setting works for all builds done after the server restart.

            Show
            jorgen99 Jörgen Lundberg added a comment - It was my bad. I didn't realize the log was written to disk. So, the utf-8 setting works for all builds done after the server restart.
            Hide
            hayarobipark hayarobi Park added a comment - - edited

            I'm running Jenkins on Windows 7 (Korean version, encoding is CP949) with msysgit-1.9.5-xxx. The git repository is on remote Linux machine.
            I have the issue that recent change message is corrupted.

            After some testing and debugging git-plugin, I found, at least in my case, that the log message received from standard output of external git.exe was already corrupted. So, changelog.xml file in builds directory stores corrupted text.

            git-client plugin execute external git.exe like this command, "git.exe whatchanged --no-abbrev -M --pretty=raw df1cca6135b7019dbd583693b59f6b97f408f5c5 ", and git.exe out change log to standard out. git-client plugin takes that output. In this point, the wrongfully converted message was received to git-client plugin.

            The original message was UTF-8, but this message was assumed to current OS's encoding(CP949 in my computer) and then wrong encoding conversion (CP949 to UCS-2) is occured. I don't yet know which one is doing this wrong conversion; hudson Locallauncher, get.exe or other.

            Show
            hayarobipark hayarobi Park added a comment - - edited I'm running Jenkins on Windows 7 (Korean version, encoding is CP949) with msysgit-1.9.5-xxx. The git repository is on remote Linux machine. I have the issue that recent change message is corrupted. After some testing and debugging git-plugin, I found, at least in my case, that the log message received from standard output of external git.exe was already corrupted. So, changelog.xml file in builds directory stores corrupted text. git-client plugin execute external git.exe like this command, "git.exe whatchanged --no-abbrev -M --pretty=raw df1cca6135b7019dbd583693b59f6b97f408f5c5 ", and git.exe out change log to standard out. git-client plugin takes that output. In this point, the wrongfully converted message was received to git-client plugin. The original message was UTF-8, but this message was assumed to current OS's encoding(CP949 in my computer) and then wrong encoding conversion (CP949 to UCS-2) is occured. I don't yet know which one is doing this wrong conversion; hudson Locallauncher, get.exe or other.
            eratolekov Era Tolekov made changes -
            Link This issue is related to JENKINS-23091 [ JENKINS-23091 ]
            Hide
            gtrafimenkov Gennady Trafimenkov added a comment -

            I might have a working fix for this issue. Here is the fix: https://github.com/gennady/git-client-plugin/commit/aef7fff3ff765e2f8fd2b270d89e3f6b462cc2de

            Give it a try if you don't mind.

            You can compile the plugin yourself with

            mvn package
            

            or try already compiled version https://github.com/gennady/git-client-plugin/raw/8383bd7c222b52e26b0d1b395b2eb26766f86cf7/compiled-plugin/git-client.hpi

            How to try:

            • stop jenkins
            • remove git-client, git-client.hpi, git-client.jpi from the plugins folder
            • copy git-client.hpi to the plugins folder
            • start jenkins
            Show
            gtrafimenkov Gennady Trafimenkov added a comment - I might have a working fix for this issue. Here is the fix: https://github.com/gennady/git-client-plugin/commit/aef7fff3ff765e2f8fd2b270d89e3f6b462cc2de Give it a try if you don't mind. You can compile the plugin yourself with mvn package or try already compiled version https://github.com/gennady/git-client-plugin/raw/8383bd7c222b52e26b0d1b395b2eb26766f86cf7/compiled-plugin/git-client.hpi How to try: stop jenkins remove git-client, git-client.hpi, git-client.jpi from the plugins folder copy git-client.hpi to the plugins folder start jenkins
            Hide
            scm_issue_link SCM/JIRA link daemon added a comment -

            Code changed in jenkins
            User: Gennady Trafimenkov
            Path:
            src/main/java/org/jenkinsci/plugins/gitclient/CliGitAPIImpl.java
            src/test/java/org/jenkinsci/plugins/gitclient/GitAPITestCase.java
            src/test/resources/unicodeCharsInChangelogRepo.zip
            src/test/resources/unicodeCharsInChangelogRepoCreate.sh
            http://jenkins-ci.org/commit/git-client-plugin/c99c91fcf497e784204398761be5c10f438d0e55
            Log:
            Fixed garbled commit messages on Windows

            On windows changelog commit messages with unicode characters are
            not saved correctly to changelog.xml when CliGitAPI
            implementation is in use.

            That happens because "git whatchanged" gives byte stream of data.
            Commit messages in that stream are encoded in UTF-8. It is
            necessary to explicitly decode bytestream to strings using UTF-8
            encoding, otherwise default system encoding will be used.

            This should fix issues:
            https://issues.jenkins-ci.org/browse/JENKINS-6203
            https://issues.jenkins-ci.org/browse/JENKINS-14798
            https://issues.jenkins-ci.org/browse/JENKINS-23091

            Show
            scm_issue_link SCM/JIRA link daemon added a comment - Code changed in jenkins User: Gennady Trafimenkov Path: src/main/java/org/jenkinsci/plugins/gitclient/CliGitAPIImpl.java src/test/java/org/jenkinsci/plugins/gitclient/GitAPITestCase.java src/test/resources/unicodeCharsInChangelogRepo.zip src/test/resources/unicodeCharsInChangelogRepoCreate.sh http://jenkins-ci.org/commit/git-client-plugin/c99c91fcf497e784204398761be5c10f438d0e55 Log: Fixed garbled commit messages on Windows On windows changelog commit messages with unicode characters are not saved correctly to changelog.xml when CliGitAPI implementation is in use. That happens because "git whatchanged" gives byte stream of data. Commit messages in that stream are encoded in UTF-8. It is necessary to explicitly decode bytestream to strings using UTF-8 encoding, otherwise default system encoding will be used. This should fix issues: https://issues.jenkins-ci.org/browse/JENKINS-6203 https://issues.jenkins-ci.org/browse/JENKINS-14798 https://issues.jenkins-ci.org/browse/JENKINS-23091
            Hide
            markewaite Mark Waite added a comment -

            Resolved in git client plugin 1.19.3 released 6 Feb 2016

            Show
            markewaite Mark Waite added a comment - Resolved in git client plugin 1.19.3 released 6 Feb 2016
            markewaite Mark Waite made changes -
            Resolution Fixed [ 1 ]
            Status Reopened [ 4 ] Resolved [ 5 ]
            rtyler R. Tyler Croy made changes -
            Workflow JNJira [ 136296 ] JNJira + In-Review [ 187164 ]
            markewaite Mark Waite made changes -
            Status Resolved [ 5 ] Closed [ 6 ]

              People

              Assignee:
              ndeloof Nicolas De Loof
              Reporter:
              bleis bleis
              Votes:
              10 Vote for this issue
              Watchers:
              22 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: