Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-31096

Unicode characters in console logs do not print correctly in Workflow builds

      On my production Jenkins system, Workflow jobs which run a shell script that happens to print Unicode characters instead display the null/question mark placeholder in the console log view. The same shell script, run by a freestyle job, prints correctly. I can't reproduce this on my test Jenkins server, so I suspect this is partly an environment issue, but I don't know what to check to confirm that and it's interesting that it only impacts Workflow jobs.

      Workflow script which reproduces this:
      node {
      sh 'env printf "\u2024 \u2024 \u2024 \u2024 \u2024 \n"'
      }

          [JENKINS-31096] Unicode characters in console logs do not print correctly in Workflow builds

          Owen Mehegan added a comment - - edited

          /systemInfo tells me the encoding is ANSI_X3.4-1968. I get the same failure whether running on the master or other nodes. I am setting LANGUAGE = en_US.UTF-8 and LC_ALL = en_US.UTF-8 in the environment variables config for my build nodes, but not for the master.

          Output of 'locale' on the master:

          LANG=
          LANGUAGE=
          LC_CTYPE="POSIX"
          LC_NUMERIC="POSIX"
          LC_TIME="POSIX"
          LC_COLLATE="POSIX"
          LC_MONETARY="POSIX"
          LC_MESSAGES="POSIX"
          LC_PAPER="POSIX"
          LC_NAME="POSIX"
          LC_ADDRESS="POSIX"
          LC_TELEPHONE="POSIX"
          LC_MEASUREMENT="POSIX"
          LC_IDENTIFICATION="POSIX"
          LC_ALL=
          

          Output of 'locale' on a build node:

          LANG=en_US.UTF-8
          LANGUAGE=en_US.UTF-8
          LC_CTYPE="en_US.UTF-8"
          LC_NUMERIC="en_US.UTF-8"
          LC_TIME="en_US.UTF-8"
          LC_COLLATE="en_US.UTF-8"
          LC_MONETARY="en_US.UTF-8"
          LC_MESSAGES="en_US.UTF-8"
          LC_PAPER="en_US.UTF-8"
          LC_NAME="en_US.UTF-8"
          LC_ADDRESS="en_US.UTF-8"
          LC_TELEPHONE="en_US.UTF-8"
          LC_MEASUREMENT="en_US.UTF-8"
          LC_IDENTIFICATION="en_US.UTF-8"
          LC_ALL=en_US.UTF-8

          Owen Mehegan added a comment - - edited /systemInfo tells me the encoding is ANSI_X3.4-1968. I get the same failure whether running on the master or other nodes. I am setting LANGUAGE = en_US.UTF-8 and LC_ALL = en_US.UTF-8 in the environment variables config for my build nodes, but not for the master. Output of 'locale' on the master: LANG= LANGUAGE= LC_CTYPE="POSIX" LC_NUMERIC="POSIX" LC_TIME="POSIX" LC_COLLATE="POSIX" LC_MONETARY="POSIX" LC_MESSAGES="POSIX" LC_PAPER="POSIX" LC_NAME="POSIX" LC_ADDRESS="POSIX" LC_TELEPHONE="POSIX" LC_MEASUREMENT="POSIX" LC_IDENTIFICATION="POSIX" LC_ALL= Output of 'locale' on a build node: LANG=en_US.UTF-8 LANGUAGE=en_US.UTF-8 LC_CTYPE="en_US.UTF-8" LC_NUMERIC="en_US.UTF-8" LC_TIME="en_US.UTF-8" LC_COLLATE="en_US.UTF-8" LC_MONETARY="en_US.UTF-8" LC_MESSAGES="en_US.UTF-8" LC_PAPER="en_US.UTF-8" LC_NAME="en_US.UTF-8" LC_ADDRESS="en_US.UTF-8" LC_TELEPHONE="en_US.UTF-8" LC_MEASUREMENT="en_US.UTF-8" LC_IDENTIFICATION="en_US.UTF-8" LC_ALL=en_US.UTF-8

          Jesse Glick added a comment -

          So your master is set to use ASCII, meaning it cannot process such characters in files at all. The workaround is to set your master to use UTF-8 encoding, as virtually all modern Linux distributions do. If you cannot set it systemwide, I think you can run the Jenkins Java process with -Dfile.encoding=UTF-8.

          Now I have had a debate with kohsuke about this but we do not see eye to eye. His perspective is that Jenkins should always store log files in the system native encoding, which I think is mainly useful for Japanese users who are accustomed to using one of the two or three traditional encodings in that country; it is only helpful if you are directly accessing the log file from the filesystem (over the web in console it is rendered according to the encoding of the browser, or the REST client in the case of consoleText). My position is that all text files stored by Jenkins should be in UTF-8; external processes forked on a slave can be assumed to produce output in the system native encoding, which should then be recoded internally to UTF-8.

          Now the twist here is freestyle vs. Workflow. In the case of a freestyle build, the system encoding on the slave is noted when the build starts, recorded as part of the build record, and used to save the log. That allows the build to handle varying encodings in one CI server. But such a trick is impossible for Workflow, which starts the build and then gets a slave, or several, or none. So it cannot use a slave-specific charset. It could either use the master’s charset, as it currently does (meaning it cannot track output from slaves using a charset which is a superset of that); or it could use UTF-8 unconditionally, which might be a better idea (though I would need some flag to indicate old build logs saved in another character set).

          In either case, process output should probably be recoded to that expected by the build log. This is devilishly hard to prove in an automated test, I am afraid.

          Jesse Glick added a comment - So your master is set to use ASCII, meaning it cannot process such characters in files at all. The workaround is to set your master to use UTF-8 encoding, as virtually all modern Linux distributions do. If you cannot set it systemwide, I think you can run the Jenkins Java process with -Dfile.encoding=UTF-8 . Now I have had a debate with kohsuke about this but we do not see eye to eye. His perspective is that Jenkins should always store log files in the system native encoding, which I think is mainly useful for Japanese users who are accustomed to using one of the two or three traditional encodings in that country; it is only helpful if you are directly accessing the log file from the filesystem (over the web in console it is rendered according to the encoding of the browser, or the REST client in the case of consoleText ). My position is that all text files stored by Jenkins should be in UTF-8; external processes forked on a slave can be assumed to produce output in the system native encoding, which should then be recoded internally to UTF-8. Now the twist here is freestyle vs. Workflow. In the case of a freestyle build, the system encoding on the slave is noted when the build starts, recorded as part of the build record, and used to save the log. That allows the build to handle varying encodings in one CI server. But such a trick is impossible for Workflow, which starts the build and then gets a slave, or several, or none. So it cannot use a slave-specific charset. It could either use the master’s charset, as it currently does (meaning it cannot track output from slaves using a charset which is a superset of that); or it could use UTF-8 unconditionally, which might be a better idea (though I would need some flag to indicate old build logs saved in another character set). In either case, process output should probably be recoded to that expected by the build log. This is devilishly hard to prove in an automated test, I am afraid.

          Owen Mehegan added a comment - - edited

          FWIW, even setting it systemwide using locale didn't fix this, I had to add -Dfile.encoding=UTF-8 to my java arguments. Now my logs display nicely. Thanks for the explanation!

          Owen Mehegan added a comment - - edited FWIW, even setting it systemwide using locale didn't fix this, I had to add -Dfile.encoding=UTF-8 to my java arguments. Now my logs display nicely. Thanks for the explanation!

          Jesse Glick added a comment -

          As part of the JENKINS-38381 refactoring, I have switched the log handling for Pipeline builds to use UTF-8 everywhere, adjusted the sh/bat steps to convert output from the agent’s system default charset to UTF-8, and written a test demonstrating that ISO-8859-2 text is correctly processed.

          Jesse Glick added a comment - As part of the JENKINS-38381 refactoring, I have switched the log handling for Pipeline builds to use UTF-8 everywhere, adjusted the sh / bat steps to convert output from the agent’s system default charset to UTF-8, and written a test demonstrating that ISO-8859-2 text is correctly processed.

          Marslo Jiao added a comment - - edited

          I'm running newman (postman) script in Jenkins. The console output shows:

           

          But when I click the View as plain text, it the character shows correct:

           

          Here more system information:

          • Jenkins Version: 2.70
          • Job Type: Free Style
          • Slave and Jenkins Master OS: Ubuntu 16.04 LTS
          • Jenkins Master JAVA_ARG: JAVA_ARGS=" -Dfile.encoding=UTF-8 -Dsun.jnu.encoding=utf-8"
          • Jenkins Master installed by: apt install jenkins.
          • Jenkins System Properties:
            • file.encoding: UTF-8
            • file.encoding.pkg: sun.io
            • sun.io.unicode.encoding: UnicodeLittle
            • sun.jnu.encoding: utf-8
            • java.runtime.name: Java(TM) SE Runtime Environment
            • java.runtime.version: 1.8.0_131-b11
            • user.language: en
            • sun.java.launcher: SUN_STANDARD
          • Jenkins Environment Variable
            • LANG: en_US.UTF-8
            • LC_ALL: en_US.UTF-8

           

          locale command in both Jenkins and Slave

          LANG=en_US.UTF-8
          LANGUAGE=en_US.UTF-8
          LC_CTYPE="en_US.UTF-8"
          LC_NUMERIC="en_US.UTF-8"
          LC_TIME="en_US.UTF-8"
          LC_COLLATE="en_US.UTF-8"
          LC_MONETARY="en_US.UTF-8"
          LC_MESSAGES="en_US.UTF-8"
          LC_PAPER="en_US.UTF-8"
          LC_NAME="en_US.UTF-8"
          LC_ADDRESS="en_US.UTF-8"
          LC_TELEPHONE="en_US.UTF-8"
          LC_MEASUREMENT="en_US.UTF-8"
          LC_IDENTIFICATION="en_US.UTF-8"
          LC_ALL=en_US.UTF-8

           

          I've add the comments at here also.

          Marslo Jiao added a comment - - edited I'm running newman (postman) script in Jenkins. The console output shows:   But when I click the View as plain text , it the character shows correct:   Here more system information: Jenkins Version: 2.70 Job Type: Free Style Slave and Jenkins Master OS: Ubuntu 16.04 LTS Jenkins Master JAVA_ARG:  JAVA_ARGS=" -Dfile.encoding=UTF-8 -Dsun.jnu.encoding=utf-8" Jenkins Master installed by: apt install jenkins . Jenkins System Properties: file.encoding:  UTF-8 file.encoding.pkg:  sun.io sun.io.unicode.encoding:  UnicodeLittle sun.jnu.encoding:  utf-8 java.runtime.name:  Java(TM) SE Runtime Environment java.runtime.version:  1.8.0_131-b11 user.language:  en sun.java.launcher:  SUN_STANDARD Jenkins Environment Variable LANG:  en_US.UTF-8 LC_ALL:  en_US.UTF-8   locale command in both Jenkins and Slave LANG=en_US.UTF-8 LANGUAGE=en_US.UTF-8 LC_CTYPE="en_US.UTF-8" LC_NUMERIC="en_US.UTF-8" LC_TIME="en_US.UTF-8" LC_COLLATE="en_US.UTF-8" LC_MONETARY="en_US.UTF-8" LC_MESSAGES="en_US.UTF-8" LC_PAPER="en_US.UTF-8" LC_NAME="en_US.UTF-8" LC_ADDRESS="en_US.UTF-8" LC_TELEPHONE="en_US.UTF-8" LC_MEASUREMENT="en_US.UTF-8" LC_IDENTIFICATION="en_US.UTF-8" LC_ALL=en_US.UTF-8   I've add the comments at here also.

          Jesse Glick added a comment -

          marslo this issue is about Pipeline, so please do not comment on issues affecting freestyle builds.

          Jesse Glick added a comment - marslo this issue is about Pipeline, so please do not comment on issues affecting freestyle builds.

          I have exactly the same issue as owenmehegan. Production won't show Unicode characters correctly, development does.

          I tried the workaround mentioned: adding -Dfile.encoding=UTF-8 as well as -Dsun.jnu.encoding=utf-8 to JENKINS_JAVA_OPTIONS under /etc/sysconfig/jenkins (Using RedHat systems here), but it didn't work. (I did confirm that the options were updated by looking at <JENKINS_URL>/systemInfo)

          The only difference I can see is that the development master is RHEL 6 and the production uses RHEL 7.

          The output is given by a python module which is run using the PyEnv plugin.

          Any other workarounds in the meantime?

          Tsvi Mostovicz added a comment - I have exactly the same issue as owenmehegan . Production won't show Unicode characters correctly, development does. I tried the workaround mentioned: adding -Dfile.encoding=UTF-8 as well as -Dsun.jnu.encoding=utf-8 to JENKINS_JAVA_OPTIONS under /etc/sysconfig/jenkins (Using RedHat systems here), but it didn't work. (I did confirm that the options were updated by looking at <JENKINS_URL>/systemInfo) The only difference I can see is that the development master is RHEL 6 and the production uses RHEL 7. The output is given by a python module which is run using the PyEnv plugin. Any other workarounds in the meantime?

          Jesse Glick added a comment -

          Relevant PRs have been brought up to date once again.

          Jesse Glick added a comment - Relevant PRs have been brought up to date once again.

          Jesse Glick added a comment -

          JENKINS-48923 proposes an analogous change to Jenkins overall.

          Jesse Glick added a comment - JENKINS-48923 proposes an analogous change to Jenkins overall.

          Jesse Glick added a comment -

          Released. Please check the plugin wikis for changelog information—it is important to update all four affected plugins.

          Jesse Glick added a comment - Released. Please check the plugin wikis for changelog information—it is important to update all four affected plugins.

            jglick Jesse Glick
            owenmehegan Owen Mehegan
            Votes:
            7 Vote for this issue
            Watchers:
            13 Start watching this issue

              Created:
              Updated:
              Resolved: