Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-31096

Unicode characters in console logs do not print correctly in Workflow builds

    XMLWordPrintable

Details

    Description

      On my production Jenkins system, Workflow jobs which run a shell script that happens to print Unicode characters instead display the null/question mark placeholder in the console log view. The same shell script, run by a freestyle job, prints correctly. I can't reproduce this on my test Jenkins server, so I suspect this is partly an environment issue, but I don't know what to check to confirm that and it's interesting that it only impacts Workflow jobs.

      Workflow script which reproduces this:
      node {
      sh 'env printf "\u2024 \u2024 \u2024 \u2024 \u2024 \n"'
      }

      Attachments

        Issue Links

          Activity

            jglick Jesse Glick added a comment -

            Probably dependent on system encoding. I do not recall ever testing with anything other than UTF-8.

            Are you using a slave? If so, does its encoding match that of the master, or differ? Use the locale command to check. Also from /systemInfo in Jenkins you can see the file.encoding according to Java.

            jglick Jesse Glick added a comment - Probably dependent on system encoding. I do not recall ever testing with anything other than UTF-8 . Are you using a slave? If so, does its encoding match that of the master, or differ? Use the locale command to check. Also from /systemInfo in Jenkins you can see the file.encoding according to Java.
            jglick Jesse Glick added a comment -

            BTW the behavior of freestyle projects is not of much interest from a diagnostic perspective, since Workflow uses a totally different system for both running external processes and for generating the build log than traditional Jenkins projects.

            jglick Jesse Glick added a comment - BTW the behavior of freestyle projects is not of much interest from a diagnostic perspective, since Workflow uses a totally different system for both running external processes and for generating the build log than traditional Jenkins projects.
            owenmehegan Owen Mehegan added a comment - - edited

            /systemInfo tells me the encoding is ANSI_X3.4-1968. I get the same failure whether running on the master or other nodes. I am setting LANGUAGE = en_US.UTF-8 and LC_ALL = en_US.UTF-8 in the environment variables config for my build nodes, but not for the master.

            Output of 'locale' on the master:

            LANG=
            LANGUAGE=
            LC_CTYPE="POSIX"
            LC_NUMERIC="POSIX"
            LC_TIME="POSIX"
            LC_COLLATE="POSIX"
            LC_MONETARY="POSIX"
            LC_MESSAGES="POSIX"
            LC_PAPER="POSIX"
            LC_NAME="POSIX"
            LC_ADDRESS="POSIX"
            LC_TELEPHONE="POSIX"
            LC_MEASUREMENT="POSIX"
            LC_IDENTIFICATION="POSIX"
            LC_ALL=
            

            Output of 'locale' on a build node:

            LANG=en_US.UTF-8
            LANGUAGE=en_US.UTF-8
            LC_CTYPE="en_US.UTF-8"
            LC_NUMERIC="en_US.UTF-8"
            LC_TIME="en_US.UTF-8"
            LC_COLLATE="en_US.UTF-8"
            LC_MONETARY="en_US.UTF-8"
            LC_MESSAGES="en_US.UTF-8"
            LC_PAPER="en_US.UTF-8"
            LC_NAME="en_US.UTF-8"
            LC_ADDRESS="en_US.UTF-8"
            LC_TELEPHONE="en_US.UTF-8"
            LC_MEASUREMENT="en_US.UTF-8"
            LC_IDENTIFICATION="en_US.UTF-8"
            LC_ALL=en_US.UTF-8
            owenmehegan Owen Mehegan added a comment - - edited /systemInfo tells me the encoding is ANSI_X3.4-1968. I get the same failure whether running on the master or other nodes. I am setting LANGUAGE = en_US.UTF-8 and LC_ALL = en_US.UTF-8 in the environment variables config for my build nodes, but not for the master. Output of 'locale' on the master: LANG= LANGUAGE= LC_CTYPE="POSIX" LC_NUMERIC="POSIX" LC_TIME="POSIX" LC_COLLATE="POSIX" LC_MONETARY="POSIX" LC_MESSAGES="POSIX" LC_PAPER="POSIX" LC_NAME="POSIX" LC_ADDRESS="POSIX" LC_TELEPHONE="POSIX" LC_MEASUREMENT="POSIX" LC_IDENTIFICATION="POSIX" LC_ALL= Output of 'locale' on a build node: LANG=en_US.UTF-8 LANGUAGE=en_US.UTF-8 LC_CTYPE="en_US.UTF-8" LC_NUMERIC="en_US.UTF-8" LC_TIME="en_US.UTF-8" LC_COLLATE="en_US.UTF-8" LC_MONETARY="en_US.UTF-8" LC_MESSAGES="en_US.UTF-8" LC_PAPER="en_US.UTF-8" LC_NAME="en_US.UTF-8" LC_ADDRESS="en_US.UTF-8" LC_TELEPHONE="en_US.UTF-8" LC_MEASUREMENT="en_US.UTF-8" LC_IDENTIFICATION="en_US.UTF-8" LC_ALL=en_US.UTF-8
            jglick Jesse Glick added a comment -

            So your master is set to use ASCII, meaning it cannot process such characters in files at all. The workaround is to set your master to use UTF-8 encoding, as virtually all modern Linux distributions do. If you cannot set it systemwide, I think you can run the Jenkins Java process with -Dfile.encoding=UTF-8.

            Now I have had a debate with kohsuke about this but we do not see eye to eye. His perspective is that Jenkins should always store log files in the system native encoding, which I think is mainly useful for Japanese users who are accustomed to using one of the two or three traditional encodings in that country; it is only helpful if you are directly accessing the log file from the filesystem (over the web in console it is rendered according to the encoding of the browser, or the REST client in the case of consoleText). My position is that all text files stored by Jenkins should be in UTF-8; external processes forked on a slave can be assumed to produce output in the system native encoding, which should then be recoded internally to UTF-8.

            Now the twist here is freestyle vs. Workflow. In the case of a freestyle build, the system encoding on the slave is noted when the build starts, recorded as part of the build record, and used to save the log. That allows the build to handle varying encodings in one CI server. But such a trick is impossible for Workflow, which starts the build and then gets a slave, or several, or none. So it cannot use a slave-specific charset. It could either use the master’s charset, as it currently does (meaning it cannot track output from slaves using a charset which is a superset of that); or it could use UTF-8 unconditionally, which might be a better idea (though I would need some flag to indicate old build logs saved in another character set).

            In either case, process output should probably be recoded to that expected by the build log. This is devilishly hard to prove in an automated test, I am afraid.

            jglick Jesse Glick added a comment - So your master is set to use ASCII, meaning it cannot process such characters in files at all. The workaround is to set your master to use UTF-8 encoding, as virtually all modern Linux distributions do. If you cannot set it systemwide, I think you can run the Jenkins Java process with -Dfile.encoding=UTF-8 . Now I have had a debate with kohsuke about this but we do not see eye to eye. His perspective is that Jenkins should always store log files in the system native encoding, which I think is mainly useful for Japanese users who are accustomed to using one of the two or three traditional encodings in that country; it is only helpful if you are directly accessing the log file from the filesystem (over the web in console it is rendered according to the encoding of the browser, or the REST client in the case of consoleText ). My position is that all text files stored by Jenkins should be in UTF-8; external processes forked on a slave can be assumed to produce output in the system native encoding, which should then be recoded internally to UTF-8. Now the twist here is freestyle vs. Workflow. In the case of a freestyle build, the system encoding on the slave is noted when the build starts, recorded as part of the build record, and used to save the log. That allows the build to handle varying encodings in one CI server. But such a trick is impossible for Workflow, which starts the build and then gets a slave, or several, or none. So it cannot use a slave-specific charset. It could either use the master’s charset, as it currently does (meaning it cannot track output from slaves using a charset which is a superset of that); or it could use UTF-8 unconditionally, which might be a better idea (though I would need some flag to indicate old build logs saved in another character set). In either case, process output should probably be recoded to that expected by the build log. This is devilishly hard to prove in an automated test, I am afraid.
            owenmehegan Owen Mehegan added a comment - - edited

            FWIW, even setting it systemwide using locale didn't fix this, I had to add -Dfile.encoding=UTF-8 to my java arguments. Now my logs display nicely. Thanks for the explanation!

            owenmehegan Owen Mehegan added a comment - - edited FWIW, even setting it systemwide using locale didn't fix this, I had to add -Dfile.encoding=UTF-8 to my java arguments. Now my logs display nicely. Thanks for the explanation!
            jglick Jesse Glick added a comment -

            As part of the JENKINS-38381 refactoring, I have switched the log handling for Pipeline builds to use UTF-8 everywhere, adjusted the sh/bat steps to convert output from the agent’s system default charset to UTF-8, and written a test demonstrating that ISO-8859-2 text is correctly processed.

            jglick Jesse Glick added a comment - As part of the JENKINS-38381 refactoring, I have switched the log handling for Pipeline builds to use UTF-8 everywhere, adjusted the sh / bat steps to convert output from the agent’s system default charset to UTF-8, and written a test demonstrating that ISO-8859-2 text is correctly processed.
            marslo Marslo Jiao added a comment - - edited

            I'm running newman (postman) script in Jenkins. The console output shows:

             

            But when I click the View as plain text, it the character shows correct:

             

            Here more system information:

            • Jenkins Version: 2.70
            • Job Type: Free Style
            • Slave and Jenkins Master OS: Ubuntu 16.04 LTS
            • Jenkins Master JAVA_ARG: JAVA_ARGS=" -Dfile.encoding=UTF-8 -Dsun.jnu.encoding=utf-8"
            • Jenkins Master installed by: apt install jenkins.
            • Jenkins System Properties:
              • file.encoding: UTF-8
              • file.encoding.pkg: sun.io
              • sun.io.unicode.encoding: UnicodeLittle
              • sun.jnu.encoding: utf-8
              • java.runtime.name: Java(TM) SE Runtime Environment
              • java.runtime.version: 1.8.0_131-b11
              • user.language: en
              • sun.java.launcher: SUN_STANDARD
            • Jenkins Environment Variable
              • LANG: en_US.UTF-8
              • LC_ALL: en_US.UTF-8

             

            locale command in both Jenkins and Slave

            LANG=en_US.UTF-8
            LANGUAGE=en_US.UTF-8
            LC_CTYPE="en_US.UTF-8"
            LC_NUMERIC="en_US.UTF-8"
            LC_TIME="en_US.UTF-8"
            LC_COLLATE="en_US.UTF-8"
            LC_MONETARY="en_US.UTF-8"
            LC_MESSAGES="en_US.UTF-8"
            LC_PAPER="en_US.UTF-8"
            LC_NAME="en_US.UTF-8"
            LC_ADDRESS="en_US.UTF-8"
            LC_TELEPHONE="en_US.UTF-8"
            LC_MEASUREMENT="en_US.UTF-8"
            LC_IDENTIFICATION="en_US.UTF-8"
            LC_ALL=en_US.UTF-8

             

            I've add the comments at here also.

            marslo Marslo Jiao added a comment - - edited I'm running newman (postman) script in Jenkins. The console output shows:   But when I click the View as plain text , it the character shows correct:   Here more system information: Jenkins Version: 2.70 Job Type: Free Style Slave and Jenkins Master OS: Ubuntu 16.04 LTS Jenkins Master JAVA_ARG:  JAVA_ARGS=" -Dfile.encoding=UTF-8 -Dsun.jnu.encoding=utf-8" Jenkins Master installed by: apt install jenkins . Jenkins System Properties: file.encoding:  UTF-8 file.encoding.pkg:  sun.io sun.io.unicode.encoding:  UnicodeLittle sun.jnu.encoding:  utf-8 java.runtime.name:  Java(TM) SE Runtime Environment java.runtime.version:  1.8.0_131-b11 user.language:  en sun.java.launcher:  SUN_STANDARD Jenkins Environment Variable LANG:  en_US.UTF-8 LC_ALL:  en_US.UTF-8   locale command in both Jenkins and Slave LANG=en_US.UTF-8 LANGUAGE=en_US.UTF-8 LC_CTYPE="en_US.UTF-8" LC_NUMERIC="en_US.UTF-8" LC_TIME="en_US.UTF-8" LC_COLLATE="en_US.UTF-8" LC_MONETARY="en_US.UTF-8" LC_MESSAGES="en_US.UTF-8" LC_PAPER="en_US.UTF-8" LC_NAME="en_US.UTF-8" LC_ADDRESS="en_US.UTF-8" LC_TELEPHONE="en_US.UTF-8" LC_MEASUREMENT="en_US.UTF-8" LC_IDENTIFICATION="en_US.UTF-8" LC_ALL=en_US.UTF-8   I've add the comments at here also.
            jglick Jesse Glick added a comment -

            marslo this issue is about Pipeline, so please do not comment on issues affecting freestyle builds.

            jglick Jesse Glick added a comment - marslo this issue is about Pipeline, so please do not comment on issues affecting freestyle builds.

            I have exactly the same issue as owenmehegan. Production won't show Unicode characters correctly, development does.

            I tried the workaround mentioned: adding -Dfile.encoding=UTF-8 as well as -Dsun.jnu.encoding=utf-8 to JENKINS_JAVA_OPTIONS under /etc/sysconfig/jenkins (Using RedHat systems here), but it didn't work. (I did confirm that the options were updated by looking at <JENKINS_URL>/systemInfo)

            The only difference I can see is that the development master is RHEL 6 and the production uses RHEL 7.

            The output is given by a python module which is run using the PyEnv plugin.

            Any other workarounds in the meantime?

            tsvi Tsvi Mostovicz added a comment - I have exactly the same issue as owenmehegan . Production won't show Unicode characters correctly, development does. I tried the workaround mentioned: adding -Dfile.encoding=UTF-8 as well as -Dsun.jnu.encoding=utf-8 to JENKINS_JAVA_OPTIONS under /etc/sysconfig/jenkins (Using RedHat systems here), but it didn't work. (I did confirm that the options were updated by looking at <JENKINS_URL>/systemInfo) The only difference I can see is that the development master is RHEL 6 and the production uses RHEL 7. The output is given by a python module which is run using the PyEnv plugin. Any other workarounds in the meantime?
            jglick Jesse Glick added a comment -

            Relevant PRs have been brought up to date once again.

            jglick Jesse Glick added a comment - Relevant PRs have been brought up to date once again.
            jglick Jesse Glick added a comment -

            JENKINS-48923 proposes an analogous change to Jenkins overall.

            jglick Jesse Glick added a comment - JENKINS-48923 proposes an analogous change to Jenkins overall.
            jglick Jesse Glick added a comment -

            Released. Please check the plugin wikis for changelog information—it is important to update all four affected plugins.

            jglick Jesse Glick added a comment - Released. Please check the plugin wikis for changelog information—it is important to update all four affected plugins.

            People

              jglick Jesse Glick
              owenmehegan Owen Mehegan
              Votes:
              7 Vote for this issue
              Watchers:
              13 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: