Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-50162

NUnit test results don't encode characters correctly in unit test report

    XMLWordPrintable

    Details

    • Similar Issues:

      Description

      Test results contain wrongly encoded characters when evaluating NUnit test results. Some characters are turned into question marks.

      Specifically our test reports contain the following test case and similar ones:

      <test-case name="Test.Namespace.TestSuite.SaveAlphaNumericals("Ö")" executed="True" result="Success" success="True" time="5.411" asserts="19"/>

      (test name simplified and anonymized)

      In the test report view of Jenkins this test case is displayed as:

      SaveAlphaNumericals("??")

      The special characters are turned to ??.

      This happens for all test cases containing special characters like Ä,Ö,Ü or °, so in the test result the cases can't be distinguished anymore.

       

       
       
       

        Attachments

          Activity

          exomo Kai Bublitz created issue -
          Hide
          slide_o_mix Alex Earl added a comment -

          Can you provide an nunit output file?

          Show
          slide_o_mix Alex Earl added a comment - Can you provide an nunit output file?
          Hide
          exomo Kai Bublitz added a comment -

          I have to clarify whether I'm allowed to provide one of our real test files, if not I can try to create a simple test result that reproduces the issue.

          Show
          exomo Kai Bublitz added a comment - I have to clarify whether I'm allowed to provide one of our real test files, if not I can try to create a simple test result that reproduces the issue.
          Hide
          slide_o_mix Alex Earl added a comment -

          Either way works for me

          Show
          slide_o_mix Alex Earl added a comment - Either way works for me
          exomo Kai Bublitz made changes -
          Field Original Value New Value
          Description Test results contain wrongly encoded characters when evaluating NUnit test results. Some characters are turned into question marks.

          Specifically our test reports contain the following test case and similar ones:

          <test-case name="Test.Namespace.TestSuite.SaveAlphaNumericals("Ö")" executed="True" result="Success" success="True" time="5.411" asserts="19"/>

          (test name simplified and anonymized)

          In the test report view of Jenkins this test case is displayed as:

          [SaveAlphaNumericals("??")|http://our.jenkins.url/job/Run%20Acceptance%20Tests/12345/testReport/Test.Namespace/TestSuite/SaveAlphaNumericals______]

          The special characters are turned to ??.

          This happens for all test cases containing special characters like Ä,Ö,Ü or °, so in the test result the cases can't be distinguished anymore.

           

           
            
          Test results contain wrongly encoded characters when evaluating NUnit test results. Some characters are turned into question marks.

          Specifically our test reports contain the following test case and similar ones:

          <test-case name="Test.Namespace.TestSuite.SaveAlphaNumericals("Ö")" executed="True" result="Success" success="True" time="5.411" asserts="19"/>

          (test name simplified and anonymized)

          In the test report view of Jenkins this test case is displayed as:

          [SaveAlphaNumericals("??")|http://our.jenkins.url/job/Run%20Acceptance%20Tests/12345/testReport/Test.Namespace/TestSuite/SaveAlphaNumericals______]

          The special characters are turned to ??.

          This happens for all test cases containing special characters like Ä,Ö,Ü or °, so in the test result the cases can't be distinguished anymore.

           

           
            
           
          exomo Kai Bublitz made changes -
          Attachment TestResult.xml [ 41833 ]
          Hide
          exomo Kai Bublitz added a comment -

          I have attached a TestResult.xml which contains some test cases of a test called TestWithParameters that have umlaut characters as parameter.

          The two test cases get displayed as UnitTests.HelloWorldTests.TestWithParameters("??")

          Show
          exomo Kai Bublitz added a comment - I have attached a TestResult.xml which contains some test cases of a test called TestWithParameters that have umlaut characters as parameter. The two test cases get displayed as UnitTests.HelloWorldTests.TestWithParameters("??")
          Hide
          slide_o_mix Alex Earl added a comment -

          I can reproduce this easily when running unit tests, I am not sure how to proceed though. I already use UTF-8 for reading files. From what I can gather, the class I use to replace invalid XML entities is replacing these characters. I am unsure why right now. I'll continue to look into it.

          Show
          slide_o_mix Alex Earl added a comment - I can reproduce this easily when running unit tests, I am not sure how to proceed though. I already use UTF-8 for reading files. From what I can gather, the class I use to replace invalid XML entities is replacing these characters. I am unsure why right now. I'll continue to look into it.
          Hide
          smorita Shin-ichi Morita added a comment - - edited

          Hi team, thank you for the valuable plugin.

          InvalidXmlInputStream.isValid seems to assume the input is a decoded character, but it is an encoded byte with the widening primitive conversion.

          According to https://docs.oracle.com/javase/tutorial/java/nutsandbolts/datatypes.html
          > The byte data type is an 8-bit signed two's complement integer. It has a minimum value of -128 and a maximum value of 127 (inclusive).

          Therefore, all the non-ascii bytes are converted to negative ints with the widening primitive conversion, and will be treated as invalid.

          In addition to that, I think InputStream cannot assume any character encoding.

          Character encoding should be handled by the upper layers like XML parsers, TextReaders, although I don't know what mechanism Transformers use internally.

          Regards,

          Show
          smorita Shin-ichi Morita added a comment - - edited Hi team, thank you for the valuable plugin. InvalidXmlInputStream.isValid seems to assume the input is a decoded character, but it is an encoded byte with the widening primitive conversion. According to https://docs.oracle.com/javase/tutorial/java/nutsandbolts/datatypes.html > The  byte  data type is an 8-bit signed two's complement integer. It has a minimum value of -128 and a maximum value of 127 (inclusive). Therefore, all the non-ascii bytes are converted to negative ints with the widening primitive conversion, and will be treated as invalid. In addition to that, I think InputStream cannot assume any character encoding. Character encoding should be handled by the upper layers like XML parsers, TextReaders, although I don't know what mechanism Transformers use internally. Regards,
          Hide
          slide_o_mix Alex Earl added a comment -

          The reason the invalid xml stream was added was for the case of broken xml files. The plugin used to just break completely, this was a way to continue parsing if possible. I agree it is not the best solution, but the upper xml parsers and so forth didn't handle the issues that were run into.

          Show
          slide_o_mix Alex Earl added a comment - The reason the invalid xml stream was added was for the case of broken xml files. The plugin used to just break completely, this was a way to continue parsing if possible. I agree it is not the best solution, but the upper xml parsers and so forth didn't handle the issues that were run into.
          Hide
          smorita Shin-ichi Morita added a comment - - edited

          I understand the difficulty.

          As the second best solution, how about converting negative values to unsigned ints in isValid()?

          input = input & 0xff;

          It would work for me as long as the character encoding is utf-8.

          Regards,

          Show
          smorita Shin-ichi Morita added a comment - - edited I understand the difficulty. As the second best solution, how about converting negative values to unsigned ints in isValid()? input = input & 0xff; It would work for me as long as the character encoding is utf-8. Regards,
          Hide
          smorita Shin-ichi Morita added a comment -

          I'm sorry, my last comment is totally wrong. Please forget it.

          In order to handle invalid characters correctly, maybe the following should be done:

          1. Decode chars from utf-8 bytes.
          2. Replace invalid chars.
          3. Encode the chars to utf-8 bytes.
          4. Pass the utf-8 bytes to Transformers.

          But I prefer treating broken xml files as build errors because the core problem is that something generates broken xml files.

          As a CI manager, I want to detect such problems as soon as possible instead of hiding them, so that I can investigate and correct them.

          One possible solution might be to make this behavior (replacing non-ascii bytes to ?) optional.

          Regards,

           

          Show
          smorita Shin-ichi Morita added a comment - I'm sorry, my last comment is totally wrong. Please forget it. In order to handle invalid characters correctly, maybe the following should be done: Decode chars from utf-8 bytes. Replace invalid chars. Encode the chars to utf-8 bytes. Pass the utf-8 bytes to Transformers. But I prefer treating broken xml files as build errors because the core problem is that something generates broken xml files. As a CI manager, I want to detect such problems as soon as possible instead of hiding them, so that I can investigate and correct them. One possible solution might be to make this behavior (replacing non-ascii bytes to ?) optional. Regards,  
          Hide
          slide_o_mix Alex Earl added a comment -

          That may be what you prefer, but there are several people who use the plugin and want something different. So, I need to determine the best way to handle it.

          Show
          slide_o_mix Alex Earl added a comment - That may be what you prefer, but there are several people who use the plugin and want something different. So, I need to determine the best way to handle it.
          Hide
          smorita Shin-ichi Morita added a comment -

          OK, I understand that

          Let me backup our situation a little. We describe most of our test names, assertion messages in Japanese. So our job shows unreadable test results like the following:

          "?"???????????????? 6 ms OK
          "?"???????????????? 39 ms OK
          9?13?????????? 6 ms OK
          ??????#: ???????????? 6 ms OK
          ??????#: ?????????????????? 6 ms OK

          It would be nice to have a way not to replace non-ascii bytes to ?.

          Thank you.

          Show
          smorita Shin-ichi Morita added a comment - OK, I understand that Let me backup our situation a little. We describe most of our test names, assertion messages in Japanese. So our job shows unreadable test results like the following: "? " ???????????????? 6 ms OK "? " ???????????????? 39 ms OK 9? 13 ?????????? 6 ms OK ??????#: ???????????? 6 ms OK ??????#: ?????????????????? 6 ms OK It would be nice to have a way not to replace non-ascii bytes to ?. Thank you.
          slide_o_mix Alex Earl made changes -
          Assignee Alex Earl [ slide_o_mix ]
          smorita Shin-ichi Morita made changes -
          Assignee Shin-ichi Morita [ smorita ]
          smorita Shin-ichi Morita made changes -
          Released As https://github.com/jenkinsci/nunit-plugin/releases/tag/nunit-0.27
          Resolution Fixed [ 1 ]
          Status Open [ 1 ] Resolved [ 5 ]

            People

            Assignee:
            smorita Shin-ichi Morita
            Reporter:
            exomo Kai Bublitz
            Votes:
            5 Vote for this issue
            Watchers:
            7 Start watching this issue

              Dates

              Created:
              Updated:
              Resolved: