Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-55350

Open Tasks Scanner throws java.nio.charset.UnmappableCharacterException

    • Icon: Bug Bug
    • Resolution: Not A Defect
    • Icon: Minor Minor
    • warnings-ng-plugin
    • None
    • Windows 10 VM, Jenkins 2.156, warnings-ng 1.0.0

      Since migrating to warnings-ng 1.0.0, open task scanner throws "java.nio.charset.UnmappableCharacterException" on a handful of files that the old open tasks plugin could read in without complaint.
      Some of the files in question are UTF-8 with BOM, some without. The Delphi IDE saves them as UTF-8 with BOM automatically when the file contains unicode characters.

      The attached files (with and without BOM) display fine in the Delphi IDE, VS Code and Notepad++.

      {{[Open Tasks Scanner] [ERROR] Exception while reading the source code file 'F:\Jenkins\workspace_Test - Migration - Warnings NG\McdLib\RAC.CharacterConsts.pas':
      [Open Tasks Scanner] [ERROR] java.nio.charset.UnmappableCharacterException: Input length = 1
      [Open Tasks Scanner] [ERROR] at java.nio.charset.CoderResult.throwException(Unknown Source)
      [Open Tasks Scanner] [ERROR] at sun.nio.cs.StreamDecoder.implRead(Unknown Source)
      [Open Tasks Scanner] [ERROR] at sun.nio.cs.StreamDecoder.read(Unknown Source)
      [Open Tasks Scanner] [ERROR] at java.io.InputStreamReader.read(Unknown Source)
      [Open Tasks Scanner] [ERROR] at java.io.BufferedReader.fill(Unknown Source)
      [Open Tasks Scanner] [ERROR] at java.io.BufferedReader.readLine(Unknown Source)
      [Open Tasks Scanner] [ERROR] at java.io.BufferedReader.readLine(Unknown Source)
      [Open Tasks Scanner] [ERROR] [wrapped] java.io.UncheckedIOException: java.nio.charset.UnmappableCharacterException: Input length = 1
      [Open Tasks Scanner] [ERROR] at java.io.BufferedReader$1.hasNext(Unknown Source)
      [Open Tasks Scanner] [ERROR] at java.util.Spliterators$IteratorSpliterator.tryAdvance(Unknown Source)
      [Open Tasks Scanner] [ERROR] at java.util.Spliterators$1Adapter.hasNext(Unknown Source)
      [Open Tasks Scanner] [ERROR] at io.jenkins.plugins.analysis.warnings.tasks.TaskScanner.scanTasks(TaskScanner.java:231)
      [Open Tasks Scanner] [ERROR] at io.jenkins.plugins.analysis.warnings.tasks.TaskScanner.scan(TaskScanner.java:197)
      [Open Tasks Scanner] [ERROR] at io.jenkins.plugins.analysis.warnings.tasks.AgentScanner.invoke(AgentScanner.java:91)
      [Open Tasks Scanner] [ERROR] at io.jenkins.plugins.analysis.warnings.tasks.AgentScanner.invoke(AgentScanner.java:28)
      [Open Tasks Scanner] [ERROR] at hudson.FilePath.act(FilePath.java:1078)
      [Open Tasks Scanner] [ERROR] at hudson.FilePath.act(FilePath.java:1061)
      [Open Tasks Scanner] [ERROR] at io.jenkins.plugins.analysis.warnings.tasks.OpenTasks.scan(OpenTasks.java:159)
      [Open Tasks Scanner] [ERROR] at io.jenkins.plugins.analysis.core.steps.IssuesScanner.scan(IssuesScanner.java:64)
      [Open Tasks Scanner] [ERROR] at io.jenkins.plugins.analysis.core.steps.IssuesRecorder.scanWithTool(IssuesRecorder.java:654)
      [Open Tasks Scanner] [ERROR] at io.jenkins.plugins.analysis.core.steps.IssuesRecorder.record(IssuesRecorder.java:622)
      [Open Tasks Scanner] [ERROR] at io.jenkins.plugins.analysis.core.steps.IssuesRecorder.perform(IssuesRecorder.java:597)
      [Open Tasks Scanner] [ERROR] at org.jenkinsci.plugins.workflow.steps.CoreStep$Execution.run(CoreStep.java:80)
      [Open Tasks Scanner] [ERROR] at org.jenkinsci.plugins.workflow.steps.CoreStep$Execution.run(CoreStep.java:67)
      [Open Tasks Scanner] [ERROR] at org.jenkinsci.plugins.workflow.steps.SynchronousNonBlockingStepExecution$1$1.call(SynchronousNonBlockingStepExecution.java:51)
      [Open Tasks Scanner] [ERROR] at hudson.security.ACL.impersonate(ACL.java:290)
      [Open Tasks Scanner] [ERROR] at org.jenkinsci.plugins.workflow.steps.SynchronousNonBlockingStepExecution$1.run(SynchronousNonBlockingStepExecution.java:48)
      [Open Tasks Scanner] [ERROR] at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
      [Open Tasks Scanner] [ERROR] at java.util.concurrent.FutureTask.run(Unknown Source)
      [Open Tasks Scanner] [ERROR] at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
      [Open Tasks Scanner] [ERROR] at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
      [Open Tasks Scanner] [ERROR] at java.lang.Thread.run(Unknown Source)
      }}

          [JENKINS-55350] Open Tasks Scanner throws java.nio.charset.UnmappableCharacterException

          Lübbe Onken added a comment -

          Interesting. The exceptions are no longer thrown. Now the errors read: Can't read source file 'xxx.pas', defined encoding 'windows-1252' seems to be wrong. If I open the file in Notepad++, it says the encoding is UTF-8 with BOM. Also sourceCodeEncoding UTF-8 is defined in the recordIssues task.

          The previous version of open task scanner was less strict. It read in the files and returned the open tasks. Maybe an umlaut in a task description was wrong, but that's a minor issue for me.

          Is it really critical that the character encoding of the file matches 100%? If a byte sequence isn't proper UTF-8 or whatever encoding was defined, the regex run against the file won't be thrown off track for the rest of the file.

          Maybe you can extend your test case by adding a line like:

          {{ // TODO: This Message should be shown 'äöüß'}}

          to the two files that I attached to this ticket and see if your test finds them with the following regex:

          (?i)^.(?://|{|(*)\s(TODO)(?:\s|:|-)+(.*)$

          The files have been saved in Notepad++ under Windows.

           

           

          Lübbe Onken added a comment - Interesting. The exceptions are no longer thrown. Now the errors read: Can't read source file 'xxx.pas', defined encoding 'windows-1252' seems to be wrong. If I open the file in Notepad++, it says the encoding is UTF-8 with BOM. Also sourceCodeEncoding UTF-8 is defined in the recordIssues task. The previous version of open task scanner was less strict. It read in the files and returned the open tasks. Maybe an umlaut in a task description was wrong, but that's a minor issue for me. Is it really critical that the character encoding of the file matches 100%? If a byte sequence isn't proper UTF-8 or whatever encoding was defined, the regex run against the file won't be thrown off track for the rest of the file. Maybe you can extend your test case by adding a line like: {{ // TODO: This Message should be shown 'äöüß'}} to the two files that I attached to this ticket and see if your test finds them with the following regex: (?i)^. (?://|{|(*)\s (TODO)(?:\s|:|-)+(.*)$ The files have been saved in Notepad++ under Windows.    

          Ulli Hafner added a comment -

          Yes, I am catching the exception in order to provide a better error message.

          In your case: seems that you are using the source encoding of windows-1252. But you must tell the task scanner to use UTF-8. BOM vs. not BOM is not the problem, the encoding simply is using the default of your platform.

          I think I found the problematic settings in your step: you set the sourceCodeEncoding within the tasks() statement, it should be outside!

          Example:

          recordIssues enabledForFailure: true, tool: taskScanner(includePattern:'**/*.java', excludePattern:'target/**/*', highTags:'FIXME', normalTags:'TODO'), sourceCodeEncoding: 'UTF-8'  
          

          Ulli Hafner added a comment - Yes, I am catching the exception in order to provide a better error message. In your case: seems that you are using the source encoding of windows-1252 . But you must tell the task scanner to use UTF-8 . BOM vs. not BOM is not the problem, the encoding simply is using the default of your platform. I think I found the problematic settings in your step: you set the sourceCodeEncoding within the tasks() statement, it should be outside! Example: recordIssues enabledForFailure: true , tool: taskScanner(includePattern: '** /*.java' , excludePattern: 'target/**/ *' , highTags: 'FIXME' , normalTags: 'TODO' ), sourceCodeEncoding: 'UTF-8'

          Lübbe Onken added a comment -

          Thanks, well spotted...

          Is there any chance to find configuration mistakes like this, before committing to the repository? I know there's a jenkins linter for VS code, but AFAIK it's only for declarative pipeline

          But shouldn't sourceCodeEncoding really be inside each tool? I might be scanning different filesets with different tasks with different encodings. In the groovyScript parser there's a recordEncoding for each parser. So I assumed it would be the same with sourceCodeEncoding.

          Lübbe Onken added a comment - Thanks, well spotted... Is there any chance to find configuration mistakes like this, before committing to the repository? I know there's a jenkins linter for VS code, but AFAIK it's only for declarative pipeline But shouldn't sourceCodeEncoding really be inside each tool? I might be scanning different filesets with different tasks with different encodings. In the groovyScript parser there's a recordEncoding for each parser. So I assumed it would be the same with sourceCodeEncoding.

          Ulli Hafner added a comment -

          Hmm, up to now I always used the same encoding for the sources. But if someone has a requirement to support different encodings for different tools I will move (or copy) the property inside the tool.

          Ulli Hafner added a comment - Hmm, up to now I always used the same encoding for the sources. But if someone has a requirement to support different encodings for different tools I will move (or copy) the property inside the tool.

          Ulli Hafner added a comment -

          Such a lint tool would be quite helpful. It would be even better if the workflow engine would show an error (or warning) if a pipeline step command contains some non-mappable properties. Maybe this is a good idea to create a feature request for the pipeline engine.

          Ulli Hafner added a comment - Such a lint tool would be quite helpful. It would be even better if the workflow engine would show an error (or warning) if a pipeline step command contains some non-mappable properties. Maybe this is a good idea to create a feature request for the pipeline engine.

          Lübbe Onken added a comment -

          OK, now I re-ran the build with sourceCodeEncoding: UTF-8 outside the task statement and now taskscanner fails to read 99% of the files because 'defined encoding 'UTF-8' seems to be wrong'.

          The old task scanner had no problem reading in the files and finding the open tasks regardless of encoding. The new one fails if the encoding of the file doesn't match the advertised encoding. In my opinion this is a regression, because the results are worse.

          I'll revert to not defining an encoding, because unfortunately we have different file encodings throughout the legacy code base.

          Lübbe Onken added a comment - OK, now I re-ran the build with sourceCodeEncoding: UTF-8 outside the task statement and now taskscanner fails to read 99% of the files because 'defined encoding 'UTF-8' seems to be wrong'. The old task scanner had no problem reading in the files and finding the open tasks regardless of encoding. The new one fails if the encoding of the file doesn't match the advertised encoding. In my opinion this is a regression, because the results are worse. I'll revert to not defining an encoding, because unfortunately we have different file encodings throughout the legacy code base.

          Ulli Hafner added a comment -

          Hmm, I don't know how I can help here in the plugin configuration. I need to choose an encoding if I read a file. I don't see how we can make the configurable for each file. Do you have an idea? Wouldn't it be easier if you use the same encoding for each job?

          Ulli Hafner added a comment - Hmm, I don't know how I can help here in the plugin configuration. I need to choose an encoding if I read a file. I don't see how we can make the configurable for each file. Do you have an idea? Wouldn't it be easier if you use the same encoding for each job?

          Lübbe Onken added a comment -

          There's no need to make the encoding configurable per file. There's a need to be more tolerant towards unexpected encoding.

          The encoding passed to the parser can't be much more than a hint what to expect. If reality doesn't match the expectation, the parser should read the file assuming a default (fallback) encoding instead of failing and try to be as good as possible, if he has no idea what the encoding is.

          Issue a warning: "defined encoding 'windows-1252' seems to be wrong - falling back to UTF-8 instead." and continue with UTF-8. What real damage can be done? Some characters in the issue view and source code view may be wrong, but at least you capture all the issues.

          Lübbe Onken added a comment - There's no need to make the encoding configurable per file. There's a need to be more tolerant towards unexpected encoding. The encoding passed to the parser can't be much more than a hint what to expect. If reality doesn't match the expectation, the parser should read the file assuming a default (fallback) encoding instead of failing and try to be as good as possible, if he has no idea what the encoding is. Issue a warning: "defined encoding 'windows-1252' seems to be wrong - falling back to UTF-8 instead." and continue with UTF-8. What real damage can be done? Some characters in the issue view and source code view may be wrong, but at least you capture all the issues.

          Ulli Hafner added a comment -

          What would be the default encoding which the parsers should use?

          Ulli Hafner added a comment - What would be the default encoding which the parsers should use?

          Lübbe Onken added a comment -

          Nowadays UTF-8 would probably be a good fallback (unless you have to deal with a legacy code base, but even some up to date source code from github contains UTF-8 and is saved in ANSI).

          Alternatively, the system default encoding could be used as fallback.

          Lübbe Onken added a comment - Nowadays UTF-8 would probably be a good fallback (unless you have to deal with a legacy code base, but even some up to date source code from github contains UTF-8 and is saved in ANSI). Alternatively, the system default encoding could be used as fallback.

            drulli Ulli Hafner
            luebbe Lübbe Onken
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: