Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-1166

Strange characters appear on pages

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed (View Workflow)
    • Priority: Major
    • Resolution: Fixed
    • Component/s: core
    • Labels:
      None
    • Environment:
      Platform: All, OS: All
    • Similar Issues:

      Description

      The character 'Â' appears in several places on the Hudson GUI. Most noticeably
      to the left of the "New Job" and "Manage Hudson" links in the sidebar and to
      the right of the search box at the top. It also occasionally appears in the
      top of the third column of the "Build Executor Status" table in the sidebar.

      I am running the latest CVS snapshot of Hudson (later than 167 but before the
      168 release), using Jetty 5.1.10 on Ubuntu 7. This is the first time I've
      tried this particular combination, so I don't know if it was an issue in
      previous builds.

      I guess this is some kind of encoding issue, but I'm not sure exactly what.

      This does not appear to be a browser issue as it is the same in both Safari and
      Opera.

        Attachments

          Issue Links

            Activity

            Hide
            dwdyer dwdyer added a comment -

            Created an attachment (id=158)
            Screenshot (from Safari, Opera is identical)

            Show
            dwdyer dwdyer added a comment - Created an attachment (id=158) Screenshot (from Safari, Opera is identical)
            Hide
            kohsuke Kohsuke Kawaguchi added a comment -

            This happens because Hudson is sending pages in UTF-8 but the browser is
            interpreting the page as iso-8859-1.

            The page Hudson sends often includes Unicode "non-breaking space" character
            (written as   in HTML), which has Unicode code point U+A0.

            When this character is sent as UTF-8, the encoding will turn this single
            character into two bytes, 0xC2 followed by 0xA0.

            If this byte sequence is decoded as iso-8859-1, you'll get "A circumflex"
            followed by "non-breaking whitespace"

            So the thing I'd like you to find out is why browser is decoding this as
            iso-8859-1. There are several places to check:

            1. if your browser can display HTTP response headers, please check if that
            contains "Content-type: text/html; charset=UTF-8". Hudson is supposed to be
            doing this for all its pages.

            2. Make sure you are not forcing browsers to interpret every page in iso-8859-1.

            3. If you have a tool like wget, that can capture the HTML response
            byte-by-byte, please use that, zip the result up, and attach it here. Using zip
            makes sure that java.net won't mess up encoding.

            Perhaps we can also send <meta http-equiv="..."> tag to further insist that we
            really do mean UTF-8.

            Show
            kohsuke Kohsuke Kawaguchi added a comment - This happens because Hudson is sending pages in UTF-8 but the browser is interpreting the page as iso-8859-1. The page Hudson sends often includes Unicode "non-breaking space" character (written as   in HTML), which has Unicode code point U+A0. When this character is sent as UTF-8, the encoding will turn this single character into two bytes, 0xC2 followed by 0xA0. If this byte sequence is decoded as iso-8859-1, you'll get "A circumflex" followed by "non-breaking whitespace" So the thing I'd like you to find out is why browser is decoding this as iso-8859-1. There are several places to check: 1. if your browser can display HTTP response headers, please check if that contains "Content-type: text/html; charset=UTF-8". Hudson is supposed to be doing this for all its pages. 2. Make sure you are not forcing browsers to interpret every page in iso-8859-1. 3. If you have a tool like wget, that can capture the HTML response byte-by-byte, please use that, zip the result up, and attach it here. Using zip makes sure that java.net won't mess up encoding. Perhaps we can also send <meta http-equiv="..."> tag to further insist that we really do mean UTF-8.
            Hide
            dwdyer dwdyer added a comment -

            The JVM was picking up the default encoding as ANSI_X3.4-1968. I fixed up the
            server's locale and encoding so that it is now en_GB.UTF-8, but the problem
            persists. The server is using Apache 2.2.4 in front of Jetty, connected by
            mod_jk and ajp13.

            I also checked in Firefox to see if it was any different, but it wasn't so
            that's a trio of browsers (all on OS X) that exhibit the problem.

            The Content-Header returned by the server does not include the charset:

            Content-Type: text/html

            I can't find anywhere in the Hudson source where content-type is set (other
            than in the Japex plugin and a couple of Javascript files).

            Show
            dwdyer dwdyer added a comment - The JVM was picking up the default encoding as ANSI_X3.4-1968. I fixed up the server's locale and encoding so that it is now en_GB.UTF-8, but the problem persists. The server is using Apache 2.2.4 in front of Jetty, connected by mod_jk and ajp13. I also checked in Firefox to see if it was any different, but it wasn't so that's a trio of browsers (all on OS X) that exhibit the problem. The Content-Header returned by the server does not include the charset: Content-Type: text/html I can't find anywhere in the Hudson source where content-type is set (other than in the Japex plugin and a couple of Javascript files).
            Hide
            dwdyer dwdyer added a comment -

            OK, I found where Hudson is setting Content-Type (I was searching for "Content-
            Type" rather than "contentType") - it's in layout.jelly. The expires header
            set in the same place appears to work, but the content type has the charset
            stripped from it.

            Show
            dwdyer dwdyer added a comment - OK, I found where Hudson is setting Content-Type (I was searching for "Content- Type" rather than "contentType") - it's in layout.jelly. The expires header set in the same place appears to work, but the content type has the charset stripped from it.
            Hide
            dwdyer dwdyer added a comment -

            The more I investigate this, the more I think it is a bug in Jetty 5.1.10 (I've
            eliminated Apache by using Jetty directly).

            The exact same version of Hudson works in Jetty 6.1.7 via the Maven plugin
            (albeit on a different machine - OS X instead of Ubuntu).

            With no better ideas, I tried using Stapler's header tag instead of the
            contentType tag, and surprisingly that fixes the problem (I've looked at the
            Stapler source and can't understand why this would be - it must be a Jetty bug).

            Are you happy for me to commit this fix, or would you like to change Stapler
            instead (perhaps the contentType tag would work if it explicitly called
            setCharacterEncoding() - though this is just a guess)?

            Show
            dwdyer dwdyer added a comment - The more I investigate this, the more I think it is a bug in Jetty 5.1.10 (I've eliminated Apache by using Jetty directly). The exact same version of Hudson works in Jetty 6.1.7 via the Maven plugin (albeit on a different machine - OS X instead of Ubuntu). With no better ideas, I tried using Stapler's header tag instead of the contentType tag, and surprisingly that fixes the problem (I've looked at the Stapler source and can't understand why this would be - it must be a Jetty bug). Are you happy for me to commit this fix, or would you like to change Stapler instead (perhaps the contentType tag would work if it explicitly called setCharacterEncoding() - though this is just a guess)?
            Hide
            dwdyer dwdyer added a comment -

            I've committed the change to layout.jelly (revision 1.40) since it's only a
            single-line and I don't think it is going to cause problems elsewhere. I think
            it's worthwhile since it makes Hudson work nicely with the latest stable
            version of Jetty.

            If you come up with something better (either in Hudson or Stapler) you can
            revert my change.

            Show
            dwdyer dwdyer added a comment - I've committed the change to layout.jelly (revision 1.40) since it's only a single-line and I don't think it is going to cause problems elsewhere. I think it's worthwhile since it makes Hudson work nicely with the latest stable version of Jetty. If you come up with something better (either in Hudson or Stapler) you can revert my change.
            Hide
            kreutziger Fabian Mutzbauer added a comment -

            With an upgrade to version 1.598, the A letters appeard on our system (ubuntu 14.04) next to nbsp. We use an apache 2.4.7 as a reverse proxy. I checked the problem with different browser (chromium, firefox) and the charset is always set on UTF-8. How can I investigate this issues further?

            Show
            kreutziger Fabian Mutzbauer added a comment - With an upgrade to version 1.598, the A letters appeard on our system (ubuntu 14.04) next to nbsp. We use an apache 2.4.7 as a reverse proxy. I checked the problem with different browser (chromium, firefox) and the charset is always set on UTF-8. How can I investigate this issues further?

              People

              Assignee:
              Unassigned Unassigned
              Reporter:
              dwdyer dwdyer
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: