Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-4241

CVS CSM implementation change CVS/Entries files encoding

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Fixed
    • Icon: Minor Minor
    • cvs-plugin
    • None
    • Platform: All, OS: All

      In our projects, we are working with some files that contains non ASCII
      characters (mainly accentuated characters é,è,à,...) that are stored in a CVS
      repository.

      The first checkout of a project is working, but any subsequent update is failing
      with this kind of error:

      cvs update: move away <a_file_with_accentuated_chars>; it is in the way

      Examing the error message closely shows that a file like
      liste_des_répertoires_de_configuration.doc is conflicting with
      liste_des_r@"pertoires_de_configuration.doc, after some search, I found that the
      related CVS/Entries file contains the wrong characters. So I tried by command
      line the same CVS checkout command that Hudson is performing and the CVS/Entries
      file is right this time. Notepad++ is telling me that the right file has an ANSI
      encoding and the wrong file has a ANSI+UTF-8 encoding. Please note that my CVS
      client is the cvsnt.exe provided with TortoiseSVN and runs on a Windows XP
      workstation where Hudson is installed.

      I thought that Hudson was maybe updating the CVS/Entries and corrupting them by
      not using the right encoding, so I took some time to read the source code and
      found that the hudson.scm.CVSSCM implementation is reading the hidden
      CVS/Entries files to detect and remove the "sticky date" if any (see the
      internal StickyDateCleanUpTask class).

      I think there are two problems here:

      • First, the content is read using the FileUtils.readFileToString(entries)
        method instead of using the FileUtils.readFileToString(entries, encoding), thus
        the content of these files are read with the default VM encoding instead of the
        real file encoding.
      • Second, the new content is written using the AtomicFileWriter class. This
        class is always writing files in UTF-8. This the origin of the bug that put
        wrong characters in the CVS/Entries files.

      Now here is what I did to fix the bug:

      • Add a constructor to the AtomicFileWriter that takes the file and the encoding
        as arguments. The original constructor is now invoking the new one using UTF-8
        encoding for backward compatibility.
      • Add a cvsHiddenFilesEncoding property in the CVSSCM class.
      • Modify every reading of a CVS file to use the encoding.
      • Modify the call to AtomicFileWriter to use the encoding.
      • Currently initializing the cvsHiddenFilesEncoding to ISO-8859-1 directly in
        the constructor. This not the right thing to do, but it works and fixes my bug
        right now. I think that this property should be configurable in the CVS
        managment page but I don't know how to do this.

            mc1arke Michael Clarke
            wismax wismax
            Votes:
            2 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: