Platform: All, OS: All
In our projects, we are working with some files that contains non ASCII
characters (mainly accentuated characters é,è,à,...) that are stored in a CVS
The first checkout of a project is working, but any subsequent update is failing
with this kind of error:
cvs update: move away <a_file_with_accentuated_chars>; it is in the way
Examing the error message closely shows that a file like
liste_des_répertoires_de_configuration.doc is conflicting with
liste_des_r@"pertoires_de_configuration.doc, after some search, I found that the
related CVS/Entries file contains the wrong characters. So I tried by command
line the same CVS checkout command that Hudson is performing and the CVS/Entries
file is right this time. Notepad++ is telling me that the right file has an ANSI
encoding and the wrong file has a ANSI+UTF-8 encoding. Please note that my CVS
client is the cvsnt.exe provided with TortoiseSVN and runs on a Windows XP
workstation where Hudson is installed.
I thought that Hudson was maybe updating the CVS/Entries and corrupting them by
not using the right encoding, so I took some time to read the source code and
found that the hudson.scm.CVSSCM implementation is reading the hidden
CVS/Entries files to detect and remove the "sticky date" if any (see the
internal StickyDateCleanUpTask class).
I think there are two problems here:
- First, the content is read using the FileUtils.readFileToString(entries)
method instead of using the FileUtils.readFileToString(entries, encoding), thus
the content of these files are read with the default VM encoding instead of the
real file encoding.
- Second, the new content is written using the AtomicFileWriter class. This
class is always writing files in UTF-8. This the origin of the bug that put
wrong characters in the CVS/Entries files.
Now here is what I did to fix the bug:
- Add a constructor to the AtomicFileWriter that takes the file and the encoding
as arguments. The original constructor is now invoking the new one using UTF-8
encoding for backward compatibility.
- Add a cvsHiddenFilesEncoding property in the CVSSCM class.
- Modify every reading of a CVS file to use the encoding.
- Modify the call to AtomicFileWriter to use the encoding.
- Currently initializing the cvsHiddenFilesEncoding to ISO-8859-1 directly in
the constructor. This not the right thing to do, but it works and fixes my bug
right now. I think that this property should be configurable in the CVS
managment page but I don't know how to do this.