Monday, September 14, 2009

UTF-8 Problems and Subversion

I was writing some unit tests recently and needed to store files containing UTF-8 text in Subversion. The file names also contained UTF-8 characters (see below). I was using Eclipse for the development work and ran into problems sharing the files between Linux and Windows.

The problem was that a file created and committed by a Windows machine, was not synchronising correctly to the Linux machine (and vica versa). The character encodings in the file names were being changed somewhere along the commit and synchronise path.

So here is what I needed to do to make it work. First I needed to make sure Eclipse marked the contents of the file as UTF-8. This is straightforward, simply right click the file name, select Properties and change the text encoding to UTF-8 (on Windows the default is Cp1252):

The next step involves configuring the CVS repository. Change to the CVS perspective, right click the repository and select Properties:

Change the Server Encoding to UTF-8 (on Linux the default Server Encoding is UTF-8, hence the problem sharing files):
The files should now commit and synchronise with the correct contents and names. When you commit the files you should probably also commit the org.eclipse.core.resources.prefs file that Eclipse generates in your project. This file stores the character encodings for the file contents.