Mercurial convert filename encoding

crucibleencodingmercurial

I have Mercurial repositories running on Apache with mod_wsgi. Repositories have all filenames encoded in windows-1251. This encoding is used for historically reasons: they were converted to mercurial from svn, windows-1251 is default windows encoding for russian locale.

Now programmers want to use Crucible tool for code review. It can't undersand filenames in any other encoding than utf-8. So I need to convert them from windows-1251 to utf-8. Does anyone know how to do this? Mercurial convert extension doesn't have options to convert encodings.

hgweb.config:

[web]
#encoding = UTF-8
encoding = windows-1251
#allow_archive = gz, zip, bz2
allow_archive = zip
allow_push = *
push_ssl = false

[extensions]

[collections]
/data/mercurial = /data/mercurial

Best Answer

You are right that the convert extension doesn't support this in a nice way currently. That is, you cannot ask it to recode from encoding X to encoding Y. However, you can ask it to rename the files one by one for you! First create a file called rename.py with

import sys
for path in sys.stdin:
    old = path[:-1] # strip newline
    new = old.decode("cp1251").encode("utf-8")
    print 'rename "%s" "%s"' % (old, new)

Then run

$ hg manifest --all | python rename.py > rename.txt

This creates your file map. You can now use

$ hg convert --filemap rename.txt cp1251-repo utf-8-repo

to convert the repository into a new repository. In the new repository, it will look like the files have always been saved using UTF-8 file names.

Note: The file names are now stored as UTF-8 in the repository. This means that checkouts will look fine on moderns Linux machines. Windows, however, does not use UTF-8 file names. The FixUtf-8 extension must be used to make Mercurial convert the UTF-8 file names into UTF-16 on the fly. This will create readable file names on Windows too.

Note: Everybody will have to re-clone the new repository! Changing any part of the history inevitably changes all the changesets hashes too. So to pull this off, you need to either

  1. make everybody push to the server,
  2. convert the repositories on the server,
  3. have people re-clone

or

  1. make everybody run the above commands on their local repositories
  2. convert the repositories on the server

Either way works since the conversion is deterministic and so your users can run it themselves if they have Python available. If they only have a TortoiseHg installation, then it's probably easiest if you convert for them on your server.

I looked at making the convert extension support this more directly and have sent a patch to the Mercurial mailinglist for more direct support for this.

Related Topic