Filenames and Unicode Normalization Forms

Mac OS X uses NFD (Normalization Form D) for filenames while everything else (like Windows and GNU/Linux) seems to use NFC. This can sometimes lead to unpleasant surprises when moving files across platforms.

Example Problem

You rsync over a file archive from your Mac to your GNU/Linux server. The filenames are not changed from NFD to NFC since rsync doesn’t care and Linux just treats filenames as byte sequences. Now you try to access the same files over Samba from your Mac. You discover that files with international characters aren’t accessible and curse Apple for Thinking Differently when it comes to filenames. Then you follow the steps below and everything works fine.

Solution: NFD to NFC Conversion

Here’s how you can convert NFD filenames to NFC:

  1. Install convmv. Read the documentation.
  2. convmv -f utf-8 -t utf-8 -r --notest --nfc directory

Acknowledgements

My instructions first said that one should convert to UTF-8 NFC via some other encoding (like ISO-8859-1), but then Nicholas Sherlock informed me about the above solution. Thanks!

Resources