Charset policy
Contents
Why?
Though rpm supports translation of description and summary inside the spec file itself, this is proven to be a nightmare, since packagers use their own encoding of choice, leaving a multiple-encoded file as result. Such a spec file can't possibly be handled properly by any editor if more than one translation exists. Thus for description or summary translation, po file is the way to go.
In ROSA, such translations are forbidden by policy. Summary translations can be done in the mdv-rpm-summary package, which also have some disadvantages. There are several proposals on how the translation process can be improved (and uncoupled from the build process), but none of them is implemented yet.
Problematic to read your file
Everybody has their own system, and they may not migrate because the old encoding "just works". However, lots of the encodings in this world are not compatible with each other. Their only common part is ASCII characters (so ASCII is safe for spec files). ISO-8859-1 characters are only shown as junk for other locales.
This, of course, requires both of writer and reader to migrate to UTF-8; but this is already a de-facto standard in the Linux world. Otherwise, people would less likely help because what you code or write is unreadable -- unless your intention is exactly not expecting others to help.
Editors may damage spec file
Related to the first point but worse. For systems with non-ISO-8859-1 legacy charset, text editors may not handle the "junk" text, and may even attempt to "correct" it by modifying characters, rendering the file even more broken.
How to fix?
In spec file itself
It is the spec file that matters, so remember to use UTF-8 throughout the file. To test if your spec file is indeed in UTF-8, use iconv to filter it:
iconv -f UTF-8 -t UTF-8 -o /dev/null yourpackage.spec
If it doesn't complain, the file is in UTF-8 (or ASCII). Otherwise it will tell you the UTF-8 test is broken in which position; you can also remove =-o /dev/null= argument to have a look yourself.
In various config files
Remember to use UTF-8 in ~/.rpmmacros, from where your name is read (in %packager line):
- %distribution ROSA Linux
- %vendor ROSA
- %packager Test Packager <blahblah@rosalinux.org>
If you also use rebuild-rpm and similar building tools, please remember to change your name in the corresponding config files too.
UTF-8 enabled editors
There are many text editors that support UTF-8 natively, be it GUI one or text mode one.
Language environment
Use UTF-8 in your shell environment too whenever possible, that would eliminate lots of headaches. For example, this is locale settings from locale command:
- LANG=ru_RU.UTF-8
- LC_CTYPE=ru_RU.UTF-8
- LC_NUMERIC=ru_RU.UTF-8
- LC_TIME=ru_RU.UTF-8
- LC_COLLATE=ru_RU.UTF-8
- LC_MONETARY=ru_RU.UTF-8
- LC_MESSAGES=ru_RU.UTF-8
- LC_PAPER=ru_RU.UTF-8
- LC_NAME=ru_RU.UTF-8
- LC_ADDRESS=ru_RU.UTF-8
- LC_TELEPHONE=ru_RU.UTF-8
- LC_MEASUREMENT=ru_RU.UTF-8
- LC_IDENTIFICATION=ru_RU.UTF-8
- LC_ALL=
You can change language settings in the ~/.i18n file located under your home directory.