Obsolete:UTF-8 tips
using xterm
With recent installs of XFree86 and Xorg-x11, xterm works with UTF-8 for any "normal use".
There are other UTF-8 terminals which might do a better job than xterm on some very advanced Unicode features like right-to-left writing and so forth, but for normal european and asian characters, xterm of recent X11 installs works just fine if the a proper UTF-8 locale is installed and the system is properly set up.
xterm -en utf-8
sort of works, but you should only use it if your system does not have a proper UTF-8 locale.
xterm -u8
is better if you have an UTF-8 locale, but this still does not change the locale environment in the shell/subprocess of xterm to an UTF-8 locale, so you'd have to do e.g. something like export LC_ALL=en_US.UTF-8
as the first thing inside the xterm.
If you are starting the xterm from a shell, it's best to simply set the locale environment to an UTF-8 locale and xterm will automatically switch to UTF-8 mode. example: LC_ALL=en_US.UTF-8 xterm
editors
before starting editor, be sure that the LC_CTYPE resource of your locale envoronment is set to an UTF-8 locale. You can check the locale which is in force for newly forked processes from a shell with the locale
command
joe
UTF-8 in joe does not work in version < 3.1, in 3.1, at least the package which comes in SuSE 9.2, works fine with UTF-8.
vim
vim supports UTF-8 since much more than a year so if you have a recent release (version 6) and everyhing set up (xterm, locale) correctly it should just work, otherwise you might need a newer software install.
To bypass the xterm issue for the editor, you can just set the locale tho an UTF-8 locale as described avove and use the graphical version of vim, gvim
. It works like the xtext-only vim, but jsut opens a new X11 window and provides a nice menu for people which would like to use the mouse indead, but nobody forces you to use a mouse, you can work with it like with vim in xterm, only using the keyboard.
vim and gvim also have the nice feature that if there is a byte sequcence
in the file which cannot be a UTF-8 byte sequence which would represnet a
valid Unicode character, it assumes that the file is not encoded in UTF-8
but in latin1 (ISO-8859-1) instead, converts the file in memory to UTF-8
and converts it back to latin1 on save. You just have to be aware of it,
the conversion is indicated by a message line containing "(converted)"
after read and write of the file.