Converting files to UTF-8 encoding is a common operation in Linux, especially when dealing with different character sets or ensuring file compatibility. The following methods can be used to achieve this goal, most commonly using command-line tools such as iconv, recode, and convmv.
Use the iconv tool. iconv is a very powerful tool that is widely used for character encoding conversion and can be used to convert files from other character sets to UTF-8 encoding:
iconv -f Original encoding -t utf-8 Input file > Output file
-f Original code: Specifies the original code of the input file.
-t utf-8: Converts the file to the UTF-8 encoding.
Enter file: To convert file.
Output file: Converted file.
If you have a file example.txt encoded ISO-8859-1 and you want to convert it to UTF-8, you can use the following command:
iconv -f ISO-8859-1 -t utf-8 example.txt > example_utf8.txt
If you are not sure of the original encoding of the file, you can first check the character encoding of the file with the file command:
file example.txt
Convert and overwrite the original file:
iconv -f ISO-8859-1 -t utf-8 example.txt > example.txt.temp && mv example.txt.temp example.txt
recode is another character set conversion tool that supports conversion from one character set to another. It is also easy to convert files to UTF-8 encoding.
On most Linux distributions, you can install recode with the following command. Debian/Ubuntu:
sudo apt install recode
Fedora:
sudo dnf install recode
Arch Linux:
sudo pacman -S recode
Basic syntax:
recode Original code... utf8 Input file
Suppose you have an ISO-8859-1 encoded file and want to convert it to UTF-8:
recode ISO-8859-1.. utf8 example.txt
convmv is a tool for converting file names and file content encodings. It can batch convert file encodings, especially for batch processing is very useful. Install convmv, Debian/Ubuntu:
sudo apt install convmv
Fedora:
sudo dnf install convmv
Arch Linux:
sudo pacman -S convmv
Basic syntax:
convmv-f original encoding -t utf-8 --notest file
-f Original code: indicates the original code of the input file.
-t utf-8: indicates that the target encoding is UTF-8.
--notest: Actually convert the file, just test (test mode only shows the conversion to be performed, but does not change the file).
For example, to convert ISO-8859-1 encoding example.txt to UTF-8 encoding:
convmv -f ISO-8859-1 -t utf-8 --notest example.txt
If you only need to convert one file and it is small, you can manually convert the file encoding to UTF-8 using the vim or vi editor. To open a file with vim:
vim example.txt
In vim, enter the following command to view the current file encoding:
:set fileencoding
To convert the file encoding to UTF-8, you can enter the following command:
:set fileencoding=utf-8
Save the file and exit vim:
:wq
For simple character substitutions, sed can be used to handle certain character sets in a file, although it is not as powerful as iconv and recode. It is suitable for handling simple transformations of known character sets in encodings.
sed 's/ Old character/New character /g' Input file > Output file
However, the limitations of this approach are increasing, and iconv or recode other tools are recommended.
Iconv supports a variety of encoding conversions and is suitable for UTF-8. Recode handles multiple character sets. Convmv is ideal for mass name changes and content conversions. Vim is suitable for single file encoding conversion. Sed is good for simple character substitutions, not for complex coding tasks. Choose your tools according to your needs, iconv and recode are good for high-volume conversions, vim or sed are good for simple tasks.