Character set conversion

If the MTA&#x27;s initial probe of the  mapping table (to determine whether or not any character set conversion or message reformatting need be performed) finds that the message is to be reformatted, it  will proceed to check each part of the message. Any text parts are found and their character set parameters are used to generate the  second probe. Only when the MTA has checked and found that conversions may be needed does it ever perform the second probe. The input string in this second case looks like this: IN-CHAN=in-channel;OUT-CHAN=out-channel;IN-CHARSET=in-char-set The  and    are the same as before, and the    is the name of the character set  associated with the particular part in question. (Note that the    MTA  option, regardless of setting,   has no effect on the form of this second probe of the     mapping table.) If no match occurs for this second probe, no character  set conversion is performed (although message reformatting,  e.g., changes to MIME structure, may be performed in  accordance with the keyword matched on the first probe). If a match does occur it should typically produce a string of the form: OUT-CHARSET=out-char-set Here the  specifies the name of the  character set to which the   should be  converted. Note that both of these character sets must be defined in the character set definition table, , located  in the MTA table directory. No conversion will be done if the character sets are not properly defined in this file. This is not usually a problem since this file defines several hundred character sets; most of  the character sets in use today are defined in this file. See the description of the    utility   for further information on the    file.

If all the conditions are met, the MTA will proceed to build the character set mapping and do the conversion. The converted message part will be relabelled with the name of the character set to which it was  converted. Encoded-words in message headers (text encoded according to the rules of RFC 2047) will also have the specified charset conversion  applied.

In addition, the following other types of output request are supported.

When working on text parts of messages, one may also specify an encoding in which the MTA should output that part: OUT-ENCODING=encoding-name Here   must be the name of an encoding supported by the MTA, namely one of (as of this writing):

 , 

 , 

 , 

 , 

 , 

 , 

 , 

 , 

 , </li>

 , </li>

 , </li>

 , </li>

 , </li>

 , </li>

 , </li>

 , </li>

 , </li>

 , </li>

 , </li>

 , </li>

 , </li>

 , </li>

 , </li>

 , </li>

<li> , </li>

<li> , </li>

<li> , </li>

<li> , </li>

<li> , </li>

<li> , </li>

<li> , </li>

<li> , </li>

<li> , </li>

<li> , </li>

<li> , </li>

<li> , </li>

<li> , </li>

<li> , </li>

<li> , </li>

<li> , </li>

<li> , </li>

<li> , </li>

<li> , </li>

<li> , </li>

<li> , </li>

<li> . </li>

</ul>

Both an output charset and an output encoding may be specified, by separating the  clauses with a comma.

For encoded-words in message header lines (material encoded using the RFC 2047 encoding rules), the   must be one of  ,  ,   , or  ; attempting to set  any other output encoding will result in the "unknown"  encoding being used.

There are also several additional options that can be applied for conversion of the charset in message headers. Specifying OUT-CHARSET=out-charset,RELABEL-ONLY=1 in the template (right hand side) of a mapping entry means that the MTA will simply use the specified charset name    wherever the    name had appeared. That is, this is intended to be used in cases where the original charset label was  wrong, and it is desired to simply override the original labelling with  correct labelling (but no actual charset conversion need be performed).

Specifying IN-CHARSET=&#x2a; in the template (right hand side) of a mapping entry requests that the MTA attempt to sniff the data to attempt to determine what character  set was truly used. Currently, the only useful such determination that can be made by the MTA is between US-ASCII, EUC-JP, SHIFT-JIS, and  ISO-2022-JP.

Specifying OUT-LANGUAGE=lang-tag in the template (right hand side) of a mapping entry tells the MTA to set the specified language tag as the value of the Content-language:  header line. Specifying OUT-LANGUAGE=&#x2a;lang-tag tells the MTA to insert the specified language tag with the charset name inside encoded-words on header lines, if no explicit language tag  was already present in the encoded-words.

See also:
 * CHARSET-CONVERSION mapping table
 * Converting ISO-2022-JP to UTF-8 and back
 * chbuild utility