Charset7, charset8, charsetesc Channel Options

From MsgServerDocWiki

Jump to: navigation, search


Automatic character set labelling (charset7, charset8, charsetesc)

The MIME specification provides a mechanism to label the charset used in plain text messages: A "charset=" parameter can be specified as part of the Content-type: header line. Various charset names are defined in MIME, including US-ASCII (the default), ISO-8859-1, ISO-8859-2, and so on, and many more have been registered with the Internet Assigned Numbers Authority (IANA).

Some existing systems and user agents, however, do not provide any mechanism for generating these charset labels. The charset7, charset8 and charsetesc channel options, when placed on a source channel, provide a mechanism to specify charset names to be inserted into message headers. Each option requires an argument giving a charset name. The names are not checked for validity. Note, however, that charset conversion can only be done on charsets specified in the MTA's charset definition file charsets.txt. The names defined in this file should be used if possible.

The charset7 charset name is used if the message contains only seven bit characters; the charset8 name will be used if eight bit data is found in the message; charsetesc will be used if a message containing only seven bit data happens to contain one or more escape characters. If the appropriate option is not specified no character set name will be inserted during MIME processing into Content-type: header lines for text parts that lack an existing charset label.

When the presence of a charset* channel option on a channel causes a MIME "charset" parameter clause to be added to an incoming message, that of course also means that the message gets the more fundamental MIME-version: and Content-type: header lines added, if not already present.

Note that the charset8 option also controls the MIME encoding of eight bit characters found in message headers (where such eight bit data is unconditionally illegal). The MTA will normally always MIME encode any such (illegal) eight bit data encountered in message headers, labelling it as the UNKNOWN charset if no charset8 value has been specified on the current source channel. (Actual addresses are a special case. In the actual address, that is, in the RFC 822 addr-spec, where eight bit categorically must not appear, any eight bit data will be replaced by the MTA with the asterisk character, *. Note that an RFC 822 phrase, or "personal name", however, is subject to the above described MIME encoding of any illegal eight bit, using the charset8 charset name.)

These charset specifications never override existing labels; that is, they have no effect if a message already has a charset label or is of a type other than text.

The charsetesc option tends to be particularly useful on channels that receive unlabelled messages using Japanese or Korean character sets that contain the escape character (e.g., iso-2022-jp or iso-2022-kr).

Personal tools