Alternate outgoing channel for DSNs and MDNs
How do I send DSNs and MDNs by using an alternate channel?
This article describes how to configure an MTA to send delivery status notifications (DSNs) and message disposition notifications (MDNs) through an alternate channel and why it is worth configuring.
A typical problem which happens to be seen with customers is that messages are not delivered out to the Internet "fast enough", this means that messages are stored in the outgoing queue before they have a chance to get delivered, because all threads which deliver messages are busy. One can configure the MTA for more outgoing processes and threads, but this sometimes helps only for a short term.
Typically legitimate messages which are sent by authenticated users to their recipients in the internet are delivered fast: the recipients' addresses are correct, there exist no delivery problems etc. But it may happen, that these messages do not get a chance to get delivered immediately, as all delivery threads are busy on attempting to deliver messages, which are more difficult to get delivered.
These messages are delivery status notifications. They are generated by the MTA itself, upon failure to deliver messages to local users. If an MTA accepts a message for delivery, it is mandated by the relevant standards, to either deliver a message, or to return a failure notification to a message sender if it can't do so.
Therefore, the MTA should make the best effort to not to accept messages which it knows it will not deliver, but it can always happen that it accepts messages, and cannot deliver them. Sending delivery notifications can be problematic these days if these are responses to spam messages, because very often senders of spam messages are in fake domains, which have MXes (as per the DNSs) which cannot be connected to. The MTA will spend time trying to connect to such servers for minutes, utilizing threads which should be delivering legitimate mail instead. When it fails to connect there, it will treat this situation as temporary, and will queue these erroneous messages for later delivery, and if the
backoff on a channel is set for frequent retries, the problematic behavior will repeat often.
This problem may be noticed especially in places which have set the grace period feature (which is set by default BTW), and do not discard spam immediately. They will have a lot of notifications about spam messages not being able to be delivered because of users being over quota.
While the DSN generation must not and cannot be switched off (it is mandated by relevant standards, and there is some percentage of legitimate DSNs), there are means to limit influence of problems with delivering DSNs to other types of messages, especially those sent by authenticated users. These means are to configure the MTA to send DSNs via a separate outgoing channel, and associate that channel with a separate process pool. That process pool will be busy trying to deliver the notifications, while the SMTP_POOL will be available for the legitimate mail. If problems described above occur, they will apply to DSNs only, while the users will not notice delays in their communication.
The MTA can be reconfigured to send DSNs it generates via a separate channel. There are several pieces of configuration to achieve that.
There is a new rewrite rule needed to handle top-level domains, so that DSNs to these domains are rewritten to the
tcp_notify channel, which will be created in the subsequent steps. The new rule should be:
Note that this rule needs to appear before the regular "." rule:
Next, to configure the DSNs to be queued to an alternate channel, put the following keywords on the
defaults .. notificationchannel process_dsn dispositionchannel process_dsn
(the same channel is used for both DSNs and MDNs, to simplify the configuration). And create the
Next create a new
tcp_notify channel which will be used to deliver the DSNs and MDNs to the internet (in
tcp_notify smtp mx single_sys remotehost inner subdirs 20 maxjobs 10 pool NOTIFY_POOL backoff "pt4h" "pt8h" notices 1 loopcheck tcp_notify-daemon
There are several interesting things in this definition:
- There delivery retry interval of 4 hours, and then every 8 hours) is unusual. This is reasonable because if a DSN is not delivered for the first time, it is unlikely to be delivered at all. There is no point of retrying it often, especially a DSN delivery attempt can consume a thread for a long time.
- Similarly, there is no point of keeping the undeliverable DSNs in the queue too long. They will be kept for one day only.
This channel will use a separate process pool
NOTIFY_POOL. The actual number of processes (
job_limit, these values should be equal as nothing else uses that pool) should be sized according to the load on and capabilities of the system.
loopcheckshould be enabled on this channel, as MX records for spammers' domains sometimes point at
Next, the process pool needs to be configured in
Some protocol level timeouts for the tcp_notify channel can be decreased. It is likely that the MTA cannot connect to MX machines for spammers' domains, so it does not make sense to attempt connecting for too long (and hence "locking" threads). For instance, one could set:
so that the server waits for only one minute for a successful connection to a remote host. As the configuration file is related to the
tcp_notify only, other connections will not be affected.
If LMTP is used for mailbox delivery, the vast majority of DSNs (including those related to overquota users) are generated on the front-end, so it is not necessary to recognize DSNs generated on the backend. If, however, SMTP is used, such DSNs will be generated on the back-end, too and when relayed to the front-end, they will be recognized as legitimate mail from users. It is better that the MTA can recognize the DSNs coming from the backend as if they were DSNs it generates itself, and associate them with tcp_notify so that the same characteristics are applied, and so that legitimate mail is not affected if the number of such DSNs is high.
The trick to achieve this is, to configure the back-end so that it sends DSNs through its own
tcp_notify channel, and configure that channel to connect to an alternate port on the front-end. The front-end would associate that port with its own tcp_notify channel and send messages arriving via that port as if they were its own DSNs.
The architecture in case of the two-tier deployment consists of front-end and back-end systems. The back-end system either sends DSNs generated in the
process_dsn channel to the front-end, port 2025, through its
tcp_notify channel, or sends other mail to the front-end, port 25, through
tcp_local. The front-end uses
tcp_intranet as a source channel form the mail from the back-end if it comes through port 25, or uses
tcp_notify if mail comes in on port 2025. That mail, along with DSNs generated by the front-end in
process_dsn, is relayed to the Internet via its
The back-end system needs to be considered as described above. The
tcp_notify channel should differ in two ways:
daemon front-end-domain-or-ipshould be added to the
tcp_notifychannel, just like on
tcp_local, so that all mail is relayed through the front-end.
tcp_notifychannel should be configured with
port 2025(2025 is the alternate port on the front-end).
The front-end should be configured as described above, with the following additions:
There should be dispatcher instance configured to listen on port 2025 in dispatcher.cnf, and it should be associated with the tcp_nofity channel.
[SERVICE=SMTP] PORT=2025 IMAGE=IMTA_BIN:tcp_smtp_server LOGFILE=IMTA_LOG:tcp_smtp_server.log STACKSIZE=2048000 PARAMETER=CHANNEL=tcp_notify
The new "." rewrite rule shoudl be changed to:
Whoever can connect to port 2025 on the MTA, can send mail to the Internet. While it may be disallowed for the Internet users, it's a good idea to configure the MTA to allow only the back-end to connect to that port, to prevent malicious local users, who are able to connect to port 2025, from relaying mail and thus bypassing whatever access control is set on port 25.
Such permissions can be set in the
PORT_ACCESS mapping table:
PORT_ACCESS ... Existing mapping entries ... *|*|2025|*|* $C$|BE_IP;$2|$Y$E *|*|2025|*|* $N
where BE_IP is a new mapping table which defines the IP numbers which can connect to port 2025, for instance:
BE_IP 10.1.4.6 $Y 10.1.4.7 $Y 127.0.0.1 $Y * $N