Choosing which messages to archive

The choice of which messages to archive is a critical one for sites, especially when the archiving is for compliance purposes. This has three components: (1) choosing whose or which types of messages to archive,  (2) choosing in what form and at what stage(s) of processing and  transitting the MTA the messages should be captured for archiving, and (3) choosing whether  Message Store IMAP APPEND operations (moving a message to a folder) should cause archiving. Of these three questions, the first (whose or which types of messages to  archive) is usually well specified. The third question (whether to archive due to Message Store operations) also tends to be straightforward to decide. However, the second question may require additional consideration. Between initial message submission and eventual final delivery into a mailbox, while transitting the MTA, messages undergo various  transformations, some trivial and some potentially dramatic. Such transformations can include: addition of Received: header lines,  addition of other header lines (such as missing-but-required header  lines such as Date:, or addition of spam filtering header lines, or  addition of mailing list header lines, etc.), transformations  ("address reversal") of addresses in header lines, alias or  list expansion changing the currently active set of envelope  recipients, "split up" of a multi-recipient message into  different copies for different subsets of recipients, addition of  "disclaimer" text, changes in Content-transfer-encoding,  document conversion processing,  conversion to a different charset , etc.

Three possible approaches for selection of which messages transitting the MTA are eligible for archiving include:



 Flow-based: Those messages passing through certain channels (such   as channels delivering to the Message Store, or channels sending out to    the Internet) should be archived. 

 User-based: Those messages sent to or from certain users (perhaps   all users; perhaps all users in certain domains;   perhaps only some distinguished subset of users) should be    archived. 

 Content-based: Those messages containing certain content should be   archived. 



Such approaches correspond, respectively, to techniques of:



 For flow-based archiving, it would be typical to trigger archiving   via    channel     options    (if using an    archiving callout    approach) or via channel Sieve filters (using a    " " action    in a channel Sieve filter    located via a   or    , as relevant). Choice of the   "correct" channels on which to trigger archiving is critical. 

 For user-based archiving, it would be typical to trigger archiving   via some user-level  (or new in MS 8.0, domain-level)    LDAP attribute; see    Capture triggered via   LDAP attribute. Use of a   class-of-service may be helpful in setting such an attribute on all (or    large subsets) of users. Note that when such an       or (new in MS 8.0)       named LDAP attribute    is used, then capture will occur at whatever channel stage a user alias    is expanded (capturing messages to the user), as well as whenever    address reversal occurs    (capturing messages from the user). Since   address reversal in particular normally occurs during every message    enqueue, deployments involving multiple channel "hops" or    multiple relay hosts may find multiple "copies" of    messages---one "copy" per channel "hop" -- getting    captured for archiving. Thus an alternative to such global use of an   LDAP attribute is to use instead a    Sieve filter    " " action, perhaps consulting a    Sieve external list    (which may consist of consulting a user-level LDAP    attribute). This technique of using a channel-specific Sieve filter   that consults a Sieve external list allows more precisely timed    (limited to specific channels) archiving that is still based on    (provisioned via) LDAP attribute settings; see for instance    Example Sieve   external lists with properties. 

 For content-based archiving, it is critical to detect and label   which messages contain the sort of content that needs archiving. If   users and user e-mail agents can be relied upon to label such context    ab initio, when messages are first generated, that is one    solution for labelling. Very simple, and easy to detect, content   criteria may be codable into a Sieve script---for instance, detecting    certain MIME Content-type: labelling. More complex content detection,   especially in cases of concerns about uncooperative users attempting to    evade archiving requirements, may require special, third-party    scanning-and-detection software, a la spam/virus filter software. As   usual, the preferred approach for integrating such third party packages    is via the MTA&#x27;s spamfilter plug-in facility; if the third party    package does not support such callout use, then the second best choice    is to deploy the package "on the side" of the MTA using the    usual     /alternate conversion channel    approach. In any case, once the messages are labelled in whatever way   chosen, then the actual trigger for archiving can use    Sieve filter    based " " triggered by presence of the    relevant label. 



See also:
 * Character set conversion
 * destinationspamfilter1 Option
 * Archive spamfilterN_config_file
 * Sieve capture extension
 * destinationfilter Option
 * Capture triggered via LDAP attributes
 * ldap_capture MTA Option
 * ldap_domain_attr_capture MTA Option
 * Address reversal
 * Sieve external lists
 * aliasdetourhost Option
 * Archiving messages