Sieve Implementation

From Messaging Server Technical Reference Wiki
Jump to: navigation, search

IMPORTANT NOTE: This page is obsolete, and has been replaced by Sieve hierarchy. It has been retained because too many things reference it.

The Sieve Email Filtering Language is heavily used throughout the iMS MTA. This article describes the iMS Sieve implementation in detail.

Supported Standardized Extensions

In addition to the core Sieve functionality specified in RFC 5228, the iMS Sieve implementation supports the following standardized extensions:

  • Body extension (RFC 5173). The body test provides the means to test material in the message body. Note that there are are number of restrictions on the implementation of body. (7U2)
  • Copy extension (RFC 3894). Copy is a simple extension that allows the redirect and fileinto actions to be used without canceling the default action of saving the message to the "inbox". (6.1)
  • Date extension (RFC 5260). The date test provides the means to test fields in date-time values. (7U4)
  • Editheader extension (RFC 5293). The addheader and deletheader actions provide the ability to alter the message header. Additionally, the replaceheader action described in http://tools.ietf.org/html/draft-degener-sieve-editheader-00 has been implemented. This provides an especially convenient way to add tags to subject fields.
  • Encoded-character extension (RFC 5228). This extension provides a way to specify Unicode characters by numeric value in Sieve character strings. (7.0) Additionally, \r, \n, and \t can be used to represent carriage return, line feed, and tab characters respectively in quoted strings.
  • Envelope extension (RFC 5228). This extension consists of an envelope test that can access MAIL FROM and RCPT TO address information.
  • Envelope-dsn (RFC 6009). This extension provides access to additional envelope information provided by the delivery status notification SMTP extension. (7U2)
  • Environment extension (RFC 5183). Environment provides scripts access to information outside of the current message. (7.0U1)
  • Extlists extension (RFC 6134). This extension adds the :list match-type to the address, currentdate, date, deleteheader, envelope, environment, hasflag, header, replaceheader, spamtest, string, and virustest tests. :list in turn allows the test to check values against externally stored information. (7U1)
  • Fileinto extension (RFC 5228). This extension adds the fileinto action for specifying a folder where the message is to be delivered.
  • Ihave extension (RFC 5463). Ihave makes it possible to write scripts that use a given extension if it is available but continue to operate if it is not.
  • Index extension (RFC 5260). The index extension adds a :index nonpositional parameter to the address, date, and header tests, which in turn provides the means to check a specific instance of header fields that occur multiple times. (7U4)
  • Imap4flags extension (RFC 5232). Imap4flags provides the means to set flags on messages delivered to th e message store. (6.3P1)
  • Mime extension (RFC 5703). The mime extension provides facilities for testing headers in inner MIME parts of messages. (7U1)
  • Notify extension (RFC 5435) and the mailto notification method (RFC 5436). Notify with the mailto method provides the means to send an notification email about the current message being processed. Note that an earlier draft of the notify action is also still supported. (6.2, 7U5)
  • Relational extension (RFC 5231). Relational adds relational comparisons (less than, greater, than, etc.) to the header, address, and envelope tests. It also adds the ability to count the number of entities that match the test criteria. (6.0)
  • Redirect-dsn extension (RFC 6009). This extends Sieve's redirect action to provide control over delivery status notification parameters. (7U2)
  • Subaddress extension (RFC 3598). Subaddress :user and :detail tagged arguments for the envelope and address tests to access information embedded in the local part of an address. (iMS uses a plus sign as the separator between user and detail information in the local part.) (6.0)
  • Spamtest and virustest extensions (RFC 5235). The tests defined by these extensions provide a means for Sieve scripts to access spam and virus filter "scores".
  • Vacation extension (RFC 5230). The vacation action defined by this extension can be used to generate "out of office" messages in response to incoming email.
  • Vacation-seconds extensions (RFC 6131). This extends the vacation time to allow specification of timeout values in seconds rather than minutes.
  • Variables extension (RFC 5229). The core Sieve language does not provide any means of saving state from one statement to the next. This extension adds variables to the language. (6.2)

Envelope Extension Implementation

In addition to supporting the standard envelope arguments specified in RFC 5228 and in the envelope-dsn extension, the envelope test supports a conversiontag arugment. This test checks the current list of tags associated with the current recipient, one at a time. Note that the :count modifier, if specified, allows checking of the number of active conversion tags. This type of envelope test is restricted to system sieves. Also note that this test only "sees" the set of tags that were present prior to sieve processing - the effects of setconversiontag and addconversiontag actions are not visible.

Environment Extension Implementation

All of the standard items defined in RFC 5183 are supported. Additionally, the following vendor-specific items are provided:

  • vnd.sun.authenticated-sender-address provides access to the sender address that's associated with the authentication state for the SMTP session. (7.0U3)
  • vnd.sun.authenticated-sender-id provides access to the authenticated user identity. (7.0U3)
  • vnd.sun.autoreply-internal returns "TRUE" when the autoreply criteria for using an internal autoreply response have been met, "FALSE" otherwise. (7.0U3)
  • vnd.sun.destination-channel item returns the name of the current destination channel. (7.0U1)
  • vnd.sun.source-channel returns the name of the current source channel. (7.0U1)
  • vnd.oracle.last-verdict provides information associated with previous script evaluations. When the sieves associated with a recipient are evaluated in order, each evaluation that performs an explicit handling action sets this item as it finishes so the next sieve in the sequence can check it. A script that doesn't perform an explicit handling action will leave this item unchanged. Possible values that can be set are:
    • refuse
    • reject
    • ereject
    • jettison
    • fileinto
    • redirect
    • keep
    • discard
    Note that testing this environment item makes the script recipient-specific in the same fashion an envelope "to" test does, and will result in this and subsequent sieves being reevaluated for every recipient. Although any script can test this item, it is intended for use when the MTA is being used to perform antispam and antivirus checks by other applications and utilities like imexpire. (7U5)

Additionally, as of the 7.0U3 release, the $E metacharacter can be used in FROM_ACCESS, SEND_ACCESS, ORIG_SEND_ACCESS, MAIL_ACCESS, and ORIG_MAIL_ACCESS mappings to set additional environment items which can then be acccessed through the environment test in any script.

Body Extension Implementation

The restrictions on the support of RFC 5173 are:

  1. The only match types supported are :contains and :is; :matches and :regex are not supported. This is likely to be a permanent restriction due to the possible performance impact of supporting these match-types.
  2. The only body transforms supported are :raw and :text; :content is not supported. This restriction may be be lifted in a future release.
  3. Variable substitutions are not allowed in body test arguments. If they are used an error is likely to occur. For example, this script will fail:
              
      require ["variables", "body"];
      set "a" "testing"
      if body :contains "${a}" { discard; }
    
    
    This restriction exists so that a list of all arguments to body in all scripts can be computed in advance and searched for in a single pass. If this restriction were to be lift it's easy to construct scripts that require an arbitrary number of passes over the message, which is unacceptable in a server environment. As such, this should be considered to be a permanent restriction.
  4. The :text body transform operates on all message parts with a text type or a 7bit/8bit encoding. If a charset other than utf-8 is specified on a text part that part is converted to utf-8 before being searched.

The availability of the body test is controlled by the enable_sieve_body MTA option. A value of 0, the default, disables the extension. A value of 1 enables the extension for use in all sieves. A value of 2 enables the use of body in system-level sieves only.

Multiple Script Support

The main extension in the iMS implementation, however, is a nonstandard one: The ability for multiple scripts to apply to a single recipient. (The Sieve specifications assume a single script per user.) iMS supports a number of additional types of Sieve scripts.

Script Types

The various types, in order from the most general to the most specific, are:

  1. Spam filter scripts. Results produced by spam filter plugins are interpolated into Sieve scripts. Up to 8 filters can be defined so up to 8 scripts can be produced. (*)
  2. Source channel sieves. The sourcefilter channel option is used to specify the Sieve URL for source channel sieves. (*)
  3. System script. A single system-wide sieve can be specified that applies to all recipients of all messages. The normal location for this script is the file /var/opt/SUNWmsgsr/config/imta.filter. (*)
  4. Destination channel script. The destinationfilter channel option specifies the Sieve URL for destination channel sieves. (*)
  5. ORIG_SEND_ACCESS, SEND_ACCESS, ORIG_MAIL_ACCESS, and MAIL_ACCESS mapping scripts. The $S sequence, when specified in any of these mappings, causes a Sieve URL to be read from the mapping result string.
  6. Mailing list domain scripts. The domain entry associated with mailing lists defined in LDAP can use the mailDomainSieveRuleSource attribute to specify a Sieve script. (*)
  7. Mailing list scripts. Mailing lists defined in LDAP can use the mailSieveRuleSource attribute to specify a Sieve script. Lists defined in the aliases file or database can use a [filter] nonpositional parameter to specify a Sieve URL.
  8. User domain scripts. The domain entry associated with users defined in LDAP can use the mailDomainSieveRuleSource attribute to specify a Sieve script.
  9. User scripts. Users defined in LDAP can use the mailSieveRuleSource attribute to specify a Sieve script. Users defined in the aliases file or database can use a [filter] nonpositional parameter to specify a Sieve URL. Finally, the filter channel option can applied to the destination channel; if used it specifies a Sieve URL for a user sieve.
  10. Head of household scripts. LDAP user entries can contain an attribute (specified by the ldap_filter_reference MTA option) that provides the distinguished name of the so-called "head of household", another LDAP user entry. This entry is read and any Sieve stored in the attribute specified by the ldap_hoh_filter MTA option(which defaults to mailSieveRuleSource) will be processed.

The types marked with (*) are considered to be "system-level" scripts. Certain capabilities, most notably the capture actions, are only available to system-level scripts.

Sieve URLs

Many types of scripts are specified through a Sieve URL. Several types of URLs are supported:

  • file: URLs can be used to refer to files stored in the local filesystem.
  • ldap: URLs can be used to refer to scripts stored in the LDAP directory.
  • data: URls make it possible to specify a script directly in the URL itself (as long as the script isn't longer than 256 characters).
  • imap: URLs can be used to access scripts using IMAP. The credentials specified by the imap_username and imap_password MTA options are used to log in to the IMAP server and the URL is resolved with the URLFETCH IMAP command. Note that imap: URL resolution is part of the server-side support for the BURL SMTP extension (used to implement forwarding of messages without having to download them) so any Sieve usage must take the fact that there's only one set of login credentials into account.

A file: URL type is normally assumed so bare filenames will also work as an argument in most places.

Semantics of Multiple Scripts

Since multiple scripts can apply to each recipient and different scripts can produce different results, there has to be a way to resolve conflicting results. The rules for determining the final result are:

  1. The scripts associated with a particular recipient are scanned in order from most specific to most general. The result of the most specific script that executes an action which determines the status of a message is used preferentially. The actions that determine the status of a message are:
    1. discard
    2. fileinto
    3. keep
    4. redirect
    5. reject
    6. ereject (7.0 or later)
  2. A set special nonstandard actions are provided which, if used, work the other way around: The most general script that specifies them is used preferentially. These special actions are:
    1. jettison (like discard)
    2. refuse (like reject)
  3. Capture actions in system-level scripts are executed unconditionally, regardless of whether or not the script that contains the action is selected as the one which determines message handling for this recipient.
  4. Conversion tags set or added by the setconversiontag and addconversiontag actions respectively are processed unconditionally in a fashion similar to capture actions.
  5. An error in any script forces a keep action and aborts further scanning. Additionally, a notification is sent to the script owner reporting the problem.

Prior to the 7.0 release refuse actions were only available to system-level scripts and refuse, when used, forced the return of a 5yx response to the DATA command in SMTP. In 7.0 and later this this no longer the case - refuse now behaves like jettison, making it possible for it to apply to only a subset of all recipients. However, the MTA checks and whenever possible will continue to use a 5yz response whenever it is possible to do so.

Evaluation of Multiple Scripts

The various different types of scripts are located, loaded, and associated with the appropriate recipient addresses as early as possible: Source channel and system scripts are dealt with during MAIL FROM processing and all others with the exception of spam filter sieves are dealt with during RCPT TO processing. Spam filter sieves are determined last - since they are derived from spam filter verdict processing they can only be determined after message data is available.

Linkage Internals Before 7.0

Prior to the 7.0 release the internal linkage of scripts to recipients looked something like this:

Sieve Evaluation 1.png

Note that any given tier of the linkage tree can be omitted. As the arrows indicate, evaluation of scripts proceeded from the bottom to the top while interpretation of script results proceeds from the recipients at the leaves down to the root.

Problems With This Approach

This organization was fine in terms of script evaluation semantics. However, after it was implemented and various additional Sieve extensions were defined a number of problems emerged:

  • Information flow up the tree from the root to the leaves wasn't possible. This was a nonissue prior to the availability of the editheader, spamtest, and virustest extensions. But once these extensions entered the picture it was only logical that the effects of more general scripts would be visible to more specific scripts. For example, a system sieve that performs some test and decides a message is likely to be spam might want to indicate this fact either by adjusting the spamtest score or by inserting a header. But this doesn't accomplish much when more specific scripts cannot see the results of these actions.
  • In some situations scripts were loaded but then the recipient addresses they were associated with ended up being dropped from the recipient list. When this happened there was no easy way to remove the scripts from the evaluation list and scripts were evaluated unnecessarily.
  • Since the envelope test can examine recipient addresses, any script that employs such a test is necessarily recipient-specific and must be reevaluated for each recipient. Even worse, since this can change the result of the script, it has a cascade effect forcing all more specific scripts to be reevaluated as well. In order to get these semantics the linkage tree had to be split and dummy nodes had to be inserted. For example, if the system sieve called for the envelope extension, the recipient tree shown above ended up looking like this:

Sieve Evaluation 3.png

  • All Sieve scripts must list all of the extensions they employ in an initial require clause. However, it is common for an extension to be listed but not actually used - and this is not necessarily a result of poor coding practice. For example, a script might perform a header or address test and depending on the result of that test only then perform an envelope test:
require ["envelope", "subaddress"];
if address :is "from" "user1@example.com" {
  if envelope :is :detail "to" "whatever" { ... }
}
In this case the script is only recipient-specific given certain header values and it is unnecessary to reevaluate it for every recipient. But since the linkage tree has to be constructed prior to script evaluation the presence of the envelope extension in the require clause forces unnecessary reevaluations.
  • Scripts often can be written to take advantage of a given extension if it is available but still function if it is not. The approach of enumerating of all extensions in the require clause does not allow the construction of such scripts. The ihave extension eliminates this restriction by adding a test that succeeds if the requested extension is available and fails if it is not. This would be extremely difficult to implement using the linkage tree approach since the extensions a given script uses can no longer be determined prior to script evaluation.

Linkage Internals In 7.0

A new way of linking scripts to users was needed and has been implemented in 7.0. The linkage tree is gone, replaced with a per-recipient array:

Sieve Evaluation 2.png

This new structure eliminates all of the issues the linkage tree had. Scripts are now evaluated during final recipient address processing, eliminating unecessary evaluation. Evaluation proceeds down the array, and if a particular script has already been evaluated on behalf of some other recipient the results can easily be checked for recipient specificity and reused if no dependency exists. Even better, information can be passed from more general scripts to more specific ones, and the additional checks for recipient-specific information inheritance are reasonably straightforward. And finally, when reevaluation is required the resulting "split" is much more straightforward.

For example, given the previous set of scripts, a recipient-specific system sieve results in the following augmented data structure:

Sieve Evaluation 4.png

Implementation Internals

Sieve parsing and evaluation is implemented using a generic parse/evaluation subsystem. In addition to Sieve, this system has also been used to implement other languages, most notably the language used by the PMDF-DIRSYNC product. Specific language details, in particular what "functions" can be called and what arguments they require, are specified through callbacks. Three basic routines are provided:

  1. sy_parse_expression - Parses an "expression" (i.e., a Sieve script), converting it into a series of instructions for the evaluator.
  2. sy_eval_expression - Evaluate an "expression" with a specific set of inputs.
  3. sy_dispose_expression - Free a previously parsed expression.

Parsed expressions are stored in two separate linked lists of arrays: One for instructions and the other for string data. The use of a series of array segments makes it possible to write parsed expressions out to disk and read them back in later. This feature is used to store the system sieve in the compiled configuration so it doesn't need to be reparsed.

This subsystem only understands basic script syntax; it knows nothing about specific Sieve semantics. Information about this is provided through callbacks passed to the parser and evaluator. The Sieve-specific callbacks are the routines mm_check_function and mm_eval_function.