Search Technology Comparison

Comparisons can be made based on various different criteria. The following sections cover some of the possibilities.

Substring vs. Indexed Search
The classic search technology built-in to imapd performs a substring search (a search term matches any word containing that substring). The indexed search technologies (Elasticsearch, DSE Solr, and ISS) use a word-based search where a search term must match an entire word (or root word). Wildcard options for indexed search are available, but add significant overhead so searches take longer and require more resources. It may be desirable to configure wildcard use for address headers. In order to strictly comply with the IMAP standard for searching, all textual searches must be configured as substring searches (using the  option). However, modern web searches tend to be word-based so our product defaults to word-based searches for message text as we believe that behavior will be faster and familiar to end-users.

Search Complexity
Elasticsearch supports all searches that IMAP supports (flag and annotation terms in an IMAP search command are converted to UID lists by imapd). DSE Solr supports all searches; however the ANNOTATE extension (RFC 5257) is not implemented on Cassandra Store. ISS only supports a subset of search operations; see the  option for a description of ISS limitations.

Index Storage
Elasticsearch can store indexes on Elasticsearch local disk without the need for a storage array. Use of local SSD will reduce the number of nodes required in the Elasticsearch cluster for a given workload. Use of local SSD is recommended for DSE Solr search. Use of a storage array with fast iSCSI support is recommended for ISS.

Search Host Failures
The imapd Elasticsearch client will round-robin between hosts listed in the  option; it will periodically retry Elasticsearch hosts that were previously down. The message store ISC client will failover to a backup host if more than one host is listed in the  option; otherwise conversion will be deferred and performed by   later. The DSE Solr client will use the  option to bootstrap a list of available DSE Solr servers; and DSE Solr uses the Cassandra Gossip protocol to stay current on the list of available Cassandra/Solr nodes as long as at least one of the nodes listed in   is online when the server process starts. When the ISS host is unavailable, search will be performed by classic search. With ISS, it is important to keep your JMQ broker running reliably.

High Availability
Elasticsearch has built-in high availability; a minimum cluster size of three nodes is recommended. DSE Solr search uses Cassandra&#x27;s high availability mechanism. A minimum cluster of three Cassandra-only and two Cassandra/Solr nodes is recommend for the msgindex table. ISS has no HA mechanism; a reliable filesystem is recommended and fallback to classic search occurs when ISS fails.

Data Growth
Elasticsearch and DSE Solr scale horizontally; as your data size grows you will need to add servers to your Elasticsearch or Cassandra clusters. For ISS, once the capacity of a message store or ISS server is exceeded; you must use the  tool to move users to a new message store with a new ISS server.

Reindexing
Elasticsearch will require a reindex when a classic store userid is renamed. However, rehostuser and renaming mailboxes will not require a reindex with Elasticsearch as long as source and target hosts share the same Elasticsearch cluster. Cassandra store supports separate external and persistent userids so DSE Solr will not require a reindex when the external userid is changed. ISS requires a reindex of impacted content when a userid is renamed, rehostuser is used, or a mailbox is renamed.

Stop Words
In order to reduce the size of the index, English stop words are not included in the index by default for Elasticsearch and DSE Solr search. Stop words in a search query will match no messages if they are part of a single search key or are in a boolean OR clause. For Elasticsearch, stop words in an AND clause will match all messages. Stop words used in a wildcard search will only match text containing a word that contains the stop word as a substring.

Whitespace
The classic search ignores whitespace, so search terms will match across whitespace breaks. Whitespace is significant to indexed search technologies.

Punctuation
The classic search is punctuation sensitive. The indexed search technologies treat most punctuation as whitespace equivalent.

Diacritical Sensitivity
The classic search is case and diacritical insensitive by default. The  option can be used to make classic search diacritical sensitive for certain languages. The indexed search technologies are diacritical sensitive by default.

Convergence/ISS Attachment Search
The attachment search feature is only available with ISS which is deprecated. If your deployment depends on this feature please contact Oracle support and explain how you use it to assist Oracle&#x27;s determination on whether to implement an equivalent feature with Elasticsearch.

Non-IMAP Search API
ISS supports a non-IMAP search API via an HTTP-based protocol. At this time, Oracle reserves the right to make incompatible changes to the elasticsearch and DSE/Solr indexing models if that is necessary to improve the product (the odds of such changes will decrease over time). If you have a business use case that requires an HTTP-based search API please contact Oracle support for consideration of such a request.

See also:
 * Store Index and search
 * diacritical_sensitive_languages Option
 * enable Option
 * hostlist Option
 * ischosts Option
 * prefix_search Option
 * solrconnectpoints Store Option
 * substring_search Option
 * suffix_search Option