System Structure

Indexing

The mailbox servers detect the arrival of jobs (emails) to the system and send a notification to the search system for every job. The master servers are responsible for accepting jobs and sending them to the indexer servers. The indexers perform the actual parsing and indexing of the emails and save the index data to disk. Users are not tied to any particular server; the indexing system is capable of spreading an individual's index across any number of servers. This makes indexing fast and scalable.

Searching

Lucene provides a very powerful searching API, called ParallelMultiSearcher, which allows us to search multiple indexes from a remote machine (in this case, the Query Master) via Java RMI calls. A query is typed into the search box via Noteworthy Webmail , sent to the query master, and then distributed to all of the indexer machines. The indexers return the results to the query master, which aggregates the results in sorted order. The master then returns these results to the Noteworthy Webmail application.

Backup/Failover

The system is designed so that any machine can be turned off without affecting system performance or results. All jobs are replicated across machines and the masters have failover machines in place.

Search Infrastructure