"Many Google teams provide pieces of the spam-protection puzzle, from distributed computing to language detection. For example, we use optical character recognition (OCR) developed by the Google Book Search team to protect Gmail users from image spam. And machine-learning algorithms developed to merge and rank large sets of Google search results allow us to combine hundreds of factors to classify spam," explains Google. "Gmail supports multiple authentication systems, including SPF (Sender Policy Framework), DomainKeys, and DKIM (DomainKeys Identified Mail), so we can be more certain that your mail is from who it says it's from. Also, unlike many other providers that automatically let through all mail from certain senders, making it possible for their messages to bypass spam filters, Gmail puts all senders through the same rigorous checks."
- Official Gmail Blog: How our spam filter works
- A Distributed Bayesian Spam Filtering using Hadoop Map/Reduce
- or Parallelizing Support Vector Machines on Distributed Computers
- Sender Reputation in a Large Webmail Service
- Spam Filtering using Google/GMAIL