StATS: Technology to end spam (March 8, 2005)

In my job I get a lot of spam, partly because I listed my email address on my web site until just recently. The research community is trying to find technological solutions to spam (unsolicited commercial email), and some of the approaches are quite fascinating. The folks at Microsoft have looked at a system that limits the amount of email that someone can send out in a single day by asking the sender to solve a moderately difficult computational challenge for each piece of email sent. Emails sent out this way would encourage the reader to open up the message, because the sender expended a moderate amount of effort (ten seconds of CPU time) in order to get the message to you. Such a system tells you that you are not just one of a million different recipients of the same commercial pitch.

The technical details are at

Another interesting approach uses Bayesian Statistics to produce a probability estimate that the message is spam. This approach looks at words that appear commonly in spam messages and uncommonly in legitimate messages.

There is a nice article about email spoofing (making an email look like it is coming from a different person's account). Spoofing is a way that spammers hide their tracks and can also be used to try to get someone in hot water. Spoofing is illegal, but you have to track down the person who did it, which is not always easy.

Further reading about spam:

Update: September 14, 2017. I had originally listed some commercial products here, but it is probably a mistake to talk about these products without having reviewed them in detail. So I have removed those links.

This page was written by Steve Simon while working at Children's Mercy Hospital. Although I do not hold the copyright for this material, I am reproducing it here as a service, as it is no longer available on the Children's Mercy Hospital website. Need more information? I have a page with general help resources. You can also browse for pages similar to this one at Category: Bayesian statistics.