|
I just installed SpamBayes and I was quite impressed by the results. This spam filter tackles the problem using a statistical approach, based on the Bayes theory.
When a new email is received, it is scanned into tokens, words or groups of words. Depending of each of these tokens spam probability found in the local database, the total email spam probability is computed and the email is filtered accordingly.
The beauty of the algorithm is that the database is created by you, when you indicate to SpamBayes which messages are spam and which are not. Basically you must train it initially so that it performs well and the filter adapts itself depending of the kind of messages you are interested in or not... The more you have received messages and correctly filtered them, the better the filter becomes and the less you have to interact with it.
Few links on this topic:
-
Very interesting article from Paul Graham, the author of the Bayesian filtering algorithm.
- SpamBayes, an open source implementation of Paul Graham algorithm in Python. Project hosted by sourceforge. Spambayes can run with most email clients and platforms using a pop proxy, can also be integrated in MS outlook as a pluggin.
- Popfile, other implementation (in perl), also hosted at sourceforge.
- Interesting comparison of SpamBayes and Popfile.
|