The English noun "spam" in the Spam filter can be reproduced with the German word waste. Originally it meant canned meat. These are unwanted electronic messages in the IT area; ie they are delivered without the wishes of the recipient. Mostly they contain advertising. According to research by the Hamburg statistics portal Statista, the number of worldwide spam emails per year was 2014 28 billion. It is a global problem that is solved with the help of a spam filter; Specifically, unwanted messages are to be sorted out by a computer program. The cause of such unwanted mail is referred to as a spammer, the process as spamming or spamming.
Areas of application of a spam filter
Classically speaking, the use of a spam filter was limited to the sorting out of unwanted emails. For this purpose, algorithms were used to construct modules for e-mail programs and mail servers. However, as the importance of advertising on the Internet has increased in the past more and more, newer programs also filter pages. Specifically, spam filters are also used for web browsers, wikis and blogs.
Working methods of a spam filter
Spam filters access information that is directly related to an email. On the one hand, this can be the content of the mail itself, but the originator of a message can also be checked to a limited extent. Three methods have been established:
a) The Blacklist method, A blacklist is a "blacklist" that is synonymous with an unwanted contact. In terms of content, such a list lists certain terms and keywords. An algorithm searches a mail for these keywords; if he finds such, this leads to a mail being sorted out. The same procedure can be extended to the sender. Many spam filters that use the blacklist method already contain a large amount of data. Users can extend this partially according to their personal needs.
b) The Bayesian Filter Method. The Bayesian filter method is based on the theory of probability and requires the cooperation of the user, especially at the beginning of the assignment. If set correctly, it is superior to the blacklist method. The user must classify received mails as spam or non-spam. In the background, the Bayessian filter learns the rules without any intervention in the algorithms. After about 1.000 itself sorted mails, the filter works independently. The Bayesian filter also continues to learn as part of subsequent re-sorting.
c) The database based solutions. Advertising emails in particular contain a series of data that should lead to a concrete contact. These include, above all, the URL of a website and the telephone number. Database-based solutions allow algorithms to search for this information. If they are found, mails are sorted out. The success rate of such procedures can be described as very good. It is true that you can redesign advertising emails over and over again and thus in an unlimited number; however, certain dates always remain the same.
Error rates of spam filters
Spam mails have become increasingly sophisticated in the past. As a result, the spam filter application needs to evolve. This is associated with effort and costs, which is why some providers for a possible service charges. In addition, sorting by means of programs is associated with an error rate, which can be reduced by a training. False negative detection is when spam emails reach the regular inbox; False positive detection, on the other hand, refers to the process of holding normal mail as spam. While optimization measures reduce the false negative detection error rate to ten to one percent, the false positive bias tends to zero.
A well-known spam filter is for example SpamAssassin, which is used by most eMailprovidern.