Fighting Image Spam
The Issue
Spammers are developing more and more sophisticated methods to avoid
filters. Generally, this entails attempts at sending out e-mail "waves" in
which each and every e-mail is in some way unique and different from its
predecessors. The relative success of each wave is then analyzed by the
spammer and the resulting finds become "features" of the next spam wave.
New methods of detecting spam waves, such as extracting their core characteristics and pushing these characteristics out to clients as spam signatures, are in the final phase of development. Attempts are also being made at finding methods to predict spam changes.
Many of the filtering methods used by BitDefender have become more robust at dealing with all of the little variations encountered in spam flows. However, in 2006 there has been an increase in image spam. Simple e-mails with apparently similar images (but unique, judging by their computational differences) started polluting our inboxes in large quantities.
At the time image spam-fighting techniques were just emerging, an effective image spam detection method was that of making signatures based on the image metadata. However, given that the BitDefender antispam lab have, in the meantime, found in-the-wild spam e-mails using fresh new techniques of image poisoning intended to defeat spam filters, an entirely new technology is now needed to defeat this new development.
The Original Approach In 2005, "image spam" accounted for approximately 10% of the total amount of spam. Such message series typically consisted of about 5-6 spam images with some minor In recent months, however, spammers have noticed that many of the current antispam solutions are almost ineffective against this new trick so they have started attacking this niche in earnest. Image spam has increased to 30-40% of the total amount of circu- lating spam, with random noise changing with almost every image sent. Detection rates have dropped even further, from more than 97% to almost 65-75%.
Spam images usually contain pictures of Viagra pills, computer hardware, pornographic images, or just the classical spam message (some text and a URL) but written in a noisy image.
To do any sort of content analysis on such e-mails Image Spam evolution in the
would mean, on the face of it, that the pictures need to be run through an first 8 months of 2006
optical character recognition (OCR) module. Yet common OCR filters are computationally expensive and their accuracy leaves much to be desired.
BitDefender's ApproachFor a more reliable detection, BitDefender offers an alternative to OCR, namely a filter which ignores the text within images (the message, from a human point of view) and instead learns by experience some common characteristics of the Common „noising" techniques: images proper.
• Adding random pixels in the image This alternative relies on the use of two techniques, histogram* extraction and his- • Animated GIFs with noisy bogus frames togram comparison, which have proved to be fruitful, over the time, in applications • Similar colors between different parts of that involve image processing.
the text in the image They are general y used in content-based image retrieval (e.g. extracting al pictures • A long line at the end of the image (some of dolphins from a set of vacation photos), with a rather high false positive rate. kind of border) with random parts missing Therefore, considering them as instruments in an AntiSpam solution was quite • Splitting the image into subimages and problematic at first as false positives meant lost e-mails for the user, which was not using the table facilities in HTML to to be taken lightly.
Experimentation has revealed that a new formula derived from these techniques, • Sending different sizes of the same image cal ed SID (short for Spam Image Distance) can be relied upon to produce few false • Image poisoning - inserting legitimate pictural content such as company logos in The Spam Image Distance algorithm picks out images based on their resemblance spam messages.
in point of quantity of similar colors rather than in point of shape content. From a • Sending noisy legitimate pictures to SID perspective, for instance, although al pictures of printed pages look somewhat alike, being white or off-white, with some quantity of a darker grey, a page of the • Sending legitimate pictures with content Encyclopedia Britannica does not look quite like a page of a text ad, because the close to spam (e.g. mortgage images from proportions of white and grey are so different. legit mortgage companies) SID is used to compare images and assess the "distance" between them, which essential y means finding out how dissimilar they are. The distances found based on the SID formula are used to compare images already included in the spam database to new images which might be spam. If the image analysis returns a score lower than a given threshold, then the image is added to the BitDefender spam images database. That is why SID is the technique of choice when dealing with spam images which are variations of other, older spam images. While this new technique can be shown to perform wel on "clean" images, there remains the problem of images having undergone obfuscation (e.g. noise adding). *)A histogram can be defined as a list of Fortunately, the obfuscation techniques used by spammers are wel -known and the colors and their relative preponderence in arsenal of countermeasures is similarly wide. For instance, spammers wil split an an image; it indicates what colors and how image into subimages and embed them into an HTML table to reconstruct the initial many pixels of a given color exist in that image. This problem can be tackled with by stitching together the histograms of the subimages, reconstructing the histogram of the initial image and then applying a SID- based analysis on the resulting composite histogram.
Detection RatesThis patent pending technology shows a 98.7% detection rate on the BitDefender cor-pus of spam images (a few mil ion samples extracted from real spam). 1.23% of these images are malformed, which means that their histograms cannot be extracted but they cannot be displayed either. A further 0.07 represent false positive results. If images that are malformed are deleted from the corpus, the detection rate quickly jumps to 100%.
6301 NW 5th Way, Suite 3500 With such promising results, the SID algorithm is a worthwhile addition to the Fort Lauderdale, Florida 33309 arsenal of any modern antispam solution and the advances in noise reduction are Phone: +1 (1) 8003-888062 expected to further improve the potential of this already very useful tool.
Fax: +1 (1) 8003 888064


LOCOS DEL TURF LOCOS DEL TURF ABR 2016 #038 Argentina, C.A.B.A Resumen Hipico Todo el turf argentino en un sólo lugar HISPANIDAD SIGUE LOS PASOS LA JOYA DE DON PAYO DE SU HERMANO HI HAPPY LOCOS DEL TURF LOCOS DEL TURF Argentina, C.A.B.A Resumen Hipico Todo el turf argentino en un sólo lugar

Marketing Bulletin Lexmark 2016 Supplies Advantage Program for Independent Dealers selling Lexmark Supplies and Genuine Lexmark Parts. Lexmark's 2016 Supplies Advantage Program is designed to incent and reward your business for selling Lexmark Toner Supplies and Genuine Lexmark Parts purchased from Lexmark Authorized Distributors.