Home / Blogs

Lies, Damn Lies, and Anti-Spam Vendor Press Releases

J.D. Falk

There's a lot of chatter about a recent study purporting to show that 29.1% of internet users has bought something from spam. As ITWire reported, "Marshal were not only interested in how many people were purchasing from a spam source, but also what goods and services they were buying. Perhaps less surprisingly this revealed that sex and drugs sell well online." But at downloadsquad, Lee Mathews discovered the shocking truth: "the survey only involved 600 people."

Lee goes on to ask "is it worse that about 180 of those people bought products from spam, or that media outlets are willing to jump all over a statistic that comes from a sampling of less than .0001% of the roughly 360 million people currently using the internet?" I'd go with the latter. Vendors who make such outrageous claims only make it more difficult for the real facts to be revealed — and the facts are scary enough without any self-serving augmentation.

This article was originally posted on Box of Meat.

By J.D. Falk, Internet Standards and Governance. Visit the blog maintained by J.D. Falk here.

Related topics: Email, Spam

WEEKLY WRAP — Get CircleID's Weekly Summary Report by Email:

Comments

Small samples and statistics validity Valdis Kletnieks  –  Aug 22, 2008 5:10 AM PST

Given that the media are willing to jump all over polls that show one candidate for president is 3 or 4 percent ahead of the other, when those polls are based on similarly small sample sizes, yes, it is OK.

In reality, the "sample mean" (the value from the sample population) ends up being very close to the "population mean", for relatively small numbers of samples - you can start doing some statistical tests with a sample size as small as 30.  That number called the "margin of error" is an estimate of how far the sample and population means are probably (usually to a 5% certainty).  For a sample size of 600, the margin of error is probably around 5% - which means that we can be 95% sure that if the sample said 29%, the real value is between 29-5 and 29+5, or the range 24-34%.  And if you think about it, it doesn't really matter - we have the same problem if one out of four, or one out of three, are buying from spam.

A much *bigger* issue, and one that *cannot* be fixed by using a bigger sample, is "selection bias".  For a survey to be statistically valid, you have to establish that the sample reflects the population.  For some things like quality control on a production line, it's pretty easy - pick a random 1 of 200 widgets for testing, and you're done.  For things like political surveys, you have to pre-filter your sample for "likely voters" - you *don't* want to include children, non-citizens, felons, and others not able or likely to vote.  If it's a survey on a website, it becomes more complicated - first, you have the fact that the people who visit the website may not be representative of Internet users in general (for instance, how did they *find* the survey to take it)?  If the survey was advertised via a banner ad or mass mailing, then the results are even more likely to be biased - because the people who responded are the people who respond to the same techniques when used by spammers.  So all you've proven is "the people who click on banner ads and reply to spam survey e-mails are likely to click on banner ads and reply to spam e-mails".

And even if they had 100,000 responses rather than 600, *THAT* flaw would still remain.

(There's other survey design issues, such as "are people likely to have replied honestly?".  How do you filter out pranksters who say "yeah, I've bought " even though they never actually have?  However, the selection bias issue is big enough to overwhelm those other issues).

To post comments, please login or create an account.

Related Blogs

Phish or Fair?

The FBI and Scotland Yard vs. Anonymous: Security Lessons

DMARC: New Email Authentication Protocol

The State of Mail Database Marketing

IP Address Reputation Primer

Related News

Topics

Industry Updates – Sponsored Posts

MarkMonitor Fraud Intelligence Report Released for Q2 2011

Dyn Releases New Powerhouse in Enterprise Class Email Delivery

The Botnet-Counterfeit Drugs Connection

Global Company Leads the Pack as One of the First Microsoft Partners to Offer Exchange 2010

Dyn Inc. Acquires Email Delivery Provider SendLabs

Afilias and .JO Registry Bring Native Language E-mail to Arabic Internet Users

New Monthly Fraud Intelligence Report Now Available

MarkMonitor to Highlight Importance of Cross-Functional Approach to Brand Protection

Preventing Your DNS Account from Being Hacked

Paid Search Ads Can Lead to Fake Goods

Open Phishing Season

.ORG Highlighted for Success in Fighting Phishing

Latest Brandjacking Index Examines How Fraudsters Abuse Financial Brands

New Report Shows .INFO Domain Safest from Phishing Attacks

MarkMonitor AntiFraud Solutions, Combining Proven Antiphishing and Expert Antimalware Capabilities

MarkMonitor AntiFraud Solutions Combine Proven Antiphishing and Expert Antimalware Capabalities

COCC Partners with MarkMonitor for Anti-Phishing Services

ICANN Mexico City Meeting Brings a Significant Shift in Direction for Brand Rights Holder Issues

MarkMonitor Year-in-Review Report Finds Online Abuse of Major Brands Was a Growth Industry for Fraud

Committed to Keeping the Internet a Safe Place

Hot Topics

Minds + Machines

Top-Level Domains

Sponsored by
Minds + Machines
dotMobi

Mobile

Sponsored by
dotMobi
Afilias

DNSSEC

Sponsored by
Afilias
Neustar UltraDNS

DNS

Sponsored by
Neustar UltraDNS
Verisign

Security

Sponsored by
Verisign