Home / Blogs

Lies, Damn Lies, and Anti-Spam Vendor Press Releases

There’s a lot of chatter about a recent study purporting to show that 29.1% of internet users has bought something from spam. As ITWire reported, “Marshal were not only interested in how many people were purchasing from a spam source, but also what goods and services they were buying. Perhaps less surprisingly this revealed that sex and drugs sell well online.” But at downloadsquad, Lee Mathews discovered the shocking truth: “the survey only involved 600 people.”

Lee goes on to ask “is it worse that about 180 of those people bought products from spam, or that media outlets are willing to jump all over a statistic that comes from a sampling of less than .0001% of the roughly 360 million people currently using the internet?” I’d go with the latter. Vendors who make such outrageous claims only make it more difficult for the real facts to be revealed—and the facts are scary enough without any self-serving augmentation.

This article was originally posted on Box of Meat.

By J.D. Falk, Internet Standards and Governance

Filed Under

Comments

Small samples and statistics validity Valdis Kletnieks  –  Aug 22, 2008 1:10 PM

Given that the media are willing to jump all over polls that show one candidate for president is 3 or 4 percent ahead of the other, when those polls are based on similarly small sample sizes, yes, it is OK.

In reality, the “sample mean” (the value from the sample population) ends up being very close to the “population mean”, for relatively small numbers of samples - you can start doing some statistical tests with a sample size as small as 30.  That number called the “margin of error” is an estimate of how far the sample and population means are probably (usually to a 5% certainty).  For a sample size of 600, the margin of error is probably around 5% - which means that we can be 95% sure that if the sample said 29%, the real value is between 29-5 and 29+5, or the range 24-34%.  And if you think about it, it doesn’t really matter - we have the same problem if one out of four, or one out of three, are buying from spam.

A much *bigger* issue, and one that *cannot* be fixed by using a bigger sample, is “selection bias”.  For a survey to be statistically valid, you have to establish that the sample reflects the population.  For some things like quality control on a production line, it’s pretty easy - pick a random 1 of 200 widgets for testing, and you’re done.  For things like political surveys, you have to pre-filter your sample for “likely voters” - you *don’t* want to include children, non-citizens, felons, and others not able or likely to vote.  If it’s a survey on a website, it becomes more complicated - first, you have the fact that the people who visit the website may not be representative of Internet users in general (for instance, how did they *find* the survey to take it)?  If the survey was advertised via a banner ad or mass mailing, then the results are even more likely to be biased - because the people who responded are the people who respond to the same techniques when used by spammers.  So all you’ve proven is “the people who click on banner ads and reply to spam survey e-mails are likely to click on banner ads and reply to spam e-mails”.

And even if they had 100,000 responses rather than 600, *THAT* flaw would still remain.

(There’s other survey design issues, such as “are people likely to have replied honestly?”.  How do you filter out pranksters who say “yeah, I’ve bought

” even though they never actually have?  However, the selection bias issue is big enough to overwhelm those other issues).

Comment Title:

  Notify me of follow-up comments

We encourage you to post comments and engage in discussions that advance this post through relevant opinion, anecdotes, links and data. If you see a comment that you believe is irrelevant or inappropriate, you can report it using the link at the end of each comment. Views expressed in the comments do not represent those of CircleID. For more information on our comment policy, see Codes of Conduct.

CircleID Newsletter The Weekly Wrap

More and more professionals are choosing to publish critical posts on CircleID from all corners of the Internet industry. If you find it hard to keep up daily, consider subscribing to our weekly digest. We will provide you a convenient summary report once a week sent directly to your inbox. It's a quick and easy read.

I make a point of reading CircleID. There is no getting around the utility of knowing what thoughtful people are thinking and saying about our industry.

VINTON CERF
Co-designer of the TCP/IP Protocols & the Architecture of the Internet

Related

Topics

Domain Names

Sponsored byVerisign

Brand Protection

Sponsored byCSC

Cybersecurity

Sponsored byVerisign

IPv4 Markets

Sponsored byIPv4.Global

DNS

Sponsored byDNIB.com

New TLDs

Sponsored byRadix

Threat Intelligence

Sponsored byWhoisXML API