Home / Blogs

Silly Bing

John Levine

Bing is Microsoft's newish search engine, whose name I am reliably informed stands for Bing Is Not Google.

A couple of months ago, as an experiment, I put up a one page link farm at wild.web.sp.am. As should be apparent after about three seconds of clicking on the links there, each page has links to 12 other pages, with the page's host name made of three names, like http://aaron.louise.celia.web.sp.am. The pages are generated by a small perl script and a database of a thousand first names. All the pages have the same IP address, although there could be about a billion (1000 cubed, since there are three names in each page name) possible domains. I forgot about it until earlier this week, when the disk with my web logs filled up.

My web logs are normally 10 to 15 megabytes a week, but all of a sudden the logs ballooned past a gigabyte. A quick look at the logs revealed that my web server was getting hammered by the bingbot.

Every search engine has a "spider" or "bot" that visits web pages to collect data for its index. It's quite normal to see a fair number of log entries from bots as various search engines wander around your web pages looking to see what's changed.

But it was not normal to see the bingbot hammering on my link farm, ten queries a second, day after day. When I noticed it, the bingbot had already visited about 15 million times, fetching 15 million nearly identical pages. I added a robots.txt file, telling bingbot to go away. It didn't help, which wasn't that surprising; since each page is in a different domain, each page could hypothetically have its own different robots file, so while the robots file should stop future indexing, it won't affect any pages that Bing had queued up from previous visits. How many did it have queued up? A lot. Bing scooped up over a million copies of the robots file, at which point I adjusted the web server configuration to return an error page when the bingbot tried to fetch a link farm page, but to return the robots file normally. Still didn't help, it fetched a lot of robots files and a lot of error pages, I think of different domains.

Since the link farm has its own IP address, it was easy to add low level packet filters to reject all traffic to that address from the 12 addresses of the bingbot. I unfiltered for a few minutes today, and it's still hammering as hard as ever.

While this isn't doing any great damage, if I didn't have the skills to look at logs and write suitable packet filters, or if I were paying by the byte for network traffic, it could have crashed my system or cost me a lot of money.

Bing is not the only search engine to have discovered my link farm. Google's Googlebot-Mobile/2.1 visits the link farm every few seconds, claiming to be various kinds of Japanese mobile phones. But Bing's traffic is orders of magnitude more than everyone else's put together. (This is just a problem for the link farm, the rest of my web sites get along with Bing just fine.)

My main question is how these highly sophisticated search engines have failed to notice that they have fetched several million almost identical pages from the same IP address and blacklist it. I have reason to believe that Bing management is aware of the issue, so maybe they'll stop it some time. Or maybe even let on what happened.

By John Levine, Author, Consultant & Speaker. More blog posts from John Levine can also be read here.

Related topics: Web

WEEKLY WRAP — Get CircleID's Weekly Summary Report by Email:

Comments

You've been BINGED! Phil Howard  –  Jul 16, 2012 1:04 PM PDT

You've been BINGED!

To post comments, please login or create an account.

Related Blogs

Related News

Topics

Industry Updates – Sponsored Posts

.nyc Goes Public to Brand the Big Apple

Mobile Web Traffic: A Dive Into the Data

Four Reasons to Move from .COM to Your .BRAND Domain

Dot Brand: Why Your Brand Needs Its Own Top-Level Domain

DotConnectAfrica's Executive Director Sophia Bekele Keynote Remarks for the ITU's Girl's ICT Day

Join dotMobi at World Hosting Days 2014, April 1 - 3

Social Networks Likely to Lose Grip on Brand/Consumer Conversations in Wake of New "Dot Brand" TLDs

dotMobi and Verio Introduce goMobi Mobile Website Solution in Europe

Small Business: Extracting More From An Online Presence

Maximizing the Mobile Web User Experience: Tips of the Trade

Neustar Names John Caldwell Vice President of Media and New Ventures

IP Geolocation: Four Reasons It Beats the Alternatives

Dyn Research: Where Do Companies Host Their Websites?

The Ratings Are In: Measuring .ORG's Trust and Success in Numbers

dot Brand or dot What? Consumers Unaware of New TLDs, Including .Google, .Microsoft and .Nike

RU-CENTER Launches Russian-Language Version of goMobi Mobile Website Builder

dotMobi Partners With the Endurance International Group, a Top U.S. Hosting Provider

.ORG Releases 94 Never-Before-Registered One- and Two-Character Domain Names

Directi Announces .pw as a Generic and Open TLD

DNS ROI: 5 Reasons Slow Website Speed Kills and Why Uptime Is a Necessity

Sponsored Topics

Minds + Machines

Top-Level Domains

Sponsored by
Minds + Machines
dotMobi

Mobile

Sponsored by
dotMobi
Verisign

Security

Sponsored by
Verisign
Afilias

DNS Security

Sponsored by
Afilias