Buying or Selling IPv4 Addresses?

Watch this video to discover how ACCELR/8, a transformative trading platform developed by industry veterans Marc Lindsey and Janine Goodman, enables organizations to buy or sell IPv4 blocks as small as /20s.

Avenue4 LLCRead Message Promoted Post

Home / Blogs

Silly Bing

John Levine

Bing is Microsoft's newish search engine, whose name I am reliably informed stands for Bing Is Not Google.

A couple of months ago, as an experiment, I put up a one page link farm at wild.web.sp.am. As should be apparent after about three seconds of clicking on the links there, each page has links to 12 other pages, with the page's host name made of three names, like http://aaron.louise.celia.web.sp.am. The pages are generated by a small perl script and a database of a thousand first names. All the pages have the same IP address, although there could be about a billion (1000 cubed, since there are three names in each page name) possible domains. I forgot about it until earlier this week, when the disk with my web logs filled up.

My web logs are normally 10 to 15 megabytes a week, but all of a sudden the logs ballooned past a gigabyte. A quick look at the logs revealed that my web server was getting hammered by the bingbot.

Every search engine has a "spider" or "bot" that visits web pages to collect data for its index. It's quite normal to see a fair number of log entries from bots as various search engines wander around your web pages looking to see what's changed.

But it was not normal to see the bingbot hammering on my link farm, ten queries a second, day after day. When I noticed it, the bingbot had already visited about 15 million times, fetching 15 million nearly identical pages. I added a robots.txt file, telling bingbot to go away. It didn't help, which wasn't that surprising; since each page is in a different domain, each page could hypothetically have its own different robots file, so while the robots file should stop future indexing, it won't affect any pages that Bing had queued up from previous visits. How many did it have queued up? A lot. Bing scooped up over a million copies of the robots file, at which point I adjusted the web server configuration to return an error page when the bingbot tried to fetch a link farm page, but to return the robots file normally. Still didn't help, it fetched a lot of robots files and a lot of error pages, I think of different domains.

Since the link farm has its own IP address, it was easy to add low level packet filters to reject all traffic to that address from the 12 addresses of the bingbot. I unfiltered for a few minutes today, and it's still hammering as hard as ever.

While this isn't doing any great damage, if I didn't have the skills to look at logs and write suitable packet filters, or if I were paying by the byte for network traffic, it could have crashed my system or cost me a lot of money.

Bing is not the only search engine to have discovered my link farm. Google's Googlebot-Mobile/2.1 visits the link farm every few seconds, claiming to be various kinds of Japanese mobile phones. But Bing's traffic is orders of magnitude more than everyone else's put together. (This is just a problem for the link farm, the rest of my web sites get along with Bing just fine.)

My main question is how these highly sophisticated search engines have failed to notice that they have fetched several million almost identical pages from the same IP address and blacklist it. I have reason to believe that Bing management is aware of the issue, so maybe they'll stop it some time. Or maybe even let on what happened.

By John Levine, Author, Consultant & Speaker
Related topics: Web
SHARE THIS POST

If you are pressed for time ...

... this is for you. More and more professionals are choosing to publish critical posts on CircleID from all corners of the Internet industry. If you find it hard to keep up daily, consider subscribing to our weekly digest. We will provide you a convenient summary report once a week sent directly to your inbox. It's a quick and easy read.

I make a point of reading CircleID. There is no getting around the utility of knowing what thoughtful people are thinking and saying about our industry.

Vinton Cerf, Co-designer of the TCP/IP Protocols & the Architecture of the Internet

Share your comments

You've been BINGED! Phil Howard  –  Jul 16, 2012 12:04 PM PST

You've been BINGED!

To post comments, please login or create an account.

Related

Topics

DNS Security

Sponsored byAfilias

Cybersecurity

Sponsored byVerisign

IP Addressing

Sponsored byAvenue4 LLC

Mobile Internet

Sponsored byAfilias

Promoted Post

Buying or Selling IPv4 Addresses?

Watch this video to discover how ACCELR/8, a transformative trading platform developed by industry veterans Marc Lindsey and Janine Goodman, enables organizations to buy or sell IPv4 blocks as small as /20s.