Home / Blogs

Google's Most Popular and Least Popular Top-Level Domains

Bill Slawski

What are the most popularly used top-level domains (TLDs), or at least, which are the ones that show up on pages indexed in Google?

I wondered this yesterday after seeing a news article stating that the registration of .cn (china) top-level domain names topped 1 million for the first time ever by the end of 2005. The seed for my wonderment was probably planted when EGOL, at Cre8asite Forums, asked about using a '.info' top-level domain earlier that day.

So I decided to check to see which were the most popular in Google, since that was the easiest place to get some statistics.

I found a couple of lists of top-level domains (generic TLDs and country code TLDs), and searched for the number of results that appeared in Google, using the advanced "site" search operator and my TLD lists. For example, a search for "site:.com" without the quotation marks might show me approximately how many pages appear in Google's index that are on sites using a ".com" top level domain.

I have listed the 20 most popular, then the 20 least popular, and finally the whole list.

By Bill Slawski, Internet Consultant. More blog posts from Bill Slawski can also be read here.

Related topics: DNS, Domain Names, Registry Services, Top-Level Domains

WEEKLY WRAP — Get CircleID's Weekly Summary Report by Email:

Comments

Re: Google's Most Popular and Least Popular Top-Level Domains Jothan Frakes  –  Jan 13, 2006 5:40 PM PDT

William, this is some good research.

In the bottom 20 were some extensions that aren't delegated to someone to manage, like .ax,.eh,.kp,.cs, and most interesting is that there was anything in .ap.  .ap is one of ten ISO3166 list additions that were set up for WIPO that no TLDs exist for.

Re: Google's Most Popular and Least Popular Top-Level Domains Geoffrey Sisson  –  Jan 13, 2006 7:18 PM PDT

Interesting — .uk has the most pages indexed of any ccTLD, nearly three times as many as the next-highest ranked ccTLD (.ca).  However, .de has nearly double the number of domain names registered as .uk.  I wonder if this reflects a Google bias (perhaps inadvertent, or at least benign) towards indexing English-language web pages?

Re: Google's Most Popular and Least Popular Top-Level Domains Jaeyoun Kim  –  Jan 14, 2006 7:25 AM PDT

I am very wondering why there are about 120 search results for .kp (Korea, Democratic People’s Republic). As Jothan mentioned, it was never delegated.

One more interesting thing is that there are only about 13 millions search results for .kr (Korea, Republic of). This shows that Google's search results for non-English web pages are still not very reliable.

Re: Google's Most Popular and Least Popular Top-Level Domains Daniel R. Tobias  –  Jan 15, 2006 9:35 PM PDT

Nonexistent addresses seem to be able to get into the Google index somehow; maybe they add any site that anybody links to, whether it really exists or not.  The search results for .iq include some obvious "joke" entries like phrases ending in "low.iq".  None of them actually work when you try to follow the links from the Google results.

Re: Google's Most Popular and Least Popular Top-Level Domains Bill Slawski  –  Jan 15, 2006 11:02 PM PDT

Thanks for the comments.  I wasn't sure what I would find when I originally started collecting this information, but have received a few comments here, and on the blog that have made me think some more about what I'm seeing.

Looking at the sites that are actually listed in some of those cctlds with very small numbers of results, this does seem to tell us more about Google than it does about top level domains.

Google will index URLs that it finds on pages even though the site isn't available.  That I knew. Sometimes pages aren't available for one reason or another.

But, it also looks like Google will index URLs that shouldn't even exist.  That surprised me a little.  I just conducted a search for "site:.xxx" (without the quotation marks) and received 852 results.  A search for "site:.xyz" comes back with another 98 results. 

The URLs returned all shared some of characteristics in common, which I see on some of the other cctlds that don't have many results because they may not have been delegated, or have expired.  These are:

Use of the URL as the snippet title.
Lack of any snippets of text from the sites themselves.
Lack of a link to a cache of the page.

Those characteristics are usually good indicators that Google has found a link to a page, but had problems visiting the page.  I'm surprised that they would include URLs with tlds that are nonexistent, like the "xyz" one I mentioned above. 

While the numbers of results are small, I would have expected the search engine to filter out some of these results, or even more likely, ignore them in an earlier stage when it is crawling pages and collecting URLs for indexing.  It's possible that the effort involved in doing that isn't worth the processing power.  It's probable that having these results in an index likely doesn't affect the relevancy of results of too many searches.

Geoffrey's comparison of indexed pages as opposed to registered sites is interesting.  I don't know if we can conclude search engine bias in the indexing of sites based upon language from the data that I collected.

There are a number of sites that generate pages dynamically, calling them forth from a database, and depending upon how the pages are set up, could be said to have an almost infinite amount of pages to index.  For instance, if a site can serve pages that display multiple data variables in the URLs to those pages, a page could be included multiple times based upon different orderings of those data variables.  Or, if a site uses session IDs, and the search engine has access to those session IDs in the URLs of pages, it could index many pages more than once.

But, when a search engine crawler collects pages to index, it will follow a number of different importance metrics that tells it which pages are important and which to visit next.  Those should keep it from trying to grab too many pages from a site that might have one of the problems I mentioned above.

I'm really not sure how useful this data is, but I'll probably look at these numbers again in a few months to see if they have changed dramatically.

Re: Google's Most Popular and Least Popular Top-Level Domains Robert Martin-Legene  –  Jan 20, 2006 6:31 AM PDT

I tried site:.dk using google.com which gave the 19m you got too. Then I tried the same query using google.dk and got 42m. Then I just pressed reload a few times in the google.dk-window and it turns out it went back and forth between 19m and 42m. I guess Google has a little problem there.

Unfortunately this could mean that the data you collected isn't totally reliable.

Re: Google's Most Popular and Least Popular Top-Level Domains Bill Slawski  –  Jan 20, 2006 12:57 PM PDT

Hi Robert,

It sounds like you were receiving result amounts from different data centers, which Google will do while load balancing.

One of the interesting things about collecting data like this, and sharing it with others is that you get a good number of views, conclusions, opinions, and sometimes even conflicting information, like yours.

A couple of other folks have pointed out to me that they are seeing very different numbers from other data centers.  Not every data center has the same information within it, and not every Google data center uses exactly the same algorithm to serve results to searchers.  I knew that going into this. 

What I find interesting is that some of the numbers vary drastically from one data center to another.  Someone reported in a comment on the blog post that they were seeing 7 billion results for a "site:.com" search.  So, I tried using a number of different data centers to see how much difference I could spot, and my results varied by more than a billion results from one data center to another. 

So yes, my collection of data might tell us a little more about the reflections of the web that Google holds in its index, and not as much about the actual web.  We know that there are large parts of the web that aren't indexed at all, and that there are inaccuracies in what the search engines have indexed.

I'd love to see some accurate numbers about the actual page amounts used in the differet tlds.  The closest method I could come up with to getting an approximation was to look at one of the tools that we use to search the web.  It's been helpful to define some of the limitations of that approach, like this issue with different data centers that you raise.  Thanks.

To post comments, please login or create an account.

Related Blogs

Related News

Topics

Industry Updates – Sponsored Posts

General Availability Kicks Off for .Website, .Press and .Host

New .ORGANIC Top-Level Domain Welcomes Leading Brands As .ORGANIC Pioneers

Dot Chinese Online and Dot Chinese Website Featured in EURid's World Report on IDNs 2014

New .ORGANIC Top-Level Domain Opens to Serve the Organic Community

Independent Endorsement of Dot Chinese Online & Dot Chinese Website by by FiarWinds Partners

New gTLDs and Best Practices for Domain Management Policies (Video)

.Host Announces Top Global Players As Pioneer Partners

Public Interest Registry Releases Bi-Annual Report, .Org Domain Registrations Pass 10.4 Million

Public Interest Registry to Speak About Upcoming Launch of .ngo and .ong Domains for NPOs

Landrush Opens for .Website, .Press and .Host

Afilias Announces General Availability of .BLACK Top-Level Domain

Nominum Announces Future Ready DNS

Last Lap of .WEBSITE, .PRESS and .HOST Sunrise

DotConnectAfrica Trust Responds to ICANN 50 GAC Advice, Updates on .Africa Application IRP Status

New .ORGANIC Domain Sunrise Begins, Creating Verified Space 
for Organic Products and Services

Non-English "IDN Email" Addresses Are Finally Working!

TLD Registry to Speak at Inaugural World Domain Day India

Independent Endorsement of Dot Chinese Online & Dot Chinese Website

ICANN London Recap Webinar

Four Reasons to Move from .COM to Your .BRAND Domain

Sponsored Topics