Home / Blogs

Search Engines and Registrars Getting Creative with Whois Database?

One of the best sources of information about sites on the web is the Whois database. A trio of patent applications from Go Daddy, published last week at the US Patent and Trademark Office, explores whether adding additional information to the Whois database might help reduce spam, phishing, and other fraudulent practices and improve search engine results.

The patent filings from Go Daddy would add reputation information to the published Whois data to let others use it for a number of reasons, including enabling search engines incorporate it into their ranking mechanisms.

While reading through those documents, I recalled the sections of a patent application from Google last year, which indicated that the search engine may be looking at Whois to help them rank pages. Google cited a number of ways that they could use Whois information in Information retrieval based on historical data.

Is this something that either company can do? Is it a use consistent with the way that Whois information is supposed to be used?

I don’t know how easy it would be to set up the processes described by Go Daddy, or verify the reputation information that they describe, and maintain the records the system would depend upon. But is it a point even worth wondering now?

The purpose of Whois information

A recent decision by the folks at ICANN to limit the use of Whois information makes it seem unlikely that that the scenarios envisioned by these documents will happen. ICANN’s Generic Names Supporting Organization held a vote in which they decided upon the sole purpose of Whois information:

“The purpose of the gTLD Whois service is to provide information sufficient to contact a responsible party for a particular gTLD domain name who can resolve, or reliably pass on data to a party who can resolve, issues related to the configuration of the records associated with the domain name within a DNS nameserver.”

Does this vote also potentially quash Google’s use of Whois information as described in their patent application on Historical Data?

The Go Daddy documents

The patent applications from Go Daddy were originally filed with the patent office in October of 2004, and contain an expansive view of Whois information, where registrars could add information about the domain names under their control.

It would be accessible to the public and search engines, and might contain material on a site (or even on individual URLs) from the registrar and from organizations like TRUSTe, VeriSign, SenderBase.org, Spamcop, and others. This could include data on the amount and frequency of spam from a specific domain as well as complaints about spam, phishing.

It would also contain information about the content of a site. The patent applications list the following content types as examples of what might be included:

alcohol, tobacco, adult related terms, violence, hate, intolerance, fraud, racism, etc.

In addition, a score would be given to the site based upon this reputation information that search engines could use in their rankings of pages.

The Go Daddy patent applications were published at the US Patent and Trademark Office on May 4, 2006, with Warren Adelman and Michael Chadwick listed as inventors. They differ in their abstract and claims sections, the description sections for all three are the same.

Publishing domain name related reputation in Whois records (US Patent Application 20060095459)

Tracking domain name related reputation (US Patent Application 20060095586)

Presenting search engine results based on domain name related reputation (US Patent Application 20060095404)

While I was reading through these applications, some sections had me wondering if such a system would be easily abused. I hadn’t considered this broadening of Whois information in light of the original intent behind Whois. It is an interesting approach, though. While the Go Daddy method described in the patent applications would increase the amount of information that could be shared with people, a recent CircleID article noted a strong movement towards less access to Whois information.

Should Whois information be used in this manner? CircleID has a large number of articles on the use of Whois data in their Privacy Matters section, but I didn’t see anything there though about the use of Whois information by search engines.

Google and Whois information

The Google patent application from last March, Information retrieval based on historical data, detailed a number of ways that Whois information could be used to aid in the rankings of web pages.

The patent application from Google focuses upon fighting web spam using a wide range of data, including that associated with domain names. They would look at information like as the length of the registration of a web site, or other aspects of the registration, such as:

  • Whether physically correct address information exists over a period of time,
  • Whether contact information for the domain changes relatively often,
  • Whether there is a relatively high number of changes between different name servers and hosting companies,
  • Whether there is known-bad contact information, name servers, and/or IP addresses associated with a domain.

Their inquiry would also view information about name servers as a way to determine if a domain is “legitimate,” such as the length of time of a domain on a name server, or:

  • Whether there is a a mix of different domains from different registrars and have a history of hosting those domains,
  • Whether the name server hosts mainly pornography or doorway domains or domains with commercial words
  • Whether it contains primarily bulk domains from a single registrar,
  • Whether the name server is brand new.

We can’t really be certain that Google is presently using this information, but there are some indications that they may be. A Search Engine Watch Forums thread, Does New Google Patent Validate Sandbox Theory?, discusses the topic, and a Search Engine Roundtable post goes into more details on the use of that information: Google Admits to Improve Search Quality with Registrar Data. Google became a domain name registrar last year, but announced that their intent for doing so had little to do with registering domain names.

Conclusion

Adding reputation information to the Whois database or using it to rank web pages appears at odds with the findings of the GNSO task force that explored the purpose of Whois information.

That doesn’t bode well for the expansion of Whois information as described in the Go Daddy patent applications. What meaning might it have for the use of a search engine to use the data in Whois to rank web pages? We don’t know for certain if Google has been using Whois information in the manner described in their patent application.

If Google is using Whois information, will this vote from ICANN’s Generic Names Supporting Organization force their use to change? Or will further discussions about how Whois information can be used by registrars be required? It looks like those are areas being explored by ICANN. A call for papers went out from ICANN on April 11th, which asked for input on a number of issues, including the use of Whois data by registrars. Here are the questions it posed about registrars’ use of Whois data:

5. Uses of registry data

Registry data is available to the registry as a consequence of registry operation. Examples of registry data could include information on domain name registrants, information in domain name records, and traffic? data associated with providing the DNS resolution services associated with the registry.

5a Examine whether or not there should be a policy regarding the use of registry data for purposes other than for which it was collected, and if so, what the elements of that policy should be.

[insert comments]

5b. Determine whether any policy is necessary to ensure non-discriminatory access to registry data that is made available to third parties.

[insert comments]

The papers were due by May 5th. Will they lead to a decision that’s consistent with the newly redefined purpose of Whois information?

By Bill Slawski, Internet Consultant

Filed Under

Comments

Ram Mohan  –  May 15, 2006 6:40 AM

Even as companies fight to get patents on Whois data, those in the trade realize that Whois data is at least outdated, charitably obsolete, and at most completely false.

There are not many known ways to economically and accurately validate addresses worldwide.

Chris McElroy  –  May 24, 2006 5:20 PM

I have to agree with Ram. Using whois, as it is now, to establish reputation, historical data, or anything else seems moot.

The idea that a domain name registered for 2-5 years means it is more valid than one that is registered for one year as in the Google patent applications isn’t based on sound reasoning.

The idea that reputation can be established using whois data is equally unreasonable. Anything that CAN be manipulated WILL be manipulated.

I would love to see a better system, but I am just not sure it is going to come from using whois data.

Bill Slawski  –  May 24, 2006 6:38 PM

I’m not sure that the accuracy of that whois information is going to be improving any time soon.  The idea of adding more information to it, where there has historically been a problem with accuracy, seems to be a less than ideal situation.

I was impressed with some of the ideas in the Go Daddy patent applications, but even those show some of the friction involved in making the system they envision work.  One statement they make is that people would be willing to opt in to sharing reputational information associated with their whois information, and would pay for the opportunity to do so.  I don’t see a benefit there worth paying for.  I think that occured to them too.  The patent applications then go on to note that registrars could be the ones collecting information, regardless of the desires of the registrants.

I also question the assumption that Google makes in their patent application that the length of time of the registration of a domain is an indication that the site may have been created primarily as one used to create web spam.  Sure, people who might be registering a domain solely to do something risky on it that might get them banned from search engines are unlikely to register for longer than a year.  But so are many other people who don’t hold those types of ambitions.

Though looking at these patent applications, and the ideas associated with them, have made me sit back and think a little about what whois information might be like five or ten years from now.

Chris McElroy  –  May 24, 2006 6:45 PM

Another question is how using whois information is going to work with whois privacy becoming an issue. Is it only going to be private to the public but accessible to Google and GoDaddy, etc.?

Ram Mohan  –  May 24, 2006 7:14 PM

The Whois of the future will likely only provide “taken” or “avaialble” answers.  The domain name business is the only one I know of where each registrar (distributor) is required to publish their customer list for anyone’s usage.

Tiered access will probably provide strong access to law enforcement and also offer good privacy coverage; even law enforcement says that at best Whois only provides a “clue” as to who the owner may be.

Rob Larkins  –  Jun 1, 2006 1:45 AM

The internet has a tradition of being a free and open society. As such I’d like to see the WHOIS database remain open to the public in some form, but I agree that something needs to be done to discourage spam.

But as for the patents… what’s the benefit to adding this extra data to the WHOIS database and not some other database? Some of this information seems better suited to be collected by Alexa.com than, uh, whoever it is that collects the WHOIS data.

Bill Slawski  –  Jun 1, 2006 3:06 AM

The internet has a tradition of being a free and open society. As such I’d like to see the WHOIS database remain open to the public in some form, but I agree that something needs to be done to discourage spam.

My independent research in law school was on property rights, and the treaty governing the use of Antarctica.  Almost everyone else I knew was writing about the Business Judgment Rule, and Directors’ discretion.  In those pre-WWW days, I wasn’t sure if there was really a use for the type of research that I conducted. 

Interestingly, the tradition of “freedom and openness” on the web is similar in a number of ways to the international cooperation in Antarctica in spite of a number of claims of ownership by different nations, with an equal number of different theories of property law to back those claims.

But as for the patents… what’s the benefit to adding this extra data to the WHOIS database and not some other database? Some of this information seems better suited to be collected by Alexa.com than, uh, whoever it is that collects the WHOIS data.

There is a property relationship between registrars and registrants.  There’s a necessary collection of billing information, and an obligation to allow the registrants to use the name for the period of time they paid for.  There are some legally enforceable limitations on both sides of that relationship. The people who are in the best position to collect reliable information are the ones who collect money for the use of a domain name.  That doesn’t necessarily mean that it happens, but they are the ones in the best position.

There isn’t a similar legal relationship between the people who register domain names, and services like Alexa, or like Google and other search engines.

I don’t think that it would be likely that the additional information that the Go Daddy patent applications describe will become part of whois information.  But the registrars might be the actor involved that would have the best chance of collecting accurate information, because of the legally recognized relationship that exists between registrars and registrants.

Bill Slawski  –  Jun 1, 2006 3:17 AM

The Whois of the future will likely only provide “taken” or “avaialble” answers.  The domain name business is the only one I know of where each registrar (distributor) is required to publish their customer list for anyone’s usage.

The client information probably isn’t necessary.

The closest I can think of to domain name registration is the incorporation of businesses in the different US States, with a sharing of (some) information amongst registered agents of corporations.  I don’t believe that any States require that the name of the customer be shared, without legal service of process, and an opportunity to quash a subpoena.

Ram Mohan  –  Jun 2, 2006 8:09 AM

The closest I can think of to domain name registration is the incorporation of businesses in the different US States, with a sharing of (some) information amongst registered agents of corporations.

...and in that case, I believe the information is made public because the government collects this data using public funds and shares it (presumably) in the public interest… and it’s unlikely that you’re going to receive a direct mail piece asking you to switch states!

Comment Title:

  Notify me of follow-up comments

We encourage you to post comments and engage in discussions that advance this post through relevant opinion, anecdotes, links and data. If you see a comment that you believe is irrelevant or inappropriate, you can report it using the link at the end of each comment. Views expressed in the comments do not represent those of CircleID. For more information on our comment policy, see Codes of Conduct.

CircleID Newsletter The Weekly Wrap

More and more professionals are choosing to publish critical posts on CircleID from all corners of the Internet industry. If you find it hard to keep up daily, consider subscribing to our weekly digest. We will provide you a convenient summary report once a week sent directly to your inbox. It's a quick and easy read.

I make a point of reading CircleID. There is no getting around the utility of knowing what thoughtful people are thinking and saying about our industry.

VINTON CERF
Co-designer of the TCP/IP Protocols & the Architecture of the Internet

Related

Topics

IPv4 Markets

Sponsored byIPv4.Global

Brand Protection

Sponsored byCSC

Threat Intelligence

Sponsored byWhoisXML API

New TLDs

Sponsored byRadix

Cybersecurity

Sponsored byVerisign

Domain Names

Sponsored byVerisign

DNS

Sponsored byDNIB.com