Home / Blogs

Similarity of gTLD Applications: Required Reading for Evaluators

Don't miss a thing – sign up for CircleID Weekly Wrap newsletter delivered to your inbox once a week.
Werner Staub

ICANN's evaluators should look at data published on this site. This sort of data helps dramatically reduce time and expense for the evaluation of new gTLD applications while increasing quality. The principle is simple: find the similarities between the 1930 applications.

It is a proof-of-concept project by Arnoldo Mueller-Molina, a young Costa Rican researcher with a doctorate in computational analytics. The analysis is work in progress, but what it shows is striking: identical answers are ubiquitous, even for the most TLD-specific questions.

For many of the key questions, the number of unique answers is below 600 (compared to 1930 applications). Some of the nominal number of unique answers are higher, but mind you, this is just the first iteration. As the algorithm gets refined and adapted, more patterns are found.

Similarity comparisons make sense between questions 15 and 50. For question 14a-14e, similarity comparisons make sense in some cases. ICANN does not publish the technical plan (Q30b-44) and the financial plan (Q45-50b). The site therefore compares the responses 14a-30a (a total of 26 response fields).

If ICANN ran the similarity engine on questions 30b-50b (another 26 response fields, 15 for the technical plan and 11 for the financial plan), it is certain to find an over overwhelming degree of similarity. It would be appropriate for ICANN to publish anonymized statistics as to how many response fields contain exactly the same content after allowing of variations of TLD string and applicant name.

This is all thanks to the Costa Rica ICANN meeting in March this year. Arnoldo was there with a "Newcomer" badge, having come home with PH.D., after spending several years in Japan and Germany. I regard it as proof that the ICANN community is a magnet for talent beyond politics and domain investors.

Arnoldo talked to a number of ICANN delegates and learned about the new gTLD program. He specializes in computational analytics, not domain names. His area of specialization is the art of detecting patterns of similarity. That is a good fit for the new gTLD program. With 1930 applications, it is too time-consuming to find similarities by hand. Arnoldo parsed all the 1930 applications and compared them side by side, using the Okapi BM25 algorithm for a start.

For each question or sub-question, Arnoldo' engine identifies groups (or Clusters) of similar answers. For each group, the algorithm identifies the "model of the group" — it would be defined as the one in the group that appears to be "more similar than others" within the group. That sounds like the Orwellian "more equal than others" but does actually make sense. Arnoldo's site offers side-by-side comparison of each element in a group to the model of the group.

Arnoldo is using his own time and resources to provide this service to the community. A lot more could be done with funding. ICANN could save millions of dollars for itself and others by spending a modest amount for services from experts like Arnoldo. Get some cultural help too: Arnoldo speaks Spanish, English and Japanese, he is perfectly at ease with CJK domains.

(Disclaimer: I have no relationship with Arnoldo or simMachines, his start-up company. I am just impressed by his work and I know for a fact that it is sorely needed. My only involvement is having responded to Arnoldo's questions, as did others who met him in San José.)

By Werner Staub

Related topics: ICANN, Top-Level Domains

 
   

Comments

To post comments, please login or create an account.

Related Blogs

Related News

Explore Topics

Sponsored Topics

Promoted Posts

Now Is the Time for .eco

.eco launches globally at 16:00 UTC on April 25, 2017, when domains will be available on a first-come, first-serve basis. .eco is for businesses, non-profits and people committed to positive change for the planet. See list of registrars offering .eco more»

Boston Ivy Gets Competitive With Its TLDs, Offers Registrars New Wholesale Pricing

With a mission to make its top-level domains available to the broadest market possible, Boston Ivy has permanently reduced its registration, renewal and transfer prices for .Broker, .Forex, .Markets and .Trading. more»

Industry Updates – Sponsored Posts

5 Afilias Top Level Domains Now Licensed for Sale in China

Radix Announces Largest New gTLD Sale with Casino.Online

2016 Year in Review: The Trending Keywords in .COM and .NET Domain Registrations

Global Domain Name Registrations Reach 329.3 Million, 2.3 Million Growth in Last Quarter of 2016

A Look at How the New .SPACE TLD Has Performed Over the Past 2 Years

Neustar to be Acquired by Private Investment Group Led by Golden Gate Capital

Startup League Reports from WebSummit, Lisbon

.SPACE Becomes the Choice of the First Ever Space Nation Asgardia

Afilias Chairman Jonathan Robinson Wins ICANN's 2016 Leadership Award at ICANN 57

MarkMonitor Supports Brand Holders' Efforts Regarding .Feedback Registry

Why .com is the Venture Capital Community's Power Player

The .cancerresearch TLD: Search for Cure Drives Digital Innovation

New TLD? Make Sure It's Secure

Radix Launches Startup League at TechCrunch

Celebrating One Year of .online

LogicBoxes Launches the New Elite Reseller Program

Afilias Acquires Premium TLDs .ARCHI, .BIO and .SKI

Radix Adds Dyn as a DNS Service Provider

Facilitating a Trusted Web Space for Financial Service Professionals

Ready or Not, 5 Big Tech Trends Headed Your Way