![]() |
||
|
The issue of ‘clustering’ in brand protection—that is, the ability to flexibly identify the existence of links between disparate findings1 from a brand monitoring solution—is one of the great unsolved problems in the industry2.
Clustering has a number of key benefits, including the identification of high-volume or serial infringers to serve as priority targets for enforcement and demonstrate ‘bad faith’ action, offering the potential for efficient bulk takedowns of groups of associated results in a single action, and the building of a full profile of the activity associated with a particular entity through an OSINT (open-source intelligence)-style investigative approach3.
In general, there are several characteristics of any finding/result from a brand monitoring programme which can serve as a basis for clustering, some of which will be dependent on the channel or type of content. Domain names are one of the ‘richest’ sources of such data points (many of which can be determined through standard look-ups), which can include features of the whois record4 such as registrant (owner) and registrar contact details, hosting information (e.g. host IP address and hosting service provider), characteristics of the domain name itself (such as name patterns5 and TLD6), and the providers of any MX (mail exchange) record(s) (allowing e-mail functionality) or SSL (secure sockets layer) certificate(s) (i.e. the authentication feature allowing the domain to use a https URL), in addition to features of any associated website. Many of these characteristics can also be relevant to other types of general Internet content, and other features may be applicable to content from other channels (such as seller names in e-commerce marketplace listings).
These features can additionally serve as the basis for more generally quantifying the level of potential threat posed by an identified result, which can be a key process in prioritising the identified results (which may, in general, comprise a large dataset), to identify the priority targets for further analysis, enforcement or content tracking7, 8.
The simplest type of clustering analysis technique—and one which is still the only offering by many brand-protection service providers—is that which is based on the use just of a single particular common characteristic of a particular type (i.e., associated with a specific single ‘label’, such as the registrant name or host IP address) associated with the set of results in question. For instance, if the name of the registrant of a group of sites is the same for each of the examples, then those sites can be determined all to be connected to each other (if that registrant name is suitably distinctive). This very simple approach is really nothing more than can be achieved through manual analysis (essentially, carrying out a series of ‘reverse look-ups’) and, while it can have value, the extent of this value is often limited.
Clustering becomes more insightful and useful if links can be drawn on the basis of identical (or similar) characteristics associated with different fields (or labels) of the database of pieces of information associated with the set of ‘candidate’ findings to be analysed. For example, if a particular e-mail address appears in the whois record of some domains, but in the website content of a series of others, the wider set of both groups of findings can reasonably be assumed to be associated with each other. However, these types of insights are generally much harder to obtain, essentially because it is not known in advance where these commonalities may appear. The situation may become even more complex if links must be followed in order to find the common features—e.g. crawling from a marketplace listing to the associated seller information page, to identify company names, addresses, telephone numbers, etc. These types of instance are where artificial intelligence (AI) tools can potentially begin to provide value.
Beyond even the initial complexity described above in constructing an effective clustering tool, there are a number of additional points to consider:
The construction of a truly effective clustering tool able to take account of all the factors discussed in this article is likely to be an extremely difficult problem to solve. However, appropriate application of AI capabilities may be able to provide a stepwise approach towards addressing the issue.
The benefits of successfully doing so will be enormous, potentially building insights and efficiencies into the processes of brand protection monitoring, analysis and enforcement which are essentially not available through any ‘classic’ approaches. Any service provider able to put a compelling solution of this nature in place in the short to medium term—particularly if it also offers other attractive AI or machine-learning features, such as the option for automatic ‘tuning’ of search parameters to identify and categorise the most significant results, being able to be ‘trained’ based on analyst feedback on the quality of the outputs, or the implementation of semi-automated enforcement notice production and sending—may find themselves a long way ahead of their field of mainstream competitors.
Sponsored byVerisign
Sponsored byDNIB.com
Sponsored byCSC
Sponsored byRadix
Sponsored byWhoisXML API
Sponsored byIPv4.Global
Sponsored byVerisign