Research from PREDATOR showing samples of brand-new registered spammer domains and blacklisting delays; some delays as long as several weeks, or even several months
A team of researchers from Princeton University and the University of California has developed a machine-learning algorithm named PREDATOR that can accurately establish domain reputation at the time of domain registration.
PREDATOR which stands for 'Proactive Recognition and Elimination of Domain Abuse at Time-Of-Registration,' is based on the notion that criminals need to obtain many domains to ensure profitability and attack agility, leading to abnormal registration behaviors such as burst registrations and textually similar names. "The intuition has always been that the way that malicious actors use online resources somehow differs fundamentally from the way legitimate actors use them," says Princeton University computer science professor Nick Feamster — one of the researchers who worked on the project. "We were looking for those signals: what is it about a domain name that makes it automatically identifiable as a bad domain name?"
The research team which, in addition to Feamster, included recently graduated Ph.D. student Shuang Hao, Alex Kantchelian and Vern Paxson from the University of California-Berkeley and Brad Miller from Google, presented their paper at the 2016 ACM Conference on Computer and Communications Security on Oct. 27. They explained:
"Miscreants register thousands of new domains every day to launch Internet-scale attacks, such as spam, phishing, and drive-by downloads. Quickly and accurately determining a domain's reputation (association with malicious activity) provides a powerful tool for mitigating threats and protecting users. Yet, existing domain reputation systems work by observing domain use (e.g., lookup patterns, content hosted) — often too late to prevent miscreants from reaping benefits of the attacks that they launch. ... Our results show that PREDATOR can provide more accurate and earlier detection compared to existing blacklists, and significantly reduce the number of suspicious domains requiring more resource-intensive or time-consuming inspection."
PREDATOR was evaluated using registration logs of second-level .com and .net domains over five months and was found achieving 70% detection rate with a false positive rate of 0.35%. Although the performance is not perfect, the group believes their system "enables prioritizing domains for subsequent, detailed analysis, and to find more malicious pages given a fixed amount of resources (e.g., via URL crawlers or human-involved identification)."
As valuable as conducting research studies such as PREDATOR are, Feamster has expressed concerns over looming restriction concerns as a result of the new upcoming Federal Communications Commission (FCC) privacy rules not may have unintended consequences within the research community. "Although the forthcoming rulemaking targets the collection, use, and sharing of customer data with 'third parties,' an important — and oft-forgotten — facet of this discussion is that (1) ISPs rely on the collection, use, and sharing of CPNI to operate and secure their networks and (2) network researchers (myself included) rely on this data to conduct our research." See Feamster's post published earlier today.