There's a new sheriff in town and he's riding the horse of "predictive policing". Back in July the Santa Cruz Police Department began deploying police officers to places where crime is likely to occur in the future — making use of new predictive modeling programs that are designed to provide daily forecasts of crime hotspots — thereby allowing the Department to preempt more serious crimes before they occurred. You can find a story describing how Santa Cruz is sending in the police before there's a crime in The New York Times.
In essence, this is another physical-world application of machine learning and clustering technologies — applied to preempting a criminal problem. In the cyber-world we've been applying these techniques for a number of years with great success. In fact many of the most important advances in dealing with cybercrime revolve around the replacement of legacy IP reputation systems and domain filtering technologies with dynamic reputation systems — systems easily capable of scaling with both the threat and an ever-expanding Internet (e.g. IPv6).
Just last week Manos Antonakakis (a principal scientist here at Damballa Labs) presented at the USENIX Security 2011 conference in San Francisco about a new generation of technology capable of identifying domain names being used for malicious purposes weeks, if not months, in advance of malware samples being intercepted, analyzed and "protected" against by legacy anti-virus approaches.
The patent-pending technology utilizes passive DNS observations within the upper DNS hierarchy, and the paper describing the first generation of research (and cybercrime proof-points) can be found in the paper "Detecting Malware Domains at the Upper DNS Hierarchy” [PDF]. The system running here within Damballa Labs is affectionately known as "Kopis" and has proved its worth time and again preemptively identifying new botnets and cybercrime campaigns — keeping our Threat Analyst team busy with enumerating the real-world criminals behind the domain abuse.
The Kopis system extends many of the principles and research we learnt and formulated when developing the Notos technology [PDF] — a next generation dynamic reputation system for DNS.
In several ways the Santa Cruz Police Department's modeling systems approximates an early generation of such a dynamic reputation system — utilizing a mix of long term observations and historical information, combined with real-time crime updates, the output of which is a forecast capable of predicting hotspots for daily crime.
Damballa Labs utilizes Notos and its derivative output evolutions in a number of ways. For example, we're able to take any observed DNS record (e.g. domain name and resolved IP address) and provide a real-time score of its reputation — even if this is the first time anyone on the Internet has ever tried to resolve that particular domain name. In practice this means that we can predict (with a scale of confidence) that connecting to a device utilizing that particular domain name (or IP) is malicious (or good) and the nature of the threat it represents — all done through passive means, and without having to have observed the maliciousness directly associated with the device anytime in the past.
Systems like Notos make use of big data (i.e. colossal volumes of historical and streaming data) gathered from a global array of sensors. The mix of historical observations and real-time data feeds means that prediction models can be dynamic enough to keep pace with truly agile threats (and threat operators) — and can yield new approaches in unveiling advanced and sophisticated threats. For example, a possible query could be "provide me a list of domain names that are pointing to residential DSL IP addresses within Villianstan, that have never been looked up by any hosts within the country of Villanstan, that have only been looked up by hosts located within Fortune-100 companies in the USA, and that the number of Fortune-100 companies doing so is less than 5 over the last 12 months." The result of the query would be a (long) list of domain names that are very high contenders for APT victims, which then drives specialist counter-intelligence analysts and law enforcement to uncover the nature of the threat.
In the meantime I'll be watching with keen interest the successes of the Santa Cruz Police Department and their new modeling programs. Here at Damballa we've had phenomenal success in using machine learning and advanced clustering techniques in unveiling and forecasting new threats.
|Cybersquatting||Policy & Regulation|
|DNS Security||Registry Services|
|IP Addressing||White Space|
Minds + Machines