Home / News I have a News Tip

Large Open-Source Data Set Released to Help Train Algorithms Spot Malware

For the first time, a large dataset has been released by a security firm to help AI research and training of machine learning models that statically detect malware. The data set released by cybersecurity firm Endgame is called EMBER is a collection of more than a million representations of benign and malicious Windows-portable executable files. Hyrum Anderson, Endgame's technical director of data science who worked on EMBER, says: "This dataset fills a void in the information security machine learning community: a benign/malicious dataset that is large, open and general enough to cover several interesting use cases. ... [We] hope that the dataset, code and baseline model provided by EMBER will help invigorate machine learning research for malware detection, in much the same way that benchmark datasets have advanced computer vision research."

The liability involved with the availability of such open data sets is something researchers involved with EMBER say they have thought through and that the hope is openness will outweigh the risks.

Follow CircleID on
Related topics: Cybersecurity, Malware
SHARE THIS POST

If you are pressed for time ...

... this is for you. More and more professionals are choosing to publish critical posts on CircleID from all corners of the Internet industry. If you find it hard to keep up daily, consider subscribing to our weekly digest. We will provide you a convenient summary report once a week sent directly to your inbox. It's a quick and easy read.

I make a point of reading CircleID. There is no getting around the utility of knowing what thoughtful people are thinking and saying about our industry.

Vinton Cerf, Co-designer of the TCP/IP Protocols & the Architecture of the Internet

Share your comments

To post comments, please login or create an account.

Related

Topics

Cybersecurity

Sponsored byVerisign

Domain Names

Sponsored byVerisign

DNS Security

Sponsored byAfilias

New TLDs

Sponsored byAfilias

IP Addressing

Sponsored byAvenue4 LLC