Home / News

Large Open-Source Data Set Released to Help Train Algorithms Spot Malware

For the first time, a large dataset has been released by a security firm to help AI research and training of machine learning models that statically detect malware. The data set released by cybersecurity firm Endgame is called EMBER is a collection of more than a million representations of benign and malicious Windows-portable executable files. Hyrum Anderson, Endgame’s technical director of data science who worked on EMBER, says: “This dataset fills a void in the information security machine learning community: a benign/malicious dataset that is large, open and general enough to cover several interesting use cases. ... [We] hope that the dataset, code and baseline model provided by EMBER will help invigorate machine learning research for malware detection, in much the same way that benchmark datasets have advanced computer vision research.”

The liability involved with the availability of such open data sets is something researchers involved with EMBER say they have thought through and that the hope is openness will outweigh the risks.

By CircleID Reporter

CircleID’s internal staff reporting on news tips and developing stories. Do you have information the professional Internet community should be aware of? Contact us.

Visit Page

Filed Under

Comments

Comment Title:

  Notify me of follow-up comments

We encourage you to post comments and engage in discussions that advance this post through relevant opinion, anecdotes, links and data. If you see a comment that you believe is irrelevant or inappropriate, you can report it using the link at the end of each comment. Views expressed in the comments do not represent those of CircleID. For more information on our comment policy, see Codes of Conduct.

CircleID Newsletter The Weekly Wrap

More and more professionals are choosing to publish critical posts on CircleID from all corners of the Internet industry. If you find it hard to keep up daily, consider subscribing to our weekly digest. We will provide you a convenient summary report once a week sent directly to your inbox. It's a quick and easy read.

I make a point of reading CircleID. There is no getting around the utility of knowing what thoughtful people are thinking and saying about our industry.

VINTON CERF
Co-designer of the TCP/IP Protocols & the Architecture of the Internet

Related

Topics

Cybersecurity

Sponsored byVerisign

DNS

Sponsored byDNIB.com

Domain Names

Sponsored byVerisign

New TLDs

Sponsored byRadix

Brand Protection

Sponsored byCSC

Threat Intelligence

Sponsored byWhoisXML API

IPv4 Markets

Sponsored byIPv4.Global