Home / Blogs

GDPR PII Time-Bomb? Kill it With Fire!

Neil Schwartzman

Hi! My name is spamfighter. I investigate spam and phish in a post-GDPR dystopia. Recently, I invented Fire, to save you millions of €uros.

One day, my Boss suggested I automate some of my processes. I, for one, welcome our Robot Overlords (and a happy boss), but I can be exacting about the tools I use. Perhaps not to the degree of the infamous Van Halen 'no brown M&M's' contractual clause but I have no patience for poorly-designed software, and truly dislike typing when a click will do, if you get my drift (save your breath Poindexter, I lived life on the CLI when the great VMS-UNIX wars raged, then discovered Clarus the Dogcow, and have never looked back).

I was apprehensive, but 200 open BBEdit documents suggested I might need help. I asked my friend Adam, a cargo cult programmer by trade, to watch my flow. "AH-hah!" exclaimed when he soon spotted my most obvious pain point: VirusTotal. My long-outdated query tool demanded a replacement.

There was nothing available off-the-Git-shelf, you see, since most VT Tools focus on Malware aspects of the service. That's not where I live, though. So we built a tool for me.

VirusTotal Batcher (VTB) is tuned for optimum speed when handling both the passive DNS part of VT - IPs and Domains and URL score queries, as well as including URL submission (force-scoring), a function missing from most other tools. Oh, and One Last Thing: VTB can display the live resolution IP of a host alongside the historical passive DNS info. Simple, but massively important in some aspects of my work. Boom.

Adam saw me fiddling with some data one day. The sad truth is he had become aware of my hidden shame. You see, my jam tends to be an unholy amalgam of compromised sites, malicious domains, third-level TLDs, sub-domain providers' hostnames, and Short URLs. The task, one I did often, is the laborious, boring and error-prone work of reducing these to the smallest element needed for reports to various service providers for remediation.

Say hello to my little friend dMunge. Munge brings all the domains to the yard; has the Alexa Top 1 Million memorized down cold, a user-defined Whitelist to prevent false positives, and a Service Provider list to highlight those if you wish. dMunge refines millions of entries to the base unit and format needed for abuse reporting, at supersonic speed.

We had entered the test phase when Adam requested a data-set sufficiently large and complex to really put dMunge under stress. I had one, but it contained personal information and thus came with strict confidentiality constraints. I couldn't even show it to him.

I told Adam about the problem: Email Addresses. Did I hear you pshaw? Go ahead and try to get rid of email addresses, at scale ... I'll wait here. ... Back so soon? Yeah, I know! A simple search for @ fails, because the @ appears elsewhere in logs AND not in every email address AND even some file attachments have @ their naming convention. I began to hate the @.

Happily, the only other requirement I had was a simple one: That the log file maintain absolute data integrity beyond the address change. Easy-peasy. :-|

Email addresses are seemingly simple to eliminate in theory, devilishly difficult in practice, and potentially expensive mistakes under GDPR. Send an unreacted address to the wrong place, and someone in Europe becomes a Euro Millionaire. Whoops.

Our widget Fire eliminates those pesky email addresses and serves up a log-file so clean it will be a hit at your next Article 29 Working Party party.

ORDER BEFORE MIDNIGHT TONIGHT
By now, you are surely asking where can you get these miraculous utilities. They needed a home befitting their station in life. Enter the sensory abomination WHOis Apocalypse

Please, help yourself. They are free (nor are they loss-leaders to paid versions) and MIT Licensed.

learned a lot of things in this process:

Doing stuff like this exercises the creative part of my brain, not unlike the part I used when I was in the music industry.

Great coding is extremely difficult. We saw stuff out there that is so poor it would have to be twice as good to be half-assed. I'm not saying our stuff is perfect, but we tested our stuff. A LOT. Adam wanted to strangle me when the tools failed the tests I devised. A. LOT.

One super-cool thing about tool-making is that not everything has been invented. Adam and I created three things for my benefit, and we hope, for others.

Doing stuff like this is fun, and I now have a cunning plan to develop something so epic that epic just resigned.

If you use the tools and like them, or hate them, please give us feedback. Heck, fork them if your vision differs.

Hey, by the way, in his spare time Adam wrote some cool articles on the History of Computer Programming that are popular over at Medium and I can be found pondering how to use VR to bring peace to the middle east.

By Neil Schwartzman, Executive Director, The Coalition Against unsolicited Commercial Email - CAUCE
SHARE THIS POST

If you are pressed for time ...

... this is for you. More and more professionals are choosing to publish critical posts on CircleID from all corners of the Internet industry. If you find it hard to keep up daily, consider subscribing to our weekly digest. We will provide you a convenient summary report once a week sent directly to your inbox. It's a quick and easy read.

I make a point of reading CircleID. There is no getting around the utility of knowing what thoughtful people are thinking and saying about our industry.

Vinton Cerf, Co-designer of the TCP/IP Protocols & the Architecture of the Internet

Share your comments

Clarifications The Famous Brett Watson  –  May 26, 2018 9:45 PM PDT

You got so carried away with the adrenaline rush of creating useful tools that your article was practically written in Geek Code.

So, to be clear, this article talks about three utilities (firewheel, domain munger, and VT batch), written in Python (an important detail when you're inviting other people to contribute). The actual link to the software is the "WHOis Apocalypse" one, and yes, be prepared for a mid-90s visual abomination if you dare follow the link. (Bug report: fix the page title, you goose.)

You are not wrong … Neil Schwartzman  –  May 27, 2018 12:03 PM PDT

The utilities are not only mere python, they are Python3. Voila.

As far as the page title goes, I see absolutely nothing wrong with it.

Consistency The Famous Brett Watson  –  May 28, 2018 7:54 AM PDT

Armageddon/apocalypse — toMAYto/toMAHto, am I right?

To post comments, please login or create an account.

Related

Topics

IP Addressing

Sponsored byAvenue4 LLC

Domain Names

Sponsored byVerisign

New TLDs

Sponsored byAfilias

DNS Security

Sponsored byAfilias

Cybersecurity

Sponsored byVerisign