DIY Threat Intel: Mining Spam For Malware

If you use email, you already have a wonderful resource available to you for doing some quick and dirty threat intelligence work: your spam folder.

Every day, people receive anywhere from dozens to hundreds of spam emails, ranging from plain vanilla unsolicited emails, to unwanted content, to phishing attempts and malware.

There’s a wealth of information to be mined from your spam folder. Right now, we’ll focus on extracting URLs and attachments from your spam emails and automatically analyzing them. Lots of commercial security controls, many costing tens of thousands to hundreds of thousands of dollars per year to operate, perform this sort of operation. I’m going to show you how to do it for free.

To accomplish this, you’ll need a VirusTotal account and API key.

Once you have both, you can download from GitHub two Python scripts I wrote to accomplish this.

Quick How-To

These scripts assume you have access to a folder containing spam email on a system from which you can execute Python code. This could be because you’re running it on a mail server, or because you’ve configured your email client to save your spam as plain text in a folder, each mail in a separate file.

Either way, once you have the emails in a parseable format accessible to the code, edit the code and read through it, looking for places where you need to supply correct paths (and, in runvt.py, your VirusTotal API key).

Next, create a crontab entry similar to that in the repository’s README.md file, to automatically call process-spam.py and runvt.py on a regular basis.

I’ll admit, it’s not very user friendly. If there’s interest shown, I’ll polish the code, clean things up, and make usage much simpler; right now I’m just sharing something I wrote for myself and it suits my needs.

Once you’ve got things set up, the following will occur:

all emails in your spam folder will be analyzed and deleted periodically
any URLs present will be saved to a file
any attachments will be extracted and saved
attachments will be hashed and submitted to VirusTotal automatically
if the hash is known, the results will be saved to a file. Otherwise, the full file will be submitted to VT, and the results obtained and saved on the next execution of runvt.py (it’s why I have it set to run every 4 hours in my crontab)

Now, sit back, relax, and enjoy having your incoming malware automatically analyzed for you!

Next steps

I definitely need to clean up the code and make it more usable and easier to deply.
I want to include a JSON parser to make the results human-readable (currently, I eyeball the JSON, or grep the results, since I’m doing this on my personal email. That doesn’t scale.)
I want to submit the extracted URLs to VT as well, to easily identify spam and phishing campaigns along with malware.
If/when I ever get around to learning some modern web development frameworks, I may create a nice web-based front-end for displaying current status and results, along with statistics regarding number of mails processed overall, per unit time, and findings.
Last, and perhaps most relevant to this audience, I need some way to indicate which attachments were previously unknown to VirusTotal, so analysts can easily identify those items that they need to write new rules or signatures for.

Alternatives

If you wanted, you could set up your own sandbox environment and submit your samples to it instead of VirusTotal.