Super-quick analysis of account credentials (username/password pairs, in various forms) posted to Pastebin over roughly a day:

                       Start time: 20171113 2100UTC
       Credentials parsed to date: 792,488
Clean (unproblematic) credentials: 734,807
         Unique clean credentials: 475,653

Credentials parsed to date: I’ve had a homebrew pastebin scraper analyzing new pastes, watching for email addresses, for a while now. This is where the number of credentials extracted stood as of Start time.

Clean (unproblematic) credentials: I wrote a somewhat lazy parser that attempts to help me identify patterns in the extracted paste bodies so I can more effectively grab credentials pasted in a variety of formats. There are still some that I haven’t quite worked through yet, so this count removes those, leaving only the ones I’m confident in.

Unique clean credentials: A count of the unique credentials parsed from the pastebin data extracted as of Start time.

Another day’s data

                       Start time: 20171114 2100UTC
       Credentials parsed to date: 806,267
Clean (unproblematic) credentials: 744,126
         Unique clean credentials: 478,642
Analysis
 Potential credentials posted in 24 hours: 13,779
Identified credentials posted in 24 hours:  9,319
    Unique credentials posted in 24 hours:  2,989

In a 24-hour period, I observed 2,989 new unique credentials posted to Pastebin (modulo the fact that my current script for extracting credentials from the potential pool isn’t 100% effective, and skipped a bit over 4,000 lines, some of which may have contained multiple credentials per line).

Yet another day

Checking the following day at the same time:

                       Start time: 20171115 2100UTC
       Credentials parsed to date: 875,568
Clean (unproblematic) credentials: 808,895
         Unique clean credentials: 492,259
Analysis
 Potential credentials posted in 24 hours: 34,650
Identified credentials posted in 24 hours: 32,384 
    Unique credentials posted in 24 hours:  6,808

Summary

I don’t yet have a feel for whether these two data points are representative of the number of credentials per day. Given the wide variance between the two days, I strongly suspect it isn’t. I intend to automate the processing of the data and collect a few weeks’ worth to obtain a more representative rate of credential posting.

comments powered by Disqus