md5 hashing emails - that's dumb, right?
#1
I work with nonprofits who often share their email lists lower cased and hashed with unsalted md5 in order to verify how many people signed a petition, etc.

That sounds like it would be fairly trivial to crack.  I haven't been able to figure it out yet.  But i'd like to show on my list rather than just say that it can be done.

Two Questions:

1.  Someone who knows what they're doing could unhash someone's email list fairly easily, right?

2.  What is the mask I should be using?  Roughly ten percent of my list has a dot in the name.  If they have a number in the name, it's at the end usually.

The vast majority of emails are from these domains.


@gmail.com
@yahoo.com
@aol.com
@hotmail.com
@comcast.net
@sbcglobal.net
@msn.com
@verizon.net
@earthlink.net
@att.net


I feel like I need to relearn regular expressions in backwards land....
#2
In this context (assuming the donors or beneficiaries or whomever are intended to be kept confidential) it is dumb to use MD5, yes. They should be trivial to crack.

You wouldn't use a mask. You'd either do a combinator attack or use rules to append domain names.
#3
(02-11-2017, 09:36 PM)epixoip Wrote: In this context (assuming the donors or beneficiaries or whomever are intended to be kept confidential) it is dumb to use MD5, yes. They should be trivial to crack.

You wouldn't use a mask. You'd either do a combinator attack or use rules to append domain names.


Okay, short of me figuring out how to do this and demonstrating, can someone help me explain why this is a very dumb idea?  How trivial would it be to crack a list of a million email addresses of donors?  They are all lower cased and md5 with no salt.  (Just the mysql MD5 function.)


What is a better way to do this, realizing I'm dealing with people who wander up and down the hallway demanding someone tell them what their password is.
#4
Email addresses (and usernames themselves for that matter) are really no different from very weak passwords: short, predictable, low-entropy, human generated strings. And of course the domain names are also extremely predictable (you'll have some one-off domains here and there, but all the major domains are well-known and publicly available.) The cracking approach is pretty simple: list of domains in one wordlist, regular password wordlists in another wordlist, run combinator attack.

I've encountered lots (millions?) of MD5'd emails in the past and have not really had any difficulty cracking them (85-90% success rate?) See Gravatar, for instance. Really easy to get someone's email address from their Gravatar hash.

Better way to do it would be to treat the emails just like passwords, and as such, you'd use a proper password hashing algorithm like bcrypt or something.
#5
(02-12-2017, 08:14 AM)epixoip Wrote: Email addresses (and usernames themselves for that matter) are really no different from very weak passwords: short, predictable, low-entropy, human generated strings. And of course the domain names are also extremely predictable (you'll have some one-off domains here and there, but all the major domains are well-known and publicly available.) The cracking approach is pretty simple: list of domains in one wordlist, regular password wordlists in another wordlist, run combinator attack.

I've encountered lots (millions?) of MD5'd emails in the past and have not really had any difficulty cracking them (85-90% success rate?) See Gravatar, for instance. Really easy to get someone's email address from their Gravatar hash.

Better way to do it would be to treat the emails just like passwords, and as such, you'd use a proper password hashing algorithm like bcrypt or something.

I'd like to show people that hashing emails with MD5 is a bad idea and propose an alternative.

This is what we are doing.  We (nonprofits, community organizing groups) do big swaps of emails based on double opted in co-branded petitions.  People sign a petition - a legitimate petition that we do deliver in person.  (For example, demanding that the Senate do background checks on all nominees.)

Then we calculate how many people each group contributed and then distribute a proportionate number of "new to list" emails to each group.  That way people who sign a petition don't start receiving emails from 20 groups and get overwhelmed and each group gets back new names for their list.

It's not ideal, but it adheres to CAN-SPAM and groups that don't honor unsubscribes are blackballed forever.

The vulnerability is we share these lists of MD5 hashed email lists to determine new names.  We need something.  

Would SHA2 be better?  If so, how?
#6
No, SHA2 is not better than MD5 for these purposes. I already gave you the answer in my previous post.