Studying demographics in effective wordlist/rule generation

Studying demographics in effective wordlist/rule generation - Printable Version

+- hashcat Forum (https://hashcat.net/forum)
+-- Forum: Misc (https://hashcat.net/forum/forum-15.html)
+--- Forum: General Talk (https://hashcat.net/forum/forum-33.html)
+--- Thread: Studying demographics in effective wordlist/rule generation (/thread-7763.html)

Studying demographics in effective wordlist/rule generation - JCas - 08-25-2018

Hi everyone! I've recently become fascinated with the art of effective password cracking. I've been honing my skills and building my personal knowledge by running a variety of exercises and experiments.

I'm interested in the effect that culture, language, geographic regions, and other demographics may have on the effectiveness of various wordlists and rules. A wordlist and rule list may be effective against hash lists from an english-speaking demographic but might not work so well against a hash list that is primarily from a non-english speaking demographic. Also the culture demographic seems to have an effect - video gamers might have different password habits than those in a corporate environment. Age definitely appears to have an effect.

Has anyone found any white papers or other research papers that looked into this very effect? What about hashlists that are confirmed to be from a particular demographic or at least part of the world? I've been toying around with the battlefield hash list as that is statistically from a younger crowd.

Not to go completely off topic and more of a side question: What about non-latin alphabetic writing systems such as cyrillic or greek or armenian? I know hashcat supports them. Are there any decent baseline/starter wordlists or rule lists for those systems and what about hash files?

RE: Studying demographics in effective wordlist/rule generation - lakiw - 09-07-2018

Hi, good questions!

First of all, I'd recommend digging into the papers on http://passwordresearch.com/ for actual studies on the topic. For example, here is a paper going into differences between Chinese and English datasets:

http://passwordresearch.com/papers/paper480.html

One challenge of answering this question is that most datasets are very messy, and it's hard to compare apples to apples. For example, many of these websites have different value to their users so comparing a gaming website to the LinkedIn breach might indicate differences in younger vs older users, but it may in fact just be because the users value the accounts different.

As to the encoding issues you mentioned, that is by far the bane of my password cracking existence.

The following post is getting a little old in the tooth (hello oclHashcat) but it has some info on using mask attack for other character sets:

https://blog.bitcrack.net/2013/09/cracking-hashes-with-other-language.html

Most of the time you need to have your dictionaries encoded to the same that the target hashes were created.

John the Ripper has the ability to be somewhat encoding aware which really helps for mangling rules, (a whole other issue). Here is a link to its documentation:

https://github.com/magnumripper/JohnTheRipper/blob/bleeding-jumbo/doc/ENCODINGS

This is a topic that can get pretty deep. I'd be interested to hear about other people's experiences!

RE: Studying demographics in effective wordlist/rule generation - royce - 09-09-2018

Great reply, lakiw.