Studying demographics in effective wordlist/rule generation
Hi, good questions!

First of all, I'd recommend digging into the papers on for actual studies on the topic. For example, here is a paper going into differences between Chinese and English datasets:

One challenge of answering this question is that most datasets are very messy, and it's hard to compare apples to apples. For example, many of these websites have different value to their users so comparing a gaming website to the LinkedIn breach might indicate differences in younger vs older users, but it may in fact just be because the users value the accounts different.

As to the encoding issues you mentioned, that is by far the bane of my password cracking existence. 

The following post is getting a little old in the tooth (hello oclHashcat) but it has some info on using mask attack for other character sets:

Most of the time you need to have your dictionaries encoded to the same that the target hashes were created. 

John the Ripper has the ability to be somewhat encoding aware which really helps for mangling rules, (a whole other issue). Here is a link to its documentation:

This is a topic that can get pretty deep. I'd be interested to hear about other people's experiences!

Messages In This Thread
RE: Studying demographics in effective wordlist/rule generation - by lakiw - 09-07-2018, 01:22 AM