Hashcat Utilities List Cleaner
This item was in my mailbox today:

It is now possible to download large amounts of "n-grams" data from the COCA corpus for offline use from http://www.ngrams.info. This is in addition to the data on the top 500,000 word forms and the top 5,000 lemmas in COCA, which has been available for free from http://www.wordfrequency.info for the past few months.

Starting today, registered users can freely download large n-gram datasets, which contain the frequency of the one million most frequent 2, 3, 4, and 5-word sequence in the corpus, and then use this data offline for research and teaching. Other versions of the n-grams datasets allow users to download tens and even hundreds of millions of rows of data.

In addition to the COCA data, starting today you can also download n-grams data from the 400 million word Corpus of Historical American English (COHA). This data allows you to search offline to see the frequency of every word, and every 2, 3, 4, and 5-gram that occurs at least three times in the corpus, along with its frequency in each of the 20 decades (1810s-2000s).

For more information on this new n-grams data, please see http://www.ngrams.info.


Mark Davies
Brigham Young University

(11-22-2011, 05:33 AM)atom Wrote: i dont think using "base-words" is a good idea in password cracking.
My idea about corpora as a starting point for word lists was a response to the point made that most of the so-called word lists on the Internet are basically gabage. Using them as input lists for mangling is a big waste of time.

In the big picture, many types of lists and analyses are needed. Leaked passwords, already cracked passwords, those tools that make a target specific word list, etc.

But a good list of real words is a good start to then apply good mangling rules to.
Thank you very much Kgx Pnqvhm, great links and thanks for sharing them !

I really hope this doesn’t start an argument about which type of word list to use, this was not my intention at all. I thank you both for your input and I understand both sides of this.

Atoms idea to use genuine found lists makes sense and I do this sort of attack. Adding further rules and especially his special method of the fingerprint attack is very effective.

However as Kgx Pnqvhm said he was only helping / humouring me by replying to my question about word lists. As I am relatively new to this I was surprised at the quality of the word lists being shared around and Kgx Pnqvhm was just helping me make my own.

I do fully understand that my home made lists may not be as effective as real password lists but they would at least be clean and properly formatted in an attempt to avoid any duplication or wasted password candidates.

I am only playing with this sort of thing as a hobby and I will enjoy modifying these “clean” lists in an attempt to test my skill at predicting what people may use. Applying rules to these lists will add another layer all together.

I think this is a case where an amateur such as myself, is getting involved between professionals and I shouldn’t do that.

So I apologise if this has caused any problems. Thank you both very much for your help.


Still trying to work out what “Kgx Pnqvhm” is !!! Big Grin
What I'm gearing up for is pass phrases. E.g., see "The Diceware Passphrase Home Page."
People are being encouraged to use long passwords these days, and th easiest way to have something longer is to use more real words, which are easier to remember than cryptic sequences.
(11-24-2011, 03:22 AM)Kgx Pnqvhm Wrote: What I'm gearing up for is pass phrases.

Me too !Big Grin

As I am sure you will already know WPA encourages this heavily also, pass phrases are becoming increasingly popular as the education in password usage has spread.

This is a good reason to make a feature request for hashcatplus to support 63 characters, or something more than 15 at least.
the rule engine in oclHashcat-plus is not compatible to plaintext passwords > 15.
(11-24-2011, 05:29 PM)atom Wrote: the rule engine in oclHashcat-plus is not compatible to plaintext passwords > 15.


Perhaps a rough and ready version with no rules ? Ha ha call it "Tom Cat" ! Big Grin