Wordlists (Tips and Tricks)
#1
In another thread, epixoip posted this quote:
Quote:...And remember, quality over quantity. Most wordlists you download from the Internet are going to be pure garbage...

This started a few PM's back and forth, and with epixoip's permission, I'm turning the discussion into a public thread in hopes of helping myself and others.

epixoip Wrote:...Real-world passwords make the best wordlists, so the easiest way to clean your wordlist is to simply download a bunch of public leaks, run your wordlists through them, and then only save the passwords that were found in those leaks as the new wordlist.

Then you can "unrule" this list by "unapplying" best64.rule (this is kind of difficult) so that when your list is run with best64.rule, you re-create the same plains + more.

That's the simplest approach.

I have actually been doing that, or close to it. I'm thinking of two lists actually. First is the raw captures, unfiltered. This I use for generating an hcstat file. I'm debating whether to leave the duplicates (not introducing them intentionally though) simply to give more weight to more common "words".

The second, I remove the 1-4 digits and the beginning and end and lowercase everything, and remove duplicates. This becomes my attack dictionary.

epixoip responded:
Quote:You absolutely want to leave duplicates. It's extremely important to have duplicates.

Usually I trim out everything < 6 chars, with the exception of my "Top X" wordlists which are unaltered and probabilistically ordered.

The leaving of duplicates, is that for the "generate an hcstat file" list, the attack list, or both?

As for the <6 characters, I'm on the fence about that. For instance, I've been practicing on the mayhem list. I've found several passwords, including:
  • Swamp8861
  • Scrap1932
  • Pilot8969
just to name a few. There are more examples.

All of these are <6 character words with 4 digits appended. Doing a simple hybrid attack where I append 1-4 digits is easy. But the base words, swamp, scrap, pilot, etc. are all less than 6 characters. If I filter them out, the hybrid would miss these passwords.

I could do a mask ?u?1?1?1?1?d?d?d?d and pick them back up. I'd love to hear thoughts on this from more experienced hash crackers on which approach is better, plus any other tips and tricks for creating a great wordlist(s).
#2
(12-30-2014, 06:35 AM)rsberzerker Wrote: The leaving of duplicates, is that for the "generate an hcstat file" list, the attack list, or both?

The context was around generating an hcstat file. Duplicates in your wordlist makes no sense.

(12-30-2014, 06:35 AM)rsberzerker Wrote: As for the <6 characters, I'm on the fence about that. For instance, I've been practicing on the mayhem list. I've found several passwords, including:

It's your call. It really just comes down to how you typically attack hashes, and I'm trying to give generic advice. Personally I would typically pick up those types of passwords with a mask attack.
#3
(12-30-2014, 07:01 AM)epixoip Wrote: The context was around generating an hcstat file. Duplicates in your wordlist makes no sense.

That's what I thought. Just wanted to be sure I wasn't missing something. More than once in my life, I've gone, "D'uh! How could I have missed that?!?!"
#4
I have made myself a ~800mb wordlist of all real passwords. It's a very nice addition to the hashes.org one which also contains only real passwords. If you wants some nice founds wordlists, go here.

http://www.adeptus-mechanicus.com/codex/...shpass.php

Unified list manager has a function to get basewords, I'm not sure how good it is though.
#5
(12-30-2014, 01:41 PM)Saint Wrote: I have made myself a ~800mb wordlist of all real passwords. It's a very nice addition to the hashes.org one which also contains only real passwords. If you wants some nice founds wordlists, go here.

http://www.adeptus-mechanicus.com/codex/...shpass.php

Unified list manager has a function to get basewords, I'm not sure how good it is though.

That's a good link. Thanks.
#6
anyone willing to share how they "un-rule" a wordlist to get base words?

I've been using sed and awk to remove numbers and special char but not sure if that's the best way to tackle this...
#7
(01-07-2015, 01:50 AM)forumhero Wrote: anyone willing to share how they "un-rule" a wordlist to get base words?

I've been using sed and awk to remove numbers and special char but not sure if that's the best way to tackle this...

epixoip shares this command http://hashcat.net/forum/thread-1305.html to get basewords which i've been using successfuly.

Best regards,
Azren