Dictionary Filter
#1
I have compiled my wordlist from a hundred smaller ones and now I'm looking for a program or script/command whether in Windows or Linux to filter it. I want to filter out any password that include special characters, less than 8 characters and also duplicates.
#2
it should be very easy to do this w/ for instance running some (shell) commands.

The only thing you should consider:
1. why not sort and unique the dict beforehand
2. how large is the dict - must it be really fast, do you need to optimize the filtering etc
3. do you really need a dict after all, maybe bruteforce for instance the first x chars (length 1-x, where x depends on the algo and could be up to 7-8) would be more clever -> then use the dict w/ length > x and < 8 for instance

Anyway, on linux you could do something like:
Code:
$ sort -u orig_dict.txt -o dict_unique.txt
$ grep -E '^[0-9a-zA-Z]{0,8}$' dict_unique.txt > less_than_8.txt

OR

Code:
$ < orig_dict.txt parallel --pipe --gnu grep -E '^[0-9a-zA-Z]\{0,8\}$' | sort -u -o less_than_8.txt

Note: there are many possibilities to speed up the filtering... for instance grep w/ LANG=C also often helps. Furthermore, you could also consider doing it w/ awk and other tools (if the dict is not too large, you could also unique the dict all w/ a simple awk, i.e. filter+sort w/ a awk one-liner , unfortunately this doesn't scale very well w/ larger dicts)

UPDATE: I think I got it wrong and you meant all passwords >= 8, if so then something like:
Code:
$ awk '$0~/^[0-9a-zA-Z]{8,}$/' dict_unique.txt > less_than_8.txt

OR
Code:
$ grep -E '^[0-9a-zA-Z]{8,}$' dict_unique.txt > less_than_8.txt
might work
#3
(01-09-2014, 02:22 PM)philsmd Wrote: it should be very easy to do this w/ for instance running some (shell) commands.

The only thing you should consider:
1. why not sort and unique the dict beforehand
2. how large is the dict - must it be really fast, do you need to optimize the filtering etc
3. do you really need a dict after all, maybe bruteforce for instance the first x chars (length 1-x, where x depends on the algo and could be up to 7-8) would be more clever -> then use the dict w/ length > x and < 8 for instance

Anyway, on linux you could do something like:
Code:
$ sort -u orig_dict.txt -o dict_unique.txt
$ grep -E '^[0-9a-zA-Z]{0,8}$' dict_unique.txt > less_than_8.txt

OR

Code:
$ < orig_dict.txt parallel --pipe --gnu grep -E '^[0-9a-zA-Z]\{0,8\}$' | sort -u -o less_than_8.txt

Note: there are many possibilities to speed up the filtering... for instance grep w/ LANG=C also often helps. Furthermore, you could also consider doing it w/ awk and other tools (if the dict is not too large, you could also unique the dict all w/ a simple awk, i.e. filter+sort w/ a awk one-liner , unfortunately this doesn't scale very well w/ larger dicts)

UPDATE: I think I got it wrong and you meant all passwords >= 8, if so then something like:
Code:
$ awk '$0~/^[0-9a-zA-Z]{8,}$/' dict_unique.txt > less_than_8.txt

OR
Code:
$ grep -E '^[0-9a-zA-Z]{8,}$' dict_unique.txt > less_than_8.txt
might work

Thanks for your help. I've been using dictionaries and lists generated by Crunch for a little while now and have recently just begun looking into other modes. If my router have a default 10 digit lower alpha numeric password such as f2667d1351, what would be the proper syntax and effective mode?