This is what I use to find base words in a list of plains. I am posting it both to share and to see if others have ideas for improving it.
A few explanations:
First, I use tr instead of sed to convert upper to lower, both because it's much faster, and because it plays better with unicode.
I then strip out all non-alpha chars from the beginning and end of the line.
Then I do common l33t substitutions (this can probably be improved.)
Then I strip out all non-lower alpha chars, and delete any empty lines.
Example: take the following plains
Becomes:
All comments, thoughts, and flames welcome.
Code:
cat plains | tr A-Z a-z | sed 's/^[^a-z]*//g; s/[^a-z]*$//g; y/112345677890@\$\!\#/ilzeasbzvbgoasih/; s/[^a-z]//g; /^$/d' >basewords
A few explanations:
First, I use tr instead of sed to convert upper to lower, both because it's much faster, and because it plays better with unicode.
I then strip out all non-alpha chars from the beginning and end of the line.
Then I do common l33t substitutions (this can probably be improved.)
Then I strip out all non-lower alpha chars, and delete any empty lines.
Example: take the following plains
Code:
l33t1979
h4$hcaT2012
39bananas
69cockmaster69
Becomes:
Code:
leet
hashcat
bananas
cockmaster
All comments, thoughts, and flames welcome.