Getting "unruly": Finding base words
#1
Lightbulb 
This is what I use to find base words in a list of plains. I am posting it both to share and to see if others have ideas for improving it.

Code:
cat plains | tr A-Z a-z | sed 's/^[^a-z]*//g; s/[^a-z]*$//g; y/112345677890@\$\!\#/ilzeasbzvbgoasih/; s/[^a-z]//g; /^$/d' >basewords

A few explanations:

First, I use tr instead of sed to convert upper to lower, both because it's much faster, and because it plays better with unicode.

I then strip out all non-alpha chars from the beginning and end of the line.

Then I do common l33t substitutions (this can probably be improved.)

Then I strip out all non-lower alpha chars, and delete any empty lines.

Example: take the following plains

Code:
l33t1979
h4$hcaT2012
39bananas
69cockmaster69

Becomes:

Code:
leet
hashcat
bananas
cockmaster

All comments, thoughts, and flames welcome.
Reply


Messages In This Thread
Getting "unruly": Finding base words - by epixoip - 06-19-2012, 01:03 AM
RE: Getting "unruly": Finding base words - by d2 - 08-17-2016, 01:05 PM