hashcat Forum
Getting "unruly": Finding base words - Printable Version

+- hashcat Forum (https://hashcat.net/forum)
+-- Forum: Misc (https://hashcat.net/forum/forum-15.html)
+--- Forum: General Talk (https://hashcat.net/forum/forum-33.html)
+--- Thread: Getting "unruly": Finding base words (/thread-1305.html)



Getting "unruly": Finding base words - epixoip - 06-19-2012

This is what I use to find base words in a list of plains. I am posting it both to share and to see if others have ideas for improving it.

Code:
cat plains | tr A-Z a-z | sed 's/^[^a-z]*//g; s/[^a-z]*$//g; y/112345677890@\$\!\#/ilzeasbzvbgoasih/; s/[^a-z]//g; /^$/d' >basewords

A few explanations:

First, I use tr instead of sed to convert upper to lower, both because it's much faster, and because it plays better with unicode.

I then strip out all non-alpha chars from the beginning and end of the line.

Then I do common l33t substitutions (this can probably be improved.)

Then I strip out all non-lower alpha chars, and delete any empty lines.

Example: take the following plains

Code:
l33t1979
h4$hcaT2012
39bananas
69cockmaster69

Becomes:

Code:
leet
hashcat
bananas
cockmaster

All comments, thoughts, and flames welcome.


RE: Getting "unruly": Finding base words - Hash-IT - 06-22-2012

Nice work there, epixoip !

I am very interested to see if anyone here can help improve this as it is something I am hoping to be able to do.

Unfortunately you are way ahead of me so I don't think I can contribute much apart from occasionally bumping this thread ! Smile


RE: Getting "unruly": Finding base words - Hash-IT - 06-25-2012

Hi epixoip

Just to let you know that your efforts on this were not in vain ! Smile

We have managed to inspire Blazer to add his own version of this to ULM.

He likes to do things his own way so it will be interesting to see the results.


RE: Getting "unruly": Finding base words - epixoip - 06-30-2012

right on Smile


RE: Getting "unruly": Finding base words - hashcrash - 08-17-2016

4 years later and your command is still working perfect. Thanks for that! (I'm just in the hashcat learning process...)

Just one question about special, e.g. german Characters:

My wordlists contains, for example, the word "könig". In german you sometimes write "oe" for "ö".
So does it make sense to add "koenig" to my list of baseword as well? Or is it better to write a rule (if it isn't already somewhere) for that?


And: what's about stuff like this:


���
����

(For me there are questions marks, I guess because of encoding problems.)
That isn't of any value for my baseword list, is it?


RE: Getting "unruly": Finding base words - d2 - 08-17-2016

@epixoip: just a cosmetic change, before putting output into file:

Code:
 | sort -u > basewords



RE: Getting "unruly": Finding base words - hashcrash - 08-17-2016

With your command you're not lowercase'ing stuff like german umlauts... (Ä --> ä, Ö --> ö etc.). But I'm not sure if the corresponding rule (toggle) does it... I have to check it out.


RE: Getting "unruly": Finding base words - tibit - 08-28-2016

(08-17-2016, 08:43 PM)hashcrash Wrote: With your command you're not lowercase'ing stuff like german umlauts... (Ä --> ä, Ö --> ö etc.). But I'm not sure if the corresponding rule (toggle) does it... I have to check it out.

Based on the epixoip's code, you could use 'sed' instead understand foreign characters like German, French, Turkisch ..

The code then becomes
Code:
sed 's/[[:upper:]]*/\L&/g' infile | sed 's/^[^[:lower:]]*//g; s/[^[:lower:]]*$//g; y/112345677890@\$\!\#/ilzeasbzvbgoasih/; s/[^[:lower:]]//g; /^$/d' >outfile