|
Getting "unruly": Finding base words
|
|
06-19-2012, 01:03 AM
(This post was last modified: 06-19-2012 01:05 AM by epixoip.)
Post: #1
|
|||
|
|||
|
This is what I use to find base words in a list of plains. I am posting it both to share and to see if others have ideas for improving it.
Code: cat plains | tr A-Z a-z | sed 's/^[^a-z]*//g; s/[^a-z]*$//g; y/112345677890@\$\!\#/ilzeasbzvbgoasih/; s/[^a-z]//g; /^$/d' >basewordsA few explanations: First, I use tr instead of sed to convert upper to lower, both because it's much faster, and because it plays better with unicode. I then strip out all non-alpha chars from the beginning and end of the line. Then I do common l33t substitutions (this can probably be improved.) Then I strip out all non-lower alpha chars, and delete any empty lines. Example: take the following plains Code: l33t1979Becomes: Code: leetAll comments, thoughts, and flames welcome. |
|||
|
06-22-2012, 05:52 PM
Post: #2
|
|||
|
|||
|
RE: Getting "unruly": Finding base words
Nice work there, epixoip !
I am very interested to see if anyone here can help improve this as it is something I am hoping to be able to do. Unfortunately you are way ahead of me so I don't think I can contribute much apart from occasionally bumping this thread !
|
|||
|
06-25-2012, 04:33 PM
Post: #3
|
|||
|
|||
|
RE: Getting "unruly": Finding base words
Hi epixoip
Just to let you know that your efforts on this were not in vain ! ![]() We have managed to inspire Blazer to add his own version of this to ULM. He likes to do things his own way so it will be interesting to see the results. |
|||
|
06-30-2012, 09:13 AM
Post: #4
|
|||
|
|||
|
RE: Getting "unruly": Finding base words
right on
|
|||
|
« Next Oldest | Next Newest »
|
Search
Member List
Calendar
Help




