Getting "unruly": Finding base words
#8
(08-17-2016, 08:43 PM)hashcrash Wrote: With your command you're not lowercase'ing stuff like german umlauts... (Ä --> ä, Ö --> ö etc.). But I'm not sure if the corresponding rule (toggle) does it... I have to check it out.

Based on the epixoip's code, you could use 'sed' instead understand foreign characters like German, French, Turkisch ..

The code then becomes
Code:
sed 's/[[:upper:]]*/\L&/g' infile | sed 's/^[^[:lower:]]*//g; s/[^[:lower:]]*$//g; y/112345677890@\$\!\#/ilzeasbzvbgoasih/; s/[^[:lower:]]//g; /^$/d' >outfile
Reply


Messages In This Thread
Getting "unruly": Finding base words - by epixoip - 06-19-2012, 01:03 AM
RE: Getting "unruly": Finding base words - by d2 - 08-17-2016, 01:05 PM
RE: Getting "unruly": Finding base words - by tibit - 08-28-2016, 08:20 PM