random is easy to generate but hard to detect. humans recognize what appears to be random by identifying that there are no recognizable patterns in the text. programmatically speaking, you have to do the same.
one way you could approach this is kind of like the spell checker approach you mentioned -- you could check each word in your wordlist against a dictionary and only print matches. you'd need to do case insensitive matches and do basic "l33t" substitutions, etc.
a VERY simple example that is guaranteed to have false negatives would be something like this:
this is a mediocre implementation at best, of course. the biggest problem with this code is it doesn't do any sort of detection of made-up compound words (like 'applebanana' for example), but the spellcheck approach would have this problem, too.
here's an example. using these plains:
this script outputs the following:
so, you know, not too bad. but far from good. this might give you something to build upon though.
one way you could approach this is kind of like the spell checker approach you mentioned -- you could check each word in your wordlist against a dictionary and only print matches. you'd need to do case insensitive matches and do basic "l33t" substitutions, etc.
a VERY simple example that is guaranteed to have false negatives would be something like this:
Code:
while read plain; do
baseword="$(echo $plain | sed -r 'y/4310/aeio/; s/[^a-zA-Z]//g; s/^.(.*)./\1/' | tr A-Z a-z)"
if grep "$baseword" words.english.txt >/dev/null; then
echo $plain
else
echo "'$plain' appears to be random: couldn't find '$baseword' in the dictionary." >&2
fi
done < sample.plains
this is a mediocre implementation at best, of course. the biggest problem with this code is it doesn't do any sort of detection of made-up compound words (like 'applebanana' for example), but the spellcheck approach would have this problem, too.
here's an example. using these plains:
Code:
applebanana
69cockmaster69
8dJ3na3Ldn4
tac0v4gina5
bluebear
red123
aN7b3mlK
this script outputs the following:
Code:
'applebanana' appears to be random: couldn't find 'pplebanan' in the dictionary.
69cockmaster69
'8dJ3na3Ldn4' appears to be random: couldn't find 'jenaeldn' in the dictionary.
'tac0v4gina5' appears to be random: couldn't find 'acovagin' in the dictionary.
bluebear
red123
'aN7b3mlK' appears to be random: couldn't find 'nbeml' in the dictionary.
so, you know, not too bad. but far from good. this might give you something to build upon though.