In our User Contribution section, people have posted about tools they've to remove hashes from word lists.
But I've seen references to using "MDXfind" to do the same. Does anyone here know the syntax to use MDXfind as a tool to remove hashes from word lists?
(And, out of curiosity, how often are strings that look like hashes found to be actually real passwords, created by an actual human?)
No clue, maybe ask on their forums?
Even if I could get access to their forums, it would likely be dismissed as an off-topic and/or stupid question.
I was hoping for enough of an overlap of people to get an answer here.
Wait, why do you think such a "stupid question" is a non-stupid question if you ask it on this forum?
Because of what little I could find about it, what I am wanting to do with it is quite different from its intended usage.
Meanwhile, I asked ßlazer, who passed the message onto Fred Wang, the creator of MDXfind, who replied:
"MDXfind does not process word list in the fashion you suggest."
So now I know that what I thought might be a possible use, isn't.
[a necromancer appears]
Perhaps they mean the 'mdsplit' tool, that's a companion to mdxfind. But mdsplit is designed to remove founds - hash:plain pairs - from a list of target hashes, not to remove just hashes.
If your use case is 'remove everything that looks like a hash from a wordlist', then grepping out based on a set of regular expressions, like [0-9a-fA-F]{32}, [0-9a-fA-F]{40}, etc is probably your best bet. There will be false positives, though - the only way to truly tell if they're hashes is to crack them.
And since mdxfind can crack many different kinds of hashes at once, then mdxfind (followed by mdsplit) *could* be used to "detect" (by cracking), and then remove, a bunch of hashes from a wordlist. But it would miss any hashes that you can't crack.
So the real-world solution is likely to be a best-effort one, where you can cull a wordlist based on regex, and then maybe try to crack what's filtered from there to catch obvious ones, and then visually inspect the rest (or even measure their randomness on a per-string basis, and sift one the ones that seem to be less random.
Definitely a non-trivial problem - but a fun one.