08-02-2023, 10:09 PM
[a necromancer appears]
Perhaps they mean the 'mdsplit' tool, that's a companion to mdxfind. But mdsplit is designed to remove founds - hash:plain pairs - from a list of target hashes, not to remove just hashes.
If your use case is 'remove everything that looks like a hash from a wordlist', then grepping out based on a set of regular expressions, like [0-9a-fA-F]{32}, [0-9a-fA-F]{40}, etc is probably your best bet. There will be false positives, though - the only way to truly tell if they're hashes is to crack them.
And since mdxfind can crack many different kinds of hashes at once, then mdxfind (followed by mdsplit) *could* be used to "detect" (by cracking), and then remove, a bunch of hashes from a wordlist. But it would miss any hashes that you can't crack.
So the real-world solution is likely to be a best-effort one, where you can cull a wordlist based on regex, and then maybe try to crack what's filtered from there to catch obvious ones, and then visually inspect the rest (or even measure their randomness on a per-string basis, and sift one the ones that seem to be less random.
Definitely a non-trivial problem - but a fun one.
Perhaps they mean the 'mdsplit' tool, that's a companion to mdxfind. But mdsplit is designed to remove founds - hash:plain pairs - from a list of target hashes, not to remove just hashes.
If your use case is 'remove everything that looks like a hash from a wordlist', then grepping out based on a set of regular expressions, like [0-9a-fA-F]{32}, [0-9a-fA-F]{40}, etc is probably your best bet. There will be false positives, though - the only way to truly tell if they're hashes is to crack them.
And since mdxfind can crack many different kinds of hashes at once, then mdxfind (followed by mdsplit) *could* be used to "detect" (by cracking), and then remove, a bunch of hashes from a wordlist. But it would miss any hashes that you can't crack.
So the real-world solution is likely to be a best-effort one, where you can cull a wordlist based on regex, and then maybe try to crack what's filtered from there to catch obvious ones, and then visually inspect the rest (or even measure their randomness on a per-string basis, and sift one the ones that seem to be less random.
Definitely a non-trivial problem - but a fun one.
~