generating actual hashes to stdout
#1
Photo 
Hello, I was wondering if I'm missing an obvious way to generate and save the hashes themselves?
I would like to work on some sort of classifier (my guess is a random forest would be sufficient to be decent), and am having trouble finding a relatively painless way of getting a large quantity of examples for the extensive hashmodes hashcat supports. 
Essentially, I would like to re-create the example page (https://hashcat.net/wiki/doku.php?id=example_hashes), but with ~1000+ examples for each (non-file) mode.
Any suggestion would be appreciated.
Reply
#2
Hashcat can't do this for a handful of reasons but mainly, it'd be very very slow and would take a lot of resources for a GPU to do this. Your best bet is to copy the algorithm to a script or utility designed to spit out the hashes themselves and avoid doing it with a device like a GPU. Be aware, a file full of hashes will get very big, very fast. Lots of the modes in hashcat are also _NOT_ hashes, but may involve hashing at some step. Those modes are not easy to recreate as they may require encrypting/compressing data as well as hashing. The test.pl scripts in hashcat are a good place to look for some more basic perl implementations of the algorithms.

Also I hate to the bearer of bad news but if you are trying to train a classifier on hashes, you are going to have a very very tough time. If it's for recognizing hash formats, they are either easy to know instantly or impossible to know from the format alone. If you want to determine information about the hash itself, such as relating to the plaintext, the vast majority are designed specifically to resist that and it will also likely not be possible.
Reply
#3
(12-03-2022, 07:32 AM)Chick3nman Wrote: Hashcat can't do this for a handful of reasons but mainly, it'd be very very slow and would take a lot of resources for a GPU to do this. Your best bet is to copy the algorithm to a script or utility designed to spit out the hashes themselves and avoid doing it with a device like a GPU. Be aware, a file full of hashes will get very big, very fast. Lots of the modes in hashcat are also _NOT_ hashes, but may involve hashing at some step. Those modes are not easy to recreate as they may require encrypting/compressing data as well as hashing. The test.pl scripts in hashcat are a good place to look for some more basic perl implementations of the algorithms.

Also I hate to the bearer of bad news but if you are trying to train a classifier on hashes, you are going to have a very very tough time. If it's for recognizing hash formats, they are either easy to know instantly or impossible to know from the format alone. If you want to determine information about the hash itself, such as relating to the plaintext, the vast majority are designed specifically to resist that and it will also likely not be possible.

Thank you for the informative reply, I will have a look at the test.pl scripts.

As for the warning, you are entirely correct. The problem indeed tends to fall into the "trivial or impossible" side of things, and I agree that knowing the hash type with certainty isn't possible, and moreover gleaning information about the initial plaintext is literally what hash functions are built to prevent with the avalanche effect.

However, what I'm really trying to accomplish is narrow down the pool of possibilities, such that the model takes in a hash and spits out an ordered list of possible hash types (which could be accomplished by sorting by model softmax confidences). It doesn't really need to be great, I would just like it to at minimum eliminate totally impossible choices (such as incompatible lengths).

I've had a look around and people have tried to write scripts to do this kind of this by hand (see https://github.com/SmeegeSec/HashTag and https://gitlab.com/kalilinux/packages/hash-identifier),
but a peek behind the curtain at the source code reveals a quick descent into if else hell.
If you're gonna descend into if else hell, you might as well let the model do it for you!

If anything else comes to mind or anyone has any suggestions, I'm all ears. Thanks!
Reply
#4
Hashcat accomplishes this by running ingested hashes through each of our module parsers one at a time and reviewing the output for proper tokenization/parsing. It will give you all possibilities for hashes that match several, with info about each. It will also give you exact matches for hashes that have a consistent format to identify them. Further, we gate some modes that should not come up in the matching based on the usage of the kernels to avoid matching things that arent useful to the user. See here:

https://github.com/hashcat/hashcat/commi...d5cc46a8b0
Reply
#5
(12-04-2022, 12:14 AM)Chick3nman Wrote: Hashcat accomplishes this by running ingested hashes through each of our module parsers one at a time and reviewing the output for proper tokenization/parsing. It will give you all possibilities for hashes that match several, with info about each. It will also give you exact matches for hashes that have a consistent format to identify them. Further, we gate some modes that should not come up in the matching based on the usage of the kernels to avoid matching things that arent useful to the user. See here:

https://github.com/hashcat/hashcat/commi...d5cc46a8b0

Okay, I have never seen that before. That is AMAZING and will do beautifully for my purpose.

Thanks!!
Reply