06-16-2018, 10:12 AM
This is a problem related to (a) how GPU parallelization works in general in combination with (b) an algorithm with a very iteration count.
When it comes to modern hashing algorithms they are typically designed in a way that they are not parallelizable and that the calculation has to be done in serial. You can not start computing iteration 2 if you have not computed iteration 1 before. They depend on each other. This means for slow algorithms like 7-Zip (if we want to make use of the parallelization power of a gpu) we have to place a single password candidate on a single shader (which a gpu has many) and compute the entire hash on a it. This can take a very long time, depending on the iteration count of the algorithm. We're talking about times up to a minute here for a single hash computation. But what we got for doing it is that we're able to run a few 100k at the same time and make use of the parallelization power of the gpu in that way. That's why it takes so long for hashcat to report any progress, because it actually takes that long to compute a single hash with a high iteration count.
In the past hashcat did not report any speed for such extreme cases, resulting in hashing speed of 0H/s. Some users may remember such cases and wondering "why isn't it doing anything". From a technical perspective nothing changed. In the past and also today the GPU just need that much time. The only difference is in newer hashcat versions is that it creates an extrapolation based on the current progress of the iterations. For example it knows that the shader processed like 10000 of 10000000 iterations in X time and therefore it can tell how much it will (eventually) take to process the full iteration count and recomputes this into a more or less valid hashing speed, which is in that case not the real one. This is what you are shown in the speed counter.
But that's the theoretical part. When it comes to GPGPU it's actually not enough to feed as many password candidates to the GPU as it has shaders. There's a lot of context switching going on the GPU which we have no control over. To keep all the shaders busy for the entire time we have to feed it with password candidates many times the number of shaders.
Here's some illustration. Hashtopolis (and every other 3rd party overlay should) is using the progress-only parameter in order to find the ideal chunk segmentation size depending on the actual attack. So for example if you want to run this command (simplified):
The process has to run this (short living) command before:
And what you get from is is the following:
This means that my testing device (old CPU) has an (ideal) processing power of 2048 password candidates and it will take 4.2 seconds to process it. This way hashtopolis can compute that, in order to have hashcat running for 10 minutes, it will do 600/4.2 = 142 and then use this multiplier with 2048 and the result (292571) is the perfect -l value.
But does that mean that if you run the command, that it will show you a progress of 2048 after 4.2 seconds? No! And that's the real-life circumstances that kick in here. As said above, in order to keep the GPU busy even when it's doing the context switches we have to overload it with multiple times the number of shaders. This mulitplier (M) typically range from 10 to 1000. From programming perspective this means the GPU returns M times later in time!
The hashcat host process has to wait for the GPU to return in order to mark all password candidates as processed. This is when it updates the progress counter. If you disable this multiplier, you should be able to see progress updates very fast. You can control this using the -n paramter, however you will notice how the speed will drop. But what we want is speed, and this is how things are.
So by running the above command and then pressing "s" to update the status after 10 seconds there's no progress:
But if I disable the multiplier by adding -n 1 --force you can see the following after 10 seconds:
Note that the auto-tune engine automatically increased the loop count (-u) since the GPU was not so busy executing so many work items in parallel, therefore it had the chance to increase the loop count. Anyway the important part here is that you can see the progress increased, but at the same time the speed dropped.
Controlling -n manually to find the perfect tradeoff in terms of speed is a painful process. This is why you have the -w parameter in hashcat which tries to simplify this for you. That means, if you use -w 1 instead of trying -n you end up in a nice trade off:
A clever way for a user to help hashtopolis to help hashcat to deal with such extreme circumstances is to use -w 1 in the task commandline.
When it comes to modern hashing algorithms they are typically designed in a way that they are not parallelizable and that the calculation has to be done in serial. You can not start computing iteration 2 if you have not computed iteration 1 before. They depend on each other. This means for slow algorithms like 7-Zip (if we want to make use of the parallelization power of a gpu) we have to place a single password candidate on a single shader (which a gpu has many) and compute the entire hash on a it. This can take a very long time, depending on the iteration count of the algorithm. We're talking about times up to a minute here for a single hash computation. But what we got for doing it is that we're able to run a few 100k at the same time and make use of the parallelization power of the gpu in that way. That's why it takes so long for hashcat to report any progress, because it actually takes that long to compute a single hash with a high iteration count.
In the past hashcat did not report any speed for such extreme cases, resulting in hashing speed of 0H/s. Some users may remember such cases and wondering "why isn't it doing anything". From a technical perspective nothing changed. In the past and also today the GPU just need that much time. The only difference is in newer hashcat versions is that it creates an extrapolation based on the current progress of the iterations. For example it knows that the shader processed like 10000 of 10000000 iterations in X time and therefore it can tell how much it will (eventually) take to process the full iteration count and recomputes this into a more or less valid hashing speed, which is in that case not the real one. This is what you are shown in the speed counter.
But that's the theoretical part. When it comes to GPGPU it's actually not enough to feed as many password candidates to the GPU as it has shaders. There's a lot of context switching going on the GPU which we have no control over. To keep all the shaders busy for the entire time we have to feed it with password candidates many times the number of shaders.
Here's some illustration. Hashtopolis (and every other 3rd party overlay should) is using the progress-only parameter in order to find the ideal chunk segmentation size depending on the actual attack. So for example if you want to run this command (simplified):
Code:
./hashcat -m 11600 hash.txt rockyou.txt -D1
The process has to run this (short living) command before:
Code:
./hashcat -m 11600 hash.txt rockyou.txt -D1 --progress-only
And what you get from is is the following:
Code:
Progress.Dev.#2..: 2048
Runtime.Dev.#2...: 4227.98ms
This means that my testing device (old CPU) has an (ideal) processing power of 2048 password candidates and it will take 4.2 seconds to process it. This way hashtopolis can compute that, in order to have hashcat running for 10 minutes, it will do 600/4.2 = 142 and then use this multiplier with 2048 and the result (292571) is the perfect -l value.
But does that mean that if you run the command, that it will show you a progress of 2048 after 4.2 seconds? No! And that's the real-life circumstances that kick in here. As said above, in order to keep the GPU busy even when it's doing the context switches we have to overload it with multiple times the number of shaders. This mulitplier (M) typically range from 10 to 1000. From programming perspective this means the GPU returns M times later in time!
The hashcat host process has to wait for the GPU to return in order to mark all password candidates as processed. This is when it updates the progress counter. If you disable this multiplier, you should be able to see progress updates very fast. You can control this using the -n paramter, however you will notice how the speed will drop. But what we want is speed, and this is how things are.
So by running the above command and then pressing "s" to update the status after 10 seconds there's no progress:
Code:
Speed.Dev.#1.....: 9702 H/s (24.58ms) @ Accel:64 Loops:16 Thr:768 Vec:1
Progress.........: 0/14344384 (0.00%)
But if I disable the multiplier by adding -n 1 --force you can see the following after 10 seconds:
Code:
Speed.Dev.#1.....: 7382 H/s (3.10ms) @ Accel:1 Loops:128 Thr:768 Vec:1
Progress.........: 72960/14344384 (0.51%)
Note that the auto-tune engine automatically increased the loop count (-u) since the GPU was not so busy executing so many work items in parallel, therefore it had the chance to increase the loop count. Anyway the important part here is that you can see the progress increased, but at the same time the speed dropped.
Controlling -n manually to find the perfect tradeoff in terms of speed is a painful process. This is why you have the -w parameter in hashcat which tries to simplify this for you. That means, if you use -w 1 instead of trying -n you end up in a nice trade off:
Code:
Speed.Dev.#1.....: 8913 H/s (3.29ms) @ Accel:16 Loops:8 Thr:768 Vec:1
Progress.........: 61440/14344384 (0.43%)
A clever way for a user to help hashtopolis to help hashcat to deal with such extreme circumstances is to use -w 1 in the task commandline.