HashCat takes a very long time to report progress
#1
I've been working with s3inlc, of hashtopolis for a few days now trying to iron out some issues with HashCat and Hashtopolis.

It has come down to an issue with HashCat taking a very long time to report any progress. About 10 - 15 minutes in fact.

I'm not entirely sure why, as my hash rate for 7z is about 3,000 H/s.

How does the the checkpoint system work in HashCat, what sort of work load is required to hit a checkpoint and get status updates?

Example 10 minute job, no progress, hashing at 3,000+ H/s:

Code:
C:\Users\Jeff\Desktop\x\hashcat\hashtopolis\Python\crackers\1>HashCat64.exe --quiet --runtime=600 --restore-disable --potfile-disable --session=hashtopolis -a 0 -m 11600 ..\..\hashlists\13 ..\..\files\rockyou.txt -p : --hash-type=11600 -o ..\..\hashlists\13.out


Session..........: hashtopolis
Status...........: Running
Hash.Type........: 7-Zip
Hash.Target......: $7z$1$19$0...
Time.Started.....: Thu Apr 05 16:48:14 2018 (3 mins, 47 secs)
Time.Estimated...: Thu Apr 05 18:05:52 2018 (1 hour, 13 mins; Runtime limited: 6 mins, 13 secs)
Guess.Base.......: File (..\..\files\rockyou.txt)
Guess.Queue......: 1/1 (100.00%)
Speed.Dev.#1.....:     1342 H/s (32.50ms) @ Accel:64 Loops:32 Thr:256 Vec:1
Speed.Dev.#2.....:     1895 H/s (33.53ms) @ Accel:64 Loops:32 Thr:256 Vec:1
Speed.Dev.#*.....:     3237 H/s
Recovered........: 0/1 (0.00%) Digests, 0/1 (0.00%) Salts
Progress.........: 0/14344384 (0.00%)
Rejected.........: 0/0 (0.00%)
Restore.Point....: 0/14344384 (0.00%)
Candidates.#1....: 123456 -> 457076
Candidates.#2....: 457068 -> gogetto
HWMon.Dev.#1.....: Util:100% Core:1030MHz Mem:1250MHz Bus:8
HWMon.Dev.#2.....: Util: 80% Core:1050MHz Mem: 500MHz Bus:8



Session..........: hashtopolis
Status...........: Aborted (Runtime)
Hash.Type........: 7-Zip
Hash.Target......: $7z$1$19$0...
Time.Started.....: Thu Apr 05 16:48:14 2018 (10 mins, 13 secs)
Time.Estimated...: Thu Apr 05 18:12:18 2018 (1 hour, 13 mins; Runtime limit exceeded)
Guess.Base.......: File (..\..\files\rockyou.txt)
Guess.Queue......: 1/1 (100.00%)
Speed.Dev.#1.....:     1342 H/s (32.51ms) @ Accel:64 Loops:32 Thr:256 Vec:1
Speed.Dev.#2.....:     1894 H/s (33.53ms) @ Accel:64 Loops:32 Thr:256 Vec:1
Speed.Dev.#*.....:     3237 H/s
Recovered........: 0/1 (0.00%) Digests, 0/1 (0.00%) Salts
Progress.........: 0/14344384 (0.00%)
Rejected.........: 0/0 (0.00%)
Restore.Point....: 0/14344384 (0.00%)
Candidates.#1....: 123456 -> 457076
Candidates.#2....: 457068 -> gogetto
HWMon.Dev.#1.....: Util:  0% Core: 425MHz Mem:1250MHz Bus:8
HWMon.Dev.#2.....: Util:  0% Core: 854MHz Mem: 500MHz Bus:8
#2
I'm really hoping to not lose momentum in my project, hoping someone can assist me here.
#3
Have you tried manually setting the status update frequency (with --status and --status-timer)?

[Edit: apparently this issue is regardless of that; could be a potential feature, likely a GitHub issue in the works for it]
~
#4
(04-26-2018, 06:59 AM)royce Wrote: Have you tried manually setting the status update frequency (with --status and --status-timer)?

[Edit: apparently this issue is regardless of that; could be a potential feature, likely a GitHub issue in the works for it]

Are you suggesting that this is currently by design? and that faster reporting is in the works?

I'd imagine the many other user using hashtopolis aren't experiencing this lag Confused
#5
I wouldn't say by design - but I have no idea if this longer delay is "fixable" or a natural consequence of the approach. My understanding is that there's some development discussion going on about it, but I don't know any of the details.
~
#6
This is a problem related to (a) how GPU parallelization works in general in combination with (b) an algorithm with a very iteration count.

When it comes to modern hashing algorithms they are typically designed in a way that they are not parallelizable and that the calculation has to be done in serial. You can not start computing iteration 2 if you have not computed iteration 1 before. They depend on each other. This means for slow algorithms like 7-Zip (if we want to make use of the parallelization power of a gpu) we have to place a single password candidate on a single shader (which a gpu has many) and compute the entire hash on a it. This can take a very long time, depending on the iteration count of the algorithm. We're talking about times up to a minute here for a single hash computation. But what we got for doing it is that we're able to run a few 100k at the same time and make use of the parallelization power of the gpu in that way. That's why it takes so long for hashcat to report any progress, because it actually takes that long to compute a single hash with a high iteration count.

In the past hashcat did not report any speed for such extreme cases, resulting in hashing speed of 0H/s. Some users may remember such cases and wondering "why isn't it doing anything". From a technical perspective nothing changed. In the past and also today the GPU just need that much time. The only difference is in newer hashcat versions is that it creates an extrapolation based on the current progress of the iterations. For example it knows that the shader processed like 10000 of 10000000 iterations in X time and therefore it can tell how much it will (eventually) take to process the full iteration count and recomputes this into a more or less valid hashing speed, which is in that case not the real one. This is what you are shown in the speed counter.

But that's the theoretical part. When it comes to GPGPU it's actually not enough to feed as many password candidates to the GPU as it has shaders. There's a lot of context switching going on the GPU which we have no control over. To keep all the shaders busy for the entire time we have to feed it with password candidates many times the number of shaders.

Here's some illustration. Hashtopolis (and every other 3rd party overlay should) is using the progress-only parameter in order to find the ideal chunk segmentation size depending on the actual attack. So for example if you want to run this command (simplified):

Code:
./hashcat -m 11600 hash.txt rockyou.txt -D1

The process has to run this (short living) command before:

Code:
./hashcat -m 11600 hash.txt rockyou.txt -D1 --progress-only

And what you get from is is the following:

Code:
Progress.Dev.#2..: 2048
Runtime.Dev.#2...: 4227.98ms

This means that my testing device (old CPU) has an (ideal) processing power of 2048 password candidates and it will take 4.2 seconds to process it. This way hashtopolis can compute that, in order to have hashcat running for 10 minutes, it will do 600/4.2 = 142 and then use this multiplier with 2048 and the result (292571) is the perfect -l value. 

But does that mean that if you run the command, that it will show you a progress of 2048 after 4.2 seconds? No! And that's the real-life circumstances that kick in here. As said above, in order to keep the GPU busy even when it's doing the context switches we have to overload it with multiple times the number of shaders. This mulitplier (M) typically range from 10 to 1000. From programming perspective this means the GPU returns M times later in time! 

The hashcat host process has to wait for the GPU to return in order to mark all password candidates as processed. This is when it updates the progress counter. If you disable this multiplier, you should be able to see progress updates very fast. You can control this using the -n paramter, however you will notice how the speed will drop. But what we want is speed, and this is how things are.

So by running the above command and then pressing "s" to update the status after 10 seconds there's no progress:

Code:
Speed.Dev.#1.....:     9702 H/s (24.58ms) @ Accel:64 Loops:16 Thr:768 Vec:1
Progress.........: 0/14344384 (0.00%)

But if I disable the multiplier by adding -n 1 --force you can see the following after 10 seconds:

Code:
Speed.Dev.#1.....:     7382 H/s (3.10ms) @ Accel:1 Loops:128 Thr:768 Vec:1
Progress.........: 72960/14344384 (0.51%)

Note that the auto-tune engine automatically increased the loop count (-u) since the GPU was not so busy executing so many work items in parallel, therefore it had the chance to increase the loop count. Anyway the important part here is that you can see the progress increased, but at the same time the speed dropped. 

Controlling -n manually to find the perfect tradeoff in terms of speed is a painful process. This is why you have the -w parameter in hashcat which tries to simplify this for you. That means, if you use -w 1 instead of trying -n you end up in a nice trade off:

Code:
Speed.Dev.#1.....:     8913 H/s (3.29ms) @ Accel:16 Loops:8 Thr:768 Vec:1
Progress.........: 61440/14344384 (0.43%)

A clever way for a user to help hashtopolis to help hashcat to deal with such extreme circumstances is to use -w 1 in the task commandline.