hashcat has to read through wordlist when using restore or --skip
#1
Maybe this could be a possible speed improvement for hashcat with using restore or --skip:

Been running a 3-4 day run with a 157 GB wordlist the initial restore/--skip part takes 15+ mins to restore due to having to READ through the huge wordlist, (I'm currently at 82% so A LOT of reading). I run every day for ~10 hours as I have solar panels so does not cost me anything to run, but does mean I have to restore a previous session.

Seems pointless when the .restore file could also store the byte position in the wordlist file so, instant start basically with a simple change.

He could even store the byte position every 5% in the .dictstat2 file so, when using --skip, it can work out the lower % and skip straight to it, then read upto the --skip value, easy and FASTER!

Seem feasable?

FYI, if you want to download the 157 GB wordlist, find it below in 7zip chunks:

http://share.blandyuk.co.uk/wordlists/huge/
Reply
#2
I think this is related: https://github.com/hashcat/hashcat/issues/1644

it's probably much more complicated than just storing 1 additional value... I think the problem has mainly to do with statistics and making sure the file didn't change etc.

what I mean by statistics is that in the status we have a lot of values... e.g. rejected count, progress, restore points etc... if we just seek to a position all those values could be computed totally wrong.

I think we discussed a while back if we should add even a further file to improve seeking... because the current architecture is that dictstat has nothing to do with restore and the .restore file has kind of nothing to do with stats about the dict etc... therefore storing this restore/skip/seek values in one of them, is actually quite misleading and wrong.

That said, I think there is still room for improvement, we just need to come up with a clever strategy that doesn't impact the user that doesn't intent to restore/seek/--skip (performance-wise) and that is also correct with all the stats/values etc (for instance, we won't want to risk to somehow get progress > 100%, just because we didn't want to risk waiting a few extra minutes to get the values correct etc).

Any volunteer that has enough insides and knowledge about the general architecture of hashcat and has a good strategy is of course very welcome... and of course also Pull Requests that try to improve speed or fix the "problem" in a very clever and most importanty correct way. Thx
Reply
#3
Thanks for the reply philsmd. Yes, I can imagine there will be some issues to overcome but, this would help a lot for runs like this. Hopefully, someone can come up with a solution as would be greatly appreciated Smile

Hashcat does see any changes in filesize / dates and rebuild dictstat2. If a user went so far as to try and "trick" dictstat2, then thats there prerogative. Hashcat also write .restore files anyway so should be no impact what-so-ever hopefully.

Anyway, hopefully a solution will come.
Reply
#4
I was thinking about this now a little bit and I found another problem...

if the user uses --skip (or short -s) it doesn't mean that the byte offset is available in any file (neither .restore, .dicstat2 or a new file we come up with)... therefore we would need a solution that would also work if the user wants to run --skip without previously running the dict up to a certain point
(at least this is what I think s3inlc is trying to run with hashtopolis etc and this feature would help a lot there as far as I understood)...

Therefore I'm not sure if an offset would help at all, if we can't assume that this offset is always available (only for --restore maybe, but not with --skip) !?
Reply
#5
You could store the file position OR closest offset value every 5% in the dictstat2 file for each wordlist as mentioned in first post. Maybe only do this for wordlists over 1GB? Hashcat actually displays the % progress from the --skip value anyway so, all you do is round-down to nearest 5% and move to that byte value if possible:

Live Example: Progress.......: 12814444170/14481290158 (88.49%)

88.49% = 85% nearest so, only have to read the remaining 3.49% and we are off cracking again.

(14481290158 * 0.85) = floor(12309096634.3) = 12309096634 offset (now we have file position @ 85% when building dictstat2). I don't fully understand how dictstat2 stores the wordlist data so might not be possible.
Reply