using hashcat to generate file-checksums - Printable Version +- hashcat Forum (https://hashcat.net/forum) +-- Forum: Misc (https://hashcat.net/forum/forum-15.html) +--- Forum: General Talk (https://hashcat.net/forum/forum-33.html) +--- Thread: using hashcat to generate file-checksums (/thread-7221.html) |
using hashcat to generate file-checksums - gmk - 01-21-2018 it's not the intended purpose of hashcat, but can it (or any of its tools) be used to generate checksums of files, leveraging gpu? Preferably sha256 or higher. My guess is no, but I'd love to be wrong on this one. I didn't find the answer on the forums, wiki or faq, nor when trying the software itself. I am looking for accelerating this process, and disk-IO doesn't seem to be the bottleneck. Thanks for any input RE: using hashcat to generate file-checksums - undeath - 01-21-2018 File checksumming on gpu would make not much sense because PCIe throughput would be the bottleneck. Besides, GPUs are fast because they have so many cores and can do many computations parallelly. That's not what you need for file hashing. RE: using hashcat to generate file-checksums - gmk - 01-21-2018 thanks for the heads up! But isn't pcie bandwidth usually more abundant than disk bandwidth? 8x pcie 2.0 lanes or 4x pcie 3.0 lanes should be around 4 GigaBytes per Second, in each direction. freeing the cpu for other tasks would also be a huge bonus I haven't checked throughput on igpu yet, not sure how much that would be \edit: I am rather new to the topic, but yes, a multithreaded solution would be great. sha256sum for instance seems to work single threaded. I could work around that by just processing multiple files in parallel, for now. just am not sure if using a gpu could make sense in the first place, so I started by poking around here RE: using hashcat to generate file-checksums - undeath - 01-21-2018 Sorry, I misread your statement about the bottleneck. Most checksum functions are not designed to allow parallel processing of a single hashing operation and the sha1/2 familiy is one of that. (due to the Merkle-Damgard construction) With the much slower GPU cores I would expect hashing of even muitiple files (limited by either IO or PCIe bandwidth) to be slower. RE: using hashcat to generate file-checksums - gmk - 01-21-2018 thanks undeath! especially for pointing me towards "merkle-damgard construction", now it's much more clear why sha2 (and below) aren't made for gpu/parallel execution I was hoping it would be a matter of splitting up files, calculating the checksum of the parts, and summing it up after, while remaining deterministic. The splitting up being the part where it started to scale and be better the bigger the files were. Also, my plan of just using a black-box solution without having to understand ANY of the innerts failed. Oh well b2sum (blake2) seems to be one of the few tools that support multithreading. MD6 with the Merkle tree seems well suited for parallel calculations. Someone published work on that, maybe I can get a hand on it: https://www.researchgate.net/publication/292504353_The_fast_implementation_of_MD6_on_GPU I am aware this is offtopic in regards to hashcat, but if anyone stumbles open this thread looking for something similar, they might not mind finding the additional information. e: undit my edit, something seems off e3: redoing my first edit. Turns out the testfile was on a slow harddisk, but results on a pcie-ssd were identical. Turns out the file is sparse. Only ~3MB worth of data in a 15GB file, so like no IO overhead at all. Comparing sha1sum with sha256sum or sha512sum, I'd expect different results if IO bottlenecked: same file in all tests, filesize 15G time md5sum real 0m25.519s user 0m21.643s sys 0m3.353s time sha1sum real 0m18.815s user 0m15.497s sys 0m3.300s time sha256sum real 0m37.194s user 0m33.767s sys 0m3.420s time sha512sum real 0m26.384s user 0m23.020s sys 0m3.180s results remain in the same ballpark on multiple testruns. sha512sum outperforming sha256sum is a 64bit thingy A test on a "real" file with 7G on a sata 7200rpm drive resulted in all of them taking 53+ seconds, no matter which tool was used. Here IO definitely was the bottleneck. on a pcie-ssd drive the same file resulted between 10 and 120 seconds, depending on which tool was used (indicating IO was not the bottleneck) |