Saving calculated hashes
#1
Hi!
Is there any option to save calculated hashes from dictionary to another file?
Can i just hash every line from dict and save it?
Im buliding small database with plain:md5Confusedha1 and hashing it with php would take few years Wink
#2
You can use this batch script:

Code:
while read line; do echo -n "$line" | md5sum | cut -c -32; done < [yourfile]
#3
Yeah, thanks for reply.
I can do it with batch or php, but i want accelerate that process with gpu.
#4
you'd have to code that yourself.
#5
Well, as undeath wrote, you can code it yourself. Or try to contact atom and tell him about your idea.

For now, batch should be faster than PHP Wink
#6
You need to write a program that will hash using SSE2/SSSE3/AVX to make it fast. You should write truncated binary hashes to the file. Since I guess you want to make it useful, you should make a lossy hash table. Indexing around log2(number of passwords) bits of the hash and save a binary value that represents a range of passwords... Unless you are just using an unmodified dictionary then you'll need to just save the full password. Well I guess you could split the dictionary into blocks and compress each of them. Then the "binary value that represents a range of passwords" would be which compressed dictionary part to look at.

Anyway it's not like I've done similar things... http://www.tobtu.com/md5.php. It took about 3 days to generate a database with 50 billion passwords and this was kinda slow. So far this is the best I've come up with.

You can look at this: http://en.wikipedia.org/wiki/Wikipedia_t...ano_coding I really need to go find those papers and read them. So I can add sources and get the article accepted.

I need:
"On binary representations of monotone sequences" by Peter Elias in 1972

I have (but I haven't fully read it yet but the intro was so long and boring that I figured out that in the case of lossy hash tables Huffman encoding of the number of passwords per bin (number of bins is on the order of number of passwords) is more efficient but this leads to an unknown size instead of a size that is known before you start with Elias-Fano):
"Efficient Storage and Retrieval by Content and Address of Static Files" by Peter Elias in 1974

I found (but this doesn't describe anything about the Elias-Fano coding so you "don't" need to read it):
Robert M. Fano. On the number of bits required to implement an associative memory. Memorandum 61, Computer Structures Group, Project MAC, MIT, Cambridge, Mass., n.d., 1971
#7
Thanks, that is really useful.
I have some work to do...
#8
A couple of other minor points, and a question: Why are you storing this in a database?

The minor points; when dealing with large amounts of data, it's important to remember that disk I/O isn't free :-)

On my system, the stock MD5() function in libcrypto runs at 5 million hashes per second, on a single thread. SHA1 is a bit slower, at 2.5 million hashes per second. Both assumed a typical 8-charcter password (longer passwords take longer, of course).

That same system can read a file at about 11 million words per second (again, assuming 8 character passwords, plus a linefeed). It can write a little bit slower.

So, if you just want to create a list of

word:MD5:SHA1

you will need to read ~9 characters, and write ~9+33+41 for each line.

That means you will be able to write about 1 million lines per second, before you run out of disk bandwidth.

In other words, the stock MD5 and SHA1 functions are plenty fast enough to run your disk to saturation, if you have standard hard drives. If you have an SSD, you might need to use one or two threads to bring your speed up.

Using a GPU won't help (at all) in this application.
#9
Thanks.
I will hash it slower than thought, but this isnt a problem.
I can hash 170 milion words to md5 and save it to csv in just hour Smile