Proper method to extract the hash from a PDF file?
#1
Quote:METHOD COMPLETED at post #6.

In order to test the PDF attack feature of OCLHashCat, I am using these sample hashes files:

+ Sample non-hashes (to be) supported by JtR
http://openwall.info/wiki/john/sample-non-hashes
For example, this one:
http://openwall.info/wiki/_media/john/pdf_samples.tar
And this file from there in:
test-3-RC4-40-open-testpassword.pdf

So I extract it (from Linux shell) by using PDF2John from John the Ripper suite:

Code:
$ ./pdf2john test-3-RC4-40-open-testpassword.pdf test-3-RC4-40-open-testpassword.pdf:$pdf$Standard*9a1156c38ab8177598d1608df7d7e340ae639679bd66bc4cda9bc9a4eedeb170*1f300cd939dd5cf0920c787f12d16be22205e55a5bec5c9c6d563ab4fd0770d7*16*c015cff8dbf99345ac91c84a45667784*1*1*0*1*6*40*-4*2*1
$ ./pdf2john test-3-RC4-40-open-testpassword.pdf > test3.txt
n$ cat test3.txt
test-3-RC4-40-open-testpassword.pdf:$pdf$Standard*9a1156c38ab8177598d1608df7d7e340ae639679bd66bc4cda9bc9a4eedeb170*1f300cd939dd5cf0920c787f12d16be22205e55a5bec5c9c6d563ab4fd0770d7*16*c015cff8dbf99345ac91c84a45667784*1*1*0*1*6*40*-4*2*1

And now I try to crack it (from PowerShell at Windows 7 SP1):

Code:
PS C:\Users\Luis> oclhashcat64 "test3.txt" -m 10400 -a 3 anypassword
oclHashcat v1.33 starting...

WARNING: Hashfile 'test3.txt' in line 1 (test-3-RC4-40-open-testpassword.pdf:$pdf$Standard*9a1156c38ab8177598d1608df7d7e340ae639679bd66bc4cda9bc9a4eedeb170*1f300cd939dd5cf0920c787f12d16be22205e55a5bec5c9c6d563ab4fd0770d7*16*c015cff8dbf99345ac91c84a45667784*1*1*0*1*6*40*-4*2*1): Line-length exception
Parsed Hashes: 1/1 (100.00%)

ERROR: No hashes loaded

Any hash type (10400, 10410, 10420, 10500, 10600, 10700) fail.
If I modify the "test3.pdf" file and remove the "test-3-RC4-40-open-testpassword.pdf:" part, the attacks keep failing.

I think the problem comes, acording to this thread, from the PDF hash file format, that should rather be something like (supposed example for RC4-40 cyphering):

Code:
$pdf$1*2*40*-4*1*16*c015cff8dbf99345ac91c84a45667784*32*1f300cd939dd5cf0920c787f12d16be22205e​55a5bec5c9c6d563ab4fd0770d7*32*9a1156c38ab8177598d1608df7d7e340ae639679bd66bc4cd​a9bc9a4eedeb170:$HEX[db34433720]

If this is the kind of hash file that must be generated, what is the method to achieve it?

Thanks you.
#2
I think there are 2 problems here:
- your pdf2john must be out of date because mine gives the right format;
- you should remove the first part of the hash (before the ':') and leave ONLY the hash to use it in hashcat.
#3
Well, in fact, the "pdf2john" file included in my distro (The-Distribution-Which-Does-Not-Handle-OpenCL-Well (Kali) Linux v1.0.9) seems to be different, so I downloaded latest John the Ripper Jumbo version from here:
http://www.openwall.com/john/

and now the result (same file) is:

Code:
$ ../JTR/john-1.8.0-jumbo-1/run/pdf2john.py test-3-RC4-40-open-testpassword.pdf
test-3-RC4-40-open-testpassword.pdf:$pdf$1*2*40*-4*1*16*c015cff8dbf99345ac91c84a45667784*:::::test-3-RC4-40-open-testpassword.pdf

But I keep having the same Line-length exception, even leaving the hash file like:

Code:
$pdf$1*2*40*-4*1*16*c015cff8dbf99345ac91c84a45667784*:::::test-3-RC4-40-open-testpassword.pdf

$pdf$1*2*40*-4*1*16*c015cff8dbf99345ac91c84a45667784*:::::

$pdf$1*2*40*-4*1*16*c015cff8dbf99345ac91c84a45667784*

$pdf$1*2*40*-4*1*16*c015cff8dbf99345ac91c84a45667784

Maybe you could give me some idea about where did you get your pdf2john.py version?
Mine is 13.376 bytes size.

Thanks you.
#4
download pdf2john.py from github -> https://github.com/magnumripper/JohnTheRipper
and everything will be ok Wink
#5
I just made a video: http://youtu.be/4eK8MZWEYyU
#6
Here is the complete procedure (thanks you all who helped, boys):

- Download "pdf2john.py" from the suite "John the Ripper":

Code:
wget https://github.com/magnumripper/JohnTheRipper/archive/bleeding-jumbo.zip
unzip bleeding-jumbo.zip
cp JohnTheRipper-bleeding-jumbo/run/pdf2john.py .

- Run "pdf2john.py" file from wherever you want on your .pdf file, leaving only characters between (inside) the ":" :

Code:
./pdf2john.py MyPDF.pdf | sed 's/::.*$//' | sed 's/^.*://' > MyPDF-Hash.txt

Note the two "sed" commands, intended to filter the output string. For example, the original:
Code:
MyPDF.pdf:$pdf$4*4*128*-1028*1*16*652fc762fdb12c47a5f90ddd2b99b809*32*dd
86d858f914809078a4a47348d32c0fc4e9c08042a10e6434b48b698de7731f*32*3c1e693526d5bc
8da15b99eea6cbc6ed2c2397e23e2c39d1974fdc004c588cff:::::MyPDF.pdf

... would finish as the appropriate (for OCLHashCat):

Code:
$pdf$4*4*128*-1028*1*16*652fc762fdb12c47a5f90ddd2b99b809*32*dd86d858f914809078a4a47348d32c0fc4e9c08042a10e6434b48b698de7731f*32*3c1e693526d5bc8da15b99eea6cbc6ed2c2397e23e2c39d1974fdc004c588cff

Or you can do it step by step:

Code:
./pdf2john.py MyPDF.pdf > MyPDF-Hash.txt
nano MyPDF-Hash.txt

Windows edition could be (it worked for me, it seems no matter the CR/LF issue with text files):
Code:
notepad MyPDF-Hash.txt

... and remove all the stuff outside the ":" (included).

- The file "MyPDF-Hash.txt" is now ready to process with OCLHashCat. Good cracking!


NOTES:
- Works on both Linux shell and CygWin (python required).
- If you do this from Windows but without CygWin (for example, by using another Python interpreter), remember that the "sed" utility included in UnxUtils (for Windows) can not work (as for today, February 2015) with single quotes ('), so use double quotes instead ("):

Code:
./pdf2john.py MyPDF.pdf | sed "s/::.*$//" | sed "s/^.*://" > MyPDF-Hash.txt

P.S: Of course, this can be added to the Wiki as a Tutorial or whatever, if Atom considers it OK :-) .
#7
Things were a bit more confusing due to sites like this one whose details about the format for a hash PDF file was incorrect (or so it seems).
#8
Good stuff, I ran into the same issues when benchmarking the new release.
#9
Hi, I found this thread cause I look for an easy way to determine which mode to choose, and how to determine this easy way which pdf is encrypted by which method. As an example, I have bunch of encrypted pdf's:
Code:
$pdf$1*2*40*-4*1*16*
$pdf$1*2*40*-64*1*16*
$pdf$2*3*128*-1028*1*16*
$pdf$2*3*128*-3904*1*16*
$pdf$2*3*128*-4*1*16*
$pdf$4*4*128*-1028*1*16*
$pdf$4*4*128*-4*1*16*
$pdf$5*5*256*-1028*1*16*
$pdf$5*6*256*-1028*1*16*
is it enough to determine the mode, or maybe hash length is also important? Maybe it would be easier to implement this into hashcat to autoguess mode from given hash. Thanks in advance for help
#10
It should be enough to use the right mode.

You can use this as a reference: https://hashcat.net/wiki/doku.php?id=example_hashes