I have been testing the -m 10400 through -m 10700 PDF functionality with a buddy and stumbled on a weird issue. Basically I started out with a PDF that has the owner password set, but the user password unset. Using Acrobat 11 Pro I generated a PDF using the owner password 'hashcat'. Then I extracted the hash using the latest pdf2john from github with the following hashcat parameters:
The hash target specified in 'PDF-password-is-hashcat.hash' looked like this:
It only took about a half hour to crack on a GTX 670. So I imagine it'll be a lot faster on an ATI 7970.
Anyhow, here is where it gets weird.
The original file was output using the Adobe Acrobat 10.0 Paper Capture Plugin. The document security properties were identical other than a different password.
Since that worked I quickly put together a batch script to run through common password masks:
http://pastebin.com/2fDeQkQ6
The first line immediately threw an error:
The hash was extracted using the latest pdf2john.py the same way as how I extracted it with the first PDF. That got me curious. I tried several other versions of pdf2john (john-1.8.0-jumbo-1, john179j5, and older copy I have) and got similar output.
For example, john-1.8.0-jumbo-1 output:
To make doubly sure the data was correct, I quickly grabbed a PDF-parser tool and wrote a companion script to convert the owner object hash to a hexstring for hashcat.
The two are clearly different:
pdf2john:
XXXX507dabd18be0aa0d9f4c70607c0fa183ba9cf5e503026a524319b5e7XXXX
manual extraction:
XXXX507dabd18be0aa5c5c729f4c70607c0fa183ba9cf5e503026a524319b5e7XXXX
To better show the difference:
Looking at the PDF in a hex editor I see it shows 33 bytes in total (including the starting 0x0 and ending 0xf8):
I've tried all three versions (minus the spaces and trailing semi-colon + description):
And all of them give the same error.
0x5c 0x5c 0x72 translates to: \\r
So it looks like it's being processed as a carriage return.
Any ideas how to resolve this?
----
edited to obfuscate hashes
Code:
cudaHashcat64 -m 10500 -a 3 PDF-password-is-hashcat.hash ?l?l?l?l?l?l?l
The hash target specified in 'PDF-password-is-hashcat.hash' looked like this:
Code:
$pdf$2*3*128*-1028*1*16*XXXXee15d4b3e08fe5b9ecea0e02XXXX*32*XXXX9d72c7c670c42eeb4fca1d2XXXX000000000000000000000000000000000*32*XXXX3e868dc87604626c2b8c259297a14d58c6309c70b00afdfb1fbba10eXXXX
It only took about a half hour to crack on a GTX 670. So I imagine it'll be a lot faster on an ATI 7970.
Anyhow, here is where it gets weird.
The original file was output using the Adobe Acrobat 10.0 Paper Capture Plugin. The document security properties were identical other than a different password.
Since that worked I quickly put together a batch script to run through common password masks:
http://pastebin.com/2fDeQkQ6
The first line immediately threw an error:
Code:
WARNING: Hashfile 'PDF-password-is-...hash' in line 1 ($pdf$4*4*128*-1324*1*32*XXXXab525883d8493ece960c6038dcdcc75a428632fd4e45ba43bfe17ec3XXXX*32*XXXX566a10eba70977b1b24f23d0XXXX00000000000000000000000000000000*32*XXXX507dabd18be0aa0d9f4c70607c0fa183ba9cf5e503026a524319b5e7XXXX): Line-length exception
Parsed Hashes: 1/1 (100.00%)
The hash was extracted using the latest pdf2john.py the same way as how I extracted it with the first PDF. That got me curious. I tried several other versions of pdf2john (john-1.8.0-jumbo-1, john179j5, and older copy I have) and got similar output.
For example, john-1.8.0-jumbo-1 output:
Code:
PDF-password-is-...pdf:$pdf$4*4*128*-1324*1*32*XXXXab525883d8493ece960c6038dcdcc75a428632fd4e45ba43bfe17ec3XXXX*:::::PDF-password-is-...pdf
To make doubly sure the data was correct, I quickly grabbed a PDF-parser tool and wrote a companion script to convert the owner object hash to a hexstring for hashcat.
The two are clearly different:
pdf2john:
XXXX507dabd18be0aa0d9f4c70607c0fa183ba9cf5e503026a524319b5e7XXXX
manual extraction:
XXXX507dabd18be0aa5c5c729f4c70607c0fa183ba9cf5e503026a524319b5e7XXXX
To better show the difference:
Code:
XXXX507dabd18be0aa 0d 9f4c70607c0fa183ba9cf5e503026a524319b5e7XXXX
XXXX507dabd18be0aa 5c5c72 9f4c70607c0fa183ba9cf5e503026a524319b5e7XXXX
Looking at the PDF in a hex editor I see it shows 33 bytes in total (including the starting 0x0 and ending 0xf8):
Code:
_ start-v v-my script screws up as well?
008c5fc8: 38 2f 4f 28 XX XX 50 7d ab d1 8b e0 aa 5c 72 9f 4c 70 60 7c 0f a1 83 ba :8/O(..P}.....\r.Lp`|....
008c5fe0: 9c f5 e5 03 02 6a 52 43 19 b5 e7 XX XX 29 2f 50 20 2d 31 33 32 34 2f 52 :.....jRC...u.)/P -1324/R
_ ^-end
I've tried all three versions (minus the spaces and trailing semi-colon + description):
Code:
XXXX507dabd18be0aa 0d 9f4c70607c0fa183ba9cf5e503026a524319b5e7XXXX : PDF2JOHN
XXXX507dabd18be0aa 5c5c72 9f4c70607c0fa183ba9cf5e503026a524319b5e7XXXX : Manual
XXXX507dabd18be0aa 5c72 9f4c70607c0fa183ba9cf5e503026a524319b5e7XXXX : Hex from file
And all of them give the same error.
0x5c 0x5c 0x72 translates to: \\r
So it looks like it's being processed as a carriage return.
Any ideas how to resolve this?
----
edited to obfuscate hashes