Cracking pdf file with arabic password using hashcat
#1
Smile 
I have been doing some test on password strength using arabic passwords and try to crack it using hashcat and when i used a non salted hash like a custom generated md5 hash and tried using --hex-charset parameter and it works like a charm but when i try it on pdf files it is not working i even decreased the password to 1 character
in case you are wondering: My assumptions are:

1.maybe the pseudo random salt generator takes chars from the password itself
2.the password hashing algorithm stores the password in unknown charset format when the password is from non-English chars

So my question is:

1.is there another workaround on this(is it possible)?
2.if my 2nd guess is correct please tell me what charset?
#2
You do not need --hex-salt. This has nothing to do with salts.

If anything you would need --hex-charset, but that is not necessariy and only needed if you define a custom charset (-1, -2, -3 or -4) or a custom mask.

The only thing you need is the information of the encoding.
Hashcat even has a --encoding-to and --encoding-from option to facilitate the conversion between different encodings.

If your password seems to be only 1 character long, it doesn't mean that it only uses 1 byte. Depending on the character encoding, only a very few characters (the most common for that specific language) are encoded with 1 byte.
Therefore, a password that only consist of 1 character could be using multiple bytes and hashcat (and all underlying hashing algorithms) work with bytes (therefore the length that hashcat shows e.g. in mask attack is always showing the bytes, not the characters).

I think you should try different encodings and try to understand character encoding.
#3
(11-21-2017, 05:58 PM)philsmd Wrote: You do not need --hex-salt. This has nothing to do with salts.

If anything you would need --hex-charset, but that is not necessariy and only needed if you define a custom charset (-1, -2, -3 or -4) or a custom mask.

The only thing you need is the information of the encoding.
Hashcat even has a --encoding-to and --encoding-from option to facilitate the conversion between different encodings.

If your password seems to be only 1 character long, it doesn't mean that it only uses 1 byte. Depending on the character encoding, only a very few characters (the most common for that specific language) are encoded with 1 byte.
Therefore, a password that only consist of 1 character could be using multiple bytes and hashcat (and all underlying hashing algorithms) work with bytes (therefore the length that hashcat shows e.g. in mask attack is always showing the bytes, not the characters).

I think you should try different encodings and try to understand character encoding.

sorry for my english but if you look at my question again in raw hashes like md5 it worked for me following the given link steps using --hex-charset, for elaborating:
i used 
Code:
echo -n م | md5sum
to generate md5 hash and then 
Code:
hashcat -a 0 -m 0 md5hash --hex-charset -1 d8d9 -2 808182838485 ?1?2
so the word م in hex is D985 and hashcat successfully recovered it but when tried it with files like pdf format it is not recovering it the steps i used was
pdf2jhon.py then removed the filename from the generated hash text file then entered 
Code:
hashcat -a 0 -m [b]10500 [/b] [b]hash.txt[/b] --hex-charset -1 d8d9 -2 808182838485 ?1?2
#4
this of course depends on what the tool that you use to encrypt the pdf (acrobat reader etc) does with the input. It could use utf-8 by default. it could use utf16-le etc...
The linux echo tool wont change the encoding and therefore the default encoding of your shell is used (probably utf-8 since d985 is the utf-8 version of م)

Therefore you can't really compare a simple echo with what the pdf tool does. It could use different encodings if the chars are outside the ASCII space. There are many possibilities but only a small amount of encodings are used in the wild (the most common are of course utf8, utf16 and utf32 variants).
#5
(11-21-2017, 07:27 PM)philsmd Wrote: this of course depends on what the tool that you use to encrypt the pdf (acrobat reader etc) does with the input. It could use utf-8 by default. it could use utf16-le etc...
The linux echo tool wont change the encoding and therefore the default encoding of your shell is used (probably utf-8 since d985 is the utf-8 version of م)

Therefore you can't really compare a simple echo with what the pdf tool does. It could use different encodings if the chars are outside the ASCII space. There are many possibilities but only a small amount of encodings are used in the wild (the most common are of course utf8, utf16 and utf32 variants).

i think my question is that and i quote " 2.the password hashing algorithm stores the password in unknown charset format when the password is from non-English chars" and i was seeking for and i quote again " 2.if my 2nd guess is correct please tell me what charset?" off course it might depend on the tool used to encrypt the document i am using adobe acrobat but the problem is not the pdf only i also tried a word file but the same result.
if you know what kind of Unicode format they use please tell me.
NB. correction when i say different charset i meant Unicode format.
#6
Hadn't tried to use this tool for such a purpose yet, have tried to extract the data from protected pdf in arabic lang exactly and it did a thing. The tool is a simple extraction service https://www.altoextractpdf.com yet it shall be promising here as well