office2hashcat.py error
#1
I hope it is OK to post on this forum for help with office2hashcat

I am getting the following output when using office2hashcat on a 97-2003 Power Point file.

Traceback (most recent call last):
  File "office2hashcat.py", line 2018, in <module>
    ret = process_file(sys.argv[i].decode("utf8"))
  File "office2hashcat.py", line 1995, in process_file
    find_rc4_passinfo_ppt(filename, sppt, offset)
  File "office2hashcat.py", line 1728, in find_rc4_passinfo_ppt
    persistOffset = unpack("<L", stream.read(4))[0]
struct.error: unpack requires a string argument of length 4

I have used the script on a file that I created and password protected and got the hash no problem.

I would appreciate if someone could point me in the right direction on how I can extract the hash from this file.

Thank you
#2
Looks to me like that file isn't encrypted.
#3
Thanks for the reply

When opening in MS Power point a password is required and Libre Office says the file cant be opened as encrypted files are not supported.

The only other information that might be important is that the file is password protected for opening and is also password protected for editing with a different password.
#4
There is a difference between being password-protected and being encrypted. If it's not encrypted then it is not supported.
#5
Wink 
Thanks,

I need to do some further study on these types of files to understand the different password protection / encryption options that are available.

I assume if the file is password protected but not encrypted there is still a method to obtain the password from the file.

Thanks again
#6
It should be possible to recover the password but I'm not sure of any tools that support that. The tools I'm aware of are able to simply strip out the password protection to allow you to open the document.
#7
I will get another look at this file when I get a chance, it is not urgent at the moment.

I will post back if I find anything.

Thanks for taking the time to help.
#8
There has been a bug in this office2john / office2hashcat python script for a long time for ppt files that no one seems to have wanted to fix or admit too. So I took up the challenge. The original author seems to admit to the buggyness: "# BUGGY: PersistDirectoryAtom and PersistDirectoryEntry processing"

Should probably let the guys on https://github.com/magnumripper/JohnTheRipper know about this but feeling lazy. If you do let them know please give me credit. Ill probably get around to it soon enough anyways heres how I fixed it:

Replace:

Code:
   # BUGGY: PersistDirectoryAtom and PersistDirectoryEntry processing
   i = 0
   stream.read(4)  # unused
   while i < encryptSessionPersistIdRef:
       i += 1
       persistOffset = unpack("<L", stream.read(4))[0]


With this:

Code:
   # print("recLen: %d" % recLen)

   # PersistDirectoryAtom and PersistDirectoryEntry processing
   byteCount = 0
   while byteCount < recLen:
       persistData = unpack("<L", stream.read(4))[0]
       byteCount += 4
       persistId = persistData & 0xFFFFF
       cPersist = (persistData >> 20) & 0xFFF

       # print("persistId: %d" % persistId)
       # print("cPersist: %d" % cPersist)
       for i in range(persistId,persistId+cPersist):
           # print("i: %d" % i)
           persistOffset = unpack("<L", stream.read(4))[0]
           byteCount += 4
           # print("byteCount: %d" % byteCount)
           if i == encryptSessionPersistIdRef or byteCount == recLen:
               break

Details
: There was a problem with how PersistDirectoryAtom and PersistDirectoryEntry processing is done. Yes its tricky cause you have to deal with a 20 bit persistId and 12 bit cPersist variables. See page 43 in [MS-PPT].pdf in the zip files: http://download.microsoft.com/download/2...tocols.zip

Let me know if that works for you. I actually haven't tested it extensively but it should work much better then the old code.
#9
Cool, any chance you can create a PR for office2hashcat.py on github?