Unknown excel password encoding
#1
Hello, I've got hex encoded password for 2007 excel .xlsx file (using 9400 rule). When I'm converting hex string to ascii one of symbols look like that: ê. Excel throw invalid password error. I'm using excel 2016 to open file. UTF codecs can't convert hex to text because of start byte error. My question: Is that possible to open excel with hex string? I really can't find information about it. If not possible, what you can recommend for my situation?
Reply
#2
(12-20-2021, 01:28 AM)andy_larkin Wrote: Hello, I've got hex encoded password for 2007 excel .xlsx file (using 9400 rule). When I'm converting hex string to ascii one of symbols look like that: ê. Excel throw invalid password error. I'm using excel 2016 to open file. UTF codecs can't convert hex to text because of start byte error. My question: Is that possible to open excel with hex string? I really can't find information about it. If not possible, what you can recommend for my situation?

looks like your converter doesnt convert special utf-8 hex chars like german umlauts äöü (which naturally dont belong to plain ascii

try this one

50e4737377f67264 > Pässwörd

https://www.rapidtables.com/convert/numb...ascii.html
Reply
#3
(12-20-2021, 03:33 PM)Snoopy Wrote:
(12-20-2021, 01:28 AM)andy_larkin Wrote: Hello, I've got hex encoded password for 2007 excel .xlsx file (using 9400 rule). When I'm converting hex string to ascii one of symbols look like that: ê. Excel throw invalid password error. I'm using excel 2016 to open file. UTF codecs can't convert hex to text because of start byte error. My question: Is that possible to open excel with hex string? I really can't find information about it. If not possible, what you can recommend for my situation?

looks like your converter doesnt convert special utf-8 hex chars like german umlauts äöü (which naturally dont belong to plain ascii

try this one

50e4737377f67264 > Pässwörd

https://www.rapidtables.com/convert/numb...ascii.html

Thank you for answer, I've tried a lot of standart converting tools and encoding algorithms and it won't work anyway( I think in this case some not standart encoding was used. But hashcat some way getting password bytes and trying to decode excel file with it, so I think there should be some way to do it.

https://www.rapidtables.com/convert/numb...ascii.html this one is not working too(
Reply
#4
So you have an Excel 2007 file that's password protected and you don't know the password?

What is the source of this hex encoded password? What encoding was used when it came into existence? It's unlikely that Excel's password encoding is unknown to Excel, it's more likely that it's unknown to you and the program you're using to crack it.

Are you saying that "ê" is not one of the characters of the password? How can you know this if you don't know the password? You have to know something about the original password to be able to rule that out.

Are you 100% sure that this hex encoded password can unlock the file if it was written out in plain text (not hex encoded)?

Assuming that the original password was "qwerty", you can't unlock the file with its hex equivalent "717765727479" in extended ASCII. The password "qwerty" is "717765727479" regardless if you used ASCII or extended ASCII. Excel is expecting you to type in "qwerty", not "717765727479" or some other form of gibberish.

There is no magic, only a lot of hard work.

I think you're approaching the problem the wrong way. I haven't looked at this myself (yet) but here is some useful and commonly referenced bit of information:

https://hashcat.net/wiki/doku.php?id=fre..._documents

You may also find it interesting to read this:

https://en.wikipedia.org/wiki/Microsoft_...protection
Reply
#5
(12-21-2021, 02:24 AM)meow Wrote: So you have an Excel 2007 file that's password protected and you don't know the password?

What is the source of this hex encoded password? What encoding was used when it came into existence? It's unlikely that Excel's password encoding is unknown to Excel, it's more likely that it's unknown to you and the program you're using to crack it.

Are you saying that "ê" is not one of the characters of the password? How can you know this if you don't know the password? You have to know something about the original password to be able to rule that out.

Are you 100% sure that this hex encoded password can unlock the file if it was written out in plain text (not hex encoded)?

Assuming that the original password was "qwerty", you can't unlock the file with its hex equivalent "717765727479" in extended ASCII. The password "qwerty" is "717765727479" regardless if you used ASCII or extended ASCII. Excel is expecting you to type in "qwerty", not "717765727479" or some other form of gibberish.

There is no magic, only a lot of hard work.

I think you're approaching the problem the wrong way. I haven't looked at this myself (yet) but here is some useful and commonly referenced bit of information:

https://hashcat.net/wiki/doku.php?id=fre..._documents

You may also find it interesting to read this:

https://en.wikipedia.org/wiki/Microsoft_...protection

Hey, I've already know the password, got it using hashcat 9400(first used office2hashcat to get hash of xslx file). Brute force was successful. But I have result like that: $office$*2007$*20*128*16*e284...$HEX[hex encoded password here].

Now I want to open this excel file using password from hashcat. But when I'm converting trough online hex to text converting tools  one of the symbols looks like: ê or even �. Also I've created simple python script to convert hex string into text using python supported encodings.I think some not standard encoding was used while protecting excel file with password. Any converted from hex text password I've tried to open excel file with not workingSad
Maybe you can advice me something to open that file? Hope now you understand clear my case, sorry for my bad english.
I have bad understanding of how encodings/decodings from hex is working, but someway hashcat decoded the hash of file using one of passwords, so I think there should be way to open excel file data for me using that hex password or brute force encoding algorithm in worst case.
Reply
#6
andy_larkin, you can use Alt+Numpad to enter non-ASCII symbols.

For example, symbol "ê" has hexadecimal code 0x00EA (decimal 234), i. e. you can type Alt+0234.
Reply
#7
I think we have a very different view on what the problem is.

I am by no means expert on text encoding, or brute forcing or Hashcat for that matter. But I do understand well enough to tell you that text encoding is not encryption, so there is nothing to brute force. Text encoding is just a matter of turning numbers into letters in computer systems, and there have been many different encodings like ASCII and later Extended ASCII, and now most commonly UTF-8 as part of Unicode. I have a lot to read up on this myself, but I can confidently say that there is nothing to brute force when it comes to dealing with text encodings.

If the problem is mainly that you are missing the "ê" letter on your keyboard and you simply need to enter it in the password field of Excel, then you can use the Alt code that Nick provided (Alt+0234). But I fear that the problem is not that simple. If it were, you could simply select the text inside "[hex encoded password here]", copy it and then paste that in the password field of Excel. I suspect that this is what the real problem is, because it's not accepting it as the correct password.

I'm sorry, I still don't fully understand the problem. I think I will have to test this on my end to understand the process.

If you already know the password, can you open the Excel file with it the normal way? Also, is it the same as what you see inside brackets in "[hex encoded password here]"?
Reply
#8
I learned something today. I created a password protected Excel document with Excel and then cracked the password with Hashcat. I used the latest Excel version ("2111" if that's a thing) and Hashcat 6.2.5. I used a very weak password: qwerty.

The Python script linked to in Hashcat FAQ section above did not work at all, not on Windows and not on Linux. Here is the link if someone wants to verify it.

https://raw.githubusercontent.com/strict...hashcat.py

It runs but doesn't return anything, not even a blank line.


Code:
PS C:\ExcelCracking\hashcat-6.2.5> python.exe ..\office2hashcat.py ..\world.xls

PS C:\ExcelCracking\hashcat-6.2.5>


I then found a guide on how to do this on "Stuff Jason Does", at link below.

https://stuffjasondoes.com/2018/07/18/cr...g-hashcat/

Jason used Hashcat for cracking, version 5.1, but he used another script called "office2john.py". Direct link can be found below.

https://github.com/truongkma/ctf-tools/b...ce2john.py

I immediately realized that this must be the original name of the original password cracker, for I know about "John The Ripper". Sadly this script did not work either. It was spitting out errors, on Windows and on Linux. I tested it in both systems to ensure it was not custom made for Linux file search paths, considering the error message.


Code:
PS C:\ExcelCracking\hashcat-6.2.5> python.exe ..\office2john.py ..\world.xls
Traceback (most recent call last):
  File "C:\ExcelCracking\office2john.py", line 2674, in process_file
    if accdb_magic in data and accdb_xml_start in data:
TypeError: a bytes-like object is required, not 'str'
..\world.xls : OLE check failed, a bytes-like object is required, not 'str'


Now after trying this one, I went on to find a third one. This time I came a cross the good one by following a YouTube tutorial on the "squidsup" channel. The link is below.

https://www.youtube.com/watch?v=mpCae81ziio

The direct link to the script is below.

https://raw.githubusercontent.com/magnum...ce2john.py

You can tell by the URL who the author of each script is (or who copied from whom). Check them out on GitHub. Using this script I was able to extract the hash needed for my Excel document. The script linked to in FAQ seems to belong to "oclHashcat", an old Hashcat derivative.

There are a lot of different variants of this script in circulation on the net, apparently. You don't want to use a bad copy or one that's so customized that it's killing your progress (or worse, one with a virus). This script is made available by "JohnTheRipper" himself. So the lesson here is, always go to the source!

So my question to Andy here is, where did you get the script from? A link would be nice, for comparison.

I think I know what the problem is Andy. The script you used has extracted too much information. It went beyond the upper boundary of the hash in its raw form in the Excel document. I suspect this is what happened. I am not 100% sure. For that I would need to have a look at the original Excel document or at very least know the Python script you used. Or if possible replicate your exact environment and scenario in every sense. Having any sort of Excel 2007 document would be of great help, even if it's not the original one, as I don't know how to make these old formats and I don't have the old Excel to make one. Maybe the issue you are seeing is a direct result of using that old format. Perhaps the script doesn't work with Excel 2007? That's one thing you need to research.

Does it really say "$HEX" in that string?

For this input (example):
Code:
$office$*2007*20*128*16*411a51284e0d0200b131a8949aaaa5cc*117d532441c63968bee7647d9b7df7d6*df1d601ccf905b375575108f42ef838fb88e1cde

You should get this output (example):
Code:
$office$*2007*20*128*16*411a51284e0d0200b131a8949aaaa5cc*117d532441c63968bee7647d9b7df7d6*df1d601ccf905b375575108f42ef838fb88e1cde:qwerty

Where "qwerty" is the password.

If I understand you correctly, your situation is following.

For this kind of input (example):
Code:
$office$*2007*20*128*16*411a51284e0d0200b131a8949aaaa5cc*117d532441c63968bee7647d9b7df7d6*df1d601ccf905b375575108f42ef838fb88e1cde

You are seeing this output (example):
Code:
$office$*2007*20*128*16*411a51284e0d0200b131a8949aaaa5cc*117d532441c63968bee7647d9b7df7d6*df1d601ccf905b375575108f42ef838fb88e1cde$HEX[E9626F72676E6572]

Where "E9626F72676E6572" is hex for "éborgner" (french for having someone's eye out) which is the password to open the document.

Have you tried using that long string as your hash with Hashcat? What does Hashcat say? It's worth a try, even if it doesn't look like a normal string you would expect. You can't know what to expect unless you have done this before (with same file format, script, file versions, program versions, etc.).

Although it's not very likely that the hex part spells out to something meaningful like "éborgner". What would be the point of that? What would be the point of using Hashcat then on top of that? If you already have the password, in hex, then you don't need Hashcat at all, whatever script you used has done all the work for you, including not only extracting the hash but also cracking it. All you would have to do is decode the hex back to normal text using some decoding system that gives you meaningful (or less meaningful) results you can use for password candidates. You don't need Hashcat for that.

Have you tried removing the "$HEX[blabla]" part of the hash string? If it's just hex garbage that the script has erroneously extracted, you may be better off removing that. It might be useful to know if these hashes are fixed length (that I don't know).

So anyway, I used Python3 to extract the hash, and I used the script I linked to above (the original). I had to manually remove "world.xls:" from the hash string as it is not used by Hashcat and will result in a parsing error. This is the main difference between the "office2john.py" and "office2hashcat.py" scripts. So if you know how to edit text files, you can use the original script instead of the Hashcat script.

I didn't know how to create Excel 2007 documents using the latest version, it may have been removed as an option. So I went with "Excel 97 - 2003". Therefore my hash looked a little different.

Following is just an example hash, and not the one I used.

For this input:
Code:
$oldoffice$4*83328705222323020515404251156288*2855956a165ff6511bc7f4cd77b9e101*941861655e73a09c40f7b1e9dfd0c256ed285acd

I got this output:
Code:
$oldoffice$4*83328705222323020515404251156288*2855956a165ff6511bc7f4cd77b9e101*941861655e73a09c40f7b1e9dfd0c256ed285acd:qwerty

I had to use -m9800 flag as I was using an older format version.

As you can see, there is no such thing as "$HEX" in my output. Are you sure you have identified the Excel version correctly? Have you used a good extraction script?

Looking at beginning of your hash I see this:
Code:
$office$*2007$*20*128

Where I should be seeing this:
Code:
$office$*2007*20*128

Is this a typo? You have one $ too many there. Did you paste it in correctly? Does the output really look like that? If it does then something must be wrong. It's probably the script you're using.

It worked for me, but your mileage may vary. Like I said, make sure you're using a good script that you know is working correctly. That's the first thing to look at. If I were you I would also try that script on a newer version of the Excel file format, and on an older version.

One thing you can always do is go deeper and explore the Excel file format, try to understand it, use a hex editor to search for magic strings, try to understand the Python script and then do the same work manually. This will take a lot of time and effort and I don't recommend it, but it's possible to manually do all the things that the script automagically. But then you can truly appreciate the heavy lifting that the script does for us. Like I said, there is no magic, only a lot of hard work.

Do keep me posted on your progress. I'm interested to know how it goes. If I can help somehow I will try to find the time to do so. This was a fun and useful exercise even for me (first time cracker).
Reply
#9
I installed Office 2007 so I can run Excel 2007. I now have my Excel 2007 document that's password protected, and I know it's made in Excel 2007. I have some surprising results.


Code:
PS C:\ExcelCracking\hashcat-6.2.5> python.exe ..\office2hashcat.py ..\hello.xlsx
$office$*2007*20*128*16*bd72fadd630f6706d2265bb2670744d8*ffd55bec1246280becc69478087b5e45*19871af11d8ff42d730128763a13229cf67ee6e8
PS C:\ExcelCracking\hashcat-6.2.5> python.exe ..\office2john.py ..\hello.xlsx
Traceback (most recent call last):
  File "C:\ExcelCracking\office2john.py", line 2674, in process_file
    if accdb_magic in data and accdb_xml_start in data:
TypeError: a bytes-like object is required, not 'str'
..\hello.xlsx : OLE check failed, a bytes-like object is required, not 'str'
PS C:\ExcelCracking\hashcat-6.2.5> python.exe ..\office2john2.py ..\hello.xlsx
hello.xlsx:$office$*2007*20*128*16*bd72fadd630f6706d2265bb2670744d8*ffd55bec1246280becc69478087b5e45*19871af11d8ff42d730128763a13229cf67ee6e8
PS C:\ExcelCracking\hashcat-6.2.5>


As you can see, the original "office2john" script still works. I have named it "office2john2" as I saved it after the bad one. The bad "office2john" script still doesn't work, as expected. But the surprising bit is that the "office2hashcat" script now works.

So there you have it. The Hashcat variant of the script that's on the FAQ page works with some Excel formats, but not all of them. Cracking is not a matter of luck over skill as some will say, but more of a matter of staying consistent and diligent. Otherwise you're left with this impression of playing Russian roulette, a hit or miss game.

"office2john":
Code:
hello.xlsx:$office$*2007*20*128*16*bd72fadd630f6706d2265bb2670744d8*ffd55bec1246280becc69478087b5e45*19871af11d8ff42d730128763a13229cf67ee6e8

"office2hashcat":
Code:
$office$*2007*20*128*16*bd72fadd630f6706d2265bb2670744d8*ffd55bec1246280becc69478087b5e45*19871af11d8ff42d730128763a13229cf67ee6e8

If you compare these two, you can see that the file name prefix is the only difference:

Code:
$office$*2007*20*128*16*bd72fadd630f6706d2265bb2670744d8*ffd55bec1246280becc69478087b5e45*19871af11d8ff42d730128763a13229cf67ee6e8

hello.xlsx:$office$*2007*20*128*16*bd72fadd630f6706d2265bb2670744d8*ffd55bec1246280becc69478087b5e45*19871af11d8ff42d730128763a13229cf67ee6e8


First of all, I don't have a "$HEX[blabla]" in either string. Also, I have sent the output to a new file called "hello.hash" and then edited out the "hello.xlsx:" (because I used the original script).

However, despite having everything set and ready for Hashcat to crack the password, I ran into trouble, using Hashcat 6.2.5. I haven't tried other versions yet.

For this input:
Code:
PS C:\hashcat-6.2.5> .\hashcat.exe -a3 -m9400 -o ..\hello.password.txt ..\hello.hash

I got this output:

Code:
Hashfile '..\hello.hash' on line 1 ($): Signature unmatched
Hashfile '..\hello.hash' on line 2 (): Separator unmatched
No hashes loaded.


First of all, the hash was ready to be loaded and it could locate it, but it failed to load it because it encountered unmatching signature on the first line. The second line is just a blank line. It looks like it's picking up the newline as a separator. Separator for what? I don't know. Someone can fill me in maybe. It worked the first time for the "Excel 97 - 2003" document. Comparing "world.hash" (first document I tested) with "hello.hash" (the new document), I see a newline in both of them.

There was also this message:
Code:
..\hello.hash: Byte Order Mark (BOM) was detected

I will look at it at a later time. But to me, this is telling that John The Ripper is probably the better choice for cracking newer Excel hashes. I haven't tried older versions of Hashcat yet.
Reply
#10
I am finally cracking it. It converted "hello.hash" from CRLF to LF, and UTF-16 LE to UTF-8. Now it's accepted by Hashcat 6.2.5 and I will tell you in about 25 minutes if it worked or not. I'm brute forcing it so it will take a bit of time, even for a 6 character password (but it's a more challenging hash value than MD5).

So to sum this all up:
1. Use a good script to get the hash.
2. The "office2hashcat" script doesn't work with all versions of Excel documents (the same may apply to other Office apps as well).
3. Make sure to use LF instead of CRLF in your hash file. You can easily avoid this by using Linux for hash extraction rather than Windows. (You may not be able to fully utilize your GPU without CUDA runtime if you're not running Windows, unless it's made available for Linux by Nvidia, and you may not have a choice but to use Windows if your GPU is installed in a Windows PC for the purpose of DirectX gaming. Take your pick.)
4. Make sure you don't use BOM in your hash file. This too can be easily avoided by using Linux when you pipe the output of the script to a new file.
5. There should be no "$HEX[blabla]" at the end of the hash string for an Excel 2007 document.

Update: Cracking it with brute force will take more time than I anticipated as it was not found within the first few rounds of candidates, and optimized kernel is not available.

Update 2: Password for Excel 2007 document is now cracked as well. Patience is a virtue.

Input:
Code:
$office$*2007*20*128*16*bd72fadd630f6706d2265bb2670744d8*ffd55bec1246280becc69478087b5e45*19871af11d8ff42d730128763a13229cf67ee6e8

Output:
Code:
$office$*2007*20*128*16*bd72fadd630f6706d2265bb2670744d8*ffd55bec1246280becc69478087b5e45*19871af11d8ff42d730128763a13229cf67ee6e8:qwerty

Again, no "$HEX[blabla]" at the end in the output. Not for Excel 2007 and not for "Excel 97 - 2003" document. I think it's safe to say that something must have gone wrong during your hash extraction operation Andy.
Reply