I learned something today. I created a password protected Excel document with Excel and then cracked the password with Hashcat. I used the latest Excel version ("2111" if that's a thing) and Hashcat 6.2.5. I used a very weak password: qwerty.
The Python script linked to in Hashcat FAQ section above did not work at all, not on Windows and not on Linux. Here is the link if someone wants to verify it.
https://raw.githubusercontent.com/strict...hashcat.py
It runs but doesn't return anything, not even a blank line.
I then found a guide on how to do this on "Stuff Jason Does", at link below.
https://stuffjasondoes.com/2018/07/18/cr...g-hashcat/
Jason used Hashcat for cracking, version 5.1, but he used another script called "office2john.py". Direct link can be found below.
https://github.com/truongkma/ctf-tools/b...ce2john.py
I immediately realized that this must be the original name of the original password cracker, for I know about "John The Ripper". Sadly this script did not work either. It was spitting out errors, on Windows and on Linux. I tested it in both systems to ensure it was not custom made for Linux file search paths, considering the error message.
Now after trying this one, I went on to find a third one. This time I came a cross the good one by following a YouTube tutorial on the "squidsup" channel. The link is below.
https://www.youtube.com/watch?v=mpCae81ziio
The direct link to the script is below.
https://raw.githubusercontent.com/magnum...ce2john.py
You can tell by the URL who the author of each script is (or who copied from whom). Check them out on GitHub. Using this script I was able to extract the hash needed for my Excel document. The script linked to in FAQ seems to belong to "oclHashcat", an old Hashcat derivative.
There are a lot of different variants of this script in circulation on the net, apparently. You don't want to use a bad copy or one that's so customized that it's killing your progress (or worse, one with a virus). This script is made available by "JohnTheRipper" himself. So the lesson here is, always go to the source!
So my question to Andy here is, where did you get the script from? A link would be nice, for comparison.
I think I know what the problem is Andy. The script you used has extracted too much information. It went beyond the upper boundary of the hash in its raw form in the Excel document. I suspect this is what happened. I am not 100% sure. For that I would need to have a look at the original Excel document or at very least know the Python script you used. Or if possible replicate your exact environment and scenario in every sense. Having any sort of Excel 2007 document would be of great help, even if it's not the original one, as I don't know how to make these old formats and I don't have the old Excel to make one. Maybe the issue you are seeing is a direct result of using that old format. Perhaps the script doesn't work with Excel 2007? That's one thing you need to research.
Does it really say "$HEX" in that string?
For this input (example):
You should get this output (example):
Where "qwerty" is the password.
If I understand you correctly, your situation is following.
For this kind of input (example):
You are seeing this output (example):
Where "E9626F72676E6572" is hex for "éborgner" (french for having someone's eye out) which is the password to open the document.
Have you tried using that long string as your hash with Hashcat? What does Hashcat say? It's worth a try, even if it doesn't look like a normal string you would expect. You can't know what to expect unless you have done this before (with same file format, script, file versions, program versions, etc.).
Although it's not very likely that the hex part spells out to something meaningful like "éborgner". What would be the point of that? What would be the point of using Hashcat then on top of that? If you already have the password, in hex, then you don't need Hashcat at all, whatever script you used has done all the work for you, including not only extracting the hash but also cracking it. All you would have to do is decode the hex back to normal text using some decoding system that gives you meaningful (or less meaningful) results you can use for password candidates. You don't need Hashcat for that.
Have you tried removing the "$HEX[blabla]" part of the hash string? If it's just hex garbage that the script has erroneously extracted, you may be better off removing that. It might be useful to know if these hashes are fixed length (that I don't know).
So anyway, I used Python3 to extract the hash, and I used the script I linked to above (the original). I had to manually remove "world.xls:" from the hash string as it is not used by Hashcat and will result in a parsing error. This is the main difference between the "office2john.py" and "office2hashcat.py" scripts. So if you know how to edit text files, you can use the original script instead of the Hashcat script.
I didn't know how to create Excel 2007 documents using the latest version, it may have been removed as an option. So I went with "Excel 97 - 2003". Therefore my hash looked a little different.
Following is just an example hash, and not the one I used.
For this input:
I got this output:
I had to use -m9800 flag as I was using an older format version.
As you can see, there is no such thing as "$HEX" in my output. Are you sure you have identified the Excel version correctly? Have you used a good extraction script?
Looking at beginning of your hash I see this:
Where I should be seeing this:
Is this a typo? You have one $ too many there. Did you paste it in correctly? Does the output really look like that? If it does then something must be wrong. It's probably the script you're using.
It worked for me, but your mileage may vary. Like I said, make sure you're using a good script that you know is working correctly. That's the first thing to look at. If I were you I would also try that script on a newer version of the Excel file format, and on an older version.
One thing you can always do is go deeper and explore the Excel file format, try to understand it, use a hex editor to search for magic strings, try to understand the Python script and then do the same work manually. This will take a lot of time and effort and I don't recommend it, but it's possible to manually do all the things that the script automagically. But then you can truly appreciate the heavy lifting that the script does for us. Like I said, there is no magic, only a lot of hard work.
Do keep me posted on your progress. I'm interested to know how it goes. If I can help somehow I will try to find the time to do so. This was a fun and useful exercise even for me (first time cracker).
The Python script linked to in Hashcat FAQ section above did not work at all, not on Windows and not on Linux. Here is the link if someone wants to verify it.
https://raw.githubusercontent.com/strict...hashcat.py
It runs but doesn't return anything, not even a blank line.
Code:
PS C:\ExcelCracking\hashcat-6.2.5> python.exe ..\office2hashcat.py ..\world.xls
PS C:\ExcelCracking\hashcat-6.2.5>
I then found a guide on how to do this on "Stuff Jason Does", at link below.
https://stuffjasondoes.com/2018/07/18/cr...g-hashcat/
Jason used Hashcat for cracking, version 5.1, but he used another script called "office2john.py". Direct link can be found below.
https://github.com/truongkma/ctf-tools/b...ce2john.py
I immediately realized that this must be the original name of the original password cracker, for I know about "John The Ripper". Sadly this script did not work either. It was spitting out errors, on Windows and on Linux. I tested it in both systems to ensure it was not custom made for Linux file search paths, considering the error message.
Code:
PS C:\ExcelCracking\hashcat-6.2.5> python.exe ..\office2john.py ..\world.xls
Traceback (most recent call last):
File "C:\ExcelCracking\office2john.py", line 2674, in process_file
if accdb_magic in data and accdb_xml_start in data:
TypeError: a bytes-like object is required, not 'str'
..\world.xls : OLE check failed, a bytes-like object is required, not 'str'
Now after trying this one, I went on to find a third one. This time I came a cross the good one by following a YouTube tutorial on the "squidsup" channel. The link is below.
https://www.youtube.com/watch?v=mpCae81ziio
The direct link to the script is below.
https://raw.githubusercontent.com/magnum...ce2john.py
You can tell by the URL who the author of each script is (or who copied from whom). Check them out on GitHub. Using this script I was able to extract the hash needed for my Excel document. The script linked to in FAQ seems to belong to "oclHashcat", an old Hashcat derivative.
There are a lot of different variants of this script in circulation on the net, apparently. You don't want to use a bad copy or one that's so customized that it's killing your progress (or worse, one with a virus). This script is made available by "JohnTheRipper" himself. So the lesson here is, always go to the source!
So my question to Andy here is, where did you get the script from? A link would be nice, for comparison.
I think I know what the problem is Andy. The script you used has extracted too much information. It went beyond the upper boundary of the hash in its raw form in the Excel document. I suspect this is what happened. I am not 100% sure. For that I would need to have a look at the original Excel document or at very least know the Python script you used. Or if possible replicate your exact environment and scenario in every sense. Having any sort of Excel 2007 document would be of great help, even if it's not the original one, as I don't know how to make these old formats and I don't have the old Excel to make one. Maybe the issue you are seeing is a direct result of using that old format. Perhaps the script doesn't work with Excel 2007? That's one thing you need to research.
Does it really say "$HEX" in that string?
For this input (example):
Code:
$office$*2007*20*128*16*411a51284e0d0200b131a8949aaaa5cc*117d532441c63968bee7647d9b7df7d6*df1d601ccf905b375575108f42ef838fb88e1cde
You should get this output (example):
Code:
$office$*2007*20*128*16*411a51284e0d0200b131a8949aaaa5cc*117d532441c63968bee7647d9b7df7d6*df1d601ccf905b375575108f42ef838fb88e1cde:qwerty
Where "qwerty" is the password.
If I understand you correctly, your situation is following.
For this kind of input (example):
Code:
$office$*2007*20*128*16*411a51284e0d0200b131a8949aaaa5cc*117d532441c63968bee7647d9b7df7d6*df1d601ccf905b375575108f42ef838fb88e1cde
You are seeing this output (example):
Code:
$office$*2007*20*128*16*411a51284e0d0200b131a8949aaaa5cc*117d532441c63968bee7647d9b7df7d6*df1d601ccf905b375575108f42ef838fb88e1cde$HEX[E9626F72676E6572]
Where "E9626F72676E6572" is hex for "éborgner" (french for having someone's eye out) which is the password to open the document.
Have you tried using that long string as your hash with Hashcat? What does Hashcat say? It's worth a try, even if it doesn't look like a normal string you would expect. You can't know what to expect unless you have done this before (with same file format, script, file versions, program versions, etc.).
Although it's not very likely that the hex part spells out to something meaningful like "éborgner". What would be the point of that? What would be the point of using Hashcat then on top of that? If you already have the password, in hex, then you don't need Hashcat at all, whatever script you used has done all the work for you, including not only extracting the hash but also cracking it. All you would have to do is decode the hex back to normal text using some decoding system that gives you meaningful (or less meaningful) results you can use for password candidates. You don't need Hashcat for that.
Have you tried removing the "$HEX[blabla]" part of the hash string? If it's just hex garbage that the script has erroneously extracted, you may be better off removing that. It might be useful to know if these hashes are fixed length (that I don't know).
So anyway, I used Python3 to extract the hash, and I used the script I linked to above (the original). I had to manually remove "world.xls:" from the hash string as it is not used by Hashcat and will result in a parsing error. This is the main difference between the "office2john.py" and "office2hashcat.py" scripts. So if you know how to edit text files, you can use the original script instead of the Hashcat script.
I didn't know how to create Excel 2007 documents using the latest version, it may have been removed as an option. So I went with "Excel 97 - 2003". Therefore my hash looked a little different.
Following is just an example hash, and not the one I used.
For this input:
Code:
$oldoffice$4*83328705222323020515404251156288*2855956a165ff6511bc7f4cd77b9e101*941861655e73a09c40f7b1e9dfd0c256ed285acd
I got this output:
Code:
$oldoffice$4*83328705222323020515404251156288*2855956a165ff6511bc7f4cd77b9e101*941861655e73a09c40f7b1e9dfd0c256ed285acd:qwerty
I had to use -m9800 flag as I was using an older format version.
As you can see, there is no such thing as "$HEX" in my output. Are you sure you have identified the Excel version correctly? Have you used a good extraction script?
Looking at beginning of your hash I see this:
Code:
$office$*2007$*20*128
Where I should be seeing this:
Code:
$office$*2007*20*128
Is this a typo? You have one $ too many there. Did you paste it in correctly? Does the output really look like that? If it does then something must be wrong. It's probably the script you're using.
It worked for me, but your mileage may vary. Like I said, make sure you're using a good script that you know is working correctly. That's the first thing to look at. If I were you I would also try that script on a newer version of the Excel file format, and on an older version.
One thing you can always do is go deeper and explore the Excel file format, try to understand it, use a hex editor to search for magic strings, try to understand the Python script and then do the same work manually. This will take a lot of time and effort and I don't recommend it, but it's possible to manually do all the things that the script automagically. But then you can truly appreciate the heavy lifting that the script does for us. Like I said, there is no magic, only a lot of hard work.
Do keep me posted on your progress. I'm interested to know how it goes. If I can help somehow I will try to find the time to do so. This was a fun and useful exercise even for me (first time cracker).