Hi,
I found a bug which is related (I guess) to my previous thread
http://hashcat.net/forum/thread-2228.html
from wich I create a TRAC (
https://hashcat.net/trac/ticket/154)
The rule file :
contains :
- Swap @ N *XY Swaps character X with Y
- Duplicate last N Z Duplicates last character N times
- Overwrite @ N oNX Overwrites character at position N with X
hashlist :
Code:
C4CA4238A0B923820DCC509A6F75849B
A933D13F81649BEBE035DC21F4002FF1
commandline :
Code:
oclHashcat-plus64.exe -m 0 --outfile-format=7 h.txt dic.dic -r 2.rule
dic.dic contains only the number "1"
result :
Code:
c4ca4238a0b923820dcc509a6f75849b:1:31
a933d13f81649bebe035dc21f4002ff1:1:31
The first one (c4ca4..) is the good md5 for '1'.
The second one is wrong.
That's why I guess this is a problem from a null byte somewhere ? like my previous post.
version : oclHashcat-plus-0.14 / Win 7 64bits
Thanks for helping me to understand the problem, coming from this complex rule.
Try to execute this:
Code:
echo -en "1\00002"|md5sum
a933d13f81649bebe035dc21f4002ff1 -
Basically it is: 1 || \0 || 2 where || means concatenated.
Yes, there seems to be a bug (or at least a strange behaviour in oclHashcat), I documented some of my findings in this trac ticket:
https://hashcat.net/trac/ticket/154
I would suggest that we collect HERE the samples and discuss here what / why it is strange and how this could be best fixed (or more: what should be the expected output) and comment on trac whenever we have enough details for the devs.
BTW: the correct md5(1) can be generated as:
Code:
echo -n 1|md5sum
c4ca4238a0b923820dcc509a6f75849b -
Please see bug 148 for a solution which I think is quite helpful. I've been using this in my code for the past month or so, and this fixes so many problems.
For example, when dealing with UNICODE vs UTF-8, the $HEX[] format allows simple re-use of a solved output file, with no hacking. Dealing with embedded CR/CRLF/LF/LFLF is also automatic with this.
Hey Waffle,
while I agree with most (if not all) of your suggestions here (
https://hashcat.net/trac/ticket/148 ) , I also think that it is not really the solution for this problem (or perfectly related to this bug).
The suggested output format would indeed help to understand what has been checked and what the real matching plaintext was, but the problem here is that I/we currently don't understand why "1\00002" is even considered with those rules and dict (because in my opinion it shouldn't). Also here
https://hashcat.net/trac/ticket/154 in my opinion the \00 shouldn't be concatened with those rules since it isn't in the dict... This is the real problem/bug we are trying to understand here...
... but yes another (and most importantly working) output format could/should be able to help us to understand what is going on here.
When I spoke with Atom about this a while back - in relation to this exact issue - he explained that there was no way to "conditionalize" (poor word, I know) the execution of rules, particularly on the GPU. For example, truncating a short word; if the truncation is at a point longer than the word length, the new password length would be set to the "truncated" length, creating a null-appended value.
There is no way to "fix" this, and retain high speed.
As a result, I had to change how I used oclHashcat - I never use it with the --remove option, and always post-check the "found" passwords against the original hashes.
Moving to the new output format doesn't "fix" the unfixable problem of conditional-rules-on-gpu, but it does at least show what was found, and does offer a much better long-term storage of found passwords in the output file (and gives hope to the re-use of this data).
This _does_ require oclHashcat to pass back the password length correctly to the output, *after* rule application. It does not do so currently, and assumes that the input password length is the same as the output password length (I think). Either that, or the length is not preserved, and a strlen() is being used - which is not a good idea. Most hash algorithms (including MD5) do not care which bytes are being hashed, and use a "pointer-and-length" format; we need to be able to represent that on input (in the dictionaries), and in the output.
In this example, seeing the output:
c4ca4238a0b923820dcc509a6f75849b:1
a933d13f81649bebe035dc21f4002ff1:$HEX[310032]
makes it perfectly obvious what happened... and you can then reuse
$HEX[310032]
in a password dictionary file to indicate the 3 character string 1 NUL 2.
This is even more significant with passwords ending in a CR/CRLF.
The ISW 2012 cracking challenge, for example, had more than 264,000 passwords which required hex encoding.
Perfect, thx Waffle for these details.
It is just a pity that I/we weren't aware of these speed-truncation mechanism, unfortunatelly I think nobody documented this behaviour and why it can't be avoided neither here
http://hashcat.net/forum/thread-2228.html nor here
https://hashcat.net/trac/ticket/154 before.
So your last post seems to be *THE* valid answer and I suggest if nobody has valid arguments against it that we close
https://hashcat.net/trac/ticket/154 as unresolvable AND/OR that it is *now* a output-problem only (because those \00 bytes sometimes need to be there to avoid performance issues) AND state there (again) that while those unexpected null bytes cannot easily be fixed a correct output format should allow to detect them.
Thx Waffle for this good points
(06-01-2013, 09:18 PM)Waffle Wrote: [ -> ]When I spoke with Atom about this a while back - in relation to this exact issue - he explained that there was no way to "conditionalize" (poor word, I know) the execution of rules, particularly on the GPU. For example, truncating a short word; if the truncation is at a point longer than the word length, the new password length would be set to the "truncated" length, creating a null-appended value.
There is no way to "fix" this, and retain high speed.
As a result, I had to change how I used oclHashcat - I never use it with the --remove option, and always post-check the "found" passwords against the original hashes.
Moving to the new output format doesn't "fix" the unfixable problem of conditional-rules-on-gpu, but it does at least show what was found, and delete duplicates does offer a much better long-term storage of found passwords in the output file (and gives hope to the re-use of this data).
This _does_ require oclHashcat to pass back the password length correctly to the output, *after* rule application. It does not do so currently, and assumes that the input password length is the same as the output password length (I think). Either that, or the length is not preserved, and a strlen() is being used - which is not a good idea. Most hash algorithms (including MD5 duplicate finder) do not care which bytes are being hashed, and use a "pointer-and-length" format; we need to be able to represent that on input (in the dictionaries), and in the output.
In this example, seeing the output:
c4ca4238a0b923820dcc509a6f75849b:1
a933d13f81649bebe035dc21f4002ff1:$HEX[310032]
makes it perfectly obvious what happened... and you can then reuse
$HEX[310032]
in a password dictionary file to indicate the 3 character string 1 NUL 2.
This is even more significant with passwords ending in a CR/CRLF.
The ISW 2012 cracking challenge, for example, had more than 264,000 passwords which required hex encoding.
Thank you this helped me a lot!