Custom charsets for Cyrillic passwords in NTLM hashes
#1
Hi,

If I understand correctly, to provide a valid charset for the NTLM hashes we need to convert it to little endian.

In the example from https://hashcat.net/forum/thread-6384-po...l#pid34048:

Code:
$ hashcat -m900 -a3 --hex-charset -1 04354045 c2767da21725edccced3fd251e4d8619 ?1?1?1?1?1?1

how would you calculate '04354045'?

I've tried:

Code:
$ echo -n хер | iconv -f utf8 -t utf16le | xxd -e -p
450435044004

This works too, but such a generated charset is 2 bytes longer (6 vs. 4).
Reply
#2
I'm not sure what you're trying to do. The last command you show does of course produce 6 bytes, since utf-16 uses a fixed number of two bytes per character. Hence three characters require 6 bytes in utf-16.
Reply
#3
I'd like to have charsets as small as possible.

How did the author of the linked post obtained their charset?
Reply
#4
just sort and unique it

Code:
echo -n хер | iconv -f utf8 -t utf16le | xxd -p -c 1 | sort -un | tr -d '\n'
Reply
#5
thanks @philsmd!
Reply
#6
(06-13-2019, 03:12 PM)undeath Wrote: since utf-16 uses a fixed number of two bytes per character.

I am sorry, I may missunderstand you.
UTF-16 save one Unicode element in two or four bytes, e.g. "𝓮𝔁𝓪𝓶𝓹𝓵𝓮" is build out of seven Unicode elements / characters and in UTF-16 it is saved in 28 bytes. This should be true for UTF-8 and UTF-32, too.
Code:
$ cat example_.txt
𝓮𝔁𝓪𝓶𝓹𝓵𝓮
$ cat example_.txt | hexdump -C
00000000  f0 9d 93 ae f0 9d 94 81  f0 9d 93 aa f0 9d 93 b6  |................|
00000010  f0 9d 93 b9 f0 9d 93 b5  f0 9d 93 ae 0d 0a        |..............|
$ cat example_.txt | iconv -f utf8 -t utf16le > example_u16.txt
$ cat example_u16.txt | hexdump -C
00000000  35 d8 ee dc 35 d8 01 dd  35 d8 ea dc 35 d8 f6 dc  |5...5...5...5...|
00000010  35 d8 f9 dc 35 d8 f5 dc  35 d8 ee dc 0d 00 0a 00  |5...5...5.......|

Of cause you can combine multiple Unicode elements into a single visible one, e.g. "🧛🏽‍♀️". But this is not what I mean.

side note: I am aware of the fact that the correct visualisation of my examples depend on different factors. The first example should be ok on most systems while the last one seems to be broken on most systems.
Reply
#7
this discussion is very difficult to follow. are you using 2 forum accounts or why are you BotPass answering like it was an answer to you (but actually given to boreas). How are you related if I may ask ? Do you "just" have the same problem?

I actually kind of agree with undeath. utf16 uses at least 2 bytes as your example also proofs. like the new line ascii 0a is represented as 0a00 in your hex dump

of course non-ascii characters might also use more bytes 2+, but most characters of english words in dicts are represented with exactly 2 bytes (statistically speaking). of course all of them, utf8 and utf16, utf32, can use more bytes if needed
Reply
#8
There is no relation.

I just referred to the "fixed number of two bytes" which is not "more bytes 2+" for me. I often see people, even trainer, proclaim that UTF-16 has exactly two bytes per element and that you can determine the length of text by dividing the number of bytes by two. Some of them even fight the idea until they see an example proving them wrong.
I do not know how it was meant, but I thought that people reading this thread may get that "fixed number" wrong thus I returned to it.

The windows line break is still there because I was to lazy to remove it. Sorry that this even more flustered.

If it is not allowed here to post in other peoples threads I am sorry to and I will try not to do this in the future. Sad
Reply
#9
no, it's perfectly fine to answer like you did and clarify the situation.

Thanks for the clarification !

It just wasn't very clear (maybe only to me) that this was just a general clarification/statement and your argument was only to make clear that the "fixed-2-bytes" rule isn't always true in general.

The phrase "I may missunderstand you" sounded to me that it was "your problem" (original post) and that you were trying to fix this problem and didn't understand the argument that undeath gave... now we ALL know it was just a general clarification/correction of the wrong 2-bytes-for-utf16 rule.... I think that also undeath knows that both utf8 and utf16 are variable length and that 2 bytes isn't the maximum number of bytes for utf16 to represent a single "character". The main problem is that with "just" only 2 bytes you can already represent soo many characters in utf16 that it's quite rare that (with most languages, not all of course ! I'm also mainly focusing on english but I know that''s not a good idea always, because some languages most of the time use 4 bytes) you have to use 4 bytes or more with utf16.

Sorry for my misunderstanding and confusion above... it wasn't meant to stop people to discuss at all, it was just a little bit unclear to me if you are kind of related or having the same problem to fix etc.

I guess at least from now we ALL know that the minimum 2 bytes and maybe much more bytes per "character" are used in utf16. an hell yeah, character encoding (and conversion) can be quite difficult to understand and deal with ! it's sometimes a nightmare. it's for sure not as simple as always 2 bytes !
Reply