Maskuni : a standalone mask generator with unicode support
#1
Hi,

I’m making this self-quarantine project public. Masks are still a trending topics in many countries so ... let’s have unicode masks :-)

https://github.com/flbdx/maskuni

Maskuni is similar to Maskprocessor. Its syntax is compatible. It’s also fully compatible with hashcat’s mask files.

By default it will iterate over 8-bits charsets like Maskprocessor (maskuni is faster though).

When its unicode support is enabled, maskuni will read UTF-8 encoded charsets and write UTF-8 encoded words, internally iterating over 32-bits unicode codepoints.

Code:
$ ./maskuni.exe --unicode -1 👍🤘 'emojis ?1?1'
emojis 👍👍
emojis 👍🤘
emojis 🤘👍
emojis 🤘🤘

I’m mainly using it to extend the lowercase and uppercase predefined charsets with latin accented letters (you can redefine or extend the predefined charsets with maskuni) :

Code:
$ ./maskuni --unicode --charset 'l:?léèà' --no-delim '?l'
abcdefghijklmnopqrstuvwxyzéèà

There is one huge limitation with unicode masks though. They can’t generated unicode chars made of multiple combined codepoints (like the very common red hearth emoji ❤️).

In addition, maskuni also has a more traditional "bruteforce" mode. Its defined by a password length and a set of charsets each with a minimal and maximal number of occurrences. For example, password of length 8 with 0 to 6 lowercase, 0 to 2 uppercase, 0 to 2 digits (see https://hashcat.net/forum/thread-8847.html for a related post).

The bruteforce mode also has 8-bit or unicode flavors.

Regarding maskuni’s speed. Well it’s faster than maskprocessor. But this doesn’t mean much as the performances will be killed by the pipe between maskuni and its consumer, or even by a bad stdin handling from the consumer... So it’s more suited to slow hashes, custom cracking software or wordlist building.

Hopefully maskuni will find some use!
Reply
#2
interesting. but the name of the tool is a little bit confusing because we always use maskgen for the hcmask generation tool of PACK (https://github.com/iphelix/pack/). That's very confusing, because one generates .hcmask files from dictionaries (statistics) and yours is a tool that pipes the output of masks (a "mask processor") to stdout or files etc.

I'm also interested in what the strategy is to improve the speed over maskprocessor, where do the differences lie in ? Any clever strategy to improve the maskprocessor speed, because as far as I know it's already one of the fastest tools with clever buffering/flushing etc ?
Reply
#3
(06-03-2020, 08:54 AM)philsmd Wrote: interesting. but the name of the tool is a little bit confusing because we always use maskgen for the hcmask generation tool of PACK (https://github.com/iphelix/pack/). That's very confusing, because one generates .hcmask files from dictionaries (statistics) and yours is a tool that pipes the output of masks (a "mask processor") to stdout or files etc.

Ah, I forgot about PACK/maskgen.py. Thanks for reminding me. I will think about another name. Maybe something like "umaskgen" or "unimask"?

Quote:I'm also interested in what the strategy is to improve the speed over maskprocessor, where do the differences lie in ? Any clever strategy to improve the maskprocessor speed, because as far as I know it's already one of the fastest tools with clever buffering/flushing etc ?

Regarding its speed, its mostly excessive care due to excessive boredom during this self-quarantine time...

Looking at mp.c, I see that the "--occurrences-max" option causes additional work to keep the count. Maybe it’s significant enough. I'll try a bench tonight after patching it out.

Otherwise:
  • aggressive inlining
  • GCC being reliable (Clang won’t optimize this as good as GCC…)
  • regarding the output: I’m using the same buffer size as maskprocessor (8192) and did not found it interesting to use bigger buffers
  • Only for the windows port, I’m avoiding the libc’s memcpy for the buffering. I noticed both with cygwin and msys2 that GCC doesn’t replace memcpy by an efficient builtin and call the one from the system libc. They kinda suck...
  • I remember having an issue with the size of my "Charset" structure (a mask being a vector of this structure). Having for example a 40 bytes structure would hurt the speed noticeably, probably because some charsets were overlapping on 2 cache lines. Keeping it at 32 bytes or padding it up to 64 bytes did improve the speed.
Reply
#4
So I'll change the name of the project to Maskuni to avoid any confusion with PACK.
I’ll update the first post and thread tittle once it's pushed and the repo is moved.

Regarding the speed difference with Maskprocessor, I did some more benchmarks. As I suspected the "--occurrence-max" option accounts for almost all the difference. Still Maskuni keeps a slight edge after removing the occurrences counting and logic.
Running at around 150 million words/s is about 1 word every 23 cycles on my laptop. So every line counts and it’s also probably very sensitive to the GCC’s version (and mood).
Reply