Maskuni : a standalone mask generator with unicode support
#3
(06-03-2020, 08:54 AM)philsmd Wrote: interesting. but the name of the tool is a little bit confusing because we always use maskgen for the hcmask generation tool of PACK (https://github.com/iphelix/pack/). That's very confusing, because one generates .hcmask files from dictionaries (statistics) and yours is a tool that pipes the output of masks (a "mask processor") to stdout or files etc.

Ah, I forgot about PACK/maskgen.py. Thanks for reminding me. I will think about another name. Maybe something like "umaskgen" or "unimask"?

Quote:I'm also interested in what the strategy is to improve the speed over maskprocessor, where do the differences lie in ? Any clever strategy to improve the maskprocessor speed, because as far as I know it's already one of the fastest tools with clever buffering/flushing etc ?

Regarding its speed, its mostly excessive care due to excessive boredom during this self-quarantine time...

Looking at mp.c, I see that the "--occurrences-max" option causes additional work to keep the count. Maybe it’s significant enough. I'll try a bench tonight after patching it out.

Otherwise:
  • aggressive inlining
  • GCC being reliable (Clang won’t optimize this as good as GCC…)
  • regarding the output: I’m using the same buffer size as maskprocessor (8192) and did not found it interesting to use bigger buffers
  • Only for the windows port, I’m avoiding the libc’s memcpy for the buffering. I noticed both with cygwin and msys2 that GCC doesn’t replace memcpy by an efficient builtin and call the one from the system libc. They kinda suck...
  • I remember having an issue with the size of my "Charset" structure (a mask being a vector of this structure). Having for example a 40 bytes structure would hurt the speed noticeably, probably because some charsets were overlapping on 2 cache lines. Keeping it at 32 bytes or padding it up to 64 bytes did improve the speed.
Reply


Messages In This Thread
RE: Maskgen : a standalone mask generator with unicode support - by flbdx - 06-03-2020, 12:11 PM