Adding PCFG to slow_candidates
Reviving the discussion started in this thread:

BLUF: I'm starting to write a PCFG guesser implementation in C with the eventual goal of getting it included as  a slow candidate mode in Hashcat. 

Current Python PCFG code:

I will be starting a new repo shortly for the C only version of the guess generator. I plan on both versions using the Python training program to generate rulesets/grammars.

1) Are there templates for the five functions for sc mode? I'm going to quote Atom below on those five functions

The five functions in general are the following:
  • sc_pcfg_init - A function which resets all internal structures of the generator as it would be started freshly from the commandline. It will also provide the mandatory and optional parameters a user can specify in a struct. It will return a context to work with. The context enables multi threading functionality.

  • sc_pcfg_keyspace - A function which simply returns the total number of candidates which the generator will create based on the parameter configuration. If the total number is unknown this has some disadvantages. For instance, the ETA can not be computed or it may not be possible to distribute it via hashtopolis. In this case return (u64) -1 and hashcat will assume the generator will give a negative returncode in the seek/next function (explained next).

  • sc_pcfg_seek - Seek to a specific candidate position. This is mandatory, the parameter will be just a number. Will also have a returncode if there's no such position

  • sc_pcfg_next - Output the next password candidate (based on the context)

  • sc_pcfg_shutdown - A cleanup function
Note: Based on other discussions, seek and keyspace may not be practical to implement with the current next algorithm that the PCFG code uses.
I've heared about the excellent performance of the new pcfg. Congrats!

There's no templates for the SC integration. The functions I've explained are functions your candidates generator library should expose. If it does, I'll do the integration into hashcat.
    What you describe sounds good for a plan going forward. I'll code it up the C PCFG guesser so it can run standalone as well, (which will help with my own personal testing), but will also expose those functions with the exception of sc_pcfg_seek to other programs. I've been talking with some other people and think I may have a way to tackle the sc_pcfg_keyspace in a reasonable manner. 

For the first PoC, I'm going to leave out some features of the python pcfg_guesser. Most notably, I'm going to only handle UTF-8 guesses, drop support for OMEN, and not handle save/restore for cracking sessions. That should significantly simplify things which means I might actually get something working in a somewhat reasonable timeframe. I figure those features could then be implemented at a later point.
Periodic check-in to say that I'm making progress. The current code can be found here:

TLDR: It *does not* generate guesses right now, but progress is being made.

I'm currently at the point where I'm loading a ruleset/grammar, and am troubleshooting the priority queue and "next function". Once I'm somewhat happy they are working the way I expect them to, I'll focus on generating guesses from parse trees, (basically rule + dictionary combos).

Currently I'm using a modified version of the Makefile included in Hashcat to try and limit the challenges of including this in HC at a later point. The program I'm working on right now is stand-alone in that it'll parse the command line and generate guesses without having to use another program, as I figure that will make the initial troubleshooting easier. After I get it working I plan on adding in (abstracting) the functionality to make it easier to incorporate into other tools.
Oh, and the ruleset needs to be generated by the Python version of the PCFG toolset, (available here:

I currently have no plans of porting the training program over to C, since it heavily relies on several Python libraries. Once a ruleset is generated and saved to disk though, either the Python PCFG or the compiled C PCFG will be able to make use of it.
The newest version of the compiled PCFG is finally generating password guesses! There's a ton of features/improvements I still want to make to it, (the top one is I'm pretty sure I'm horribly handling non-ASCII UTF-8 characters), but it's at the point where end to end testing can be conducted.

To generate guesses with the code in the repo, generate/copy a ruleset/grammar using the Python PCFG toolset, and then copy the entire "Rules" folder into the compiled_pcfg directory. Yes, better figuring out how to set up the PCFG trainer to support multiple projects is also on my to-do list.

I was running into challenges using it to pipe guesses into hashcat using the following command:

<Path omitted>/pcfg_guesser | ./hashcat64.exe -a 0 -m 100 ../../research/password_lists/hashcat_fmt_test_list.hsh

But that may be because I was running it under windows subsystem for Linux under "Ubuntu", so weirdness can pop up. The error was that it seemed like the pcfg_guesser died/stopped generating guesses while Hashcat was still starting up. If I cat a dictionary file in instead, hashcat will crack passwords. This probably points to an area I need to dig into on my end to handle blocking output gracefully. Or it could be something entirely else ¯\_(ツ)_/¯. What I really need to do is simply test it out on a real Linux system.

I was able to do some initial testing using John the Ripper though. 

./pcfg_guesser | ../JohnTheRipper/run/john --stdin -format=Raw-MD5 ../../research/password_lists/test_list.txt

Side note: You can invoke JtR's status output when piping in guesses via stdin using the following command:

kill -SIGUSR1 <PID of JTR>

Doing this, I was getting about 4 to 5 million guesses a second. As an example:

0g 0:00:03:17  0g/s 5253Kp/s 5253Kc/s 183066MC/s jessykira3..liberty@2009

It does slow down the longer you run a cracking session, but so far I haven't run one using the compiled PCFG for more than an hour. That being said, this is about 20 times faster than the Python PCFG guesser so that's a huge improvement!

Next on my to-do list:
1) Run the compiled PCFG on a different computer. If issues still arise with piping guesses into Hashcat, dig into that.
2) Start abstracting the functions listed earlier that will be used to integrate this into other programs. Aka sc_pcfg_next, sc_pcfg_init, etc.
3) Testing, and more testing. For example I want to make sure it is generating the same guesses as the Python PCFG.
(08-22-2019, 10:46 PM)lakiw Wrote: I'm going to... drop support for OMEN

I understand that you're the author of PCFG and thus more favorable towards including PCFG in hashcat, but may I ask you why do you want to drop the support for OMEN in the first place when, according to the benchmark plots, it's superior to PCFG in terms of the number of guess one needs to make?