11-08-2018, 11:20 PM

I'm very excited about Hashcat's Brain server and have started to look into if it would be possible to incorporate a PCFG guesser into it. I don't think I'll actually be doing any coding until January, but I wanted to write down some notes and get people's input into it.

---Random Notes---

Current PCFG code: https://github.com/lakiw/pcfg_cracker

Quick description: PCFG stands for probabilistic context free grammar. In a nutshell what it does is create a model of how people create passwords, assigns different passwords a probability of occurring according to that model, and then generates password guesses in probability order.

How Brain could help: The current PCFG implementation has a lot of overhead in generating guesses because it attempts to generate the guesses in probability order. That's a nicer way of saying it is really slow. That being said, a PCFG based attack could be sped up dramatically, (perhaps on the order of Markov attacks), if you didn't care about generating guesses in probability order and instead took another approach like generating all guesses above a certain probability threshold. There's also exists the ability to distribute the work fairly easily. Having a Brain server help keep track of sessions could really help with this.

Current PCFG password cracking grammar:

Base Structures: This looks a lot like a hashcat mask if you squint, and it contains the basic structure of a password guess. The current fields are:

A=Alpha characters/Letters

D=Digits

O=Other/Special Characters

K=Keyboard Combos

X=conteXt sensitive replacements (such as <3, #1...)

M=Markov generated string, (currently adding OMEN support right now)

A typical base structure would look like:

Password123! = A8D3O1

Notice that the base structure does not cover capitalization or l33t mangling. That comes later...

There are also pre-terminal replacements. These are replacements for each of the variables in the base structure. This is also where transforms like case mangling happen. So you might have transforms such as:

A8 -> ULLLLLL -> Password

In the above example, the case mangling is applied as a mask that is then applied to the dictionary word.

What adds the "P" to PCFGs is that every transform has a probability associated with it. So the Base Structure has a probability, the Alpha capitailation mask has a probability, the word "password" has a probability, and the digits and special characters at the end have a probability. The final probability of a generated guess is the combined probability of all the transforms it took to generate it.

What this means from a Brain perspective is that you can describe a cracking session by a starting and ending probability for a given grammar in the current PCFG implementation. If you don't care about generating guesses in probability order, you could further split things up by describing attacks as the base structure, the starting probability, and the ending probability. Therefore you could distribute the work much like you can currently do it with traditional Mask attacks.

---Getting PCFGs into Brain---

I started looking at the brain code, and it appears the above saving and checking previously run PCFG attacks could be implemented in the brain_compute_attack function. That's likely the easy part. The hard part would be getting a PCFG style attack to run in Hashcat. Version 2 of the PCFG cracker was actually written in C++, but the current version 3 is a Python3 script. I'm starting a full re-write of the PCFG cracker though, so this is the time to bite the bullet and maybe go back to C++. My first goal is to re-write the training program which will likely be done in the next couple of months, which is why I expect to pick this up with more haste once the new year rolls around.

---Random Notes---

Current PCFG code: https://github.com/lakiw/pcfg_cracker

Quick description: PCFG stands for probabilistic context free grammar. In a nutshell what it does is create a model of how people create passwords, assigns different passwords a probability of occurring according to that model, and then generates password guesses in probability order.

How Brain could help: The current PCFG implementation has a lot of overhead in generating guesses because it attempts to generate the guesses in probability order. That's a nicer way of saying it is really slow. That being said, a PCFG based attack could be sped up dramatically, (perhaps on the order of Markov attacks), if you didn't care about generating guesses in probability order and instead took another approach like generating all guesses above a certain probability threshold. There's also exists the ability to distribute the work fairly easily. Having a Brain server help keep track of sessions could really help with this.

Current PCFG password cracking grammar:

Base Structures: This looks a lot like a hashcat mask if you squint, and it contains the basic structure of a password guess. The current fields are:

A=Alpha characters/Letters

D=Digits

O=Other/Special Characters

K=Keyboard Combos

X=conteXt sensitive replacements (such as <3, #1...)

M=Markov generated string, (currently adding OMEN support right now)

A typical base structure would look like:

Password123! = A8D3O1

Notice that the base structure does not cover capitalization or l33t mangling. That comes later...

There are also pre-terminal replacements. These are replacements for each of the variables in the base structure. This is also where transforms like case mangling happen. So you might have transforms such as:

A8 -> ULLLLLL -> Password

In the above example, the case mangling is applied as a mask that is then applied to the dictionary word.

What adds the "P" to PCFGs is that every transform has a probability associated with it. So the Base Structure has a probability, the Alpha capitailation mask has a probability, the word "password" has a probability, and the digits and special characters at the end have a probability. The final probability of a generated guess is the combined probability of all the transforms it took to generate it.

What this means from a Brain perspective is that you can describe a cracking session by a starting and ending probability for a given grammar in the current PCFG implementation. If you don't care about generating guesses in probability order, you could further split things up by describing attacks as the base structure, the starting probability, and the ending probability. Therefore you could distribute the work much like you can currently do it with traditional Mask attacks.

---Getting PCFGs into Brain---

I started looking at the brain code, and it appears the above saving and checking previously run PCFG attacks could be implemented in the brain_compute_attack function. That's likely the easy part. The hard part would be getting a PCFG style attack to run in Hashcat. Version 2 of the PCFG cracker was actually written in C++, but the current version 3 is a Python3 script. I'm starting a full re-write of the PCFG cracker though, so this is the time to bite the bullet and maybe go back to C++. My first goal is to re-write the training program which will likely be done in the next couple of months, which is why I expect to pick this up with more haste once the new year rolls around.