How to get maximum performance from Apple Silicon (M1-M3)?
Hello, folks:

I am learning about hashcat using an Apple laptop with an M1 Max CPU, part of the Apple Silicon family. As with all Apple Silicon hardware, there is are clusters of economy CPU cores, clusters of performance cores, a GPU, and a (poorly described) arithmetic and logical unit. I know how to run hashcat with the "-d 2" option to use 100% of the GPU, and almost no CPU. I know how to run hashcat with the "-d 1" option to use a fraction of the economy cores, but little GPU or performance cores.

How can I run hashcat to get the maximum hashes/second out of my machine? Are there any tips or know-how documented anywhere?

For example, I can imagine that one method might be to run "hashcat -d 2" in one process to max out the GPU, and then one or maybe multiple "hashcat -d 1" processes to max out the CPUs. But I don't know how to coordinate these hashcat instances so that they divide the work between them.  I also don't know of a way to tell hashcat to preferentially use the performance CPUs rather than the economy CPUs. And there probably are rules of thumb about how many processes of hashcat to run depending on the number of cores and amount of memory in a given Apple Silicon machine.

Can anyone point me to know-how or wisdom on how to get the maximum performance of hashcat on Apple Silicon?
Use the GPU and leave the CPU for the host side processing. Using both usually leads to a lower net hash rate because the GPU relies on the CPU to feed it work. Unfortunately, there's not much performance to be had from Apple silicon in the first place, so what you get from the GPU is... what you get.
On my MacStudio M2 Ultra, -d1 is for metal, -d2 for OpenCL, both 76 MCU which are the GPU cores only. I did not find a way to address the CPU cores, I guess metal is addressing GPU directly.
On intel based Macs, the CPU was shown as a separate device.

How do you address the CPU directly?