Help understanding PCIe 16x/8x/16x/8x
#1
Hi there,

Quite new to designing cracking rig and I'm struggling to understand the impact of having a 16x PCIe slot used at 8x (2 lanes ?).

Most of the motherboard I see (gaming ones) which have 4x PCIe 16x slots use in fact two slots as full 16x slots, and two as 8x slots, thus reducing performance if I'm right.

Am I right saying that a 16x PCIe slot (8x) is slower than a 16x PCIe slot (16x) ?

Is it a problem ? if I want to use 4 rtx 3080, is it a waste of money, and should I use another motherboard with 4 fully 16x PCIe slots (does it even exist in ATX format ?).

I also see many motherboards with nvme ssd using the same lanes as a pcie slot, thus reducing a slot to a 4x or even disabling it. Are there any motherboard in ATX format that I could use with 4 RTX 3080 and at least one nvme SSD, without having to worry about wasting money because a slot is in fact at 15% performance because of beeing an actual 8x pcie slot, reduced to 4x thanks to the nvme ssd ?

Thanks by advance for your help Smile
Reply
#2
In my experience, almost all hashcat and JtR cracking jobs are unaffected if you're at x4 or higher.

And yep, manufacturers disclose the difference between physical form factor and actual throughput.
~
Reply
#3
Does it mean that haschat or JtR work faster than the GPU ? I don't get it, the bus bandwidth is significantly lower at x4 than at x8 or x16 (respectively two times and 4 times lower). It doesnt have any impact on the performance of the cracking job ?

edit: merci pour la réponse !
Reply
#4
only correct answer: it depends a lot on many factors.

it depends on the hash type, it also depends on the attack type, ...

if you have a slow hash, the H/s (speed in hashes per second) is low, the cracking algo is slow, the GPUs are heavily busy to perform all the iterations or similar (high cost factor).

if you use a dictionary attack without rules (or pipe / stdin, or --slow-candidates run), you need to provide a lot of work to have full acceleration. It's not easy to provide full acceleration if the 4x PCIe slot is the bottleneck.

This means, it depends on where the bottleneck is and if you are able to provide enough work to keep the GPUs busy... for instance, it could be possible to use a "slow transfer" (4x) but still get almost normally high speed and full acceleration by using for instance rules in a dictionary attack (but be aware, that applying rules also increases your total number of password candidates, "keyspace", and this might not be what you want).
Reply
#5
(10-28-2020, 06:33 PM)almandin Wrote: I also see many motherboards with nvme ssd using the same lanes as a pcie slot, thus reducing a slot to a 4x or even disabling it. Are there any motherboard in ATX format that I could use with 4 RTX 3080 and at least one nvme SSD, without having to worry about wasting money because a slot is in fact at 15% performance because of beeing an actual 8x pcie slot, reduced to 4x thanks to the nvme ssd ?


Consumer-grade/gaming platforms like AM4 and Intel 1151/1200 do only support 24 PCIe Lanes. Usually motherboard manufacturers do not support a splitting of these lanes in such a way, that it would make sense for 4 GPUs plus an NVMe. (It would also not really make sense for a Gamer, since quad SLI really is dead).

You should head for a Workstation/enthusiast platform like TR4/TRX40 or Intel 2066. Threadripper has 64 Lanes and Intel Core X CPUs hover around 28 to 48 Lanes, depending on the model. That allows for a lot more room to split these lanes in a way that makes sense and you don't have to worry about M.2 Slots disabling each other or one of your PCIe slots. If you want to cheap out on your CPU/Mobo I can recommend First-gen/Second-gen Threadripper. If you want to max out your capacity in terms of GPUs, there is the Asus WS SAGE X299, which supports seven GPUs at once, runnig at 8x thanks to PLX chips (PCIe Lane Multiplexer).
Reply
#6
I suggest that you check out AMD EPYC 2 "Rome" motherboards. They have 128 native PCIE lanes. Please note that dual socket EPYC 2 systems would still have 128 PCIE lanes because in the dual CPU configuration, 64 PCIE lanes from each CPU is reserved for the communication between the CPUs so you are left with 2X 64 lanes.

EPYC 2 motherboards also support PCIE 4.0 and so do the RTX 3000 Series GPUs. 

PCIE 4.0 is (not exactly but nearly) two times faster than PCIE 3.0 in bandwidth.
Reply
#7
Thank you for your awesome answers, I'm looking at the hardware everyone is sharing to get an idea Smile
Reply
#8
Quote:You should head for a Workstation/enthusiast platform like TR4/TRX40 or Intel 2066. Threadripper has 64 Lanes and Intel Core X CPUs hover around 28 to 48 Lanes, depending on the model. That allows for a lot more room to split these lanes in a way that makes sense and you don't have to worry about M.2 Slots disabling each other or one of your PCIe slots.

I didnt know that CPU had a maximum number of supported lanes either ... Does it mean that if I use a 48 lanes i9 CPU, it wont be able to get 4 16x GPUs (16*4 = 64, while the cpu has only 48 lanes) ?
Reply
#9
Yes, thats right. Unfortunately, you won't be able to get your 4 16x GPUs even with Threadripper, because there are always at least 4 Lanes reserved for the Chipset. The only option to get 4 times 16x on your GPUs it to go with a server platform like EPYC, which has 128 Lanes (as Longtail already mentioned). But since running GPUs with 4x is fine already (especially with PCIe 4.0), you should not worry all too much about it. The rig I have built is running 4 GPUs in a 16x/8x/16x/8x configuration, which is perfectly fine.
Reply
#10
Thanks for your answer NoReply.

I was looking at EPYC cpus at the moment. How do we know that running gpus at 4x is fine ? Was it all through trial and error approach ?

I guess its a waste of money then, to get an EPYC with 128 lanes and its associated (quite costly) motherboard when it could all run with 4x PCIe 4 slots ...
Reply