Posts: 19
Threads: 5
Joined: Aug 2024
Hi there,
I'd like to build a mid-range Hashcracking Rack, about 2.000 bugs. Focus is a good price-performance ratio.
PS: I am aware that RTX 5000 is about to be released and the gpu prices are rather high right now as Nvidia stopped production of rtc 4.000 before the rtx 5.000 are available. However for tax reasons I have to buy it in this year.
1. I know it is recommended to use blower-style cards, but those are hard to find in the rtx 4 generation.
I think 2-3 GPUs in a 4U Rack should be fine without blower-style, aren't they?
So I consider 2 RTX 4070 GPU and a Ryzen 5600G such that there is an independent iGPU for the OS. I consider picking a board with 3 x16 slots for later Upgrade.
2. How relevant is the electrical PCIe bandwith for Hashcracking?
There are boards ~ 200 bugs that have either (in () is the electrical layout of the slots)
2x PCIe 4.0 x16 (1x x16, 1x x8)
and 1x PCIe 3.0 x16 (x4)
or
3x PCIe 4.0 x16 (1x x16, 1x x4, 1x x2),
and there are boards at ~ 500 bugs
3x PCIe 4.0 x16 (1x x16, 1x x8, 1x x4)
or 3x PCIe 4.0 x16 (1x x16, 2x x8),
If Hashcracking works fine with 3.0 x4 or even 4.0 x2, I'd definitely pick a 3 slot board. If x8 is required, I may rather pick a board with 2 slots only.
3. 4070 do have either GDDR6 or GDDR6X VRAM. For Gaming I read the X is about 3% faster, but is it relevant for Hashcracking, too?
4. I read that RAM should be twice the sum of VRAM for Hashcracking. According to this, 2x 2x12GB=48GB would be the recommendation, so I'd use 2x DDR-3200 32GB for dual Channel (128GB in 4 Slots is the Max of all the boards anyway). Or is overclocking of RAM worth the effort/additional costs for hashcracking?
That would be:
2x 4070: ~1100
Board: ~200
64 GB ECC-RAM ~ 150
5600G ~ 100
PSU 1000W 80+ Platinum: ~150 (maybe FSP VITA GM 1000W ATX 3.1)
Case: Chenbro RM41300G ~ 200 (because it has 8 PCIe slots)
= ~1900
What do you think? Anything I missed that may be an issue?
Thanks a lot, greatly appreciated.
Posts: 413
Threads: 2
Joined: Dec 2015
12-06-2024, 04:39 PM
(This post was last modified: 12-06-2024, 04:40 PM by Chick3nman.)
1. Don't try to put non-blower cards into a server chassis. They won't be happy, if they fit in the first place, even in 4U.
2. Relevant but not overly so, anything above x4 lanes should be enough to not make a huge difference in your performance. You should certainly always strive for the best possible, but as long as you have x4 lanes of at least 3.0, you should be fine.
3. This is effectively irrelevant and launching a CUDA kernel causes the drivers to automatically downclock your VRAM anyway, which so far hasn't had major impacts in the vast majority of hashcat's use cases.
4. Twice is a bit much, really the rule should be System RAM >= combined VRAM, but 2x obviously gives you a lot more room to breathe. System RAM speeds are effectively irrelevant and you likely won't even see _usage_ for a lot of it. The reason it's necessary is due to an allocation quirk in the runtime(s) i believe but these days it's a lot less of an issue. Sticking to system ram >= combined VRAM should be fine and avoid any issues.
If you want the "right" GPU to get right now, especially with the 4090 discontinued, look here: https://www.amazon.com/GIGABYTE-GeForce-...B0D87BVDWQ
or here: https://www.amazon.com/MSI-Gaming-Graphi...B0CWS78Y5J
Posts: 19
Threads: 5
Joined: Aug 2024
12-06-2024, 05:44 PM
(This post was last modified: 12-06-2024, 05:45 PM by fsdafsadfsdsdaf.)
(12-06-2024, 04:39 PM)Chick3nman Wrote: 1. Don't try to put non-blower cards into a server chassis. They won't be happy, if they fit in the first place, even in 4U.
If you want the "right" GPU to get right now, especially with the 4090 discontinued, look here: https://www.amazon.com/GIGABYTE-GeForce-...B0D87BVDWQ
or here: https://www.amazon.com/MSI-Gaming-Graphi...B0CWS78Y5J
Thanks for Pointing me to these Cards.
So even for mid-range cards like the 4070 it is not possible to properly do a 19" 4U setup?
It seems that MSI Gaming Aero is not available over here in Europe, not even the MSI webstore does list it and the MSI products page only has a RTX 1650 Aero. It is also not listed on eBay.
I looked at OpenCL benchmarks and at the hashcat benchmark results postet in this forum before. For me it looked like the sweet-spot in GH/€ is 4060 followed closely by 4060 Ti and -- with a slighly larger Gap -- the 4070. The larger cards /4070 Ti+) had a significantly worse price-perfromance-ratio than these, which I why I am surprised that you recommend 4070 Ti Super.
The Gigabyte 4070 Ti Super AI TOP is about 1.000€ each. The cheapest 4070 Ti Super is about 830€. As both seem to have the same specs, that is ~170€ just for being blower style.
Looking at benchmarks here in the forum the 4070 has about 92/GH/s NTLM, the 4070 Super TI about 133 GH/s.
So compared to the non-blower 4070 this card has 1.45 the performance but 1.8 the price. Also this would mean that instead of 2 4070 I'd use 1 4070 Super Ti for the budget. That does not look like a good price-perfromance-ratio, does it?
There is a ASUS Turbo 4070 12GB GDDR6X which is blower style that is about 620€. Compared to the 4070 Ti Super AI TOP that would mean the AI Top is 1.45 the performance for 1.58 the price. So I guess 2x the ASUS Turbo 12G would be a better option than one 4070 Ti Super AI TOP, wouldn't it?
I do actually consider using a tower instead if I see that just for using blower style I pay ~ 200€ more (in each of the cases)
(12-06-2024, 04:39 PM)Chick3nman Wrote: 2. Relevant but not overly so, anything above x4 lanes should be enough to not make a huge difference in your performance. You should certainly always strive for the best possible, but as long as you have x4 lanes of at least 3.0, you should be fine. Ok, so as 3.0 x4 is the same bandwidth as 4.0 x2, all of the mentioned boards would be fine, perfect.
(12-06-2024, 04:39 PM)Chick3nman Wrote: Sticking to system ram >= combined VRAM should be fine and avoid any issues. Good to know. Thanks.
Posts: 413
Threads: 2
Joined: Dec 2015
While the smaller cards may look like better price/performance at first, there tends to be some non-linearity to it as well as some extra consideration. For a single rig of just 2 GPUs, it's quite possible that the lower end GPUs do continue to make sense. For rigs with more GPUs, you need to consider the price per "density" as well. If you go with more of the lower end cards, you run into a limit on how many can be put into 1 system effectively and end up having to build additional systems, which means additional cost to continue scaling.
This is the speed I'm getting on a 4070Ti Super AI TOPS right now:
-----------------------
* Hash-Mode 1000 (NTLM)
-----------------------
Speed.#1.........: 144.2 GH/s (14.89ms) @ Accel:128 Loops:1024 Thr:256 Vec:8
And on the MSI card:
-----------------------
* Hash-Mode 1000 (NTLM)
-----------------------
Speed.#1.........: 143.2 GH/s (15.05ms) @ Accel:1024 Loops:1024 Thr:32 Vec:8
So a bit above the 133GH/s, which may skew the numbers a bit.
If we use your numbers for a second:
620€ for 92GH/s (4070 ASUS Turbo)
vs
830€ for 144GH/s (cheaper 4070Ti Super)
vs
1000€ for 144GH/s (AI TOPS)
This gives us:
6.73€/GH/s for the 4070
5.7€/GH/s for the 4070Ti Super
6.94€/GH/s for the 4070Ti Super AI TOPS
So while the AI TOPS variant is indeed a little worse due to it's price, the cheaper price you have for the 4070Ti Super is actually better than the price you have for the ASUS 4070 Turbo. Still, as mentioned, in this case you're right that the smaller GPUs may end up being more cost effective, especially if you aren't planning on scaling to more GPUs later. If this were a larger rig, however, like 8-10 GPU, the 4070Ti Super AI TOPS or the MSI variant are ideal due to their density, power, form factor, and price. The only major problem now is that their availability is potentially declining as they are phased out for the 50 series cards.
>Ok, so as 3.0 x4 is the same bandwidth as 4.0 x2, all of the mentioned boards would be fine, perfect.
I don't know about this. While the lane configs you mentioned _might_ be ok, I wouldn't count on them being _physically_ possible. Most consumer motherboards these days are not built with multi-GPU as an intended use case.
Posts: 19
Threads: 5
Joined: Aug 2024
Well, if you put in more cards you need to go to Threadripper line with very expensive motherboards etc. That makes sense if you want like "max performance density", but it is way above the ~ 2.000 I planned to spend.
I also hoped that blower-style would not be as important for such smaller configurations, but I am glad to know that I was wrong (rather than building it and going crazy with heat issues)
What do you think, would non-blower cards work nicely in a tower scenario or will 2 or 3 GPU be too much heat for a tower (without water cooling) anyway?
Posts: 81
Threads: 15
Joined: Dec 2019
12-10-2024, 12:04 PM
(This post was last modified: 12-10-2024, 12:06 PM by Sondero.)
You dont have to build a Threadripper rig with only 2-3 GPUs.
A frew weeks ago i build a 3x4090 (NonBlower) Aircoold 19" rig,....price limit was 7.500€ i took a 9950XCPU with an ASUS B650 Creator Board, All cards with PCI-x16->x16 Riser. On the electric side 2x PCI 4.0x8 1x PCI 4.0x4.
Everything works well with max temperature of <75 °C with Noctua industrial fans. (ambient temperatur perhabs ~15C,... server room)
Posts: 413
Threads: 2
Joined: Dec 2015
Non blower cards CAN work, but you have to mindful of their airflow and realize that while they may be able to get enough air through their coolers, they mostly rely on the case airflow to actually REMOVE the heat, whereas blower cards exhaust the heat out of the case immediately. Recycling hot air back through the cooler will cause the case to heat and the cards will get much hotter that way, especially if drawing hot air off the back of another card.
Posts: 19
Threads: 5
Joined: Aug 2024
Hi again. I decided to follow your recommendation and bought the 4070 TI Super AI TOP. In benchmark, it does show ~ 133GH/s, which is nice. however even though it is the blower design, hashcat warns about Temprature throttle pretty much right after starting hashcat.
Is there something that needs to be tweaked? Is the default driver configuration of the temp target not good for Hashcracking?
In default "brute force" mask attack, which I supposed is an optimal workload as it is easy to parallelize, I get about 55 GH/s
Thanks
|