03-13-2022, 04:37 AM
I would suspect it has an issue with the amount of memory being allocated based on the error appearing "cuMemFree(): unknown error".
The typical setup for such a large system is to have VRAM = RAM so in this scenario you would need a total of 120GB of RAM. I don't suspect a swap file would work as intended in this situation so either increase the physical RAM or disable cards.
The typical setup for such a large system is to have VRAM = RAM so in this scenario you would need a total of 120GB of RAM. I don't suspect a swap file would work as intended in this situation so either increase the physical RAM or disable cards.