06-18-2016, 04:00 PM
Hi! I haven't seen much about using VM:s for GPU workloads, and I haven't seen anything on this forum. So, I though I would write a bit about it here, because I think it's great.
Summary
It works. No performance penalty found, yet.
TL;DR version at the bottom.
About me
I work pretty much every day with designing/installing/maintaining small to almost-enterprise size VMware virtualization systems, using almost exclusively HPE hardware. It's not ALL I do, but it's a major part of my work.
My posts here are my own and are not endorsed, or even known, by my employer, HPE or VMware.
Why virtualize?
(Warning, some VMware marketspeek included)
Virtualization, generally, provides the following benefits:
Resources can be assigned from a pool, rather than beeing a fixed size decided at instalation time
Network redesign can be done without affecting the guests
VM:s can be deployed "instantly", without having to order new hardware
Much reduced hardware in the guests reduces the amount of drivers that can cause problems
In this case, using ESX to host guest running oclHashcat, the following benefits are also realized:
OS, driver and hashcat updates may cause problems or performance issues. Resolution: power down guest, snapshot, power up and perform update(s), test. If issues occur, revert snapshot. In case of a multi-GPU box, do this for one VM with one GPU. When the update procedure is finalized, repeat it on the other guests; downtime is greatly reduced.
Why NOT virtualize?
The oldschool way of thinking is that "if you need performance, you need dedicated hardware". This is not true anymore; first, hypervisors have matured and does not cause much overhead. Second, most overhead from virtualization is on the CPU part. CPU:s today are hardly ever bottlenecks - and if they are, you can add more cores. The only instance you may actually require dedicated hardware is if your application needs very very high single-thread performance. Well that and if your idiot vendor says virtualization is unsupported...
A real reason not to virtualize is increased complexity. Yes, virtualization makes the system more complex, it adds another layer of "stuff" that you need to plan, maintain and troubleshoot. If you are unwilling to learn something new, buy hardware.
Design
The goal here is to install a vShere host with one or more GPU:s. These GPU:s are then assigned to VM:s using PCI passthrough. The operating system is ESXi; it is available for free from vmware.com. You need to register and checkout a license, or it will not power on guests after 60 days. The free license is limited to 8 vCPU/guest and cannot be connected to VMware vCenter (=> no cluster).
ESXi only supports/contains drivers for a subset of hardware. It can be installed on consumer hardware if the hardware is built from supported chips; unsupported hardware may also work with community drivers. Pretty much all ready-made servers today supports ESXi; but a lot of servers to not support high-power GPU:s, or they have insufficient room for large GPU coolers, or they do not have (enough) PCI-e power.
There are a lot of options for hardware, you need to research a bit to find something that works. I use an HPE Ml350p Gen8: it can fit two 2-slot GPU:s/CPU but you need a special power cable for PCI-e power. Fortunatelly, the power socket was compatible with Corsair modular PSU power cables.
Preparation
Download ESXi. If you have a HPE/Dell server, download the custom ISO or you may not have the drivers you need. You can add drivers to the installtion CD but you cannot add drivers on the fly during the installation.
Fit the GPU and make sure it has enough power.
When you (later) add the GPU to your VM, it will reserve 100% RAM that you configured to your VM. If you need 4 gig RAM/VM and 8 VM:s, you need at least 4gig x 8 + (overhead + ESXi) RAM ~= 34gig ram. You are not constrained to whole GB:s here, you may assign 3750MB RAM instead, if you want to.
(Optional:
I recommend setting the VM advanced option "mem.ShareForceSalting" to 0, this enables me to overallocate RAM better (it enables RAM dedup between guests). It won't make a difference on guest with passthrough, though.
Since these VM:s will be in an isolated environment (I hope), I would turn off ASLR to increase guest RAM dedup.
Overallocating CPU is OK, but will notice increased latency for certain applications. Web server generally do not perform well on overallocated CPU - you may apply this fix to the VM if you think you need it: http://kb.vmware.com/kb/1018276. But consider the fact that you will be burning away some CPU that may be used for other VM:s instead. Also, those idle loops are counted towards the guest resources shares - you may end up with using half your CPU for nothing and not be eligible for CPU when you need it. SQL servers are not bothered by CPU latency; performance decreases with less CPU available, but it will use those CPU cycles effectively.
If you REALLY want to reserve an entire core (or several) for your guest, change CPU reserve to 100% and set the advanced option "monitor_control.halt_desched" to false. Your VM will now never be required to share it's core. Also, if you have hyperthreading active (you should), you may ensure that the other thread is not used by setting Hyperthreaded core sharing: none. This ensures that your VM will have 100% of the core cache and CPU time.
)
Installation
Install ESXi.
Connect to ESXi using a web browser and download vSphere Client (the client is version specific and the link is to and Internet URL).
Configure time and date. This is important: when a guest is restarted or a snapshot reverted, the guest time is set to whatever ESXi has. Is is a reoccuring problem that this step is forgotten, and VM:s start up with the wrong time/date.
In vSphere client, select your host, tab Configuration, option Advanced Settings. Enable PCI passthrough on your GPU. Reboot.
Install a VM to run oclHashcat. I use Ubuntu according to this guide: https://hashcat.net/wiki/doku.php?id=linux_server_howto Thus, I have a 64bit-only installation, version 14.04, with 'fglrx' drivers.
Install open-vm-tools; trust me, you want it. You will then be able to right-click "shutdown" instead of "poweroff" and you will see the IP addresses in vSphere client.
When the VM is installed and updated and works like you want it to, shut it down and edit the virtual hardware.
Add the PCI device (your GPU).
Start your MV, get oclHashcat.
Done!
Benchmarking
This is from my server, running ESXi 5.5 on an HPE ML350p Gen8, and a second-hand Radeon 6970. The host is running a total of 12 VM:s with total 17 vCPU:s, on a single 4-core Intel E5-2609 (no hyperthreading). 48gig RAM with 60gig RAM assigned to guests.
CPU usage during benchmark was average ~250MHz, or 10% of core speed. That's a lot of CPU that can be used for other things... ;) It's hash dependant though, some hashes actually used 97% CPU.
Performance of the 6970 I find around the Internets, is MD5=5878MH/s and WPA2≃82kH/s.
I was expecting to get at least 90% of that speed - but the benchmark got _more_ than others have posted for WPA2, and the same (almost) for MD5. It seems performance of hashcat is totally unaffected by beeing virtualized.
My performance is MD5=5698MH/s and WPA2=95kH/s.
Extras
I use hashtopus to control this VM. To start the agent, I use this little script:
The agent execution command line (set in hashtopus agent config) must contain "--gpu-temp-disable" for the fan speed config to work.
Here's a quick reference to the screen command: http://aperiodic.net/screen/quick_reference
You may check GPU temp and fan speed this way:
aticonfig --odgt --pplib-cmd "get fanspeed 0"
Conclusion
Running oclHashcat in a VM instead of on hardware, can be a viable way to optimize hardware usage and ease management, IF you have the knowledge and time to set it up. So far, it seems there is not inpact on performance, though further testing on multiple and more recent GPU:s are needed to establish this as a fact.
TL;DR:
1. Install ESXi
2. Activate PCI Passthrough, reboot
3. Install a virtual guest as usual - add open-vm-tools
4. Shutdown VM and add GPU to VM
5. Start VM, install oclHashcat
6. ?
7. Profit!
Try it, you'll like it! :)
Summary
It works. No performance penalty found, yet.
TL;DR version at the bottom.
About me
I work pretty much every day with designing/installing/maintaining small to almost-enterprise size VMware virtualization systems, using almost exclusively HPE hardware. It's not ALL I do, but it's a major part of my work.
My posts here are my own and are not endorsed, or even known, by my employer, HPE or VMware.
Why virtualize?
(Warning, some VMware marketspeek included)
Virtualization, generally, provides the following benefits:
- Flexibility
Resources can be assigned from a pool, rather than beeing a fixed size decided at instalation time
- Agility
Network redesign can be done without affecting the guests
VM:s can be deployed "instantly", without having to order new hardware
- Availability
Much reduced hardware in the guests reduces the amount of drivers that can cause problems
In this case, using ESX to host guest running oclHashcat, the following benefits are also realized:
- Hardware utilization
- Maintenance made easy
OS, driver and hashcat updates may cause problems or performance issues. Resolution: power down guest, snapshot, power up and perform update(s), test. If issues occur, revert snapshot. In case of a multi-GPU box, do this for one VM with one GPU. When the update procedure is finalized, repeat it on the other guests; downtime is greatly reduced.
- Different versions
Why NOT virtualize?
The oldschool way of thinking is that "if you need performance, you need dedicated hardware". This is not true anymore; first, hypervisors have matured and does not cause much overhead. Second, most overhead from virtualization is on the CPU part. CPU:s today are hardly ever bottlenecks - and if they are, you can add more cores. The only instance you may actually require dedicated hardware is if your application needs very very high single-thread performance. Well that and if your idiot vendor says virtualization is unsupported...
A real reason not to virtualize is increased complexity. Yes, virtualization makes the system more complex, it adds another layer of "stuff" that you need to plan, maintain and troubleshoot. If you are unwilling to learn something new, buy hardware.
Design
The goal here is to install a vShere host with one or more GPU:s. These GPU:s are then assigned to VM:s using PCI passthrough. The operating system is ESXi; it is available for free from vmware.com. You need to register and checkout a license, or it will not power on guests after 60 days. The free license is limited to 8 vCPU/guest and cannot be connected to VMware vCenter (=> no cluster).
ESXi only supports/contains drivers for a subset of hardware. It can be installed on consumer hardware if the hardware is built from supported chips; unsupported hardware may also work with community drivers. Pretty much all ready-made servers today supports ESXi; but a lot of servers to not support high-power GPU:s, or they have insufficient room for large GPU coolers, or they do not have (enough) PCI-e power.
There are a lot of options for hardware, you need to research a bit to find something that works. I use an HPE Ml350p Gen8: it can fit two 2-slot GPU:s/CPU but you need a special power cable for PCI-e power. Fortunatelly, the power socket was compatible with Corsair modular PSU power cables.
Preparation
Download ESXi. If you have a HPE/Dell server, download the custom ISO or you may not have the drivers you need. You can add drivers to the installtion CD but you cannot add drivers on the fly during the installation.
Fit the GPU and make sure it has enough power.
When you (later) add the GPU to your VM, it will reserve 100% RAM that you configured to your VM. If you need 4 gig RAM/VM and 8 VM:s, you need at least 4gig x 8 + (overhead + ESXi) RAM ~= 34gig ram. You are not constrained to whole GB:s here, you may assign 3750MB RAM instead, if you want to.
(Optional:
I recommend setting the VM advanced option "mem.ShareForceSalting" to 0, this enables me to overallocate RAM better (it enables RAM dedup between guests). It won't make a difference on guest with passthrough, though.
Since these VM:s will be in an isolated environment (I hope), I would turn off ASLR to increase guest RAM dedup.
Overallocating CPU is OK, but will notice increased latency for certain applications. Web server generally do not perform well on overallocated CPU - you may apply this fix to the VM if you think you need it: http://kb.vmware.com/kb/1018276. But consider the fact that you will be burning away some CPU that may be used for other VM:s instead. Also, those idle loops are counted towards the guest resources shares - you may end up with using half your CPU for nothing and not be eligible for CPU when you need it. SQL servers are not bothered by CPU latency; performance decreases with less CPU available, but it will use those CPU cycles effectively.
If you REALLY want to reserve an entire core (or several) for your guest, change CPU reserve to 100% and set the advanced option "monitor_control.halt_desched" to false. Your VM will now never be required to share it's core. Also, if you have hyperthreading active (you should), you may ensure that the other thread is not used by setting Hyperthreaded core sharing: none. This ensures that your VM will have 100% of the core cache and CPU time.
)
Installation
Install ESXi.
Connect to ESXi using a web browser and download vSphere Client (the client is version specific and the link is to and Internet URL).
Configure time and date. This is important: when a guest is restarted or a snapshot reverted, the guest time is set to whatever ESXi has. Is is a reoccuring problem that this step is forgotten, and VM:s start up with the wrong time/date.
In vSphere client, select your host, tab Configuration, option Advanced Settings. Enable PCI passthrough on your GPU. Reboot.
Install a VM to run oclHashcat. I use Ubuntu according to this guide: https://hashcat.net/wiki/doku.php?id=linux_server_howto Thus, I have a 64bit-only installation, version 14.04, with 'fglrx' drivers.
Install open-vm-tools; trust me, you want it. You will then be able to right-click "shutdown" instead of "poweroff" and you will see the IP addresses in vSphere client.
When the VM is installed and updated and works like you want it to, shut it down and edit the virtual hardware.
Add the PCI device (your GPU).
Start your MV, get oclHashcat.
Done!
Benchmarking
This is from my server, running ESXi 5.5 on an HPE ML350p Gen8, and a second-hand Radeon 6970. The host is running a total of 12 VM:s with total 17 vCPU:s, on a single 4-core Intel E5-2609 (no hyperthreading). 48gig RAM with 60gig RAM assigned to guests.
CPU usage during benchmark was average ~250MHz, or 10% of core speed. That's a lot of CPU that can be used for other things... ;) It's hash dependant though, some hashes actually used 97% CPU.
Code:
oclHashcat v2.01 starting in benchmark-mode...
Device #1: Cayman, 2010MB, 880Mhz, 24MCU
Hashtype: MD4
Workload: 1024 loops, 256 accel
Speed.GPU.#1.: 10566.3 MH/s
Hashtype: MD5
Workload: 1024 loops, 256 accel
Speed.GPU.#1.: 5698.3 MH/s
Hashtype: Half MD5
Workload: 1024 loops, 256 accel
Speed.GPU.#1.: 3465.1 MH/s
Hashtype: SHA1
Workload: 1024 loops, 256 accel
Speed.GPU.#1.: 1895.7 MH/s
Hashtype: SHA256
Workload: 512 loops, 256 accel
Speed.GPU.#1.: 772.0 MH/s
Hashtype: SHA384
Workload: 256 loops, 256 accel
Speed.GPU.#1.: 214.7 MH/s
Hashtype: SHA512
Workload: 256 loops, 256 accel
Speed.GPU.#1.: 217.4 MH/s
Hashtype: SHA-3(Keccak)
Workload: 512 loops, 256 accel
Hashtype: SipHash
Workload: 1024 loops, 256 accel
Speed.GPU.#1.: 5490.3 MH/s
Hashtype: RipeMD160
Workload: 512 loops, 256 accel
Speed.GPU.#1.: 1209.3 MH/s
Hashtype: Whirlpool
Workload: 512 loops, 32 accel
Speed.GPU.#1.: 76044.8 kH/s
Hashtype: GOST R 34.11-94
Workload: 512 loops, 64 accel
Speed.GPU.#1.: 59761.0 kH/s
Hashtype: GOST R 34.11-2012 (Streebog) 256-bit
Workload: 512 loops, 16 accel
Speed.GPU.#1.: 11139.3 kH/s
Hashtype: GOST R 34.11-2012 (Streebog) 512-bit
Workload: 512 loops, 16 accel
Speed.GPU.#1.: 11049.8 kH/s
Hashtype: phpass, MD5(Wordpress), MD5(phpBB3), MD5(Joomla)
Workload: 1024 loops, 32 accel
Speed.GPU.#1.: 1583.9 kH/s
Hashtype: scrypt
Workload: 1 loops, 64 accel
Speed.GPU.#1.: 167.7 kH/s
Hashtype: PBKDF2-HMAC-MD5
Workload: 1000 loops, 8 accel
Speed.GPU.#1.: 467.4 kH/s
Hashtype: PBKDF2-HMAC-SHA1
Workload: 1000 loops, 8 accel
Speed.GPU.#1.: 638.0 kH/s
Hashtype: PBKDF2-HMAC-SHA256
Workload: 1000 loops, 8 accel
Speed.GPU.#1.: 324.2 kH/s
Hashtype: PBKDF2-HMAC-SHA512
Workload: 1000 loops, 8 accel
Speed.GPU.#1.: 67549 H/s
Hashtype: Skype
Workload: 1024 loops, 256 accel
Speed.GPU.#1.: 3149.5 MH/s
Hashtype: WPA/WPA2
Workload: 1024 loops, 32 accel
Speed.GPU.#1.: 95159 H/s
Hashtype: IKE-PSK MD5
Workload: 256 loops, 128 accel
Speed.GPU.#1.: 225.9 MH/s
Hashtype: IKE-PSK SHA1
Workload: 256 loops, 128 accel
Speed.GPU.#1.: 158.7 MH/s
Hashtype: NetNTLMv1-VANILLA / NetNTLMv1+ESS
Workload: 1024 loops, 256 accel
Speed.GPU.#1.: 5379.0 MH/s
Hashtype: NetNTLMv2
Workload: 512 loops, 256 accel
Speed.GPU.#1.: 262.5 MH/s
Hashtype: IPMI2 RAKP HMAC-SHA1
Workload: 512 loops, 256 accel
Speed.GPU.#1.: 350.3 MH/s
Hashtype: Kerberos 5 AS-REQ Pre-Auth etype 23
Workload: 128 loops, 32 accel
Speed.GPU.#1.: 13883.0 kH/s
Hashtype: DNSSEC (NSEC3)
Workload: 512 loops, 256 accel
Speed.GPU.#1.: 586.1 MH/s
Hashtype: PostgreSQL Challenge-Response Authentication (MD5)
Workload: 1024 loops, 256 accel
Speed.GPU.#1.: 1075.2 MH/s
Hashtype: MySQL Challenge-Response Authentication (SHA1)
Workload: 1024 loops, 256 accel
Speed.GPU.#1.: 544.2 MH/s
Hashtype: SIP digest authentication (MD5)
Workload: 1024 loops, 32 accel
Speed.GPU.#1.: 291.4 MH/s
Hashtype: SMF > v1.1
Workload: 1024 loops, 256 accel
Speed.GPU.#1.: 1607.9 MH/s
Hashtype: vBulletin < v3.8.5
Workload: 1024 loops, 256 accel
Speed.GPU.#1.: 1513.6 MH/s
Hashtype: vBulletin > v3.8.5
Workload: 512 loops, 256 accel
Speed.GPU.#1.: 1089.3 MH/s
Hashtype: IPB2+, MyBB1.2+
Workload: 512 loops, 256 accel
Speed.GPU.#1.: 1136.0 MH/s
Hashtype: WBB3, Woltlab Burning Board 3
Workload: 512 loops, 256 accel
Speed.GPU.#1.: 223.8 MH/s
Hashtype: Joomla < 2.5.18
Workload: 1024 loops, 256 accel
Speed.GPU.#1.: 5697.9 MH/s
Hashtype: PHPS
Workload: 1024 loops, 256 accel
Speed.GPU.#1.: 1513.7 MH/s
Hashtype: Drupal7
Workload: 1024 loops, 8 accel
Speed.GPU.#1.: 8962 H/s
Hashtype: osCommerce, xt:Commerce
Workload: 1024 loops, 256 accel
Speed.GPU.#1.: 3149.4 MH/s
Hashtype: PrestaShop
Workload: 1024 loops, 256 accel
Speed.GPU.#1.: 1903.2 MH/s
Hashtype: Django (SHA-1)
Workload: 1024 loops, 256 accel
Speed.GPU.#1.: 1607.9 MH/s
Hashtype: Django (PBKDF2-SHA256)
Workload: 1024 loops, 8 accel
Speed.GPU.#1.: 16607 H/s
Hashtype: Mediawiki B type
Workload: 1024 loops, 256 accel
Speed.GPU.#1.: 1467.5 MH/s
Hashtype: Redmine Project Management Web App
Workload: 1024 loops, 256 accel
Speed.GPU.#1.: 359.3 MH/s
Hashtype: PostgreSQL
Workload: 1024 loops, 256 accel
Speed.GPU.#1.: 5696.5 MH/s
Hashtype: MSSQL(2000)
Workload: 1024 loops, 256 accel
Speed.GPU.#1.: 2011.1 MH/s
Hashtype: MSSQL(2005)
Workload: 1024 loops, 256 accel
Speed.GPU.#1.: 2011.9 MH/s
Hashtype: MSSQL(2012)
Workload: 256 loops, 256 accel
Speed.GPU.#1.: 215.8 MH/s
Hashtype: MySQL323
Workload: 1024 loops, 256 accel
Speed.GPU.#1.: 11829.0 MH/s
Hashtype: MySQL4.1/MySQL5
Workload: 512 loops, 256 accel
Speed.GPU.#1.: 890.7 MH/s
Hashtype: Oracle H: Type (Oracle 7+)
Workload: 128 loops, 64 accel
Speed.GPU.#1.: 188.7 MH/s
Hashtype: Oracle S: Type (Oracle 11+)
Workload: 1024 loops, 256 accel
Speed.GPU.#1.: 1904.2 MH/s
Hashtype: Oracle T: Type (Oracle 12+)
Workload: 1024 loops, 8 accel
Speed.GPU.#1.: 16568 H/s
Hashtype: Sybase ASE
Workload: 512 loops, 32 accel
Speed.GPU.#1.: 88246.0 kH/s
Hashtype: EPiServer 6.x < v4
Workload: 1024 loops, 256 accel
Speed.GPU.#1.: 1607.8 MH/s
Hashtype: EPiServer 6.x > v4
Workload: 512 loops, 256 accel
Speed.GPU.#1.: 675.3 MH/s
Hashtype: md5apr1, MD5(APR), Apache MD5
Workload: 1000 loops, 32 accel
Speed.GPU.#1.: 2459.1 kH/s
Hashtype: ColdFusion 10+
Workload: 256 loops, 128 accel
Speed.GPU.#1.: 372.8 MH/s
Hashtype: hMailServer
Workload: 512 loops, 256 accel
Speed.GPU.#1.: 675.3 MH/s
Hashtype: SHA-1(Base64), nsldap, Netscape LDAP SHA
Workload: 1024 loops, 256 accel
Speed.GPU.#1.: 1902.1 MH/s
Hashtype: SSHA-1(Base64), nsldaps, Netscape LDAP SSHA
Workload: 1024 loops, 256 accel
Speed.GPU.#1.: 1870.7 MH/s
Hashtype: SSHA-512(Base64), LDAP {SSHA512}
Workload: 256 loops, 256 accel
Speed.GPU.#1.: 217.4 MH/s
Hashtype: LM
Workload: 1024 loops, 256 accel
Speed.GPU.#1.: 3851.4 MH/s
Hashtype: NTLM
Workload: 1024 loops, 256 accel
Speed.GPU.#1.: 10536.4 MH/s
Hashtype: Domain Cached Credentials (DCC), MS Cache
Workload: 1024 loops, 256 accel
Speed.GPU.#1.: 2919.9 MH/s
Hashtype: Domain Cached Credentials 2 (DCC2), MS Cache 2
Workload: 1024 loops, 16 accel
Speed.GPU.#1.: 76150 H/s
Hashtype: MS-AzureSync PBKDF2-HMAC-SHA256
Workload: 100 loops, 256 accel
Speed.GPU.#1.: 3050.6 kH/s
Hashtype: descrypt, DES(Unix), Traditional DES
Workload: 1024 loops, 64 accel
Speed.GPU.#1.: 77154.6 kH/s
Hashtype: BSDiCrypt, Extended DES
Workload: 1024 loops, 256 accel
Speed.GPU.#1.: 1104.9 kH/s
Hashtype: md5crypt, MD5(Unix), FreeBSD MD5, Cisco-IOS MD5
Workload: 1000 loops, 32 accel
Speed.GPU.#1.: 2457.2 kH/s
Hashtype: bcrypt, Blowfish(OpenBSD)
Workload: 32 loops, 2 accel
Speed.GPU.#1.: 2320 H/s
Hashtype: sha256crypt, SHA256(Unix)
Workload: 1024 loops, 4 accel
Speed.GPU.#1.: 92315 H/s
Hashtype: sha512crypt, SHA512(Unix)
Workload: 1024 loops, 8 accel
Speed.GPU.#1.: 8286 H/s
Hashtype: OSX v10.4, v10.5, v10.6
Workload: 1024 loops, 256 accel
Speed.GPU.#1.: 1607.9 MH/s
Hashtype: OSX v10.7
Workload: 256 loops, 256 accel
Speed.GPU.#1.: 180.9 MH/s
Hashtype: OSX v10.8+
Workload: 1024 loops, 2 accel
Speed.GPU.#1.: 1724 H/s
Hashtype: AIX {smd5}
Workload: 1000 loops, 32 accel
Speed.GPU.#1.: 2460.1 kH/s
Hashtype: AIX {ssha1}
Workload: 64 loops, 128 accel
Speed.GPU.#1.: 10458.6 kH/s
Hashtype: AIX {ssha256}
Workload: 64 loops, 128 accel
Speed.GPU.#1.: 4584.9 kH/s
Hashtype: AIX {ssha512}
Workload: 64 loops, 32 accel
Speed.GPU.#1.: 1074.2 kH/s
Hashtype: Cisco-PIX MD5
Workload: 1024 loops, 256 accel
Speed.GPU.#1.: 3994.7 MH/s
Hashtype: Cisco-ASA MD5
Workload: 1024 loops, 256 accel
Speed.GPU.#1.: 3981.6 MH/s
Hashtype: Cisco-IOS SHA256
Workload: 512 loops, 256 accel
Speed.GPU.#1.: 772.0 MH/s
Hashtype: Cisco $8$
Workload: 1024 loops, 8 accel
Speed.GPU.#1.: 16616 H/s
Hashtype: Cisco $9$
Workload: 1 loops, 4 accel
Speed.GPU.#1.: 764 H/s
Hashtype: Juniper Netscreen/SSG (ScreenOS)
Workload: 1024 loops, 256 accel
Speed.GPU.#1.: 3149.3 MH/s
Hashtype: Juniper IVE
Workload: 1000 loops, 32 accel
Speed.GPU.#1.: 2463.4 kH/s
Hashtype: Android PIN
Workload: 1024 loops, 16 accel
Speed.GPU.#1.: 1364.8 kH/s
Hashtype: Citrix NetScaler
Workload: 1024 loops, 256 accel
Speed.GPU.#1.: 1703.6 MH/s
Hashtype: RACF
Workload: 128 loops, 256 accel
Speed.GPU.#1.: 547.4 MH/s
Hashtype: GRUB 2
Workload: 1024 loops, 2 accel
Speed.GPU.#1.: 6025 H/s
Hashtype: Radmin2
Workload: 1024 loops, 256 accel
Speed.GPU.#1.: 1922.3 MH/s
Hashtype: SAP CODVN B (BCODE)
Workload: 1024 loops, 64 accel
Speed.GPU.#1.: 161.0 MH/s
Hashtype: SAP CODVN F/G (PASSCODE)
Workload: 512 loops, 32 accel
Speed.GPU.#1.: 11874.8 kH/s
Hashtype: SAP CODVN H (PWDSALTEDHASH) iSSHA-1
Workload: 1024 loops, 16 accel
Speed.GPU.#1.: 1401.9 kH/s
Hashtype: Lotus Notes/Domino 5
Workload: 128 loops, 32 accel
Speed.GPU.#1.: 59499.5 kH/s
Hashtype: Lotus Notes/Domino 6
Workload: 128 loops, 32 accel
Speed.GPU.#1.: 11415.7 kH/s
Hashtype: Lotus Notes/Domino 8
Workload: 1024 loops, 64 accel
Speed.GPU.#1.: 134.4 kH/s
Hashtype: PeopleSoft
Workload: 1024 loops, 256 accel
Speed.GPU.#1.: 2011.2 MH/s
Hashtype: 7-Zip
Workload: 1024 loops, 4 accel
Speed.GPU.#1.: 2030 H/s
Hashtype: RAR3-hp
Workload: 16384 loops, 32 accel
Speed.GPU.#1.: 5191 H/s
Hashtype: TrueCrypt 5.0+ PBKDF2-HMAC-RipeMD160 + XTS 512 bit
Workload: 1024 loops, 64 accel
Speed.GPU.#1.: 26717 H/s
Hashtype: TrueCrypt 5.0+ PBKDF2-HMAC-SHA512 + XTS 512 bit
Workload: 1000 loops, 8 accel
Speed.GPU.#1.: 65811 H/s
Hashtype: TrueCrypt 5.0+ PBKDF2-HMAC-Whirlpool + XTS 512 bit
Workload: 1000 loops, 8 accel
Speed.GPU.#1.: 12991 H/s
Hashtype: TrueCrypt 5.0+ PBKDF2-HMAC-RipeMD160 + XTS 512 bit + boot-mode
Workload: 1000 loops, 128 accel
Speed.GPU.#1.: 52890 H/s
Hashtype: Android FDE <= 4.3
Workload: 1024 loops, 32 accel
Speed.GPU.#1.: 166.9 kH/s
Hashtype: eCryptfs
Workload: 1024 loops, 8 accel
Speed.GPU.#1.: 2616 H/s
Hashtype: MS Office <= 2003 MD5 + RC4, oldoffice$0, oldoffice$1
Workload: 1024 loops, 32 accel
Speed.GPU.#1.: 14539.3 kH/s
Hashtype: MS Office <= 2003 MD5 + RC4, collision-mode #1
Workload: 1024 loops, 32 accel
Speed.GPU.#1.: 27812.3 kH/s
Hashtype: MS Office <= 2003 SHA1 + RC4, oldoffice$3, oldoffice$4
Workload: 1024 loops, 32 accel
Speed.GPU.#1.: 18776.4 kH/s
Hashtype: MS Office <= 2003 SHA1 + RC4, collision-mode #1
Workload: 1024 loops, 32 accel
Speed.GPU.#1.: 29260.3 kH/s
Hashtype: Office 2007
Workload: 1024 loops, 32 accel
Speed.GPU.#1.: 27056 H/s
Hashtype: Office 2010
Workload: 1024 loops, 32 accel
Speed.GPU.#1.: 13553 H/s
Hashtype: Office 2013
Workload: 1024 loops, 4 accel
Speed.GPU.#1.: 1512 H/s
Hashtype: PDF 1.1 - 1.3 (Acrobat 2 - 4)
Workload: 1024 loops, 32 accel
Speed.GPU.#1.: 28542.8 kH/s
Hashtype: PDF 1.1 - 1.3 (Acrobat 2 - 4) + collider-mode #1
Workload: 1024 loops, 32 accel
Speed.GPU.#1.: 32523.0 kH/s
Hashtype: PDF 1.4 - 1.6 (Acrobat 5 - 8)
Workload: 70 loops, 256 accel
Speed.GPU.#1.: 1330.0 kH/s
Hashtype: PDF 1.7 Level 3 (Acrobat 9)
Workload: 512 loops, 256 accel
Speed.GPU.#1.: 771.8 MH/s
Hashtype: PDF 1.7 Level 8 (Acrobat 10 - 11)
Workload: 64 loops, 8 accel
Speed.GPU.#1.: 6060 H/s
Hashtype: Password Safe v2
Workload: 1000 loops, 16 accel
Speed.GPU.#1.: 43785 H/s
Hashtype: Password Safe v3
Workload: 1024 loops, 16 accel
Speed.GPU.#1.: 351.1 kH/s
Hashtype: Lastpass
Workload: 500 loops, 64 accel
Speed.GPU.#1.: 641.4 kH/s
Hashtype: 1Password, agilekeychain
Workload: 1000 loops, 64 accel
Speed.GPU.#1.: 669.8 kH/s
Hashtype: 1Password, cloudkeychain
Workload: 1024 loops, 2 accel
Speed.GPU.#1.: 1508 H/s
Hashtype: Bitcoin/Litecoin wallet.dat
Workload: 1024 loops, 2 accel
Speed.GPU.#1.: 376 H/s
Hashtype: Blockchain, My Wallet
Workload: 10 loops, 256 accel
Speed.GPU.#1.: 11149.6 kH/s
I was expecting to get at least 90% of that speed - but the benchmark got _more_ than others have posted for WPA2, and the same (almost) for MD5. It seems performance of hashcat is totally unaffected by beeing virtualized.
My performance is MD5=5698MH/s and WPA2=95kH/s.
Extras
I use hashtopus to control this VM. To start the agent, I use this little script:
Code:
#!/bin/bash
aticonfig --pplib-cmd "set fanspeed 0 60"
screen -d -m -S HASHAGENT01 mono hashtopus.exe
Here's a quick reference to the screen command: http://aperiodic.net/screen/quick_reference
You may check GPU temp and fan speed this way:
aticonfig --odgt --pplib-cmd "get fanspeed 0"
Conclusion
Running oclHashcat in a VM instead of on hardware, can be a viable way to optimize hardware usage and ease management, IF you have the knowledge and time to set it up. So far, it seems there is not inpact on performance, though further testing on multiple and more recent GPU:s are needed to establish this as a fact.
TL;DR:
1. Install ESXi
2. Activate PCI Passthrough, reboot
3. Install a virtual guest as usual - add open-vm-tools
4. Shutdown VM and add GPU to VM
5. Start VM, install oclHashcat
6. ?
7. Profit!
Try it, you'll like it! :)