Fail to compile kernel, may need to increase reserved registers for spilling.

Fail to compile kernel, may need to increase reserved registers for spilling. - Printable Version

+- hashcat Forum (https://hashcat.net/forum)
+-- Forum: Support (https://hashcat.net/forum/forum-3.html)
+--- Forum: hashcat (https://hashcat.net/forum/forum-45.html)
+--- Thread: Fail to compile kernel, may need to increase reserved registers for spilling. (/thread-6074.html)

Fail to compile kernel, may need to increase reserved registers for spilling. - joshmore - 11-28-2016

Running Hashcat on Ubuntu 16.04, using the nvidia-367 and Intel i95 drivers. It works fine when cracking MD5. It works fine when cracking NTLMv2 so long as I'm not using rules. As soon as I try something like:

Code:
# ./hashcat -m 5600 -a 0 /path/to/responder_hashes.txt ../WordLists/rockyou.txt  -r rules/best64.rule

I get the following:

Code:
hashcat (v3.10-829-g646a472) starting...

OpenCL Platform #1: Intel

=========================

* Device #1: Intel(R) HD Graphics Haswell GT2 Mobile, 1024/2048 MB allocatable, 20MCU

Hashes: 58 digests; 58 unique digests, 58 unique salts

Bitmaps: 16 bits, 65536 entries, 0x0000ffff mask, 262144 bytes, 5/13 rotates

Rules: 77

Applicable Optimizers:

* Zero-Byte

* Not-Iterated

Watchdog: Hardware Monitoring Interface not found on your system

Watchdog: Temperature abort trigger disabled

Watchdog: Temperature retain trigger disabled

Initializing device kernels and memory...ASSERTION FAILED: Fail to compile kernel, may need to increase reserved registers for spilling.

  at file /build/beignet-5qGeBM/beignet-1.1.1/backend/src/backend/gen_program.cpp, function virtual gbe::Kernel* gbe::GenProgram::compileKernel(const gbe::ir::Unit&, const string&, bool), line 200

Trace/breakpoint trap (core dumped)

I get the same behavior whether the NVidia drivers are enabled or not (using prime-select or directly through nvidia-settings). It works fine in the other attack modes ... just not the one I need.

I tried both the latest release and version from git.

I have no idea what to try next. Advice?

RE: Fail to compile kernel, may need to increase reserved registers for spilling. - atom - 11-28-2016

Clearly not an hashcat issue, the compiler error make that clear. We should ban beignet support to reduce support overhead. Can you please paste the output of hashcat -I and ideally clinfo?

RE: Fail to compile kernel, may need to increase reserved registers for spilling. - joshmore - 11-28-2016

(11-28-2016, 02:42 PM)atom Wrote: Clearly not an hashcat issue, the compiler error make that clear. We should ban beignet support to reduce support overhead. Can you please paste the output of hashcat -I and ideally clinfo?

After doing some more research, as I understand things, the issue is in beignet, which is how (I think) hashcat accesses the Intel chip. What I couldn't find was a way to tell hashcat to only use the NVidia. I think, if I could do that, I could work around the immediate issue. Is that possible?

Interestingly, there is no "-I" option in the binary version of Hashcat for Linux. Here it is from the version I compiled off of github:

Code:
# ./hashcat -I

hashcat (v3.10-829-g646a472) starting...

OpenCL Info:

Platform ID #1

  Vendor  : Intel

  Name    : Intel Gen OCL Driver

  Version : OpenCL 1.2 beignet 1.1.1

  Device ID #1

    Type           : GPU

    Vendor ID      : 4

    Vendor         : Intel

    Name           : Intel(R) HD Graphics Haswell GT2 Mobile

    Version        : OpenCL 1.2 beignet 1.1.1

    Processor(s)   : 20

    Clock          : 1000

    Memory         : 1024/2048 MB allocatable

    OpenCL Version : OpenCL C 1.2 beignet 1.1.1

    Driver Version : 1.1.1

And here is clinfo:

Code:
# clinfo

Number of platforms                               1

  Platform Name                                   Intel Gen OCL Driver

  Platform Vendor                                 Intel

  Platform Version                                OpenCL 1.2 beignet 1.1.1

  Platform Profile                                FULL_PROFILE

  Platform Extensions                             cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_spir cl_khr_icd

  Platform Extensions function suffix             Intel

  Platform Name                                   Intel Gen OCL Driver

Number of devices                                 1

  Device Name                                     Intel(R) HD Graphics Haswell GT2 Mobile

  Device Vendor                                   Intel

  Device Vendor ID                                0x8086

  Device Version                                  OpenCL 1.2 beignet 1.1.1

  Driver Version                                  1.1.1

  Device OpenCL C Version                         OpenCL C 1.2 beignet 1.1.1

  Device Type                                     GPU

  Device Profile                                  FULL_PROFILE

  Max compute units                               20

  Max clock frequency                             1000MHz

  Device Partition                                (core)

    Max number of sub-devices                     1

    Supported partition types                     None, None, None

  Max work item dimensions                        3

  Max work item sizes                             512x512x512

  Max work group size                             512

  Preferred work group size multiple              16

  Preferred / native vector sizes                 

    char                                                16 / 8       

    short                                                8 / 8       

    int                                                  4 / 4       

    long                                                 2 / 2       

    half                                                 0 / 8        (n/a)

    float                                                4 / 4       

    double                                               0 / 2        (n/a)

  Half-precision Floating-point support           (n/a)

  Single-precision Floating-point support         (core)

    Denormals                                     No

    Infinity and NANs                             Yes

    Round to nearest                              Yes

    Round to zero                                 No

    Round to infinity                             No

    IEEE754-2008 fused multiply-add               No

    Support is emulated in software               No

    Correctly-rounded divide and sqrt operations  No

  Double-precision Floating-point support         (n/a)

  Address bits                                    32, Little-Endian

  Global memory size                              2147483648 (2GiB)

  Error Correction support                        No

  Max memory allocation                           1073741824 (1024MiB)

  Unified memory for Host and Device              Yes

  Minimum alignment for any data type             128 bytes

  Alignment of base address                       1024 bits (128 bytes)

  Global Memory cache type                        Read/Write

  Global Memory cache size                        8192

  Global Memory cache line                        64 bytes

  Image support                                   Yes

    Max number of samplers per kernel             16

    Max size for 1D images from buffer            65536 pixels

    Max 1D or 2D image array size                 2048 images

    Max 2D image size                             8192x8192 pixels

    Max 3D image size                             8192x8192x2048 pixels

    Max number of read image args                 128

    Max number of write image args                8

  Local memory type                               Global

  Local memory size                               65536 (64KiB)

  Max constant buffer size                        134217728 (128MiB)

  Max number of constant args                     8

  Max size of kernel argument                     1024

  Queue properties                                

    Out-of-order execution                        No

    Profiling                                     Yes

  Prefer user sync for interop                    Yes

  Profiling timer resolution                      80ns

  Execution capabilities                          

    Run OpenCL kernels                            Yes

    Run native kernels                            Yes

    SPIR versions                                 <printDeviceInfo:138: get   SPIR versions size : error -30>

  printf() buffer size                            1048576 (1024KiB)

  Built-in kernels                                __cl_copy_region_align4;__cl_copy_region_align16;__cl_cpy_region_unalign_same_offset;__cl_copy_region_unalign_dst_offset;__cl_copy_region_unalign_src_offset;__cl_copy_buffer_rect;__cl_copy_image_1d_to_1d;__cl_copy_image_2d_to_2d;__cl_copy_image_3d_to_2d;__cl_copy_image_2d_to_3d;__cl_copy_image_3d_to_3d;__cl_copy_image_2d_to_buffer;__cl_copy_image_3d_to_buffer;__cl_copy_buffer_to_image_2d;__cl_copy_buffer_to_image_3d;__cl_fill_region_unalign;__cl_fill_region_align2;__cl_fill_region_align4;__cl_fill_region_align8_2;__cl_fill_region_align8_4;__cl_fill_region_align8_8;__cl_fill_region_align8_16;__cl_fill_region_align128;__cl_fill_image_1d;__cl_fill_image_1d_array;__cl_fill_image_2d;__cl_fill_image_2d_array;__cl_fill_image_3d;

  Device Available                                Yes

  Compiler Available                              Yes

  Linker Available                                Yes

  Device Extensions                               cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_spir cl_khr_icd

NULL platform behavior

  clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...)  No platform

  clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...)   No platform

  clCreateContext(NULL, ...) [default]            No platform

  clCreateContext(NULL, ...) [other]              Success [Intel]

  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU)  No platform

  clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU)  No platform

  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR)  No platform

  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM)  No platform

  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL)  No platform

RE: Fail to compile kernel, may need to increase reserved registers for spilling. - joshmore - 11-28-2016

(11-28-2016, 04:53 PM)joshmore Wrote: [quote='atom' pid='32410' dateline='1480336961']
Clearly not an hashcat issue, the compiler error make that clear. We should ban beignet support to reduce support overhead. Can you please paste the output of hashcat -I and ideally clinfo?

And just to be complete, here are the infodumps for when I have the NVidia side activated as well:

Code:
# ./hashcat -I

hashcat (v3.10-829-g646a472) starting...

OpenCL Info:

Platform ID #1

  Vendor  : Intel

  Name    : Intel Gen OCL Driver

  Version : OpenCL 1.2 beignet 1.1.1

  Device ID #1

    Type           : GPU

    Vendor ID      : 4

    Vendor         : Intel

    Name           : Intel(R) HD Graphics Haswell GT2 Mobile

    Version        : OpenCL 1.2 beignet 1.1.1

    Processor(s)   : 20

    Clock          : 1000

    Memory         : 1024/2048 MB allocatable

    OpenCL Version : OpenCL C 1.2 beignet 1.1.1

    Driver Version : 1.1.1

Platform ID #2

  Vendor  : NVIDIA Corporation

  Name    : NVIDIA CUDA

  Version : OpenCL 1.2 CUDA 8.0.46

  Device ID #2

    Type           : GPU

    Vendor ID      : 32

    Vendor         : NVIDIA Corporation

    Name           : Quadro K1100M

    Version        : OpenCL 1.2 CUDA

    Processor(s)   : 2

    Clock          : 705

    Memory         : 499/1998 MB allocatable

    OpenCL Version : OpenCL C 1.2 

    Driver Version : 367.57

Code:
# clinfo 

Number of platforms                               2

  Platform Name                                   Intel Gen OCL Driver

  Platform Vendor                                 Intel

  Platform Version                                OpenCL 1.2 beignet 1.1.1

  Platform Profile                                FULL_PROFILE

  Platform Extensions                             cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_spir cl_khr_icd

  Platform Extensions function suffix             Intel

  Platform Name                                   NVIDIA CUDA

  Platform Vendor                                 NVIDIA Corporation

  Platform Version                                OpenCL 1.2 CUDA 8.0.46

  Platform Profile                                FULL_PROFILE

  Platform Extensions                             cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_copy_opts cl_khr_gl_event

  Platform Extensions function suffix             NV

  Platform Name                                   Intel Gen OCL Driver

Number of devices                                 1

  Device Name                                     Intel(R) HD Graphics Haswell GT2 Mobile

  Device Vendor                                   Intel

  Device Vendor ID                                0x8086

  Device Version                                  OpenCL 1.2 beignet 1.1.1

  Driver Version                                  1.1.1

  Device OpenCL C Version                         OpenCL C 1.2 beignet 1.1.1

  Device Type                                     GPU

  Device Profile                                  FULL_PROFILE

  Max compute units                               20

  Max clock frequency                             1000MHz

  Device Partition                                (core)

    Max number of sub-devices                     1

    Supported partition types                     None, None, None

  Max work item dimensions                        3

  Max work item sizes                             512x512x512

  Max work group size                             512

  Preferred work group size multiple              16

  Preferred / native vector sizes                 

    char                                                16 / 8       

    short                                                8 / 8       

    int                                                  4 / 4       

    long                                                 2 / 2       

    half                                                 0 / 8        (n/a)

    float                                                4 / 4       

    double                                               0 / 2        (n/a)

  Half-precision Floating-point support           (n/a)

  Single-precision Floating-point support         (core)

    Denormals                                     No

    Infinity and NANs                             Yes

    Round to nearest                              Yes

    Round to zero                                 No

    Round to infinity                             No

    IEEE754-2008 fused multiply-add               No

    Support is emulated in software               No

    Correctly-rounded divide and sqrt operations  No

  Double-precision Floating-point support         (n/a)

  Address bits                                    32, Little-Endian

  Global memory size                              2147483648 (2GiB)

  Error Correction support                        No

  Max memory allocation                           1073741824 (1024MiB)

  Unified memory for Host and Device              Yes

  Minimum alignment for any data type             128 bytes

  Alignment of base address                       1024 bits (128 bytes)

  Global Memory cache type                        Read/Write

  Global Memory cache size                        8192

  Global Memory cache line                        64 bytes

  Image support                                   Yes

    Max number of samplers per kernel             16

    Max size for 1D images from buffer            65536 pixels

    Max 1D or 2D image array size                 2048 images

    Max 2D image size                             8192x8192 pixels

    Max 3D image size                             8192x8192x2048 pixels

    Max number of read image args                 128

    Max number of write image args                8

  Local memory type                               Global

  Local memory size                               65536 (64KiB)

  Max constant buffer size                        134217728 (128MiB)

  Max number of constant args                     8

  Max size of kernel argument                     1024

  Queue properties                                

    Out-of-order execution                        No

    Profiling                                     Yes

  Prefer user sync for interop                    Yes

  Profiling timer resolution                      80ns

  Execution capabilities                          

    Run OpenCL kernels                            Yes

    Run native kernels                            Yes

    SPIR versions                                 <printDeviceInfo:138: get   SPIR versions size : error -30>

  printf() buffer size                            1048576 (1024KiB)

  Built-in kernels                                __cl_copy_region_align4;__cl_copy_region_align16;__cl_cpy_region_unalign_same_offset;__cl_copy_region_unalign_dst_offset;__cl_copy_region_unalign_src_offset;__cl_copy_buffer_rect;__cl_copy_image_1d_to_1d;__cl_copy_image_2d_to_2d;__cl_copy_image_3d_to_2d;__cl_copy_image_2d_to_3d;__cl_copy_image_3d_to_3d;__cl_copy_image_2d_to_buffer;__cl_copy_image_3d_to_buffer;__cl_copy_buffer_to_image_2d;__cl_copy_buffer_to_image_3d;__cl_fill_region_unalign;__cl_fill_region_align2;__cl_fill_region_align4;__cl_fill_region_align8_2;__cl_fill_region_align8_4;__cl_fill_region_align8_8;__cl_fill_region_align8_16;__cl_fill_region_align128;__cl_fill_image_1d;__cl_fill_image_1d_array;__cl_fill_image_2d;__cl_fill_image_2d_array;__cl_fill_image_3d;

  Device Available                                Yes

  Compiler Available                              Yes

  Linker Available                                Yes

  Device Extensions                               cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_spir cl_khr_icd

  Platform Name                                   NVIDIA CUDA

Number of devices                                 1

  Device Name                                     Quadro K1100M

  Device Vendor                                   NVIDIA Corporation

  Device Vendor ID                                0x10de

  Device Version                                  OpenCL 1.2 CUDA

  Driver Version                                  367.57

  Device OpenCL C Version                         OpenCL C 1.2 

  Device Type                                     GPU

  Device Profile                                  FULL_PROFILE

  Device Topology (NV)                            PCI-E, 01:00.0

  Max compute units                               2

  Max clock frequency                             705MHz

  Compute Capability (NV)                         3.0

  Device Partition                                (core)

    Max number of sub-devices                     1

    Supported partition types                     None

  Max work item dimensions                        3

  Max work item sizes                             1024x1024x64

  Max work group size                             1024

  Preferred work group size multiple              32

  Warp size (NV)                                  32

  Preferred / native vector sizes                 

    char                                                 1 / 1       

    short                                                1 / 1       

    int                                                  1 / 1       

    long                                                 1 / 1       

    half                                                 0 / 0        (n/a)

    float                                                1 / 1       

    double                                               1 / 1        (cl_khr_fp64)

  Half-precision Floating-point support           (n/a)

  Single-precision Floating-point support         (core)

    Denormals                                     Yes

    Infinity and NANs                             Yes

    Round to nearest                              Yes

    Round to zero                                 Yes

    Round to infinity                             Yes

    IEEE754-2008 fused multiply-add               Yes

    Support is emulated in software               No

    Correctly-rounded divide and sqrt operations  Yes

  Double-precision Floating-point support         (cl_khr_fp64)

    Denormals                                     Yes

    Infinity and NANs                             Yes

    Round to nearest                              Yes

    Round to zero                                 Yes

    Round to infinity                             Yes

    IEEE754-2008 fused multiply-add               Yes

    Support is emulated in software               No

    Correctly-rounded divide and sqrt operations  No

  Address bits                                    64, Little-Endian

  Global memory size                              2095251456 (1.951GiB)

  Error Correction support                        No

  Max memory allocation                           523812864 (499.5MiB)

  Unified memory for Host and Device              No

  Integrated memory (NV)                          No

  Minimum alignment for any data type             128 bytes

  Alignment of base address                       4096 bits (512 bytes)

  Global Memory cache type                        Read/Write

  Global Memory cache size                        32768

  Global Memory cache line                        128 bytes

  Image support                                   Yes

    Max number of samplers per kernel             32

    Max size for 1D images from buffer            134217728 pixels

    Max 1D or 2D image array size                 2048 images

    Max 2D image size                             16384x16384 pixels

    Max 3D image size                             4096x4096x4096 pixels

    Max number of read image args                 256

    Max number of write image args                16

  Local memory type                               Local

  Local memory size                               49152 (48KiB)

  Registers per block (NV)                        65536

  Max constant buffer size                        65536 (64KiB)

  Max number of constant args                     9

  Max size of kernel argument                     4352 (4.25KiB)

  Queue properties                                

    Out-of-order execution                        Yes

    Profiling                                     Yes

  Prefer user sync for interop                    No

  Profiling timer resolution                      1000ns

  Execution capabilities                          

    Run OpenCL kernels                            Yes

    Run native kernels                            No

    Kernel execution timeout (NV)                 No

  Concurrent copy and kernel execution (NV)       Yes

    Number of async copy engines                  1

  printf() buffer size                            1048576 (1024KiB)

  Built-in kernels                                

  Device Available                                Yes

  Compiler Available                              Yes

  Linker Available                                Yes

  Device Extensions                               cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_copy_opts cl_khr_gl_event

NULL platform behavior

  clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...)  No platform

  clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...)   No platform

  clCreateContext(NULL, ...) [default]            No platform

  clCreateContext(NULL, ...) [other]              Success [Intel]

  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU)  No platform

  clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU)  No platform

  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR)  No platform

  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM)  No platform

  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL)  No platform

RE: Fail to compile kernel, may need to increase reserved registers for spilling. - atom - 11-29-2016

I've pushed a new version to github which hopefully informs the use user about beignet problematic. Could you please test if the warning message shows up for you? I've also added a new precompiled beta to https://hashcat.net/beta/

In case you want to ignore the beignet device you can either use the -d or the --opencl-platforms option

RE: Fail to compile kernel, may need to increase reserved registers for spilling. - joshmore - 11-29-2016

(11-29-2016, 02:05 PM)atom Wrote: I've pushed a new version to github which hopefully informs the use user about beignet problematic. Could you please test if the warning message shows up for you? I've also added a new precompiled beta to https://hashcat.net/beta/

In case you want to ignore the beignet device you can either use the -d or the --opencl-platforms option

This is what I get:

Code:
hashcat (v3.10-1617-g72d0b27) starting...

* Device #1: Intel beignet driver detected!

The beignet driver has been marked as half-baked and likely to fail kernel compilation

You can use --force to override this but do not post error reports if you do so

Started: Tue Nov 29 09:54:28 2016

Stopped: Tue Nov 29 09:54:28 2016

Also, I had tried the -d option before posting and, so far as I could tell, even when I skipped the Intel card, it still tried to compile against beignet. However --opencl-platforms=2 did the trick. CUDA cracking is working for me now.

Thanks for your help.