[split] BFI_INT from OpenCL
#10
(12-17-2010, 10:02 AM)gat3way Wrote: I find it a bit strange that 1), 4) and 5) are slower than 3) though. You have one bitwise operation less, however it should be worse as there is one more instruction dependency (b depends on c^d, then d depends on b&(c^d) while in 1),4) and 5) (b&c) and ((~b)&d) can be processed in parallel). What is more strange, both behaved the same on CPU, even though more dependencies would cause a pipeline bubble.

CPU version: If you want to use variable "b" later, you can't overwrite it, so you must make a copy...
So the first case will translate into something like this:
Code:
movdqa tmp1, b
movdqa tmp2, b
pand   tmp1, c
pnand  tmp2, d
por    tmp1, tmp2

option 3 will produce bigger dependency chain, but one instruction less:

Code:
movdqa tmp, d
pxor   tmp, c
pand   tmp, b
pxor   tmp, d

On GPU there is a different problem - absence of single NAND instruction, so moreover you must compute bitwise NOT.
But maybe I'm wrong, please correct me, don't have much time now...
Reply


Messages In This Thread
[split] BFI_INT from OpenCL - by gat3way - 12-15-2010, 12:21 AM
RE: [split] BFI_INT from OpenCL - by gat3way - 12-17-2010, 10:02 AM
RE: [split] BFI_INT from OpenCL - by Dalibor - 12-17-2010, 03:17 PM
RE: AMD Stream 2.3 SDK released - by atom - 12-15-2010, 09:09 AM
RE: AMD Stream 2.3 SDK released - by IvanG - 12-15-2010, 05:48 PM
RE: AMD Stream 2.3 SDK released - by atom - 12-15-2010, 07:33 PM
RE: AMD Stream 2.3 SDK released - by IvanG - 12-15-2010, 08:24 PM
RE: AMD Stream 2.3 SDK released - by atom - 12-15-2010, 09:22 PM
RE: AMD Stream 2.3 SDK released - by gat3way - 12-15-2010, 11:06 PM
RE: AMD Stream 2.3 SDK released - by Dalibor - 12-16-2010, 04:21 PM