10-16-2013, 09:07 PM
Hi guys, is this still alive?
Have stumblod upon this one and gave it a try with VS on Windows; seems to work fine for me, but I don't get near to 4x speedup; *only* about a dobule compared to fgets from vs runtime. and about 30% compared to doing same thing over buffer read with fread without intrinsics. I guess this version of sse2 optimized fgets is similar to using fread in concept, but performance come from parallell processing with sse2.
With fread:
duration 0: 0.957747
duration 1: 0.869212
duration 2: 0.842793
Mean value: 0.889917
With fgets:
duration 0: 1.258209
duration 1: 1.258632
duration 2: 1.258991
Mean value: 1.258611
With sse2 optimizied fgets:
duration 0: 0.661175
duration 1: 0.661901
duration 2: 0.661526
Mean value: 0.661534
I used 100 meg of random generated ASCII with relatively uniform distribution of new lines. I had to do some chanes to code to get it to compile with VS (just in struct declarations). My test code can be seen here: http://www.nextpoint.se/?p=580 and if someone would like modified code I can make it avialable.
Have stumblod upon this one and gave it a try with VS on Windows; seems to work fine for me, but I don't get near to 4x speedup; *only* about a dobule compared to fgets from vs runtime. and about 30% compared to doing same thing over buffer read with fread without intrinsics. I guess this version of sse2 optimized fgets is similar to using fread in concept, but performance come from parallell processing with sse2.
With fread:
duration 0: 0.957747
duration 1: 0.869212
duration 2: 0.842793
Mean value: 0.889917
With fgets:
duration 0: 1.258209
duration 1: 1.258632
duration 2: 1.258991
Mean value: 1.258611
With sse2 optimizied fgets:
duration 0: 0.661175
duration 1: 0.661901
duration 2: 0.661526
Mean value: 0.661534
I used 100 meg of random generated ASCII with relatively uniform distribution of new lines. I had to do some chanes to code to get it to compile with VS (just in struct declarations). My test code can be seen here: http://www.nextpoint.se/?p=580 and if someone would like modified code I can make it avialable.