Sorry for reviving this thread.
Out of my own curiosity, I translated the len utility to use fgets_sse2 instead of fgets and am seeing some odd behavior. Output:
Oddly enough, when I use rockyou.txt everything is fine:
When I piped the outputs to two different files and compared them, It seems that then version using fgets_sse2 is not keeping lines which have a . at the end. Maybe it's an issue with newline characters being handled differently. enwik8 is available here: http://mattmahoney.net/dc/enwik8.zip
edit: I found a secondary problem. It looks like when you feed it wordlists that have \r\n at the end of a line, the \r gets treated as part of the word. Looks like filtering is needed.
Out of my own curiosity, I translated the len utility to use fgets_sse2 instead of fgets and am seeing some odd behavior. Output:
Code:
mangix@Mangix ~/testing
$ len 2 6 < enwik8 | wc -l
33205
mangix@Mangix ~/testing
$ ./len 2 6 < enwik8 | wc -l
33070
Oddly enough, when I use rockyou.txt everything is fine:
Code:
mangix@Mangix ~/testing
$ len 2 6 < rockyou.txt | wc -l
2227662
mangix@Mangix ~/testing
$ ./len.exe 2 6 < rockyou.txt | wc -l
2227662
When I piped the outputs to two different files and compared them, It seems that then version using fgets_sse2 is not keeping lines which have a . at the end. Maybe it's an issue with newline characters being handled differently. enwik8 is available here: http://mattmahoney.net/dc/enwik8.zip
edit: I found a secondary problem. It looks like when you feed it wordlists that have \r\n at the end of a line, the \r gets treated as part of the word. Looks like filtering is needed.