I think it is just impossible to significantly increase the speed (and for sure increasing the # of threads doesn't help at all).
I'm not sure if you understood what I meant by saying the stdout is the problem.
Think like this: there is a worker that is *very*,very fast, but the output that he hands over to the input/output (I/O) - to the operating system let's say - will need much time to get printed. The threads need to "wait" until I/O reacts/did it's job... this is what we call a bottleneck and there is kind nothing we can do about it... there is too much to print and too much I/O involved ... also if we had used additional threads we need to wait again for I/O... so more threads do not help.
Faster I/O may help ... so a *really* slow output could depend on what you do with it (e.g. your pipe, your disk - if you use redirect - , it could also depend on operating system, how good it is to handle I/O etc etc).
I'm not sure if you understood what I meant by saying the stdout is the problem.
Think like this: there is a worker that is *very*,very fast, but the output that he hands over to the input/output (I/O) - to the operating system let's say - will need much time to get printed. The threads need to "wait" until I/O reacts/did it's job... this is what we call a bottleneck and there is kind nothing we can do about it... there is too much to print and too much I/O involved ... also if we had used additional threads we need to wait again for I/O... so more threads do not help.
Faster I/O may help ... so a *really* slow output could depend on what you do with it (e.g. your pipe, your disk - if you use redirect - , it could also depend on operating system, how good it is to handle I/O etc etc).