Over on the Hashes.org Forum, General, there are disussions such as "fake, corrupt and other crap hashes" (
https://hashes.org/forum/viewtopic.php?f=3&t=1709).
And they have "junk lists" on
https://hashes.org/crackers.php.
But the examples above are not in there.
Putting the ones I listed above into analysis tools such as PACK or using as "training lists" for other tools is a waste of time, and leads to erroneous/useless results. The research crowd seems to agree that the 8 character Statfor words are mostly machine-generated.
(If I remember correctly, atom's combined password in one of those articles about cracking was a combination of human created passwords, something to do with "mom of 8 great kids" or similar.)
--------------------------------------------------------------------------
One more item for my "book" above, even Team Hashcat's unix-ninja, in his "Password DNA" article at
https://www.unix-ninja.com/p/Password_DNA, mentined the need to sanitize:
"finally, entries which are known to belong to bots will be removed (these entries do not accurately reflect password authors' behaviours and only skew the results of a dictionary in unfavourable ways)"