![]() |
Method for removing crap/generated passwords from word lists? - Printable Version +- hashcat Forum (https://hashcat.net/forum) +-- Forum: Misc (https://hashcat.net/forum/forum-15.html) +--- Forum: General Talk (https://hashcat.net/forum/forum-33.html) +--- Thread: Method for removing crap/generated passwords from word lists? (/thread-5694.html) |
Method for removing crap/generated passwords from word lists? - Kgx Pnqvhm - 07-24-2016 With all these new leaked list there is the issue of crap/generated passwords "contaminating" the word lists. What methods do people here use to attempt to remove them? RE: Method for removing crap/generated passwords from word lists? - atom - 07-24-2016 Can you make an example of a "contamined" password? RE: Method for removing crap/generated passwords from word lists? - Kgx Pnqvhm - 07-24-2016 The most recent article (2016) about this is "So, Just Why Is 18atcskd2w Such a Popular Password?" at: http://www.tripwire.com/state-of-security/featured/so-just-why-is-18atcskd2w-such-a-popular-password/ One of the many articles discussing the Stratfor list, "Challenges with Evaluating Password Cracking Algorithms" at: http://reusablesec.blogspot.com/2015/08/challenges-with-evaluating-password.html has the sentence: "A majority of the passwords in the Stratfor dataset were machine generated." One method of detection is in "A list of flaws in the data set" where Mark Burnett writes about "Ten Million Passwords" he released: "I have an algorithm in my Hurl script that looks for situations where both the username and password have abnormally high entropy and therefore likely both were computer-generated. The algorithm looks at many weighted criteria (such as both being exactly 8 characters long or containing only hex characters) and comes up with a score. I had the weight a littler lower than it should be to avoid false positives but that means there are still many passwords that were obviously not selected by humans." RE: Method for removing crap/generated passwords from word lists? - atom - 07-25-2016 I asked for an example not a book list RE: Method for removing crap/generated passwords from word lists? - Kgx Pnqvhm - 07-26-2016 E.g., in the Stratfor list: gyq3eftf gyq3hmpr gyq3vrwf gyq9natb gyq9z9cv gyqbv3bl gyqctog6 gyqggjkb gyqgzubc gyqh2eww gyqjhue7 gyqjuaf7 gyqkc9b6 gyqkcern gyqndsww gyqpfmek gyqpurue gyqrect9 gyqruaud gyqrxqp6 gyqu9tyq gyqzvnvu gyr5td9h gyr5ywcp gyr7dta6 gyr9kbsb gyrbhszg gyrc3fok gyrc8ar7 gyrfdekt gyrh4gaw gyrh6ab3 gyrjohup gyrkei7p gyrkmebl gyrokgw9 gyrprrya gyrqkvuj gyrrf7nz gyrt4ar3 gyrtnepj gyrunw6p gyrv92bp gyrx6qqr gyrxdtvj gyrxdu9d gys2rfxr Those look machine-generated to me, not something a human would do. The majority of the 8 character words are like that. RE: Method for removing crap/generated passwords from word lists? - epixoip - 07-26-2016 So you're looking for a way to distinguish human-generated passwords from machine-generated passwords? I know of no public tool that can do this, but it certainly can be done with some degree of accuracy using markov chains, machine learning, etc. RE: Method for removing crap/generated passwords from word lists? - Kgx Pnqvhm - 07-27-2016 Right. Because those machine-generated passwords clutter up and make word lists less efficient. And when word lists get combined, the clutter/crap/noise increases geometrically. That well known computer saying "Garbage In = Garbage Out" applied to combining bad word lists becomes "Garbage * Garbage = Garbage Squared." RE: Method for removing crap/generated passwords from word lists? - atom - 07-27-2016 Note that you don't want to drop them completely. Because such random looking passwords are mostly the golden passwords. In case a person actually uses it, there's an additional chance to this password is reused, especially for more important hashes. A cracked password shouldn't be removed from a wordlist, even if it looks random. RE: Method for removing crap/generated passwords from word lists? - Kgx Pnqvhm - 07-27-2016 Over on the Hashes.org Forum, General, there are disussions such as "fake, corrupt and other crap hashes" (https://hashes.org/forum/viewtopic.php?f=3&t=1709). And they have "junk lists" on https://hashes.org/crackers.php. But the examples above are not in there. Putting the ones I listed above into analysis tools such as PACK or using as "training lists" for other tools is a waste of time, and leads to erroneous/useless results. The research crowd seems to agree that the 8 character Statfor words are mostly machine-generated. (If I remember correctly, atom's combined password in one of those articles about cracking was a combination of human created passwords, something to do with "mom of 8 great kids" or similar.) -------------------------------------------------------------------------- One more item for my "book" above, even Team Hashcat's unix-ninja, in his "Password DNA" article at https://www.unix-ninja.com/p/Password_DNA, mentined the need to sanitize: "finally, entries which are known to belong to bots will be removed (these entries do not accurately reflect password authors' behaviours and only skew the results of a dictionary in unfavourable ways)" RE: Method for removing crap/generated passwords from word lists? - Kgx Pnqvhm - 07-28-2016 After posting my answer to the MDXfind question, while looking at my notes on cleaning programs, noticed that I tool I came across earlier this year had recently been updated. See: "Introducing bstrings, a Better Strings utility!" at http://binaryforay.blogspot.com/2015/07/introducing-bstrings-better-strings.html And "bstrings v1.1 released!" at https://binaryforay.blogspot.com/2016/04/bstrings-v11-released.html It has some useful built in regular expressions. |