html and conversations in rockyou
#1
I'm probably late to this, but I was doing some analysis on the rockyou list from skullsecurity.org, and I noticed a lot of html and what looks like private conversations also included in the list. I'm assuming these are just artifacts from the data breach. I never bothered to look through the whole list in detail before so this is new to me.

Stuff like this:

LINE: 7522828 href="http://www.enchulatupagina.com/comentarios-en-espanol/sexy/p
arty

LINE: 7557111 honeyDo you realize who is in this image: http://thecoolpics.com/who.jpg . Just think for a moment and tell me o you realize who is in this image:
http://thecoolpics.com/who.jpg . Just think for a moment and tell me soon Wink)

In practice does everyone just strip out these entries or just load them up as valid password candidates?
#2
I've raised the topic of garbage in leaked lists on the forum over on hashes.org.

Another issue is "How I found encoding errors in rockyou.txt trying to import dictionaries into MySQL" at:
http://thepasswordproject.com/2012-01-25...into_mysql
#3
(08-19-2014, 03:12 PM)Kgx Pnqvhm Wrote: I've raised the topic of garbage in leaked lists on the forum over on hashes.org.

Another issue is "How I found encoding errors in rockyou.txt trying to import dictionaries into MySQL" at:
http://thepasswordproject.com/2012-01-25...into_mysql

Ok, so it's a known problem. Just wondering.