Wordlist form Weakpass - File Format
#1
Question 
Hello @all,

can someone explain to me why some wordlists from Weakpass are easy to read (e.g. with $ less) while for others wordlists less prompt "... may be a binary file". I do not know how to make them readable for me.

To give you an example, rockyou.txt works just fine while Hashes.org does not.

Thanks! And sorry for
Reply
#2
(09-16-2021, 01:28 PM)hashmenow Wrote: Hello @all,

can someone explain to me why some wordlists from Weakpass are easy to read (e.g. with $ less) while for others wordlists less prompt "... may be a binary file". I do not know how to make them readable for me.

To give you an example, rockyou.txt works just fine while Hashes.org does not.

Thanks! And sorry for

if you open hashes.org with an hex-editor you will see some non plain ascii chars or control chars right at the start, i think thats the reason why linux "thinks" this is a binary data file, because "file" hashes.org also thinks it is a datafile
Reply
#3
(09-17-2021, 05:20 PM)Snoopy Wrote:
(09-16-2021, 01:28 PM)hashmenow Wrote: Hello @all,

can someone explain to me why some wordlists from Weakpass are easy to read (e.g. with $ less) while for others wordlists less prompt "... may be a binary file". I do not know how to make them readable for me.

To give you an example, rockyou.txt works just fine while Hashes.org does not.

Thanks! And sorry for

if you open hashes.org with an hex-editor you will see some non plain ascii chars or control chars right at the start, i think thats the reason why linux "thinks" this is a binary data file, because "file" hashes.org also thinks it is a datafile

Thanks! But why do wordlists contains control chars? Even if it is possible to use them for a real password, wouldn't they be extremely rare?
Reply
#4
Your instincts are good. Yes, they should be rare - even historically, and pretty much non-existent for web-based passwords. Many "wordlists" are just massive mashups of other people's ideas of what might make a good wordlist, and/or other character encodings - and when they're sorted, a lot of the non-printable junk ends up at the top of the file.
~
Reply
#5
Well, that explains it. Thanks Smile
Reply