Need Help Parsing 10M list
#1
There is a very large list of passwords that has been posted online.  I opened this file in notepad++. (it took a while to load)  When I went over it, I noticed something.  It still had all the user names listed, all the username data had been replaced with zero's.  I thought I could get around this by using --username, but some of the listings don't have any spacing between passwords and usernames, and it didn't work.  I used the gate utility to chop it up into pieces, then used notepad++ regex editor to turn tabs into newlines, so i could at least run the list against hashcat in plaintext, but unfortunately this means I am unable to run a mask without picking up 250k false positives from all the username data.  I need help parsing this list.

https://download.g0tmi1k.com/wordlists/large/
#2
I was thinking of trying to use the --separator tag, but wasn't sure how to use it

???--username --separator \t???
???--username --separator\t???
???--username --separator <tab key>???
???--username --separator <ASCII number>???
ect...
#3
Those flags do not do what you think they do. The cool kids would actually process the list and transform the data into the required format. This is where knowing your OS really comes in handy (and indeed using an OS that has the necessary tools in the first place, which Windows does not (at least not out of the box, but there is Cygwin, WSL, etc.))

Protip: excluding a couple noteworthy exceptions, pretty much any wordlist publicly available on the Internet is going to be 100% horseshit.
#4
>Those commands do not do what you think they do
--username                |      | Enable ignoring of usernames in hashfile
--separator               | Char | Separator char for hashlists and outfile

Well, I gotta say, those are some pretty misleading descriptors then.

Like I mentioned, the list has several entries where there is no separator between the username and password, its all one word.  I'm not sure how you would process this.

Speaking of stuff you find on the internet being shit, there isn't any possibility of running some sort of command line injection script while reading a massive wordlist, is there?

PS What is the standard hashcat input format for hashlists with username data?
#5
You said this was a wordlist, not a hash list. Those flags do not apply to wordlists.

I'm not sure what you mean by 'command line injection script'. Do you mean some malicious code embedded in a wordlist to exploit some vulnerability in a utility that is parsing it? I suppose that's within the realm of things possible, but I wouldn't think that would be very probable.

The standard format is user:hash
#6
My bad.  I was using my wordlist as a hashlist using the plaintext hash mode.

Maybe if I use a Regex editor I can replace all the tabs with colons and it will format properly.