Splitting Large Single-Line File Into Passwords
#1
Hi all, and thanks in advance for the help.

I have a large text file (>10GB) that I know contains a password I'm looking for. The file is formatted properly (ASCII), but it contains no line breaks and is basically one massive single line of text. I also don't know the length of the password, but I know it exists in the file.

I have written a PowerShell script to pull out all possible strings of X length, going byte by byte (i.e., all strings of 10 characters, all strings of 11 characters, etc.) to text files, but the process is extremely slow and memory intensive (and much slower than hashcat itself).

My hash is slow, so pure bruteforce is not an option here. Does anyone know if hashcat has the capability to ingest a large text file and try all possible passwords of a specific length? Or have a better method to do this?

86Ranger
Reply
#2
(04-17-2023, 06:48 PM)86Ranger Wrote: Hi all, and thanks in advance for the help.

I have a large text file (>10GB) that I know contains a password I'm looking for. The file is formatted properly (ASCII), but it contains no line breaks and is basically one massive single line of text. I also don't know the length of the password, but I know it exists in the file.

I have written a PowerShell script to pull out all possible strings of X length, going byte by byte (i.e., all strings of 10 characters, all strings of 11 characters, etc.) to text files, but the process is extremely slow and memory intensive (and much slower than hashcat itself).

My hash is slow, so pure bruteforce is not an option here. Does anyone know if hashcat has the capability to ingest a large text file and try all possible passwords of a specific length? Or have a better method to do this?

86Ranger

is there at least a space between the passes like: pass1 pass2 pass3 or is it also without space like: pass1pass2pass3?

with space separation its no problem to generate a new wordlist from that, without separation (style2) you will probably never get your pass, why? because it is most unlikely that your script will match the exact starting point of your pass, just imagine you are looking for a pass of length six, then the pass have to start at a position which is postion modulo 6 = 0, or is your script workling like that? start at position 0 extract 6, move to position 1 extract 6 and so on? like that.
input, cut length 6
ultrawhitelongpasswordline
result:
ultraw
ltrawh
trawhi
rawhit
...?  

this way you COULD hit your pass, but you will generate tons of new wordlist for each length ( i think bruteforce of length 1-6 is maybe possible, but you will need to genrate list for length 7 up to X, and every list will consume  +10GB on its own

anyway, without knowing the exact password length and this massive file content it is most unlikely that you will find that single pass you are looking for
Reply
#3
(04-20-2023, 02:53 PM)Snoopy Wrote:
(04-17-2023, 06:48 PM)86Ranger Wrote: Hi all, and thanks in advance for the help.

I have a large text file (>10GB) that I know contains a password I'm looking for. The file is formatted properly (ASCII), but it contains no line breaks and is basically one massive single line of text. I also don't know the length of the password, but I know it exists in the file.

I have written a PowerShell script to pull out all possible strings of X length, going byte by byte (i.e., all strings of 10 characters, all strings of 11 characters, etc.) to text files, but the process is extremely slow and memory intensive (and much slower than hashcat itself).

My hash is slow, so pure bruteforce is not an option here. Does anyone know if hashcat has the capability to ingest a large text file and try all possible passwords of a specific length? Or have a better method to do this?

86Ranger

is there at least a space between the passes like: pass1 pass2 pass3 or is it also without space like: pass1pass2pass3?

with space separation its no problem to generate a new wordlist from that, without separation (style2) you will probably never get your pass, why? because it is most unlikely that your script will match the exact starting point of your pass, just imagine you are looking for a pass of length six, then the pass have to start at a position which is postion modulo 6 = 0, or is your script workling like that? start at position 0 extract 6, move to position 1 extract 6 and so on? like that.
input, cut length 6
ultrawhitelongpasswordline
result:
ultraw
ltrawh
trawhi
rawhit
...?  

this way you COULD hit your pass, but you will generate tons of new wordlist for each length ( i think bruteforce of length 1-6 is maybe possible, but you will need to genrate list for length 7 up to X, and every list will consume  +10GB on its own

anyway, without knowing the exact password length and this massive file content it is most unlikely that you will find that single pass you are looking for


Thanks for the feedback, Snoopy. To answer your question, there is no delineation between potential passwords, it is one giant single line of characters.

I was able to get the functionality I'm looking for using nested for loops on Linux, as it turns out Linux is much better at handling large raw files than Powershell. The stdin function of hashcat is great because it avoids the issue you mentioned of MASSIVE text files. Here is what I have so far:

Code:
# The first loop covers password length, and can be modified. This one covers 7-32 character passwords.
for i in $(seq 6 64)  # Covers all passwords 6-64 character length
do
     for j in $(seq 1 $i) # Rotate starting position thru length of possible passwords, sed password length, pipe to hashcat
     do
          cut -c $j- [LARGE-FILE] | sed -r 's/(.{'$i'})/\1\n/g' | hashcat -m [MODE] -a 0 [HASH]
     done
echo "Password Length" $i "Completed!"
done

The functionality is exact as expected, for example at i=6 with input ultrawhitelongpassword, it pipes:
ultraw
ltrawh
trawhi
...

My question is, when running this at large password lengths (eg ~32 characters), hashcat returns a warning "ATTENTION! Read timeout in stdin mode. Password candidates input is too slow"

I'm assuming the hash I'm using is faster than stdin can pipe the candidates. Do you know if there is anyway to optimize hashcat in this regard? Such as instructing hashcat to start piping the next set of candidates while it's still hashing the previous set?
Reply
#4
well no, you could increase the timeout, but i think it would be a better approach when reaching such long candidates to switch to really generating a dictionary from your input 

so generate a dict with lenght 32, run hashcat with this dict and while hashcat is running generate the dict of length 33, 34 and so on (depending how fast or slow hashcat is in comparison to your script), when hashcat finishes you can delete length 32 and switch to 33

sry but i dont have any better idea
Reply
#5
Hi 86Ranger

It seems like what you are asking for was recently released 2 days ago. Please see this link (https://github.com/Cynosureprime/slider) for source and releases for binaries (win/nix).

Please read the instructions carefully and note that since the max ingestion is BUFSIZ -1, if there is no break between that block of text you will miss some windows of text.

Stay safe and happy password cracking.
Reply
#6
(05-03-2023, 12:52 PM)blazer Wrote: Hi 86Ranger

It seems like what you are asking for was recently released 2 days ago. Please see this link (https://github.com/Cynosureprime/slider) for source and releases for binaries (win/nix).

Please read the instructions carefully and note that since the max ingestion is BUFSIZ -1, if there is no break between that block of text you will miss some windows of text.

Stay safe and happy password cracking.

Very cool tool. I tested it on some smaller files with stdin/hashcat and it works great.

On larger files, I was not able to get it to work. It looks like it needs the entire file to be loaded into memory, which is not feasible per my original post. The files I'm dealing with are LARGE (40, 60GB+). 

Could you elaborate a little about the BUFSIZ issue? What determines its size? My files have no breaks, and the chance I might "miss some windows of text" is not a great prospect...

Thank you!
Reply